Token Tree Generator

LLM Token Tree

Exploring intuitions around token selection

Enter a prompt. We'll visualize the tokens that get generated in response. At each step we can see what some of the possible tokens are, optionally showing the token IDs, probabilities, and paths not taken. We can run the same prompt a bunch of times and then overlay the results. Currently uses OpenAI model gpt-4o-mini. See this article for more thoughts.

Prompt:

Execution Count: How many times to run the model. More runs will give a better sense of the distribution of tokens. (max 100)

Tokens: How many tokens to generate. 10 is probably fine. (max 50)

Token Alternatives: How many token alternatives to show at each step. This forces getting the top token choices by probability. 0 and 3 are good choices. (max 10)

Seed: Seed for "random" number generator. Set it to -1 for a random seed. The other parameters with the same seed should give the same result. Kinda! IRL it seems like the logprobs change?

Show ID The token ID is the number that represents the token in the model's vocabulary. Most current models know about 32k tokens. Note in particular that some tokens look pretty much the same but one has a space in front of it. The token ID makes it easy to tell them apart.

Show Log Prob "Log Probs" aka Logarithmic Probabilities are assigned to each token at each position. Lower (more negative) numbers mean less-probable. Zero is certain to be chosen (but we round so 0 might be not-quite-zero).

Show Path Count Each time we re-run the prompt we MIGHT follow the same path we did before. This counts how many times we generated this token at this step.

Show Completion Show a node at the end of each tree branch with a summary of the overall completion.

Top-Down Tree Maybe you like a top-down tree, which works better for long completions or narrow screens.