Engineering Fluency OS

Why This Matters

When you type a message to an LLM, it does not see letters or words — it sees tokens. Tokens are the fundamental units that a language model reads and generates. How text is split into tokens affects everything: cost, speed, accuracy, and what the model can even "see."

The context window is the model's working memory — the maximum number of tokens it can process at once. Understanding tokenization explains why LLMs struggle to count letters in "strawberry," why code costs more than prose, and why your conversation eventually gets "forgotten."

Define Terms

Visual Model

Hello, world!Raw text

TokenizerBPE

Hel

world

Token IDs[15496, 11, ...]

Context WindowMax tokens: 128K

The full process at a glance. Click Start tour to walk through each step.

Text is split into subword tokens via BPE, converted to IDs, embedded as vectors, and processed within the fixed context window.

Code Example

Code

// Simple BPE-style tokenizer demonstration
// In practice, use tiktoken or the model"s tokenizer

// Simulate how BPE might tokenize words
const vocabulary = {
  "hello": 1024,
  "world": 2048,
  "un": 512,
  "happi": 789,
  "ness": 345,
  ",": 11,
  "!": 0,
  " ": 220,
};

function simpleTokenize(text) {
  const tokens = [];
  let i = 0;
  while (i < text.length) {
    let matched = false;
    // Greedy: try longest match first
    for (let len = Math.min(10, text.length - i); len > 0; len--) {
      const substr = text.slice(i, i + len).toLowerCase();
      if (vocabulary[substr] !== undefined) {
        tokens.push({ text: substr, id: vocabulary[substr] });
        i += len;
        matched = true;
        break;
      }
    }
    if (!matched) {
      tokens.push({ text: text[i], id: -1 }); // unknown
      i++;
    }
  }
  return tokens;
}

console.log(simpleTokenize("Hello, world!"));
// Shows how text becomes token objects

// Why "strawberry" is tricky
const word = "strawberry";
console.log("Characters:", word.split(""));
console.log("Length:", word.length);
// A tokenizer might split this into ["straw", "berry"]
// The model sees 2 tokens, not 10 characters
// It cannot easily count individual letters!

// Context window math
const contextWindow = 128000; // Claude"s context
const avgTokensPerWord = 1.3;
const wordsPerPage = 300;
const pagesInContext = contextWindow / (avgTokensPerWord * wordsPerPage);
console.log("Approximate pages in context:", Math.round(pagesInContext));

Interactive Experiment

Try these exercises:

Count the letters 'r' in "strawberry" yourself. Now think: if the tokenizer splits it into ["straw", "berry"], how would a model that only sees token-level representations count 'r's?
Estimate how many tokens your last message to an LLM used. Multiply your word count by 1.3 as an approximation.
What happens when a very long conversation exceeds the context window? Try sending very long messages and observe when the model starts "forgetting" earlier parts.
Look up the token count for code vs. natural language. Why does code typically use more tokens per "concept" than prose?

Quick Quiz

Coding Challenge

Estimate Token Count

Write a function called `estimateTokens` that takes a string and returns an estimated token count. Use these rules: split on spaces to get words, then each word contributes 1 token for every 4 characters (minimum 1 token per word). Also count each punctuation character (.,!?;:) as 1 separate token.

Loading editor...

Real-World Usage

Tokenization and context windows directly affect how you use LLMs in production:

API pricing: OpenAI, Anthropic, and Google all charge per token. Efficient prompting saves money.
Multilingual gaps: BPE tokenizers trained mostly on English split non-English text into more tokens, making other languages "cost" more and perform worse.
Code generation: Programming languages often tokenize inefficiently — brackets, semicolons, and indentation all consume tokens.
Document analysis: Context window size determines how much of a document the model can "see" at once. RAG systems work around limits by retrieving only relevant chunks.
Chat applications: Long conversations overflow the context window. Systems must decide what to keep and what to summarize.

Tokenization & Context Windows