A token in AI is the small chunk of text a language model actually reads and produces, and it is usually a piece of a word rather than a whole word. Before a model can process your sentence, it breaks the text into these pieces, called tokens, using a fixed vocabulary. The model then predicts the next token, one at a time, to build its reply. In practice one token is roughly four characters of English, or about three-quarters of a word, which is why token count, not word count, is the number that determines how much you pay and how much text a model can handle at once.
Why models use tokens instead of words
Computers cannot reason over raw letters efficiently, and a vocabulary of every possible word would be enormous and still miss new or misspelled words. Tokenization solves this by building a vocabulary of common word pieces. Frequent words like "the" become a single token, while a rare or long word gets split into several. The made-up word "tokenization" might break into "token" and "ization." This keeps the vocabulary manageable while still letting the model represent any text, including typos, code, and other languages.
How tokens map to text
The mapping is not one token per word. It depends on the language and content.
| Content |
Rough token cost |
| Common English word |
About 1 token |
| Average English |
About 0.75 words per token |
| 1,000 words of prose |
Around 1,300 tokens |
| Code or JSON |
More tokens per character |
| Non-English text |
Often more tokens per word |
English prose is the cheapest. Code, punctuation-heavy text, and many non-English scripts pack fewer words into the same token budget, so the same idea can cost noticeably more tokens.
Why tokens matter for cost and limits
Two practical things ride on tokens. First, pricing: most AI APIs bill per token, and both your input (the prompt) and the output (the reply) count. A long document you paste in is billed even though you wrote almost none of it. Second, the context window: a model can only hold a fixed number of tokens at once, and that ceiling is measured in tokens too. When you hit it, the model has to drop earlier text. If you are deciding between approaches for grounding a model in long documents, see what a context window is and how RAG works.
How to estimate and control tokens
- Estimate fast: multiply word count by roughly 1.3 to get tokens for English.
- Trim the prompt: remove boilerplate, repeated instructions, and stale chat history.
- Cap the output: set a max-tokens limit so a model does not ramble and run up cost.
- Watch long inputs: pasted PDFs and transcripts are usually the real cost driver, not your questions.
What to skip
- Do not count tokens by hand for casual chats. It only matters at scale or with very long inputs.
- Do not assume one token equals one word. That undercounts cost, especially for code.
- Do not forget output tokens. Long replies are often more expensive than the prompt.
FAQ
How many words is a token?
On average about three-quarters of a word in English, so 1,000 tokens is roughly 750 words. Code and other languages use more tokens per word.
Do I pay for both input and output tokens?
Yes. Most providers bill the prompt you send and the reply you get back, and both fill the context window.
Why does my non-English or code prompt cost more?
Those split into more tokens per character, so the same content uses more of your budget than plain English would.
Is a token the same as a character?
No. A token is usually several characters, often a word piece, not a single letter.
Where to go next
Learn what a context window is, see how a large language model works, and understand what AI inference is.