FUNDAMENTALSJUL 2026·11 Jul 2026·6 min read

What Are Input vs Output Tokens — and Why Do They Cost Different?

Every AI API invoice has two line items: input tokens and output tokens. Output almost always costs 4–24x more per token than input. Here's the plain-English explanation of what tokens are, how the two types differ, and what it means for your monthly AI bill.

QUICK ANSWER

Input tokens are the words you send to the model (your prompt, context, documents). Output tokens are the words the model generates back. Output costs more because the model generates each output token sequentially — one full computation per token — while all input tokens are processed together in a single parallel pass.

IN THIS ARTICLE

What Is a Token?
Input Tokens Explained
Output Tokens Explained
Why Output Costs More: The Technical Reason
The Input-to-Output Price Ratio Across Models
What This Means for Your Costs
Context Caching: Reducing Input Costs
Frequently Asked Questions

What Is a Token?

Before exploring input vs output, it's crucial to understand what a token is. A token is the fundamental building block of data that Large Language Models process. Models don't read words letters-by-letter as humans do; they read sequences of tokens.

As a general rule of thumb for English text, 1 token is approximately 0.75 words (or about 4 characters). This means a 1,500-word blog post translates to roughly 2,000 tokens. Punctuation, spaces, and special coding characters often count as their own distinct tokens. Every API request is billed purely on how many tokens it consumes.

Input Tokens Explained

Input tokens (sometimes called prompt tokens) are everything you send to the AI model in your API request.

This total includes your System Prompt, the entire conversation history you inject back into the current turn, any Retrieval-Augmented Generation (RAG) documents you attach, and the user's specific query. When you make an API call, you pay for the model to "read" all of these input tokens in order to understand the context before it replies.

Output Tokens Explained

Output tokens (sometimes called completion tokens) are everything the AI model generates and sends back to you.

This is the actual answer, the summarised text, the generated code block, or the JSON object. You pay for the model to "write" these tokens. As you likely noticed on pricing pages, this "writing" process is significantly more expensive than the "reading" process.

Why Output Costs More: The Technical Reason

Processing input tokens is essentially a highly parallelised "reading" operation. GPU architectures can ingest and map the relationships of thousands of tokens simultaneously. This parallel efficiency is exactly why input tokens are cheap.

The key difference is that generation is an autoregressive process. The model must predict the next token, append it to the context, and then predict the next one.

Because output generation is sequential, it cannot be parallelised the way reading input can. The GPU must wait for token N before it can compute token N+1. This bottleneck requires significantly more computational time and energy per token, dictating the 4–8x higher price premium.

The Input-to-Output Price Ratio Across Models

Every major AI provider prices output tokens at a premium. The ratio ranges from 1.5x (DeepSeek V3.2) to 8x (GPT-5). See our full GPT-5 vs DeepSeek V3 pricing comparison for model-by-model details.

MODEL	INPUT / 1M	OUTPUT / 1M	OUTPUT:INPUT RATIO	BEST FOR
DeepSeek V3.2	$0.28	$0.42	1.5x	Generation-heavy tasks
GPT-5 Mini	$0.40	$1.60	4x	Balanced workloads
GPT-5	$1.25	$10.00	8x	Long context, agents
Claude Sonnet 4.6	$3.00	$15.00	5x	Complex reasoning
GPT-5.2 Pro	$21.00	$168.00	8x	Flagship / research

DeepSeek V3.2's unusually low output:input ratio (1.5x vs the industry norm of 4–8x) makes it especially cost-effective for generation-heavy tasks like writing, code generation, and long-form summarisation.

What This Means for Your Costs

The input/output split in your workload has a larger impact on your bill than the model you choose. A workflow that reads 10,000 tokens and outputs 50 words is almost entirely input-driven; one that outputs long essays from short prompts is almost entirely output-driven.

Typical Input:Output Ratios by Task Type

Classification / sentiment tagging — ~100:1 (read 1,000 tokens, output one label)
Document summarisation — ~10:1 (read 1,000 tokens, output 100)
RAG-based Q&A — ~4:1 (800 tokens of retrieved context, 200 words output)
Customer support chatbot — ~2:1 (short user message + history, output reply)
Code generation / creative writing — ~1:3 or 1:4 (short prompt, long generated output)

COST OPTIMISATION TIP

Since output tokens dominate cost in generation-heavy workflows, adding output length constraints to your prompts directly reduces your bill. Instructions like "Reply in under 150 words" or "Use bullet points only" can cut output token volume by 40–60%. On GPT-5 at $10/M output tokens, every 1,000 tokens saved across 10,000 monthly runs is $10/month. Use the Burn Rate Calculator to model the impact.

Context Caching: Reducing Input Costs

Context caching stores a repeated prompt prefix at a heavily discounted rate. Both DeepSeek and OpenAI apply it automatically when they detect a repeated prefix — no SDK changes required. For apps that prepend the same 2,000-token system prompt to every API call, caching can reduce effective input costs by 75–90%.

PROVIDER	STANDARD INPUT / 1M	CACHED INPUT / 1M	CACHE DISCOUNT
DeepSeek V3.2	$0.28	$0.028	10x cheaper
OpenAI GPT-5	$1.25	$0.31	4x cheaper
OpenAI GPT-5 Mini	$0.40	$0.10	4x cheaper

Frequently Asked Questions

What is the difference between input tokens and output tokens?+

Input tokens are the text you send to the model — system prompt, conversation history, injected context, and the user message. Output tokens are the text the model generates in reply. Both are billed separately at different rates, with output typically 4–8x more expensive due to sequential computation requirements.

Why do output tokens cost more than input tokens?+

Output tokens are generated sequentially — the model runs a full neural network forward pass for each token, and each pass depends on the previous one, so they cannot be parallelised. Input tokens are processed in one parallel pass. Generating 500 output tokens requires 500 separate GPU computations vs roughly 1 for 500 input tokens — hence the 4–8x premium.

How many tokens is 1,000 words?+

Approximately 1,333 tokens for 1,000 English words, based on the rule of 1 token ≈ 0.75 words (~4 characters). A 1,500-word blog post is roughly 2,000 tokens. Actual counts depend on the model's tokenizer, language, and formatting.

What is context caching in AI APIs?+

Context caching stores a repeated prompt prefix (system prompt, document, standard instructions) so subsequent API calls reuse it at a much lower rate. DeepSeek V3.2: $0.028/M cached input (10x cheaper than standard). OpenAI GPT-5: $0.31/M cached (4x cheaper). Caching only reduces input costs, not output.

How can I reduce my output token costs?+

Add output length constraints to your prompts ("Reply in under 150 words", "Use bullet points only"). Choose models with a lower output:input ratio — DeepSeek V3.2's output rate is only 1.5x its input rate vs GPT-5's 8x. Route classification and short-answer tasks to smaller, cheaper models.