1M+
Token context window
Approximately 2,500 pages of text, 11 hours of audio, or 1 hour of video. Matching Claude (1M) and nearly 8x larger than GPT (128K).
Context window comparison (tokens)
128K
GPT
128K tokens
1M
Claude
1M tokens
1M
Gemini
1M tokens
0
Pages of text per prompt
0 hrs
Audio in a single request
0%
Savings with context caching
When you might not need RAG
For many use cases, the 1M token context window is large enough to skip the complexity of retrieval augmented generation (RAG) entirely. Instead of building an embedding pipeline, vector database, and retrieval logic, you can often just include all relevant documents directly in the prompt. This dramatically simplifies architecture at the cost of higher per request token usage, which context caching can offset.