Cost Optimization: Caching, Batching, and Model Selection

All models, all tiers2 min read

Cost per million input tokens by optimization method

0%

Max savings with prompt caching

0%

Batch API discount on all tokens

0K

Max requests per batch

Haiku 4.5Sonnet 4.6Opus 4.6
Input / MTok$1$3$5
Output / MTok$5$15$25
Batch input$0.50$1.50$2.50
Batch output$2.50$7.50$12.50
Cache read (per MTok)$0.10$0.30$0.50
Best forRouting, classificationMost production workloadsComplex reasoning, agents

Stack your discounts

Prompt caching and the Batch API discounts stack. For a large evaluation run with cached system prompts, you pay the cache read rate (0.1x) on the cached portion, and 50% discount on everything through the Batch API. On an Opus 4.6 workload, this can bring effective input costs from $5/MTok down to $0.25/MTok for cached tokens in batch mode.