K2.5 Architecture, Benchmarks, and Operating Modes | Kimi

0T params

Total model parameters

0B active

Active per token (MoE)

0K tokens

Context window

Expert modules

	K2.5 Score	Context
SWE-Bench Verified	76.8	Outperforms Gemini Pro 3.1
SWE-Bench Multilingual	73.0	Outperforms GPT-5.2 and Gemini Pro 3.1
AIME 2025	96.1	Advanced math competition problems
GPQA Diamond	87.6	Graduate level science questions
HLE (with tools)	50.2	At 76% lower cost than Claude Opus 4.6
LiveCodeBench	85.0	Real time coding evaluation
MathVision	84.2	Mathematical visual reasoning
OCRBench	92.3	Optical character recognition accuracy
VideoMMMU	86.6	Video understanding and reasoning
WeirdML	46%	Unusual/edge case reasoning (lower than competitors)

	Speed	Best For
Instant	Fastest	Quick questions, simple tasks, casual conversation
Thinking	Medium	Math, logic, complex analysis, coding problems
Agent	Slower	Research, multi-step tasks, file processing
Agent Swarm (beta)	Variable	Deep research, large codebase analysis, comprehensive investigations

Why MoE matters for cost

With only 32 billion parameters active per token out of 1 trillion total, K2.5 achieves frontier performance while using a fraction of the compute per inference. This is why API pricing can be $0.60/$3.00 per million tokens while matching models that cost 5x to 25x more.