Moonshot AI Releases Kimi K2.7-Code with 30% Better Token Efficiency
The new Mixture-of-Experts model slashes reasoning overhead while boosting agentic coding performance across complex software engineering workflows.
Moonshot AI has released Kimi K2.7-Code, a coding-focused agentic model designed to tackle complex, long-horizon software engineering workflows. Built on top of the previous Kimi K2.6, this new model introduces a major practical upgrade for developers: a roughly 30% reduction in thinking-token usage. By optimizing reasoning efficiency without sacrificing capability, Kimi K2.7-Code presents an intriguing option for both local inference and agentic developer tooling.
Under the Hood: A Massive Mixture-of-Experts Architecture
Kimi K2.7-Code leverages a Mixture-of-Experts (MoE) architecture designed to balance high capacity with computational efficiency. While the model boasts a massive 1 trillion total parameters, it activates only 32 billion parameters per token. This sparse activation makes it highly viable for optimized deployment pipelines.
Key architectural specifications include:
- Layers: 61 total layers (including 1 dense layer).
- Experts: 384 total experts, with 8 selected experts per token and 1 shared expert.
- Attention: Multi-head Latent Attention (MLA) with a hidden dimension of 7168.
- Context Window: A generous 256K context length (specifically tested up to 262,144 tokens).
- Activation: SwiGLU.
- Vision Capabilities: Includes the MoonViT vision encoder (400M parameters) for multimodal tasks.
Benchmarking Performance and Agentic Tool Use
To prove its mettle in real-world scenarios, Moonshot AI evaluated Kimi K2.7-Code against its predecessor, as well as frontier models like GPT-5.5 (running in Codex with xhigh mode) and Claude Opus 4.8 (running in Claude Code with max effort/xhigh mode).
The model was put through several rigorous benchmarks:
- Kimi Code Bench v2: An in-house benchmark covering 10+ programming languages and full-stack production tasks (backend, infrastructure, security, frontend).
- Program Bench: A challenging test where agents must recreate a program's behavior from a compiled binary and documentation alone, passing thousands of fuzz-generated tests.
- MLS Bench Lite: A 30-task subset evaluating an agent's ability to invent scalable machine learning methods.
- Agentic Benchmarks: Including Kimi Claw 24/7 (multi-day coworking tasks), MCP Atlas, and MCP Mark Verified (evaluating Model Context Protocol tool-use across environments like GitHub, Postgres, and Playwright).
Here is how Kimi K2.7-Code stacks up:
| Benchmark | Kimi K2.6 | Kimi K2.7-Code | GPT-5.5 | Claude Opus 4.8 |
|---|---|---|---|---|
| Kimi Code Bench v2 | 50.9 | 62.0 | 69.0 | 67.4 |
| Program Bench | 48.3 | 53.6 | 69.1 | 63.8 |
| MLS Bench Lite | 26.7 | 35.1 | 35.5 | 42.8 |
| Kimi Claw 24/7 Bench | 42.9 | 46.9 | 52.8 | 50.4 |
| MCP Atlas | 69.4 | 76.0 | 79.4 | 81.3 |
| MCP Mark Verified | 72.8 | 81.1 | 92.9 | 76.4 |
While frontier models still hold an edge in several categories, Kimi K2.7-Code shows a substantial leap over K2.6, particularly in tool use (MCP Mark Verified) and machine learning engineering (MLS Bench Lite).
Deployment and Local Inference Integration
For developers looking to run the model locally or integrate it into custom dev-tooling, Kimi K2.7-Code supports native INT4 quantization (reusing the same method as Kimi-K2-Thinking). Because it shares its underlying architecture with Kimi K2.5 and K2.6, existing deployment configurations can be directly reused.
The model is officially recommended for execution on several popular open-source inference engines:
To run the model via Hugging Face transformers, ensure your environment meets the dependency requirement: transformers >=4.57.1, <5.0.0.
API Usage and Constraints
If you prefer a managed option, Moonshot AI hosts an OpenAI- and Anthropic-compatible API on the Moonshot AI Platform.
When integrating Kimi K2.7-Code, developers must keep a few operational constraints in mind:
- Thinking Mode is Mandatory: The model forces both
thinkingandpreserve_thinkingparameters toTrue. "Instant mode" is not supported. - Hyperparameters: The recommended temperature for Thinking mode is
1.0, and the recommendedtop_pis0.95. - Experimental Video Support: Chatting with video content is currently experimental and only supported on the official Moonshot AI API.
Here is a quick example of how to initialize a chat completion using the official Python SDK:
import openai
client = openai.OpenAI(
base_url="https://api.moonshot.ai/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="kimi-k2.7-code",
messages=[
{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
{"role": "user", "content": "Which one is bigger, 9.11 or 9.9? Think carefully."}
],
stream=False,
max_tokens=4096
)
print("====== Reasoning Content ======")
print(response.choices[0].message)
With its improved token efficiency and robust agentic capabilities, Kimi K2.7-Code offers a compelling, highly optimized alternative for developers building the next generation of AI-assisted coding tools.
Sources & further reading
Mariana covers the fast-moving world of machine learning and generative AI, with a particular focus on how these technologies are reshaping development workflows. When she isn't stress-testing the latest foundation models, she's usually at a local hackathon.
Discussion 0
No comments yet
Be the first to weigh in.