Skip to content
AI Release

Moonshot AI Releases Kimi K2.7-Code with 30% Better Token Efficiency

The new Mixture-of-Experts model slashes reasoning overhead while boosting agentic coding performance across complex software engineering workflows.

Mariana Souza
Mariana Souza
Senior Editor · Jun 13, 2026 · 4 min read

Moonshot AI has released Kimi K2.7-Code, a coding-focused agentic model designed to tackle complex, long-horizon software engineering workflows. Built on top of the previous Kimi K2.6, this new model introduces a major practical upgrade for developers: a roughly 30% reduction in thinking-token usage. By optimizing reasoning efficiency without sacrificing capability, Kimi K2.7-Code presents an intriguing option for both local inference and agentic developer tooling.

Under the Hood: A Massive Mixture-of-Experts Architecture

Kimi K2.7-Code leverages a Mixture-of-Experts (MoE) architecture designed to balance high capacity with computational efficiency. While the model boasts a massive 1 trillion total parameters, it activates only 32 billion parameters per token. This sparse activation makes it highly viable for optimized deployment pipelines.

Key architectural specifications include:

  • Layers: 61 total layers (including 1 dense layer).
  • Experts: 384 total experts, with 8 selected experts per token and 1 shared expert.
  • Attention: Multi-head Latent Attention (MLA) with a hidden dimension of 7168.
  • Context Window: A generous 256K context length (specifically tested up to 262,144 tokens).
  • Activation: SwiGLU.
  • Vision Capabilities: Includes the MoonViT vision encoder (400M parameters) for multimodal tasks.

Benchmarking Performance and Agentic Tool Use

To prove its mettle in real-world scenarios, Moonshot AI evaluated Kimi K2.7-Code against its predecessor, as well as frontier models like GPT-5.5 (running in Codex with xhigh mode) and Claude Opus 4.8 (running in Claude Code with max effort/xhigh mode).

The model was put through several rigorous benchmarks:

  • Kimi Code Bench v2: An in-house benchmark covering 10+ programming languages and full-stack production tasks (backend, infrastructure, security, frontend).
  • Program Bench: A challenging test where agents must recreate a program's behavior from a compiled binary and documentation alone, passing thousands of fuzz-generated tests.
  • MLS Bench Lite: A 30-task subset evaluating an agent's ability to invent scalable machine learning methods.
  • Agentic Benchmarks: Including Kimi Claw 24/7 (multi-day coworking tasks), MCP Atlas, and MCP Mark Verified (evaluating Model Context Protocol tool-use across environments like GitHub, Postgres, and Playwright).

Here is how Kimi K2.7-Code stacks up:

Benchmark Kimi K2.6 Kimi K2.7-Code GPT-5.5 Claude Opus 4.8
Kimi Code Bench v2 50.9 62.0 69.0 67.4
Program Bench 48.3 53.6 69.1 63.8
MLS Bench Lite 26.7 35.1 35.5 42.8
Kimi Claw 24/7 Bench 42.9 46.9 52.8 50.4
MCP Atlas 69.4 76.0 79.4 81.3
MCP Mark Verified 72.8 81.1 92.9 76.4

While frontier models still hold an edge in several categories, Kimi K2.7-Code shows a substantial leap over K2.6, particularly in tool use (MCP Mark Verified) and machine learning engineering (MLS Bench Lite).

Advertisement

Deployment and Local Inference Integration

For developers looking to run the model locally or integrate it into custom dev-tooling, Kimi K2.7-Code supports native INT4 quantization (reusing the same method as Kimi-K2-Thinking). Because it shares its underlying architecture with Kimi K2.5 and K2.6, existing deployment configurations can be directly reused.

The model is officially recommended for execution on several popular open-source inference engines:

To run the model via Hugging Face transformers, ensure your environment meets the dependency requirement: transformers >=4.57.1, <5.0.0.

API Usage and Constraints

If you prefer a managed option, Moonshot AI hosts an OpenAI- and Anthropic-compatible API on the Moonshot AI Platform.

When integrating Kimi K2.7-Code, developers must keep a few operational constraints in mind:

  1. Thinking Mode is Mandatory: The model forces both thinking and preserve_thinking parameters to True. "Instant mode" is not supported.
  2. Hyperparameters: The recommended temperature for Thinking mode is 1.0, and the recommended top_p is 0.95.
  3. Experimental Video Support: Chatting with video content is currently experimental and only supported on the official Moonshot AI API.

Here is a quick example of how to initialize a chat completion using the official Python SDK:

import openai

client = openai.OpenAI(
    base_url="https://api.moonshot.ai/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="kimi-k2.7-code",
    messages=[
        {"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
        {"role": "user", "content": "Which one is bigger, 9.11 or 9.9? Think carefully."}
    ],
    stream=False,
    max_tokens=4096
)

print("====== Reasoning Content ======")
print(response.choices[0].message)

With its improved token efficiency and robust agentic capabilities, Kimi K2.7-Code offers a compelling, highly optimized alternative for developers building the next generation of AI-assisted coding tools.

Sources & further reading

  1. Kimi K2.7-Code: open-source coding model with better token efficiency — huggingface.co
Mariana Souza
Written by
Mariana Souza · Senior Editor

Mariana covers the fast-moving world of machine learning and generative AI, with a particular focus on how these technologies are reshaping development workflows. When she isn't stress-testing the latest foundation models, she's usually at a local hackathon.

Discussion 0

Join the discussion

Sign in or create an account to comment and vote.

No comments yet

Be the first to weigh in.

Related Reading