Skip to content
AI Article

GLM-5.2 Claims Top Open-Weights Spot on Artificial Analysis

Z ai's latest model matches proprietary frontier performance on agentic benchmarks while keeping an open MIT license.

Priya Nair
Priya Nair
AI & Developer Experience Writer · Jun 17, 2026 · 4 min read

The open-weights landscape has a new frontrunner. Z ai has officially released GLM-5.2, and it has immediately claimed the top spot on the Artificial Analysis Intelligence Index v4.1. Scoring a 51, the model edges out formidable open-weights competitors like MiniMax-M3 and DeepSeek V4 Pro.

For engineering teams looking to self-host frontier-grade intelligence without proprietary lock-in, GLM-5.2 presents a compelling new option. It matches the performance of top-tier closed models on agentic tasks while remaining accessible under a highly permissive license.

The Architecture and Footprint

GLM-5.2 maintains the exact same hardware footprint as its predecessor, GLM-5.1, coming in at 744 billion total parameters with 40 billion active parameters per token. Because the active parameter count remains unchanged, the dramatic intelligence jump is a result of training and optimization breakthroughs rather than raw parameter scaling.

However, Z ai has delivered a massive upgrade to the model's context handling. The context window has been expanded from 200K tokens on GLM-5.1 to a massive 1M tokens on GLM-5.2. This expansion makes the model far more viable for long-document analysis, complex codebase ingestion, and multi-turn agentic workflows.

Crucially for enterprise developers, GLM-5.2 is released under the MIT License. This allows teams to modify, distribute, and commercially deploy the model without the restrictive clauses often found in other "open" model releases.

Benchmarking the Intelligence Leap

GLM-5.2's score of 51 on the Artificial Analysis Intelligence Index v4.1 puts it comfortably ahead of its closest open-weights peers:

  • GLM-5.2: 51
  • MiniMax-M3: 44
  • DeepSeek V4 Pro (max): 44
  • Kimi K2.6: 43

This 11-point jump over GLM-5.1 is driven by substantial improvements across almost all evaluation suites, with a heavy emphasis on scientific reasoning and developer-centric tasks.

Benchmark GLM-5.2 Score Improvement vs. GLM-5.1
CritPt (Scientific Reasoning) 21% +16 points
TerminalBench v2.1 78% +16 points
tau3 banking 27% +15 points
HLE (Hard Logic/Reasoning) 40% +12 points
AA-LCR 71% +9 points
SciCode 50% +7 points
GPQA Diamond 89% +3 points

Additionally, GLM-5.2's score on the AA-Omniscience Index has doubled from 2 to 4. This improvement is a product of both higher accuracy (25.1% vs. 24.2% on GLM-5.1) and a lower overall hallucination rate (28.1% vs. 29.4%), while the model's attempt rate remained flat at 47%.

Agentic Mastery: GDPval-AA v2

For developers building autonomous software agents, the most critical metric in the Artificial Analysis suite is GDPval-AA v2. This benchmark is specifically designed to evaluate real-world agentic performance. It baselines Elo to human performance at 1000, utilizes a rotating panel of frontier-model judges, and raises the turn limit from 100 to 250 to accommodate longer-horizon agent trajectories.

GLM-5.2 dominates the open-weights category here, scoring 1524. This places it ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328). More impressively, this score puts GLM-5.2 on par with leading proprietary models, effectively matching GPT-5.5 (xhigh reasoning), which scores 1514.

The Developer's Catch: Token Verbosity and Cost

While the benchmark victories are impressive, developers must weigh a significant architectural trade-off: GLM-5.2 is a highly verbose, "heavy-thinking" model.

GLM-5.2 uses an average of 43k output tokens per Intelligence Index task, a steep increase from GLM-5.1's 26k. Of those 43k tokens, a staggering 37k are dedicated entirely to internal reasoning. This is higher than its open-weights peers, including MiniMax-M3 (24k), Kimi K2.6 (35k), and DeepSeek V4 Pro (max, 37k). Consequently, GLM-5.2 sits off the most attractive quadrant on the Intelligence vs. Output Tokens chart.

However, Z ai has managed to keep the model on the Pareto frontier of Intelligence vs. Cost per Task. Because of its high intelligence-to-cost ratio, it offers the lowest cost per task among models at its specific intelligence level.

At a macro level, the cost per task breaks down as follows:

  • DeepSeek V4 Pro (max): $0.05
  • MiniMax-M3: $0.18
  • GLM-5.1: $0.25
  • Kimi K2.6: $0.31
  • GLM-5.2: ~$0.46

If you choose to run the model via Z ai's first-party API, pricing is identical to GLM-5.1. It is priced at $1.4 per 1M input tokens, $4.4 per 1M output tokens, and $0.26 per 1M cache hit tokens.

Deployment and Availability

For teams not looking to host the 744B parameter model on their own hardware, GLM-5.2 is widely available across the developer ecosystem. Alongside Z ai's first-party API, developers can access the model through several major third-party inference providers, including DeepInfra, Novita, Nebius, Parasail, Siliconflow, GMI Cloud, Baseten, and Fireworks.

Sources & further reading

  1. GLM-5.2 is the new leading open weights model on Artificial Analysis — artificialanalysis.ai
Priya Nair
Written by
Priya Nair · AI & Developer Experience Writer

Priya covers AI frameworks, developer productivity tooling, and the startup ecosystem across South and Southeast Asia, bringing a researcher's rigour and a practitioner's empathy to every story. She is deeply sceptical of benchmarks and asks hard questions so her readers don't have to.

Discussion 0

Join the discussion

Sign in or create an account to comment and vote.

No comments yet

Be the first to weigh in.

Related Reading