GLM-5.2 Claims Top Open-Weights Spot on Artificial Analysis
Z ai's latest model matches proprietary frontier performance on agentic benchmarks while keeping an open MIT license.
The open-weights landscape has a new frontrunner. Z ai has officially released GLM-5.2, and it has immediately claimed the top spot on the Artificial Analysis Intelligence Index v4.1. Scoring a 51, the model edges out formidable open-weights competitors like MiniMax-M3 and DeepSeek V4 Pro.
For engineering teams looking to self-host frontier-grade intelligence without proprietary lock-in, GLM-5.2 presents a compelling new option. It matches the performance of top-tier closed models on agentic tasks while remaining accessible under a highly permissive license.
The Architecture and Footprint
GLM-5.2 maintains the exact same hardware footprint as its predecessor, GLM-5.1, coming in at 744 billion total parameters with 40 billion active parameters per token. Because the active parameter count remains unchanged, the dramatic intelligence jump is a result of training and optimization breakthroughs rather than raw parameter scaling.
However, Z ai has delivered a massive upgrade to the model's context handling. The context window has been expanded from 200K tokens on GLM-5.1 to a massive 1M tokens on GLM-5.2. This expansion makes the model far more viable for long-document analysis, complex codebase ingestion, and multi-turn agentic workflows.
Crucially for enterprise developers, GLM-5.2 is released under the MIT License. This allows teams to modify, distribute, and commercially deploy the model without the restrictive clauses often found in other "open" model releases.
Benchmarking the Intelligence Leap
GLM-5.2's score of 51 on the Artificial Analysis Intelligence Index v4.1 puts it comfortably ahead of its closest open-weights peers:
- GLM-5.2: 51
- MiniMax-M3: 44
- DeepSeek V4 Pro (max): 44
- Kimi K2.6: 43
This 11-point jump over GLM-5.1 is driven by substantial improvements across almost all evaluation suites, with a heavy emphasis on scientific reasoning and developer-centric tasks.
| Benchmark | GLM-5.2 Score | Improvement vs. GLM-5.1 |
|---|---|---|
| CritPt (Scientific Reasoning) | 21% | +16 points |
| TerminalBench v2.1 | 78% | +16 points |
| tau3 banking | 27% | +15 points |
| HLE (Hard Logic/Reasoning) | 40% | +12 points |
| AA-LCR | 71% | +9 points |
| SciCode | 50% | +7 points |
| GPQA Diamond | 89% | +3 points |
Additionally, GLM-5.2's score on the AA-Omniscience Index has doubled from 2 to 4. This improvement is a product of both higher accuracy (25.1% vs. 24.2% on GLM-5.1) and a lower overall hallucination rate (28.1% vs. 29.4%), while the model's attempt rate remained flat at 47%.
Agentic Mastery: GDPval-AA v2
For developers building autonomous software agents, the most critical metric in the Artificial Analysis suite is GDPval-AA v2. This benchmark is specifically designed to evaluate real-world agentic performance. It baselines Elo to human performance at 1000, utilizes a rotating panel of frontier-model judges, and raises the turn limit from 100 to 250 to accommodate longer-horizon agent trajectories.
GLM-5.2 dominates the open-weights category here, scoring 1524. This places it ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328). More impressively, this score puts GLM-5.2 on par with leading proprietary models, effectively matching GPT-5.5 (xhigh reasoning), which scores 1514.
The Developer's Catch: Token Verbosity and Cost
While the benchmark victories are impressive, developers must weigh a significant architectural trade-off: GLM-5.2 is a highly verbose, "heavy-thinking" model.
GLM-5.2 uses an average of 43k output tokens per Intelligence Index task, a steep increase from GLM-5.1's 26k. Of those 43k tokens, a staggering 37k are dedicated entirely to internal reasoning. This is higher than its open-weights peers, including MiniMax-M3 (24k), Kimi K2.6 (35k), and DeepSeek V4 Pro (max, 37k). Consequently, GLM-5.2 sits off the most attractive quadrant on the Intelligence vs. Output Tokens chart.
However, Z ai has managed to keep the model on the Pareto frontier of Intelligence vs. Cost per Task. Because of its high intelligence-to-cost ratio, it offers the lowest cost per task among models at its specific intelligence level.
At a macro level, the cost per task breaks down as follows:
- DeepSeek V4 Pro (max): $0.05
- MiniMax-M3: $0.18
- GLM-5.1: $0.25
- Kimi K2.6: $0.31
- GLM-5.2: ~$0.46
If you choose to run the model via Z ai's first-party API, pricing is identical to GLM-5.1. It is priced at $1.4 per 1M input tokens, $4.4 per 1M output tokens, and $0.26 per 1M cache hit tokens.
Deployment and Availability
For teams not looking to host the 744B parameter model on their own hardware, GLM-5.2 is widely available across the developer ecosystem. Alongside Z ai's first-party API, developers can access the model through several major third-party inference providers, including DeepInfra, Novita, Nebius, Parasail, Siliconflow, GMI Cloud, Baseten, and Fireworks.
Sources & further reading
- GLM-5.2 is the new leading open weights model on Artificial Analysis — artificialanalysis.ai
Priya covers AI frameworks, developer productivity tooling, and the startup ecosystem across South and Southeast Asia, bringing a researcher's rigour and a practitioner's empathy to every story. She is deeply sceptical of benchmarks and asks hard questions so her readers don't have to.
Discussion 0
No comments yet
Be the first to weigh in.