Skip to content
AI Article

Sakana's Fugu Puts a Multi-Agent System Behind One API

Japan's Sakana AI wraps a swappable pool of frontier LLMs into one OpenAI-compatible endpoint. Its benchmark wins come with an asterisk worth understanding.

Mariana Souza
Mariana Souza
Senior Editor · Jun 23, 2026 · 4 min read
Sakana's Fugu Puts a Multi-Agent System Behind One API

Tokyo lab Sakana AI shipped something that doesn't fit the usual model-release template. Fugu isn't a bigger transformer or a fresh set of weights. It's a language model trained to run a pool of other frontier LLMs, including copies of itself, and present the whole arrangement behind one OpenAI-compatible endpoint. Sakana calls it a "multi-agent system as a model." You send a request, and behind the API a coordinator picks models, splits the work, checks the output, and synthesizes an answer. You never see the machinery.

That framing matters more than the benchmark table everyone is quoting, so start there.

How it actually works

Fugu is built on two papers Sakana is presenting at ICLR 2026. The first, TRINITY, uses a lightweight coordinator (evolved rather than hand-designed) that assigns each model in the pool one of three jobs: Thinker, Worker, or Verifier. The second, the Conductor, is trained with reinforcement learning to discover coordination strategies expressed in plain natural language instead of a fixed routing table.

Put together, Fugu doesn't send a prompt to "the best model" and stop. It assembles a small team on the fly, lets them collaborate in patterns Sakana describes as "non-obvious but highly efficient," and runs a verification pass before returning anything. The closest mental model is a senior engineer who farms parts of a problem out to specialists, reviews their work, and stitches it into one answer. The twist is that the senior engineer here is also a model.

There are two tiers. Plain Fugu balances latency and quality for everyday work. Fugu Ultra throws more coordination and compute at hard problems, and it's the one Sakana points at Kaggle competitions, paper reproduction, and security analysis.

The benchmarks, and the asterisk

The numbers are real, and they're good. Sakana tested Fugu against Gemini 3.1 Pro, Opus 4.8, and GPT 5.5:

Benchmark Fugu Fugu Ultra
SWE-Bench Pro 59.0 73.7
LiveCodeBench 92.9 93.2
GPQA-D 95.5 95.5
Humanity's Last Exam 47.2 50.0

Fugu Ultra also edges out Anthropic's recently pulled models where they overlap, landing 93.2 on LiveCodeBench against Fable 5's 89.8 and 95.5 on GPQA-D against Mythos Preview's 94.6.

Here's the asterisk. Fugu isn't a single model, and most of the systems on the other side of that table are. Comparing an orchestrated team of frontier models to one model and calling it a head-to-head win is the kind of claim that reads cleanly in a launch post and falls apart under a second look. Coverage caught it fast. The honest version is narrower: Fugu Ultra, coordinating several frontier models with a verification step, can beat any individual frontier model on specific structured tasks. That's still a useful result. It just isn't "we built a model that beats GPT 5.5."

Keep one more thing in mind. Sakana's most eye-catching numbers are user-reported, not independent. The company cites users finding twenty-plus code-review issues where other tools found three, finishing patent-landscape work in hours instead of days, and running security assessments end to end without going out of scope. Treat those as vendor anecdotes until someone outside Sakana reproduces them.

What you actually get as a developer

The product decisions are more interesting than the leaderboard.

It's API only, OpenAI-compatible, and there are no open weights. If your plan was to pull this down and run it locally, it isn't that kind of release.

The pricing flips a common complaint on its head. The reflex reaction was "another subscription," but Fugu is meant to replace your direct API bills, not add to them. You pay Sakana ($20 to $200 a month, or pay-as-you-go at $5 in / $30 out per million tokens for Ultra, rising to $10 / $45 past a 272K context), and Sakana pays Anthropic, OpenAI, and Google underneath. When several agents run on one request, you're charged the top-tier model's rate, not the sum. For genuinely multi-model work, that math can come out ahead.

Teams with compliance constraints can opt specific providers or models out of the pool, but only on plain Fugu. Fugu Ultra doesn't allow it, which is a real limitation if your reason for reaching for it was avoiding a particular vendor.

One hard gate: Fugu is available everywhere except the EU and EEA. If your users or your company sit in Europe, it's a non-starter today.

Why a Japanese lab, why now

The timing isn't an accident. US export-control rules recently pulled Anthropic's Fable 5 and Mythos 5 from the market, and Sakana is walking straight through the gap. It pitches Fugu as standing "shoulder to shoulder" with restricted frontier systems while "delivering frontier capability without the risk of export controls."

Read past the marketing and there's a real strategic point. A lot of the frontier is now gated by where you are and which government signed off. An orchestration layer that sits on top of whatever models are legally available in your region, and hides the swapping, is a hedge against that volatility. You're buying capability and continuity, not a specific model.

The honest read

Fugu is a clever, real product, and the orchestration research under it is the part worth taking seriously. For agentic coding, reproduction work, and security analysis, the structured multi-step tasks where a verifier earns its keep, a coordinated pool beating any single model is both believable and useful.

The "beats GPT 5.5" framing is marketing, the headline numbers are partly self-reported, and the no-open-weights, no-EU, single-vendor-bill shape means you're trading control for convenience. Whether that trade is worth it depends entirely on what you're building. But the idea underneath, selling an orchestrated multi-agent system as if it were one model, is going to get copied. Sakana just shipped it first.

Sources & further reading

  1. Sakana Fugu — Multi-Agent System as a Model — sakana.ai
  2. Sakana AI Launches Sakana Fugu — marktechpost.com
  3. Fugu Ultra matches Mythos on certain benchmarks — business-standard.com
Mariana Souza
Written by
Mariana Souza · Senior Editor

Mariana covers the fast-moving world of machine learning and generative AI, with a particular focus on how these technologies are reshaping development workflows. When she isn't stress-testing the latest foundation models, she's usually at a local hackathon.

Discussion 3

Join the discussion

Sign in or create an account to comment and vote.

Nina Petrova @night_owl_nina · 1 day ago

it is 3am and i am wondering how fugu's coordinator handles conflicting outputs from the different llms, the article mentions it 'checks the output' but i'd love to know more about the specifics of that process

Leo Fontaine @ai_optimist_leo · 1 day ago

@night_owl_nina yeah that's the part that really gets me, how does it reconcile those conflicts

Marc Pope @marcpope · 1 day ago

From my initial testing, it's 4-5x slower and way more expensive.

Related Reading