Can a Pool of AI Models Beat One Big One? Sakana's Fugu Makes the Case

Raju Shaik
By -
0

Can a Pool of AI Models Beat One Big One? Sakana's Fugu Makes the Case

Japanese start-up Sakana AI has launched Fugu, an AI platform built on multi-model orchestration, claiming it matches Anthropic's restricted Fable 5 and Mythos Preview while beating the frontier models anyone can buy today. The Tokyo company released Fugu and a heavier Fugu Ultra variant on 22 June, pitching the system as a hedge against the single-vendor dependence that left enterprises stranded when Anthropic's top models went dark this month.

Fugu skips the industry playbook of building one ever-larger model. It works as an orchestration layer: a language model trained to call other language models from a swappable pool, including copies of itself. Send one request to one API, and Fugu decides whether to answer directly or assemble a team of specialist models for the job. To the developer it behaves like a single model; inside, a coordinated system handles selection, delegation, checking and synthesis. The approach builds on two Sakana papers presented at ICLR 2026, Trinity and Conductor.

The coding numbers Sakana is leaning on

Sakana's published table compares Fugu against the base models in its own pool. On LiveCodeBench, standard Fugu scored 92.9 and Fugu Ultra 93.2, ahead of Opus 4.8 at 87.8, Gemini 3.1 Pro at 88.5 and GPT-5.5 at 85.3. On the tougher SWE-Bench Pro, Fugu Ultra reached 73.7 against Opus 4.8's 69.2 and GPT-5.5's 58.6, using the mini-swe-agent scaffold. LiveCodeBench presents fresh programming challenges to test real coding ability, and the spread suggests the orchestration approach holds up on code workloads.

Science and reasoning scores

On GPQA-Diamond — graduate-level multiple-choice questions across biology, chemistry and physics — both Fugu and Fugu Ultra hit 95.5, clear of Opus 4.8 at 92.0, Gemini 3.1 Pro at 94.3 and GPT-5.5 at 93.6. The point Sakana wants to make: one orchestration layer can carry both scientific reasoning and software work.

Where the Fable 5 claim actually sits

Sakana stops short of saying Fugu beats Anthropic's best. Its own phrasing is that Fugu Ultra stands "shoulder-to-shoulder" with Fable 5 and Mythos Preview across engineering, science and reasoning benchmarks. The hedge matters, because neither Anthropic model sits in Fugu's pool — both became inaccessible after the US government suspended foreign access on 12 June. Sakana frames that suspension as the whole argument for orchestration: when access to one provider vanishes overnight, a swappable pool reroutes around the gap.

Two models, two jobs

The standard Fugu targets coding, chat and everyday work at lower latency, and slots into tools such as Codex. Teams with privacy or compliance limits can drop specific agents from the pool. Fugu Ultra goes after harder, multi-step problems — AI research, paper reproduction, cybersecurity analysis, and literature and patent searches. In Sakana's own tests, the company says Fugu beat Gemini 3.1 Pro, Opus 4.8 and GPT-5.5 on specialised tasks including automated research, mechanical design and financial forecasting.

The early reception is split
The launch-week response cooled the benchmark story. Researcher Ethan Mollick called Fugu Ultra slow, with a routine coding test running 30 minutes for results that fell short of Fable in practice. Hacker News developers complained that even the $200 plan buys under three hours of use a week, and that output quality sits below Fable, though code review drew praise as roughly matching Opus 4.8 or GPT-5.5. The sovereignty pitch took fire too, since Fugu still depends on whichever models sit in its pool — and Sakana uses proprietary models such as Claude Opus to post its benchmark numbers.

The team behind it

Sakana AI was founded in 2023 by David Ha (CEO), Llion Jones (CTO) and Ren Ito (Chairman). Jones co-authored the 2017 "Attention Is All You Need" paper at Google that produced the transformer architecture; Ha is a former Google Brain researcher who later ran research at Stability AI. The Tokyo company has bet from the start on collective intelligence over scale, and Fugu is the most direct expression of that bet — a route to frontier-level output by combining models rather than building a bigger one.




Tags:

Post a Comment

0 Comments

Post a Comment (0)

#buttons=(Ok, Go it!) #days=(20)

Our website uses cookies to enhance your experience. Check Out
Ok, Go it!