Shanghai's Quiet Challenger Just Rewrote the AI Price List
A month ago, MiniMax was a name most people outside China's tech circles had never heard. That changed on 12 February when the Shanghai-based lab released M2.5, an open-weight model that matches or beats the best from Anthropic, Google, and OpenAI on several key benchmarks, at a fraction of the cost.
The timing was deliberate. MiniMax had just completed its Hong Kong IPO, and M2.5 was its opening argument to the global developer market: frontier-class intelligence, priced for mass adoption.
The Numbers That Spooked Silicon Valley
M2.5 scored 80.2% on SWE-Bench Verified, the industry's go-to test for real-world coding ability. That puts it neck-and-neck with Anthropic's Claude Opus 4.6 at 80.8% and ahead of OpenAI's GPT-5.2 at 80.0%. On BrowseComp, which measures web search and retrieval, M2.5 hit 76.3%, outpacing GPT-5.2's 65.8% by a wide margin.
But the real story is the price tag. MiniMax charges $0.30 per million input tokens and $1.20 per million output tokens. That is 10 to 20 times cheaper than comparable offerings from Western labs. For developers building agentic applications that chew through millions of tokens daily, this is not a marginal saving. It is a structural shift.
"AI infrastructure is becoming foundational economic infrastructure. The companies that control cost-per-token will control the next wave of deployment." - Jensen Huang, CEO, NVIDIA
How MiniMax Built a Frontier Model on a Budget
M2.5 is a Mixture of Experts architecture with 230 billion total parameters, but only 10 billion are active during any given inference call. This design means the model carries the knowledge of a much larger system while running with the efficiency of a far smaller one.
The result is a model that is 37% faster on complex tasks than its predecessor M2.1 and uses roughly 20% fewer search and tool iterations on agentic benchmarks. In practical terms, it does more work with fewer calls, which compounds the cost advantage.
By The Numbers
- 80.2%: M2.5's score on SWE-Bench Verified, matching frontier Western models
- $0.30 per million input tokens: 10-20x cheaper than comparable Western offerings
- 230 billion total parameters: But only 10 billion active during inference
- 76.3% on BrowseComp: Outperforming GPT-5.2 by more than 10 percentage points
- 42 on Artificial Analysis Intelligence Index: Well above the median of 27 for similar open-weight models
Five Models in Five Weeks
MiniMax is not alone. In the first weeks of March 2026, five major Chinese AI models hit the market from Tencent, Alibaba, Baidu, and ByteDance. The pace is relentless. The lag between Chinese releases and the Western frontier has shrunk from months to weeks, and in some benchmarks, the gap has closed entirely.
Beijing approved Alibaba, ByteDance, and Tencent to order roughly 400,000 NVIDIA H200 chips in January, while simultaneously pushing domestic alternatives from Huawei and Cambricon. China is running a dual-track strategy: buy the best available hardware now, build your own for later.
"The cost per token is still too high for most enterprise deployments in Asia. Models like M2.5 change the equation completely for regional developers." - Kai-Fu Lee, CEO, Sinovation Ventures
The open-source angle matters too. Chinese labs have embraced open weights as a distribution strategy, earning goodwill in global developer communities. More Silicon Valley applications are expected to ship on top of Chinese open models in 2026 than ever before.
What This Means for Asian Developers
For startups across Southeast Asia, India, and Japan, MiniMax's pricing removes one of the biggest barriers to building AI-native products. A company in Jakarta or Bangalore that previously budgeted $50,000 a month for API calls can now get equivalent intelligence for $2,500. That changes what is economically viable.
| Model | SWE-Bench Verified | Input Cost (per 1M tokens) | Architecture |
|---|---|---|---|
| MiniMax M2.5 | 80.2% | $0.30 | MoE, 230B total / 10B active |
| Claude Opus 4.6 | 80.8% | $15.00 | Dense |
| GPT-5.2 | 80.0% | $10.00 | Dense |
| Gemini 3 Pro | 78.5% | $7.00 | MoE |
The table tells the story. When a $0.30 model performs within a percentage point of a $15.00 model, the conversation shifts from capability to deployment economics. And in markets where margins are thin and developer salaries are lower, deployment economics is everything.
The Catch Nobody Talks About
There are caveats. M2.5's output speed of 39.3 tokens per second sits below the median of 52.6 for comparable models. For latency-sensitive applications like real-time chat, that matters. The model also lacks the extensive safety tuning and alignment infrastructure that Western labs have built over years.
Geopolitics remains the elephant in the room. Enterprise customers in regulated industries may hesitate to build critical systems on Chinese-origin models, regardless of performance. Data sovereignty concerns, export controls, and shifting regulatory landscapes all add friction that raw benchmarks cannot capture.
But for the vast middle market of developers building internal tools, content systems, and lightweight agentic workflows, those concerns are secondary to the question that MiniMax has forced: why pay 20 times more for the same result?
Is MiniMax M2.5 really as good as Claude or GPT?
On coding benchmarks like SWE-Bench Verified, M2.5 performs within one percentage point of both Claude Opus 4.6 and GPT-5.2. However, benchmark scores do not capture everything. Safety alignment, output consistency, and latency vary, and real-world performance depends heavily on the specific use case.
Why is MiniMax so much cheaper than Western models?
The Mixture of Experts architecture uses only 10 billion of its 230 billion parameters during inference, dramatically reducing compute costs. Lower labour costs, aggressive pricing strategy to capture market share, and open-weight distribution also play a role.
Can businesses outside China safely use Chinese AI models?
For non-sensitive applications like content generation, internal tooling, and development assistance, many businesses already do. For regulated industries handling personal data or operating under strict compliance requirements, additional due diligence on data handling and model provenance is advisable.
What does this mean for AI pricing in 2026?
Downward pressure is now structural. Western labs face a choice: match Chinese pricing by improving efficiency, differentiate on safety and alignment, or focus on enterprise features that justify premium pricing. Most will attempt all three.
If you could build any AI product with tokens at $0.30 per million, what would you build first? Drop your take in the comments below.







No comments yet. Be the first to share your thoughts!
Leave a Comment