Shanghai's Quiet Challenger Just Rewrote the AI Price List
A month ago, MiniMax was a name most people outside China's tech circles had never heard. That changed on 12 February when the Shanghai-based lab released M2.5, an open-weight model that matches or beats the best from Anthropic, Google, and OpenAI on several key benchmarks, at a fraction of the cost.
The timing was deliberate. MiniMax had just completed its Hong Kong IPO, and M2.5 was its opening argument to the global developer market: frontier-class intelligence, priced for mass adoption. As we've seen with other Chinese AI models gaining global traction, the competitive landscape is shifting rapidly.
The Numbers That Spooked Silicon Valley
M2.5 scored 80.2% on SWE-Bench Verified, the industry's go-to test for real-world coding ability. That puts it neck-and-neck with Anthropic's Claude Opus 4.6 at 80.8% and ahead of OpenAI's GPT-5.2 at 80.0%. On BrowseComp, which measures web search and retrieval, M2.5 hit 76.3%, outpacing GPT-5.2's 65.8% by a wide margin.
But the real story is the price tag. MiniMax charges $0.30 per million input tokens and $1.20 per million output tokens. That is 10 to 20 times cheaper than comparable offerings from Western labs.
For developers building agentic applications that chew through millions of tokens daily, this is not a marginal saving. It is a structural shift that mirrors the broader pricing pressures we're seeing across the AI industry.
By The Numbers
- 80.2%: M2.5's score on SWE-Bench Verified, matching frontier Western models
- $0.30 per million input tokens: 10-20x cheaper than comparable Western offerings
- 230 billion total parameters: But only 10 billion active during inference
- 76.3% on BrowseComp: Outperforming GPT-5.2 by more than 10 percentage points
- 42 on Artificial Analysis Intelligence Index: Well above the median of 27 for similar open-weight models
"AI infrastructure is becoming foundational economic infrastructure. The companies that control cost-per-token will control the next wave of deployment." - Jensen Huang, CEO, NVIDIA
How MiniMax Built a Frontier Model on a Budget
M2.5 is a Mixture of Experts architecture with 230 billion total parameters, but only 10 billion are active during any given inference call. This design means the model carries the knowledge of a much larger system while running with the efficiency of a far smaller one.
The result is a model that is 37% faster on complex tasks than its predecessor M2.1 and uses roughly 20% fewer search and tool iterations on agentic benchmarks. In practical terms, it does more work with fewer calls, which compounds the cost advantage.
Five Models in Five Weeks
MiniMax is not alone. In the first weeks of March 2026, five major Chinese AI models hit the market from Tencent, Alibaba, Baidu, and ByteDance. The pace is relentless, and as highlighted in recent coverage of Moonshot AI's rapid growth, Chinese companies are aggressively scaling their AI capabilities.
Beijing approved Alibaba, ByteDance, and Tencent to order roughly 400,000 NVIDIA H200 chips in January, while simultaneously pushing domestic alternatives from Huawei and Cambricon. China is running a dual-track strategy: buy the best available hardware now, build your own for later.
"The cost per token is still too high for most enterprise deployments in Asia. Models like M2.5 change the equation completely for regional developers." - Kai-Fu Lee, CEO, Sinovation Ventures
The open-source angle matters too. Chinese labs have embraced open weights as a distribution strategy, earning goodwill in global developer communities. More Silicon Valley applications are expected to ship on top of Chinese open models in 2026 than ever before.
What This Means for Asian Developers
For startups across Southeast Asia, India, and Japan, MiniMax's pricing removes one of the biggest barriers to building AI-native products. A company in Jakarta or Bangalore that previously budgeted $50,000 a month for API calls can now get equivalent intelligence for $2,500. That changes what is economically viable.
| Model | SWE-Bench Verified | Input Cost (per 1M tokens) | Architecture |
|---|---|---|---|
| MiniMax M2.5 | 80.2% | $0.30 | MoE, 230B total / 10B active |
| Claude Opus 4.6 | 80.8% | $15.00 | Dense |
| GPT-5.2 | 80.0% | $10.00 | Dense |
| Gemini 3 Pro | 78.5% | $7.00 | MoE |
The table tells the story. When a $0.30 model performs within a percentage point of a $15.00 model, the conversation shifts from capability to deployment economics. And in markets where margins are thin and developer salaries are lower, deployment economics is everything.
This trend aligns with what we've observed in our analysis of enterprise AI adoption challenges in Asia, where cost remains a primary barrier to scaling AI initiatives beyond pilot phases.
The Catch Nobody Talks About
There are caveats. M2.5's output speed of 39.3 tokens per second sits below the median of 52.6 for comparable models. For latency-sensitive applications like real-time chat, that matters. The model also lacks the extensive safety tuning and alignment infrastructure that Western labs have built over years.
Geopolitics remains the elephant in the room. Enterprise customers in regulated industries may hesitate to build critical systems on Chinese-origin models, regardless of performance. Data sovereignty concerns, export controls, and shifting regulatory landscapes all add friction that raw benchmarks cannot capture.
But for the vast middle market of developers building internal tools, content systems, and lightweight agentic workflows, those concerns are secondary to the question that MiniMax has forced: why pay 20 times more for the same result? This echoes the broader competitive pressures we're seeing as China positions AI at the centre of its industrial strategy.
The Practical Impact
Consider the practical implications for common use cases across Asian markets:
- Content generation: Thai media companies can now afford to run AI-powered translation and localisation at scale
- Customer service: Indonesian fintech firms can deploy multilingual chatbots without breaking their operational budgets
- Code assistance: Indian software teams can integrate AI pair programming tools across entire development organisations
- Document processing: Japanese enterprises can automate regulatory compliance workflows that were previously too expensive to digitise
- Educational tools: Vietnamese ed-tech companies can offer personalised tutoring systems to millions of students
Is MiniMax M2.5 really as good as Claude or GPT?
On coding benchmarks like SWE-Bench Verified, M2.5 performs within one percentage point of both Claude Opus 4.6 and GPT-5.2. However, benchmark scores do not capture everything. Safety alignment, output consistency, and latency vary, and real-world performance depends heavily on the specific use case.
Why is MiniMax so much cheaper than Western models?
The Mixture of Experts architecture uses only 10 billion of its 230 billion parameters during inference, dramatically reducing compute costs. Lower labour costs, aggressive pricing strategy to capture market share, and open-weight distribution also play a role in driving down prices.
Can businesses outside China safely use Chinese AI models?
For non-sensitive applications like content generation, internal tooling, and development assistance, many businesses already do. For regulated industries handling personal data or operating under strict compliance requirements, additional due diligence on data handling and model provenance is advisable.
What does this mean for AI pricing in 2026?
Downward pressure is now structural. Western labs face a choice: match Chinese pricing by improving efficiency, differentiate on safety and alignment, or focus on enterprise features that justify premium pricing. Most will attempt all three simultaneously.
How does this affect enterprise AI adoption in Asia?
Cost reduction of this magnitude removes the primary barrier preventing Asian enterprises from scaling AI beyond pilot projects. Companies can now afford to deploy AI across entire organisations rather than limiting it to high-value use cases only.
The AI pricing war has officially begun, and it's being fought with Asian characteristics: aggressive pricing, rapid iteration, and open distribution. As models like M2.5 prove that frontier performance doesn't require frontier pricing, the entire industry must recalibrate its assumptions about what AI deployment looks like at scale. What's your strategy for navigating this new landscape? Drop your take in the comments below.








No comments yet. Be the first to share your thoughts!
Leave a Comment