Xiaomi Open-Sources MiMo-V2.5 Models, Launches 100 Trillion-Token Incentive Program
The smartphone and EV manufacturer Xiaomi has officially open-sourced its MiMo-V2.5 series of large language models, marking a significant escalation in the company's AI ambitions. The release includes two distinct variants: MiMo-V2.5, a 310B-parameter sparse MoE model with native multimodal capabilities, and MiMo-V2.5-Pro, a 1.02T-parameter architecture designed for complex agentic tasks. Both models are available under the permissive MIT license, allowing commercial deployment without additional authorization.
According to documentation from the company, the models support context windows up to 1 million tokens and utilize hybrid attention architectures that reduce KV-cache storage by nearly 7x during long-context operations. (This is the kind of optimization that actually matters when you're running inference at scale.) The weights, tokenizer, and full model cards are hosted on Hugging Face, where developers can immediately download and deploy locally.
The technical specifications reveal a deliberate architectural choice. MiMo-V2.5-Base features 310B total parameters with 15B active, while MiMo-V2.5-Pro scales to 1.02T total with 42B active parameters. Both employ FP8 mixed precision and incorporate Multi-Token Prediction (MTP) modules that triple output speed during inference. The hybrid attention mechanism interleaves Sliding Window Attention and Global Attention at a 6:1 ratio, maintaining performance while managing the quadratic complexity that typically plagues long-context models.
Performance benchmarks position these models competitively against frontier closed-source alternatives. On the Claw-Eval benchmark for daily agentic tasks, MiMo-V2.5 achieves 62.3 on the general subset. The Pro variant leads open-source models with a 63.8% success rate while consuming approximately 70K tokens per trajectory—roughly 40–60% fewer tokens than comparable results from Anthropic Claude Opus 4.6 or Google Gemini 3.1 Pro require.
Independent reporting from VentureBeat corroborates the efficiency claims, noting the models' positioning near the Pareto frontier of performance and token efficiency. The outlet highlights specific agentic demonstrations: a complete Rust compiler implemented in 4.3 hours across 672 tool calls, and an 8,192-line video editor application built over 11.5 hours with 1,868 tool calls.
Alongside the model release, Xiaomi launched the MiMo Orbit program, which includes a 100 trillion-token creator incentive scheme. The program distributes free tokens over a 30-day window, with applications accepted through a dedicated portal. This subsidy strategy aims to lower barriers for developers building agent frameworks and AI applications on the MiMo infrastructure.
Pricing for API access reflects the competitive positioning. For overseas developers, MiMo-V2.5-Pro costs $1.00 per million input tokens (cache miss) and $3.00 for output within 256K context windows. The base model starts at $0.40 per million input tokens, placing it in the more affordable third of leading LLMs globally. Cache hits reduce input costs to as little as $0.20–$0.40 per million tokens, though this benefit depends on workload patterns.
Chip compatibility was announced on the first day of open-sourcing. The models completed adaptation with seven manufacturers including Alibaba T-Head, Amazon Web Services, AMD, Baidu Kunlun Chip, Enflame Technology, Muxi, and Daysci. Mainstream inference frameworks SGLang and vLLM also received Day 0 support, which matters for developers who need predictable deployment paths rather than experimental setups.
The physical reality of using these models involves navigating a 1M-token context window that can hold substantial codebases, documentation, or conversation history without degradation. Load times depend on hardware configuration, but the hybrid attention architecture should reduce memory pressure compared to naive implementations. Developers will still face the familiar friction of configuring inference engines, managing token budgets, and debugging agent trajectories that span thousands of tool calls.
Whether this aggressive pricing and open-source strategy translates to sustained adoption remains uncertain. The AI infrastructure market is crowded, and developers have shown willingness to switch providers based on reliability, not just cost. The 100 trillion-token subsidy creates initial momentum, but long-term retention depends on consistent performance and ecosystem support.
Xiaomi's move signals a broader shift toward open-source competition in the frontier model space. The company is leveraging its hardware ecosystem and manufacturing scale to subsidize AI development, creating a potential flywheel effect if the models prove reliable in production environments. Time will tell if the technical specifications match real-world performance across diverse use cases.
Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt Connect on LinkedIn
Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt
Comments