AI Agents AI Gadgets & HW AI Models - LLM AI Open Source AI Security AI for Coding AI for Gaming AI for Images AI for Music AI for Videos Artificial Intelligence Editor's Choice NVIDIA AI Other News Robotics Tech Face-off Tech Satire

NVIDIA Unveils Nemotron 3 Open Model Family for Agentic AI

By Artūras Malašauskas Apr 22, 2026 2 min read Share:
NVIDIA's Nemotron 3 family introduces open, efficient models with hybrid MoE architecture to power scalable multi-agent AI systems.

NVIDIA today announced the Nemotron 3 family of open models, designed to address critical challenges in multi-agent AI development through a breakthrough hybrid mixture-of-experts (MoE) architecture. The family includes Nano, Super, and Ultra variants, with the Super model now available and Ultra following in coming months, according to the official announcement.

The Nemotron 3 models directly tackle two key industry pain points: "context explosion" in multi-agent workflows, where agents generate up to 15x more tokens than standard chat systems, and the "thinking tax" of using large models for every subtask. By integrating Mamba layers for sequence efficiency and transformer layers for precision reasoning, Nemotron 3 Super achieves up to 5x higher throughput than its predecessor while maintaining accuracy, as detailed in the NVIDIA blog.

Central to Nemotron 3's architecture is its 1-million-token context window, which prevents goal drift during long workflows by retaining full workflow state in memory. The model's hybrid Mamba-Transformer MoE backbone combines three innovations: latent MoE (activating four expert specialists for the cost of one), multi-token prediction (3x faster inference), and NVFP4 precision training, which cuts memory requirements by 4x compared to FP8 on NVIDIA Hopper platforms.

Early adopters like ServiceNow and Perplexity are already integrating Nemotron 3 into their workflows. ServiceNow's CEO Bill McDermott emphasized the model's role in "empowering leaders across all industries to fast-track their agentic AI strategy," while Perplexity's Aravind Srinivas highlighted how the model enables "directing workloads to the best fine-tuned open models" for optimized token economics.

NVIDIA is releasing the models with open weights under a permissive license, alongside over 10 trillion tokens of pre- and post-training datasets, 15 reinforcement learning environments, and evaluation recipes. This comprehensive release—unprecedented for open models—enables developers to customize, optimize, and deploy the models on their own infrastructure, as stated in the Nemotron 3 White Paper.

Unlike proprietary alternatives, Nemotron 3's transparency allows organizations to align AI systems with their data, regulations, and values. This supports NVIDIA's broader sovereign AI initiative, with European and South Korean enterprises adopting the models to build compliance-aligned systems. The release also includes specialized variants: Nemotron 3 Omni for multimodal reasoning, VoiceChat for real-time conversations, and safety models for content moderation.

Industry analysts note that Nemotron 3's focus on efficiency—delivering "frontier-level intelligence with 5x throughput efficiency" (per the Super model's technical report)—positions it as a critical infrastructure layer for the next wave of agentic AI. As multi-agent systems move beyond chatbots into complex workflows, the model's ability to handle context explosion while reducing costs will likely accelerate adoption across sectors from cybersecurity to manufacturing.

On PinchBench—a benchmark for measuring AI agents' performance in OpenClaw environments—Nemotron 3 Super scored 85.6% across the full test suite, making it the best open model in its class. This performance, combined with the model's open ecosystem, creates a compelling alternative to proprietary solutions for developers seeking transparency without sacrificing capability.

Arturas Malas Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt Connect on LinkedIn
Share:

Comments

Sign in to comment:
    <