What is Nemotron 3 Super and why did Nvidia release it?

Nemotron 3 Super is a 120 billion-parameter open AI model built for agentic workloads, meaning systems that perform multi-step reasoning tasks. Nvidia released it to reduce compute costs while improving performance. It targets developers building advanced AI agents.

How does the Mixture-of-Experts (MoE) design improve efficiency?

The model activates only 12.7 billion parameters per inference instead of using all 120 billion. This selective activation reduces compute costs significantly. It makes large models more practical for real-world deployment.

How much faster is Nemotron 3 Super compared to other models?

Nvidia claims up to 7.5x higher throughput than Qwen3.5-122B-A10B. It also delivers over 2x performance compared to models like GPT-OSS-120B. This makes it one of the fastest open models in its class.

What makes the Mamba-Transformer architecture important?

The hybrid design combines Mamba blocks for efficient long-sequence processing with Transformer layers for accuracy. This allows handling large contexts without heavy memory usage. It balances speed and precision.

How large is the model’s context window?

Nemotron 3 Super supports up to one million tokens in context. This enables long-form reasoning and complex multi-step tasks. It is significantly larger than most standard AI models.

How was the model trained?

The model was trained on over 25 trillion tokens in multiple phases. It was further refined using supervised learning and reinforcement learning across different environments. This improves performance on structured and reasoning tasks.

Is Nemotron 3 Super open source?

It is released under Nvidia’s open model licence, with checkpoints and datasets available on Hugging Face. Developers can access and fine-tune the model. This supports broader adoption and experimentation.

What are “agentic workloads” and why do they matter?

Agentic workloads involve AI systems performing multi-step tasks, like reasoning chains or autonomous actions. These tasks require more compute and longer context. Nemotron is specifically optimised for this use case.

Nvidia launches Nemotron 3 Super for AI agents

Written by Brie CarterPublished Apr. 20 2026Last Updated Apr. 20 2026

Nvidia has released Nemotron 3 Super, a 120 billion-parameter open AI model designed to improve efficiency and reduce costs for large-scale agentic workloads.

The model uses a Mixture-of-Experts (MoE) architecture, activating just 12.7 billion parameters per inference, allowing significant compute savings during multi-step AI tasks.

Nvidia claims Nemotron 3 Super delivers up to 7.5 times higher throughput than Qwen3.5-122B-A10B and more than double the performance of comparable models like GPT-OSS-120B.

Built on a hybrid Mamba-Transformer architecture, the model supports context windows of up to one million tokens, enabling long-form reasoning without heavy memory costs.

The system was trained on over 25 trillion tokens and fine-tuned using reinforcement learning across multiple environments to improve performance on complex tasks.

Nemotron 3 Super is fully open under Nvidia’s model licence, with checkpoints and datasets available on Hugging Face and deployment supported across major cloud platforms.

The release highlights Nvidia’s push to lower the cost of deploying advanced AI agents while increasing performance in enterprise and developer use cases.