
Nvidia has released Nemotron 3 Super, a 120 billion-parameter open AI model designed to improve efficiency and reduce costs for large-scale agentic workloads.
The model uses a Mixture-of-Experts (MoE) architecture, activating just 12.7 billion parameters per inference, allowing significant compute savings during multi-step AI tasks.
Nvidia claims Nemotron 3 Super delivers up to 7.5 times higher throughput than Qwen3.5-122B-A10B and more than double the performance of comparable models like GPT-OSS-120B.
Built on a hybrid Mamba-Transformer architecture, the model supports context windows of up to one million tokens, enabling long-form reasoning without heavy memory costs.
The system was trained on over 25 trillion tokens and fine-tuned using reinforcement learning across multiple environments to improve performance on complex tasks.
Nemotron 3 Super is fully open under Nvidia’s model licence, with checkpoints and datasets available on Hugging Face and deployment supported across major cloud platforms.
The release highlights Nvidia’s push to lower the cost of deploying advanced AI agents while increasing performance in enterprise and developer use cases.