NVIDIA Nemotron 3 Super

120B hybrid Mamba-Transformer MoE with 12B active params, latent MoE routing, multi-token prediction, and 1M token context

Try model API Talk to an engineer

‌

Model details

Developed by
NVIDIA
Model family
Nemotron
Use case
large language
Version
Super
License
NVIDIA AI Foundation Models Community License Agreement
Readme
View

View repository

Example usage

Nemotron 3 Super is a 120B hybrid Mamba-Transformer mixture-of-experts model with 12B active parameters per forward pass, built as the coordination and reasoning layer for multi-agent systems.

Its latent MoE architecture consults 4 experts at the cost of 1, and NVFP4 training on Blackwell delivers peak throughput 3x higher than FP8, putting it in the top quadrant for both intelligence and output speed among open models. With a 1M token context window and strong tool-calling capabilities, it handles routing, planning, and synthesis across agents, while delegating high-volume routine tasks to lighter models like Nemotron 3 Nano. It's fully open-source: weights, training data, and recipes.

Nemotron 3 Super achieves higher tokens per second (TPS) than comparable models in the Qwen, MiniMax, GLM, and gpt-oss families, according to Artificial Analysis. It’s the only model to fully land in the most attractive quadrant for intelligence vs. efficiency.

Input

1from openai import OpenAI
2import os
3
4model_url = "" # Copy in from API pane in Baseten model dashboard
5
6client = OpenAI(
7    api_key=os.environ['BASETEN_API_KEY'],
8    base_url=model_url
9)
10
11# Chat completion
12response_chat = client.chat.completions.create(
13    model="",
14    messages=[
15        {"role": "user", "content": "Write FizzBuzz."}
16    ],
17    temperature=0.6,
18    max_tokens=512,
19)
20print(response_chat)

JSON output

1{
2    "id": "143",
3    "choices": [
4        {
5            "finish_reason": "stop",
6            "index": 0,
7            "logprobs": null,
8            "message": {
9                "content": "[Model output here]",
10                "role": "assistant",
11                "audio": null,
12                "function_call": null,
13                "tool_calls": null
14            }
15        }
16    ],
17    "created": 1741224586,
18    "model": "",
19    "object": "chat.completion",
20    "service_tier": null,
21    "system_fingerprint": null,
22    "usage": {
23        "completion_tokens": 145,
24        "prompt_tokens": 38,
25        "total_tokens": 183,
26        "completion_tokens_details": null,
27        "prompt_tokens_details": null
28    }
29}