NVIDIA Nemotron 3 Super
120B hybrid Mamba-Transformer MoE with 12B active params, latent MoE routing, multi-token prediction, and 1M token context
Model details
View repositoryExample usage
Nemotron 3 Super is a 120B hybrid Mamba-Transformer mixture-of-experts model with 12B active parameters per forward pass, built as the coordination and reasoning layer for multi-agent systems.
Its latent MoE architecture consults 4 experts at the cost of 1, and NVFP4 training on Blackwell delivers peak throughput 3x higher than FP8, putting it in the top quadrant for both intelligence and output speed among open models. With a 1M token context window and strong tool-calling capabilities, it handles routing, planning, and synthesis across agents, while delegating high-volume routine tasks to lighter models like Nemotron 3 Nano. It's fully open-source: weights, training data, and recipes.
Nemotron 3 Super achieves higher tokens per second (TPS) than comparable models in the Qwen, MiniMax, GLM, and gpt-oss families, according to Artificial Analysis. Itβs the only model to fully land in the most attractive quadrant for intelligence vs. efficiency.1from openai import OpenAI
2import os
3
4model_url = "" # Copy in from API pane in Baseten model dashboard
5
6client = OpenAI(
7 api_key=os.environ['BASETEN_API_KEY'],
8 base_url=model_url
9)
10
11# Chat completion
12response_chat = client.chat.completions.create(
13 model="",
14 messages=[
15 {"role": "user", "content": "Write FizzBuzz."}
16 ],
17 temperature=0.6,
18 max_tokens=512,
19)
20print(response_chat)1{
2 "id": "143",
3 "choices": [
4 {
5 "finish_reason": "stop",
6 "index": 0,
7 "logprobs": null,
8 "message": {
9 "content": "[Model output here]",
10 "role": "assistant",
11 "audio": null,
12 "function_call": null,
13 "tool_calls": null
14 }
15 }
16 ],
17 "created": 1741224586,
18 "model": "",
19 "object": "chat.completion",
20 "service_tier": null,
21 "system_fingerprint": null,
22 "usage": {
23 "completion_tokens": 145,
24 "prompt_tokens": 38,
25 "total_tokens": 183,
26 "completion_tokens_details": null,
27 "prompt_tokens_details": null
28 }
29}