"Inference Engineering" is now available. Get your copy here
large language

NVIDIA logoNVIDIA Nemotron 3 Super

120B hybrid Mamba-Transformer MoE with 12B active params, latent MoE routing, multi-token prediction, and 1M token context

Model details

View repository

Example usage

Nemotron 3 Super is a 120B hybrid Mamba-Transformer mixture-of-experts model with 12B active parameters per forward pass, built as the coordination and reasoning layer for multi-agent systems.

Its latent MoE architecture consults 4 experts at the cost of 1, and NVFP4 training on Blackwell delivers peak throughput 3x higher than FP8, putting it in the top quadrant for both intelligence and output speed among open models. With a 1M token context window and strong tool-calling capabilities, it handles routing, planning, and synthesis across agents, while delegating high-volume routine tasks to lighter models like Nemotron 3 Nano. It's fully open-source: weights, training data, and recipes.

Nemotron 3 Super achieves higher tokens per second (TPS) than comparable models in the Qwen, MiniMax, GLM, and gpt-oss families, according to Artificial AnalysisNemotron 3 Super achieves higher tokens per second (TPS) than comparable models in the Qwen, MiniMax, GLM, and gpt-oss families, according to Artificial Analysis. It’s the only model to fully land in the most attractive quadrant for intelligence vs. efficiency.
Input
1from openai import OpenAI
2import os
3
4model_url = "" # Copy in from API pane in Baseten model dashboard
5
6client = OpenAI(
7    api_key=os.environ['BASETEN_API_KEY'],
8    base_url=model_url
9)
10
11# Chat completion
12response_chat = client.chat.completions.create(
13    model="",
14    messages=[
15        {"role": "user", "content": "Write FizzBuzz."}
16    ],
17    temperature=0.6,
18    max_tokens=512,
19)
20print(response_chat)
JSON output
1{
2    "id": "143",
3    "choices": [
4        {
5            "finish_reason": "stop",
6            "index": 0,
7            "logprobs": null,
8            "message": {
9                "content": "[Model output here]",
10                "role": "assistant",
11                "audio": null,
12                "function_call": null,
13                "tool_calls": null
14            }
15        }
16    ],
17    "created": 1741224586,
18    "model": "",
19    "object": "chat.completion",
20    "service_tier": null,
21    "system_fingerprint": null,
22    "usage": {
23        "completion_tokens": 145,
24        "prompt_tokens": 38,
25        "total_tokens": 183,
26        "completion_tokens_details": null,
27        "prompt_tokens_details": null
28    }
29}

πŸ”₯ Trending models