Our Series E: we raised $300M at a $5B valuation to power a multi-model future. READ

large language

Qwen3 Coder 30B

Small mixture-of-experts LLM with advanced coding and reasoning capabilities optimized for fast inference

Deploy now

‌

Model details

Developed by
Qwen
Model family
Qwen
Use case
large language
Version
3
Variant
Coder
Size
30B
API
OpenAI SDK
License
Apache 2.0
Readme
View

View repository

Example usage

Run Qwen 3 Coder 30B (Flash) on an H100 GPU.

Qwen3 has shown strong performance on math and reasoning tasks, but running it in production requires a highly optimized inference stack to avoid excessive latency.

Deployments of Qwen3 are OpenAI-compatible.

Input

1from openai import OpenAI
2import os
3
4model_url = "" # Copy in from API pane in Baseten model dashboard
5
6client = OpenAI(
7    api_key=os.environ['BASETEN_API_KEY'],
8    base_url=model_url
9)
10
11# Chat completion
12response_chat = client.chat.completions.create(
13    model="",
14    messages=[
15        {"role": "user", "content": "Write FizzBuzz."}
16    ],
17    temperature=0.6,
18    max_tokens=100,
19)
20print(response_chat)

JSON output

1{
2    "id": "143",
3    "choices": [
4        {
5            "finish_reason": "stop",
6            "index": 0,
7            "logprobs": null,
8            "message": {
9                "content": "[Model output here]",
10                "role": "assistant",
11                "audio": null,
12                "function_call": null,
13                "tool_calls": null
14            }
15        }
16    ],
17    "created": 1741224586,
18    "model": "",
19    "object": "chat.completion",
20    "service_tier": null,
21    "system_fingerprint": null,
22    "usage": {
23        "completion_tokens": 145,
24        "prompt_tokens": 38,
25        "total_tokens": 183,
26        "completion_tokens_details": null,
27        "prompt_tokens_details": null
28    }
29}