large language
Qwen3 235B 2507
Mixture-of-experts LLM with math and reasoning capabilities
Model details
Example usage
Baseten offers Dedicated Deployments and Model APIs for Qwen3 235B A22B Instruct 2507 powered by the Baseten Inference Stack.
Qwen3 has shown strong performance on math and reasoning tasks, but running it in production requires a highly optimized inference stack to avoid excessive latency.
Deployments of Qwen3 are OpenAI-compatible.
Input
1from openai import OpenAI
2import os
3
4model_url = "" # Copy in from API pane in Baseten model dashboard
5
6client = OpenAI(
7 api_key=os.environ['BASETEN_API_KEY'],
8 base_url=model_url
9)
10
11# Chat completion
12response_chat = client.chat.completions.create(
13 model="",
14 messages=[
15 {"role": "user", "content": "Write FizzBuzz."}
16 ],
17 temperature=0.6,
18 max_tokens=100,
19)
20print(response_chat)JSON output
1{
2 "id": "143",
3 "choices": [
4 {
5 "finish_reason": "stop",
6 "index": 0,
7 "logprobs": null,
8 "message": {
9 "content": "[Model output here]",
10 "role": "assistant",
11 "audio": null,
12 "function_call": null,
13 "tool_calls": null
14 }
15 }
16 ],
17 "created": 1741224586,
18 "model": "",
19 "object": "chat.completion",
20 "service_tier": null,
21 "system_fingerprint": null,
22 "usage": {
23 "completion_tokens": 145,
24 "prompt_tokens": 38,
25 "total_tokens": 183,
26 "completion_tokens_details": null,
27 "prompt_tokens_details": null
28 }
29}