Baseten acquires Parsed: Own your intelligence by unifying training and inference. READ
large language

Qwen LogoQwen3 Omni Thinker

An "omni" model that can process both image and audio input

Model details

View repository

Example usage

Qwen 3 Omni is compatible with the OpenAI SDK. It takes multiple modalities of input: text, image, and audio. The "Thinker" variant of the model, implemented here, returns text.

1{
2  "model": "qwen3-omni",
3  "messages": [
4    {"role": "system", "content": "You are a helpful assistant."},
5    {
6      "role": "user",
7      "content": [
8        {"type": "text", "text": "Describe what you see and hear."},
9        {
10          "type": "image_url",
11          "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cars.jpg"}
12        },
13        {
14          "type": "audio_url",
15          "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cough.wav"}
16        }
17      ]
18    }
19  ],
20  "max_tokens": 2048,
21  "temperature": 0.7,
22  "stream": false
23}
Input
1from openai import OpenAI
2import os
3
4client = OpenAI(
5    api_key=os.environ["BASETEN_API_KEY"],
6    base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync/v1"
7)
8
9resp = client.chat.completions.create(
10    model="qwen3-omni",
11    messages=[
12        {"role": "system", "content": "You are a helpful assistant."},
13        {"role": "user", "content": [
14            {"type": "text", "text": "Describe this image and audio content."},
15            {"type": "image_url", "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cars.jpg"}},
16            {"type": "audio_url", "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cough.wav"}}
17        ]}
18    ],
19    max_tokens=2048,
20    temperature=0.7,
21    stream=False,
22)
23print(resp.choices[0].message.content)
JSON output
1{
2    "id": "chatcmpl-...",
3    "object": "chat.completion",
4    "created": 1710000000,
5    "model": "qwen3-omni",
6    "choices": [
7        {
8            "index": 0,
9            "finish_reason": "stop",
10            "message": {
11                "role": "assistant",
12                "content": "I see several parked cars in front of a building and hear a short cough."
13            }
14        }
15    ],
16    "usage": {
17        "prompt_tokens": 512,
18        "completion_tokens": 24,
19        "total_tokens": 536
20    }
21}

large language models

See all
DeepSeek Logo
Model API
LLM

DeepSeek V3.2

V3.2 - B200
Z AI
LLM

GLM-4.6V

4.6 - Vision

Qwen models

See all
Qwen Logo
Model API
LLM

Qwen3 Coder 480B

3 - Coder
Qwen Logo
LLM

Qwen 3 32B

V3 - TRT-LLM - H100

🔥 Trending models