transcription

Mistral AI logoVoxtral Small 24B

Voxtral Small is an enhancement of Mistral Small 3. It excels at speech transcription, translation and audio understanding.

Model details

View repository

Example usage

Voxtral Small 24B accepts text and audio (formatted using mistral_common) through an OpenAI-compatible API. The following example is adapted for Baseten from Voxtral Small 24B's model page.

Input
1from mistral_common.protocol.instruct.messages import (
2    TextChunk,
3    AudioChunk,
4    UserMessage,
5    AssistantMessage,
6    RawAudio,
7)
8from mistral_common.audio import Audio
9from huggingface_hub import hf_hub_download
10
11from openai import OpenAI
12
13model_id = "12345678"
14
15client = OpenAI(
16    api_key="YOUR_API_KEY",
17    base_url=f"https://model-{model_id}.api.baseten.co/{deploy_env}/sync/v1"
18)
19
20models = client.models.list()
21model = models.data[0].id
22
23obama_file = hf_hub_download(
24    "patrickvonplaten/audio_samples", "obama.mp3", repo_type="dataset"
25)
26bcn_file = hf_hub_download(
27    "patrickvonplaten/audio_samples", "bcn_weather.mp3", repo_type="dataset"
28)
29
30def file_to_chunk(file: str) -> AudioChunk:
31    audio = Audio.from_file(file, strict=False)
32    return AudioChunk.from_audio(audio)
33
34text_chunk = TextChunk(
35    text="Which speaker is more inspiring? Why? How are they different from each other? Answer in French."
36)
37user_msg = UserMessage(
38    content=[file_to_chunk(obama_file), file_to_chunk(bcn_file), text_chunk]
39).to_openai()
40
41response = client.chat.completions.create(
42    model=model,
43    messages=[user_msg],
44    temperature=0.2,
45    top_p=0.95,
46)
47content = response.choices[0].message.content
48
49messages = [
50    user_msg,
51    AssistantMessage(content=content).to_openai(),
52    UserMessage(
53        content="Ok, now please summarize the content of the first audio."
54    ).to_openai(),
55]
56
57response = client.chat.completions.create(
58    model=model,
59    messages=messages,
60    temperature=0.2,
61    top_p=0.95,
62)
63print(response.model_dump_json(indent=4))
JSON output
1{
2    "id": "chatcmpl-e9ec9328-dbc3-494c-8f99-1a0d79727dd2",
3    "choices": [
4        {
5            "finish_reason": "stop",
6            "index": 0,
7            "logprobs": null,
8            "message": {
9                "content": "Dans le premier audio, le président Obama prononce son discours d'adieu à Chicago, suivant la tradition des présidents précédents. Il exprime sa gratitude envers les Américains, qu'ils aient été d'accord avec lui ou non, et souligne l'importance de leurs conversations pour le maintenir honnête, inspiré et motivé. Il partage des moments marquants de son mandat, tels que la résilience économique, l'accès aux soins de santé abordables, la reconstruction après des catastrophes, et les réalisations scientifiques. Obama insiste sur l'importance de la participation citoyenne pour préserver la démocratie et améliorer la nation. Il conclut en exprimant son optimisme pour l'avenir du pays et son désir de continuer à servir en tant que citoyen.",
10                "refusal": null,
11                "role": "assistant",
12                "annotations": null,
13                "audio": null,
14                "function_call": null,
15                "tool_calls": [],
16                "reasoning_content": null
17            },
18            "stop_reason": null
19        }
20    ],
21    "created": 1752878421,
22    "model": "voxtral-small",
23    "object": "chat.completion",
24    "service_tier": null,
25    "system_fingerprint": null,
26    "usage": {
27        "completion_tokens": 149,
28        "prompt_tokens": 3093,
29        "total_tokens": 3242,
30        "completion_tokens_details": null,
31        "prompt_tokens_details": null
32    },
33    "prompt_logprobs": null,
34    "kv_transfer_params": null
35}

transcription models

See all
OpenAI logo
Transcription

Whisper Large V3 (best performance)

V3 - H100 MIG 40GB
OpenAI logo
Transcription

Whisper Streaming Large v3

H100 MIG 40GB
OpenAI logo
Transcription

Whisper Streaming Large v3 Turbo

H100 MIG 40GB

Mistral AI models

See all
Mistral AI logo
LLM

Mistral Small 3.1

3.1 - vLLM - H100
Mistral AI logo
Transcription

Voxtral Small 24B

2507 - Small - H100
Mistral AI logo
Transcription

Voxtral Mini 3B

2507 - Mini - H100 MIG 40GB

🔥 Trending models