Our Series E: we raised $300M at a $5B valuation to power a multi-model future. READ

Rachel Rapp

Product

Community

The Baseten Inference Stack at NVIDIA Dynamo Day

Rachel Rapp

NVIDIA Dynamo Day and The Baseten Inference Stack

AI engineering

The fastest Whisper — with streaming and diarization

William Gao

Tianshu Cheng

4 others

Baseten powers the fastest, most accurate, and cost-efficient Whisper transcription on the market, with streaming and diarization.

Model performance

How Baseten achieved 2x faster inference with NVIDIA Dynamo

Abu Qader

Michael Feil

Abu Qader

2 others

2x faster inference with Nvidia Dynamo

Infrastructure

How we built Multi-cloud Capacity Management (MCM)

Colin McGrath

Phil Howes

William Lau

3 others

Building multi-cloud capacity management at Baseten

Infrastructure

How Baseten multi-cloud capacity management (MCM) unifies deployments

Amir Haghighat

Rachel Rapp

1 other

Baseten multi-cloud capacity management

News

Introducing Baseten Embeddings Inference: The fastest embeddings solution available

Michael Feil

Michael Feil

1 other

Introducing BEI

News

Baseten Chains is now GA for production compound AI systems

Marius Killinger

Tyron Jung

Marius Killinger

2 others

Baseten Chains

News

New observability features: activity logging, LLM metrics, and metrics dashboard customization

Aaron Relph

Marius Killinger

Sid Shanker

Suren Atoyan

4 others

Observability

News

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Abu Qader

Bryce Dubayah

Justin Yi

3 others

Speculative Decoding in Engine Builder

Explore Baseten today

Start deploying

Talk to an engineer