Our Series E: we raised $300M at a $5B valuation to power a multi-model future. READ

Philip Kiely

Lead Developer Advocate

Philip Kiely

Model performance

Day zero benchmarks for Qwen 3 with SGLang on Baseten

Michael Feil

Philip Kiely

Yineng Zhang

2 others

Qwen + SGLang

Infrastructure

Accelerating inference with NVIDIA B200 GPUs

Philip Kiely

Philip Kiely

B200 GPUs

Community

Building performant embedding workflows with Chroma and Baseten

Philip Kiely

Philip Kiely

Chroma Baseten

AI engineering

The best open-source embedding models

Philip Kiely

Philip Kiely

Best embedding models

Model performance

How we built BEI: high-throughput embedding, reranker, and classifier inference

Michael Feil

Philip Kiely

Michael Feil

1 other

TensorRT-LLM for embeddings

Model performance

How multi-node inference works for massive LLMs like DeepSeek-R1

Phil Howes

Philip Kiely

Phil Howes

1 other

Multi-node inference

Infrastructure

Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

Testing GH200 GPUs

AI engineering

Private, secure DeepSeek-R1 in production in US & EU data centers

Amir Haghighat

Philip Kiely

Yineng Zhang

2 others

DeepSeek R1

Model performance

How we built production-ready speculative decoding with TensorRT-LLM

Pankaj Gupta

Philip Kiely

Pankaj Gupta

2 others

Speculative Decoding with TensorRT-LLM

1 2 3...8

Explore Baseten today

Start deploying

Talk to an engineer