Our Series E: we raised $300M at a $5B valuation to power a multi-model future. READ

Philip Kiely

Lead Developer Advocate

Philip Kiely

News

Introducing automatic LLM optimization with TensorRT-LLM Engine Builder

Abu Qader

Philip Kiely

Abu Qader

1 other

TensorRT-LLM Engine Creation

Community

Ten reasons to join Baseten

Dustin Michaels

Philip Kiely

Dustin Michaels

1 other

Join Baseten

Model performance

How to serve 10,000 fine-tuned LLMs from a single GPU

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

10,000 LoRAs 1 GPU

Infrastructure

Control plane vs workload plane in model serving infrastructure

Colin McGrath

Matt Howard

Philip Kiely

Colin McGrath

2 others

Control plane vs workload plane

Model performance

Comparing tokens per second across LLMs

Philip Kiely

Philip Kiely

Comparing TPS across LLMs

AI engineering

CI/CD for AI model deployments

Vlad Shulman

Samiksha Pal

Sid Shanker

Philip Kiely

Vlad Shulman

3 others

CI/CD for AI models

AI engineering

Streaming real-time text to speech with XTTS V2

Het Trivedi

Philip Kiely

Het Trivedi

1 other

Streaming TTS

Model performance

Continuous vs dynamic batching for AI inference

Matt Howard

Philip Kiely

Matt Howard

1 other

Continuous vs Dynamic batching

Infrastructure

Using fractional H100 GPUs for efficient model serving

Matt Howard

Vlad Shulman

Pankaj Gupta

Philip Kiely

Matt Howard

3 others

H100 MIGs

1 2 3 4 5...8

Explore Baseten today

Start deploying

Talk to an engineer