Our Series E: we raised $300M at a $5B valuation to power a multi-model future. READ

Philip Kiely

Lead Developer Advocate

Philip Kiely

Model performance

Benchmarking fast Mistral 7B inference

Abu Qader

Pankaj Gupta

Philip Kiely

Abu Qader

3 others

Mistral 7B

Model performance

33% faster LLM inference with FP8 quantization

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

Faster inference with FP8

Model performance

High performance ML inference with NVIDIA TensorRT

Philip Kiely

Justin Yi

1 other

NVIDIA TensorRT

Model performance

FP8: Efficient model inference with 8-bit floating point numbers

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

8-bit floating point numbers

Infrastructure

The benefits of globally distributed infrastructure for model serving

Phil Howes

Philip Kiely

Phil Howes

1 other

Benefits of global infra

Model performance

40% faster Stable Diffusion XL inference with NVIDIA TensorRT

Pankaj Gupta

Philip Kiely

Pankaj Gupta

2 others

40% faster SDXL

Model performance

Why GPU utilization matters for model inference

Marius Killinger

Philip Kiely

Marius Killinger

1 other

Why GPU utilization matters

AI engineering

The best open source large language model

Philip Kiely

Philip Kiely

Best open source large language models

Model performance

Unlocking the full power of NVIDIA H100 GPUs for ML inference with TensorRT

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

H100 w/ TensorRT-LLM

1...4 5 6 7 8

Explore Baseten today

Start deploying

Talk to an engineer