Our Series E: we raised $300M at a $5B valuation to power a multi-model future. READ

Bryce Dubayah

Engineering

Bryce Dubayah

Model performance

How to run LLM performance benchmarks (and why you should)

Alex Ker

Bryce Dubayah

Alex Ker

1 other

How to run LLM performance benchmarks (and why you should)

AI engineering

Tool Calling in Inference

Kenzie Amack

Bryce Dubayah

Kenzie Amack

1 other

Tool Calling in Inference

Model performance

How we run GPT OSS 120B at 500+ tokens per second on NVIDIA GPUs

Amir Haghighat

Tri Dao

Abu Qader

Bryce Dubayah

Philip Kiely

Amir Haghighat

4 others

GPT OSS 120B

News

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Abu Qader

Bryce Dubayah

Justin Yi

3 others

Speculative Decoding in Engine Builder

Model performance

How to build function calling and JSON mode for open-source and fine-tuned LLMs

Bryce Dubayah

Philip Kiely

Bryce Dubayah

1 other

JSON Mode

News

Introducing function calling and structured output for open-source and fine-tuned LLMs

Bryce Dubayah

Philip Kiely

Bryce Dubayah

1 other

Function calling + JSON Mode

Explore Baseten today

Start deploying

Talk to an engineer

Bryce Dubayah - Engineering