Inference engineering and the real-world deployment of LLMs, with Philip Kiely

Inference engineering and the real-world deployment of LLMs, with Philip Kiely

Patrick McKenzie (patio11) and Philip Kiely, early employee at Baseten, discuss the inference stack: the critical layer of software and hardware that sits between a model’s weights and a user’s prompt. They cover inference engineering, how intermediate layers are evolving over a technical stack that is changing every six months, and how sophisticated organizations are actually consuming LLMs beyond just writing their questions into chatbot apps.

Full transcript available here: www.complexsystemspodcast.com/inference-engineering-with-philip-kiely/


Presenting Sponsors: Mercury, Meter, & Granola


Complex Systems is presented by Mercury—radically better banking for founders. Mercury offers the best wire experience anywhere: fast, reliable, and free for domestic U.S. wires, so you can stay focused on growing your business. Apply online in minutes at mercury.com.

Networking infrastructure has a way of accumulating technical debt faster than almost anything else in IT. Meter handles the full stack (wired, wireless, and cellular) as a single integrated solution: designed, deployed, and managed end-to-end so there's only one vendor to call when something goes wrong. Visit meter.com/complexsystems to book a demo.


If meetings consistently leave you with hazy action items and lost context, Granola handles the transcription so you can actually participate and gives you searchable notes afterward. Try it free at granola.ai/complexsystems with code COMPLEXSYSTEMS

Links:

Timestamps:
(00:00) Intro
(00:30) The AI deployment pipeline
(03:04) Evolution of abstraction layers in engineering
(05:14) Defining inference and model weights
(08:45) Architecture of language and diffusion models
(10:11) AI adoption in the broader economy
(11:30) The shift toward agentic workflows and RL
(14:55) Function calling and real-world actions
(20:10) Sponsors: Mercury | Meter
(22:59) Technologies for agentic tools: MCP and skills
(25:32) The craft of writing a harness
(29:56) Using AI for automated proofreading and tool creation
(34:12) Balancing LLMs with deterministic code
(37:31) Observability and chain of thought reasoning
(39:31) Sponsor: Granola
(41:21) Observability and chain of thought reasoning
(50:45) Speculative decoding and hidden states
(55:37) The value of smaller, task-specific models
(59:55) Internal competencies versus buying solutions
(01:09:27) Self-publishing a technical book in record time
(01:23:20) Wrap

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(94)

Forty ways to pay for coffee in Japan

Forty ways to pay for coffee in Japan

Patrick McKenzie (patio11) reads his 2021 essay "Payments in Japan," tracing how Japanese consumers navigate a landscape with dozens of competing payment methods at once: credit cards, electronic mone...

25 Juni 35min

The factory behind your home loan

The factory behind your home loan

Patrick McKenzie reads from his 2022 Bits About Money essay on mortgages, making the case that a mortgage is best understood as a manufactured product, not a simple loan between a bank and a customer....

18 Juni 26min

How brokerage transfers actually work

How brokerage transfers actually work

Patrick McKenzie reads from his 2024 Bits About Money essay on ACATS, the Automated Customer Account Transfer Service that governs how Americans move investment accounts between brokerages, then updat...

4 Juni 43min

Wrong numbers and why they survive, with Aaron Brown

Wrong numbers and why they survive, with Aaron Brown

Patrick McKenzie (patio11) is joined by Aaron Brown, author of Wrong Number, to examine why institutions that produce bad statistics face so few consequences for doing so. They trace the pattern from ...

14 Maj 55min

Defendant, Censor, Politico, Spy

Defendant, Censor, Politico, Spy

The improbable but true story of how non-profits operating a private intelligence agency to combat terrorism decided to interfere with campaign infrastructure in a U.S. election.This piece includes or...

8 Maj 1h 5min

How the SPLC became financial infrastructure

How the SPLC became financial infrastructure

Patrick McKenzie reads from his latest Bits About Money essay, walking through why bank fraud charges are a prosecutor's favorite tool, how the Bank Secrecy Act's surveillance regime is designed to fo...

1 Maj 51min

The honey badger of payments

The honey badger of payments

Patrick McKenzie (patio11) reads his classic Bits about Money essay on how checks shaped the entire American payments infrastructure, from the origins of ACH to why a standard US bank account is, tech...

23 Apr 29min

Cash received is not revenue earned

Cash received is not revenue earned

Patrick McKenzie (patio11) reads his classic Bits about Money essay explaining why revenue recognition in software is more complicated than most engineers, founders, and financial reporters think. The...

16 Apr 33min

Populärt inom Business & ekonomi

badfluence
framgangspodden
varvet
rss-borsens-finest
avanzapodden
uppgang-och-fall
svd-tech-brief
rss-svart-marknad
bathina-en-podcast
lastbilspodden
rss-dagen-med-di
fill-or-kill
24fragor
rss-inga-dumma-fragor-om-pengar
borsmorgon
dynastin
rss-den-nya-ekonomin
rikatillsammans-om-privatekonomi-rikedom-i-livet
rss-kort-lang-analyspodden-fran-di
borslunch-2