Linear Digressions

Benchmark Bank Heist

What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being...

6 Huhti 12min

Benchmarking AI Models

How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the st...

30 Maalis 29min

The Hot Mess of AI (Mis-)Alignment

The paperclip maximizer — the classic AI doom scenario where a hyper-competent machine single-mindedly converts the universe into office supplies — might not be the AI risk we should actually lose sle...

23 Maalis 22min

The Bitter Lesson

Every AI builder knows the anxiety: you spend months engineering prompts, tuning pipelines, and chaining calls together — then a new model drops and half your work evaporates overnight. It turns out r...

15 Maalis 19min

From Atari to ChatGPT: How AI Learned to Follow Instructions

From Atari to ChatGPT: How AI Learned to Follow Instructions by Katie Malone

9 Maalis 25min

It's RAG time: Retrieval-Augmented Generation

Today we are going to talk about the feature with the worst acronym in generative AI: RAG, or Retrieval Augmented Generation. If you've ever used something like "Chat with My Docs," if you have an int...

2 Maalis 17min

Chasing Away Repetitive LLM Responses with Verbalized Sampling

One of the things that LLMs can be really helpful with is brainstorming or generating new creative content. They are called Generative AI, after all—not just for summarization and question-and-answer ...

23 Helmi 19min

We're Back

It's been (*checks watch*) about five and a half years since we last talked. Fortunately nothing much has happened in the AI/data science world in that time. So let's just pick up where we left off, s...

16 Helmi 2min

Linear Digressions

Jaksot(309)

Benchmark Bank Heist

Benchmarking AI Models

The Hot Mess of AI (Mis-)Alignment

The Bitter Lesson

From Atari to ChatGPT: How AI Learned to Follow Instructions

It's RAG time: Retrieval-Augmented Generation

Chasing Away Repetitive LLM Responses with Verbalized Sampling

We're Back

Kaikki yhdessä sovelluksessa

Sinulle valikoitua sisältöä

Jatka kuuntelua koska tahansa

Tarinat ja äänet, joita rakastat kuunnella