Scrum Master Toolbox Podcast: Agile storytelling from the trenches16 Feb

When AI Decisions Go Wrong at Scale—And How to Prevent It With Ran Aroussi

BONUS: When AI Decisions Go Wrong at Scale—And How to Prevent It

We've spent years asking what AI can do. But the next frontier isn't more capability—it's something far less glamorous and far more dangerous if we get it wrong. In this episode, Ran Aroussi shares why observability, transparency, and governance may be the difference between AI that empowers humans and AI that quietly drifts out of alignment.

The Gap Between Demos and Deployable Systems

"I've noticed that I watched well-designed agents make perfectly reasonable decisions based on their training, but in a context where the decision was catastrophically wrong. And there was really no way of knowing what had happened until the damage was already there."

Ran's journey from building algorithmic trading systems to creating MUXI, an open framework for production-ready AI agents, revealed a fundamental truth: the skills needed to build impressive AI demos are completely different from those needed to deploy reliable systems at scale. Coming from the EdTech space where he handled billions of ad impressions daily and over a million concurrent users, Ran brings a perspective shaped by real-world production demands.

The moment of realization came when he saw that the non-deterministic nature of AI meant that traditional software engineering approaches simply don't apply. While traditional bugs are reproducible, AI systems can produce different results from identical inputs—and that changes everything about how we need to approach deployment.

Why Leaders Misunderstand Production AI

"When you chat with ChatGPT, you go there and it pretty much works all the time for you. But when you deploy a system in production, you have users with unimaginable different use cases, different problems, and different ways of phrasing themselves."

The biggest misconception leaders have is assuming that because AI works well in their personal testing, it will work equally well at scale. When you test AI with your own biases and limited imagination for scenarios, you're essentially seeing a curated experience.

Real users bring infinite variation: non-native English speakers constructing sentences differently, unexpected use cases, and edge cases no one anticipated. The input space for AI systems is practically infinite because it's language-based, making comprehensive testing impossible.

Multi-Layered Protection for Production AI

"You have to put in deterministic filters between the AI and what you get back to the user."

Ran outlines a comprehensive approach to protecting AI systems in production:

Model version locking: Just as you wouldn't randomly upgrade Python versions without testing, lock your AI model versions to ensure consistent behavior
Guardrails in prompts: Set clear boundaries about what the AI should never do or share
Deterministic filters: Language firewalls that catch personal information, harmful content, or unexpected outputs before they reach users
Comprehensive logging: Detailed traces of every decision, tool call, and data flow for debugging and pattern detection

The key insight is that these layers must work together—no single approach provides sufficient protection for production systems.

Observability in Agentic Workflows

"With agentic AI, you have decision-making, task decomposition, tools that it decided to call, and what data to pass to them. So there's a lot of things that you should at least be able to trace back."

Observability for agentic systems is fundamentally different from traditional LLM observability. When a user asks "What do I have to do today?", the system must determine who is asking, which tools are relevant to their role, what their preferences are, and how to format the response.

Each user triggers a completely different dynamic workflow. Ran emphasizes the need for multi-layered access to observability data: engineers need full debugging access with appropriate security clearances, while managers need topic-level views without personal information. The goal is building a knowledge graph of interactions that allows pattern detection and continuous improvement.

Governance as Human-AI Partnership

"Governance isn't about control—it's about keeping people in the loop so AI amplifies, not replaces, human judgment."

The most powerful reframing in this conversation is viewing governance not as red tape but as a partnership model. Some actions—like answering support tickets—can be fully automated with occasional human review. Others—like approving million-dollar financial transfers—require human confirmation before execution. The key is designing systems where AI can do the preparation work while humans retain decision authority at critical checkpoints. This mirrors how we build trust with human colleagues: through repeated successful interactions over time, gradually expanding autonomy as confidence grows.

Building Trust Through Incremental Autonomy

"Working with AI is like working with a new colleague that will back you up during your vacation. You probably don't know this person for a month. You probably know them for years. The first time you went on vacation, they had 10 calls with you, and then slowly it got to 'I'm only gonna call you if it's really urgent.'"

The path to trusting AI systems mirrors how we build trust with human colleagues. You don't immediately hand over complete control—you start with frequent check-ins, observe performance, and gradually expand autonomy as confidence builds. This means starting with heavy human-in-the-loop interaction and systematically reducing oversight as the system proves reliable. The goal is reaching a state where you can confidently say "you don't have to ask permission before you do X, but I still want to approve every Y."

In this episode, we refer to Thinking in Systems by Donella Meadows, Designing Machine Learning Systems by Chip Huyen, and Build a Large Language Model (From Scratch) by Sebastian Raschka.

About Ran Aroussi

Ran Aroussi is the founder of MUXI, an open framework for production-ready AI agents. He is also the co-creator of yfinance (with 10 million downloads monthly) and founder of Tradologics and Automaze. Ran is the author of the forthcoming book Production-Grade Agentic AI: From Brittle Workflows to Deployable Autonomous Systems, also available at productionaibook.com.

You can connect with Ran Aroussi on LinkedIn.

Oppdag Premium

Prøv 14 dager gratis

Kjøp Premium

Episoder(200)

When Passion Becomes the Problem — How Pushing for Agile Change Too Fast Creates Resistance | Viktor Glinka

Viktor Glinka: When Passion Becomes the Problem — How Pushing for Agile Change Too Fast Creates Resistance Read the full Show Notes and search through the world's largest audio library on Agile and ...

20 Apr 15min

BONUS From 3,000 Scripts to 3 Tools - Building AI-Last Software With Peter Swimm

BONUS: From 3,000 Scripts to 3 Tools - Building AI-Last Software With Conversational AI Pioneer Peter Swimm In this special BONUS episode, Peter Swimm—conversational AI veteran, creator of BotKit (t...

18 Apr 31min

The People-Pleasing Product Owner and the PO Who Understood User Value — Two Sides of Product Ownership | Efe Gümüs

Efe Gümüs: The People-Pleasing Product Owner and the PO Who Understood User Value — Two Sides of Product Ownership In this episode, we refer to the SPIDR slicing method. The Great Product Owner:...

17 Apr 17min

Success as a Scrum Master Means People Feel Safe Enough to Speak Up Before It's Too Late | Efe Gümüs

Efe Gümüs: Success as a Scrum Master Means People Feel Safe Enough to Speak Up Before It's Too Late Read the full Show Notes and search through the world's largest audio library on Agile and Scrum d...

16 Apr 14min

Why Enforcing a Framework on Your Organization Will Never Be a Real Agile Transformation | Efe Gümüs

Efe Gümüs: Why Enforcing a Framework on Your Organization Will Never Be a Real Agile Transformation Read the full Show Notes and search through the world's largest audio library on Agile and Scrum d...

15 Apr 18min

When Daily Stand-ups Become Status Updates — The Warning Signs of a Team Falling Apart | Efe Gümüs

Efe Gümüs: When Daily Stand-ups Become Status Updates — The Warning Signs of a Team Falling Apart Read the full Show Notes and search through the world's largest audio library on Agile and Scrum dir...

14 Apr 15min

The Hidden Cost of Splitting the Scrum Master Role — And Why Stance Changes Make or Break Your Impact | Efe Gümüs

Efe Gümüs: The Hidden Cost of Splitting the Scrum Master Role — And Why Stance Changes Make or Break Your Impact Read the full Show Notes and search through the world's largest audio library on Agil...

13 Apr 14min

BONUS Why a Distinguished Engineer Stopped Reading Code — Lights-Out Codebases and the End of the IC With Philip Su

BONUS: Why a Distinguished Engineer Stopped Reading Code — Lights-Out Codebases and the End of the IC Philip Su has spent two decades at the highest levels of software engineering — Microsoft, Meta ...

11 Apr 41min

Reklamefrie Premium-podkaster

Hør populære podkaster som Storefri med Mikkel og Herman, Ida med hjertet i hånden, Krimpodden og mye mye mer

Skap din egen podkastboble

I appen skaper du ditt eget bibliotek med favoritter, og vi gir deg også anbefalinger til podkaster du ikke kan gå glipp av.

Prøv 14 dager gratis

Dersom du er ny Podme-bruker får du 14 dager gratis prøveperiode når du oppretter abonnement

Premium

99 kr/ måned

Tilgang til alle våre Premium-podkaster
Alle podkaster fra VG, Aftenposten, BT og SA
Reklamefritt Premium-innhold
Ingen bindingstid. Avslutt når du ønsker

Prøv 14 dager gratis

Premium

129 kr/ måned

Tilgang til alle Premium-podkaster
Alle podkaster fra VG, Aftenposten, BT og SA
Reklamefritt Premium-innhold
Ingen bindingstid. Avslutt når du ønsker
En Ekstra bruker

Prøv 14 dager gratis

Populært innen Politikk og nyheter

rss-utenrikskomiteen-med-bogen-og-grasvik

Historiene og stemmene du vil høre

Ubegrenset tilgang til alle dine favorittpodkaster og lydbøker

Les mer