Scrum Master Toolbox Podcast: Agile storytelling from the trenches16 Helmi

When AI Decisions Go Wrong at Scale—And How to Prevent It With Ran Aroussi

BONUS: When AI Decisions Go Wrong at Scale—And How to Prevent It

We've spent years asking what AI can do. But the next frontier isn't more capability—it's something far less glamorous and far more dangerous if we get it wrong. In this episode, Ran Aroussi shares why observability, transparency, and governance may be the difference between AI that empowers humans and AI that quietly drifts out of alignment.

The Gap Between Demos and Deployable Systems

"I've noticed that I watched well-designed agents make perfectly reasonable decisions based on their training, but in a context where the decision was catastrophically wrong. And there was really no way of knowing what had happened until the damage was already there."

Ran's journey from building algorithmic trading systems to creating MUXI, an open framework for production-ready AI agents, revealed a fundamental truth: the skills needed to build impressive AI demos are completely different from those needed to deploy reliable systems at scale. Coming from the EdTech space where he handled billions of ad impressions daily and over a million concurrent users, Ran brings a perspective shaped by real-world production demands.

The moment of realization came when he saw that the non-deterministic nature of AI meant that traditional software engineering approaches simply don't apply. While traditional bugs are reproducible, AI systems can produce different results from identical inputs—and that changes everything about how we need to approach deployment.

Why Leaders Misunderstand Production AI

"When you chat with ChatGPT, you go there and it pretty much works all the time for you. But when you deploy a system in production, you have users with unimaginable different use cases, different problems, and different ways of phrasing themselves."

The biggest misconception leaders have is assuming that because AI works well in their personal testing, it will work equally well at scale. When you test AI with your own biases and limited imagination for scenarios, you're essentially seeing a curated experience.

Real users bring infinite variation: non-native English speakers constructing sentences differently, unexpected use cases, and edge cases no one anticipated. The input space for AI systems is practically infinite because it's language-based, making comprehensive testing impossible.

Multi-Layered Protection for Production AI

"You have to put in deterministic filters between the AI and what you get back to the user."

Ran outlines a comprehensive approach to protecting AI systems in production:

Model version locking: Just as you wouldn't randomly upgrade Python versions without testing, lock your AI model versions to ensure consistent behavior
Guardrails in prompts: Set clear boundaries about what the AI should never do or share
Deterministic filters: Language firewalls that catch personal information, harmful content, or unexpected outputs before they reach users
Comprehensive logging: Detailed traces of every decision, tool call, and data flow for debugging and pattern detection

The key insight is that these layers must work together—no single approach provides sufficient protection for production systems.

Observability in Agentic Workflows

"With agentic AI, you have decision-making, task decomposition, tools that it decided to call, and what data to pass to them. So there's a lot of things that you should at least be able to trace back."

Observability for agentic systems is fundamentally different from traditional LLM observability. When a user asks "What do I have to do today?", the system must determine who is asking, which tools are relevant to their role, what their preferences are, and how to format the response.

Each user triggers a completely different dynamic workflow. Ran emphasizes the need for multi-layered access to observability data: engineers need full debugging access with appropriate security clearances, while managers need topic-level views without personal information. The goal is building a knowledge graph of interactions that allows pattern detection and continuous improvement.

Governance as Human-AI Partnership

"Governance isn't about control—it's about keeping people in the loop so AI amplifies, not replaces, human judgment."

The most powerful reframing in this conversation is viewing governance not as red tape but as a partnership model. Some actions—like answering support tickets—can be fully automated with occasional human review. Others—like approving million-dollar financial transfers—require human confirmation before execution. The key is designing systems where AI can do the preparation work while humans retain decision authority at critical checkpoints. This mirrors how we build trust with human colleagues: through repeated successful interactions over time, gradually expanding autonomy as confidence grows.

Building Trust Through Incremental Autonomy

"Working with AI is like working with a new colleague that will back you up during your vacation. You probably don't know this person for a month. You probably know them for years. The first time you went on vacation, they had 10 calls with you, and then slowly it got to 'I'm only gonna call you if it's really urgent.'"

The path to trusting AI systems mirrors how we build trust with human colleagues. You don't immediately hand over complete control—you start with frequent check-ins, observe performance, and gradually expand autonomy as confidence builds. This means starting with heavy human-in-the-loop interaction and systematically reducing oversight as the system proves reliable. The goal is reaching a state where you can confidently say "you don't have to ask permission before you do X, but I still want to approve every Y."

In this episode, we refer to Thinking in Systems by Donella Meadows, Designing Machine Learning Systems by Chip Huyen, and Build a Large Language Model (From Scratch) by Sebastian Raschka.

About Ran Aroussi

Ran Aroussi is the founder of MUXI, an open framework for production-ready AI agents. He is also the co-creator of yfinance (with 10 million downloads monthly) and founder of Tradologics and Automaze. Ran is the author of the forthcoming book Production-Grade Agentic AI: From Brittle Workflows to Deployable Autonomous Systems, also available at productionaibook.com.

You can connect with Ran Aroussi on LinkedIn.

Kokeile Premiumia

Nauti 14 päivää ilmaiseksi

Tilaa Premium

Jaksot(200)

The Hidden Cost of Distributed Agile Teams — When Time Zones and Misaligned Incentives Silently Kill Value Delivery | Nate Amidon

Nate Amidon: The Hidden Cost of Distributed Agile Teams — When Time Zones and Misaligned Incentives Silently Kill Value Delivery Read the full Show Notes and search through the world's largest audio...

8 Huhti 16min

When the Blame Game Between Product and Engineering Destroys Your Scrum Team From the Inside | Nate Amidon

Nate Amidon: When the Blame Game Between Product and Engineering Destroys Your Scrum Team From the Inside Read the full Show Notes and search through the world's largest audio library on Agile and S...

7 Huhti 15min

When Overconfidence Breaks the Trust You Worked So Hard to Build | Nate Amidon

Nate Amidon: When Overconfidence Breaks the Trust You Worked So Hard to Build Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly on the Scrum M...

6 Huhti 14min

BONUS #NoEstimates, Throughput, and the Superstition of Project Management With Felipe Engineer-Manriquez

BONUS: Why Your Plan Is Lying to You — #NoEstimates, Throughput, and the Superstition of Project Management This episode is a cross-post from The EBFC Show, Felipe Engineer-Manriquez's podcast expl...

4 Huhti 50min

The Adaptable Product Owner — How Progress Over Perfection Drives Real Value in Scrum | Bhavin Shukla

Bhavin Shukla: The Adaptable Product Owner — How Progress Over Perfection Drives Real Value in Scrum In this episode, we refer to story mapping as a key tool for maintaining focus and alignment. ...

3 Huhti 14min

Why Scrum Master Success Means Confronting the Ugly Truth With Data | Bhavin Shukla

Bhavin Shukla: Why Scrum Master Success Means Confronting the Ugly Truth With Data Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly on the Sc...

2 Huhti 14min

De-Scaling an Agile Organization — Removing Bureaucracy Without Losing Consistency | Bhavin Shukla

Bhavin Shukla: De-Scaling an Agile Organization — Removing Bureaucracy Without Losing Consistency Read the full Show Notes and search through the world's largest audio library on Agile and Scrum dir...

1 Huhti 18min

The Hidden Cost of Always Saying Yes — How a Helpful Scrum Team Nearly Self-Destructed | Bhavin Shukla

Bhavin Shukla: The Hidden Cost of Always Saying Yes — How a Helpful Scrum Team Nearly Self-Destructed Read the full Show Notes and search through the world's largest audio library on Agile and Scrum...

31 Maalis 15min

Kaikki yhdessä sovelluksessa

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi yhdessä paikassa.

Sinulle valikoitua sisältöä

Podme-sovelluksessa kokoat suosikkisi helposti omaan kirjastoosi. Saat meiltä myös kuuntelusuosituksia!

Jatka kuuntelua koska tahansa

Voit jatkaa siitä mihin jäit, myös offline-tilassa.

Premium

9,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa

Aloita 14 päivän kokeilu

Premium

13,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa
Yksi lisäkäyttäjä

Kokeile 14 päivää maksutta

Suosittua kategoriassa Politiikka ja uutiset

rss-polikulaari-pitka-kiekko-ja-muut-ts-podcastit

Tarinat ja äänet, joita rakastat kuunnella

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi

Lue lisää