Scrum Master Toolbox Podcast: Agile storytelling from the trenches16 Helmi

When AI Decisions Go Wrong at Scale—And How to Prevent It With Ran Aroussi

BONUS: When AI Decisions Go Wrong at Scale—And How to Prevent It

We've spent years asking what AI can do. But the next frontier isn't more capability—it's something far less glamorous and far more dangerous if we get it wrong. In this episode, Ran Aroussi shares why observability, transparency, and governance may be the difference between AI that empowers humans and AI that quietly drifts out of alignment.

The Gap Between Demos and Deployable Systems

"I've noticed that I watched well-designed agents make perfectly reasonable decisions based on their training, but in a context where the decision was catastrophically wrong. And there was really no way of knowing what had happened until the damage was already there."

Ran's journey from building algorithmic trading systems to creating MUXI, an open framework for production-ready AI agents, revealed a fundamental truth: the skills needed to build impressive AI demos are completely different from those needed to deploy reliable systems at scale. Coming from the EdTech space where he handled billions of ad impressions daily and over a million concurrent users, Ran brings a perspective shaped by real-world production demands.

The moment of realization came when he saw that the non-deterministic nature of AI meant that traditional software engineering approaches simply don't apply. While traditional bugs are reproducible, AI systems can produce different results from identical inputs—and that changes everything about how we need to approach deployment.

Why Leaders Misunderstand Production AI

"When you chat with ChatGPT, you go there and it pretty much works all the time for you. But when you deploy a system in production, you have users with unimaginable different use cases, different problems, and different ways of phrasing themselves."

The biggest misconception leaders have is assuming that because AI works well in their personal testing, it will work equally well at scale. When you test AI with your own biases and limited imagination for scenarios, you're essentially seeing a curated experience.

Real users bring infinite variation: non-native English speakers constructing sentences differently, unexpected use cases, and edge cases no one anticipated. The input space for AI systems is practically infinite because it's language-based, making comprehensive testing impossible.

Multi-Layered Protection for Production AI

"You have to put in deterministic filters between the AI and what you get back to the user."

Ran outlines a comprehensive approach to protecting AI systems in production:

Model version locking: Just as you wouldn't randomly upgrade Python versions without testing, lock your AI model versions to ensure consistent behavior
Guardrails in prompts: Set clear boundaries about what the AI should never do or share
Deterministic filters: Language firewalls that catch personal information, harmful content, or unexpected outputs before they reach users
Comprehensive logging: Detailed traces of every decision, tool call, and data flow for debugging and pattern detection

The key insight is that these layers must work together—no single approach provides sufficient protection for production systems.

Observability in Agentic Workflows

"With agentic AI, you have decision-making, task decomposition, tools that it decided to call, and what data to pass to them. So there's a lot of things that you should at least be able to trace back."

Observability for agentic systems is fundamentally different from traditional LLM observability. When a user asks "What do I have to do today?", the system must determine who is asking, which tools are relevant to their role, what their preferences are, and how to format the response.

Each user triggers a completely different dynamic workflow. Ran emphasizes the need for multi-layered access to observability data: engineers need full debugging access with appropriate security clearances, while managers need topic-level views without personal information. The goal is building a knowledge graph of interactions that allows pattern detection and continuous improvement.

Governance as Human-AI Partnership

"Governance isn't about control—it's about keeping people in the loop so AI amplifies, not replaces, human judgment."

The most powerful reframing in this conversation is viewing governance not as red tape but as a partnership model. Some actions—like answering support tickets—can be fully automated with occasional human review. Others—like approving million-dollar financial transfers—require human confirmation before execution. The key is designing systems where AI can do the preparation work while humans retain decision authority at critical checkpoints. This mirrors how we build trust with human colleagues: through repeated successful interactions over time, gradually expanding autonomy as confidence grows.

Building Trust Through Incremental Autonomy

"Working with AI is like working with a new colleague that will back you up during your vacation. You probably don't know this person for a month. You probably know them for years. The first time you went on vacation, they had 10 calls with you, and then slowly it got to 'I'm only gonna call you if it's really urgent.'"

The path to trusting AI systems mirrors how we build trust with human colleagues. You don't immediately hand over complete control—you start with frequent check-ins, observe performance, and gradually expand autonomy as confidence builds. This means starting with heavy human-in-the-loop interaction and systematically reducing oversight as the system proves reliable. The goal is reaching a state where you can confidently say "you don't have to ask permission before you do X, but I still want to approve every Y."

In this episode, we refer to Thinking in Systems by Donella Meadows, Designing Machine Learning Systems by Chip Huyen, and Build a Large Language Model (From Scratch) by Sebastian Raschka.

About Ran Aroussi

Ran Aroussi is the founder of MUXI, an open framework for production-ready AI agents. He is also the co-creator of yfinance (with 10 million downloads monthly) and founder of Tradologics and Automaze. Ran is the author of the forthcoming book Production-Grade Agentic AI: From Brittle Workflows to Deployable Autonomous Systems, also available at productionaibook.com.

You can connect with Ran Aroussi on LinkedIn.

Kokeile Premiumia

Nauti 14 päivää ilmaiseksi

Tilaa Premium

Jaksot(200)

The "Death of Agile" and Why It's Really the Death of Empowerment That Should Frighten Us | Nigel Baker

Nigel Baker: The "Death of Agile" and Why It's Really the Death of Empowerment That Should Frighten Us Read the full Show Notes and search through the world's largest audio library on Agile and Scru...

4 Maalis 18min

When Teams Slowly Decay by Anointing a Hidden Dictator | Nigel Baker

Nigel Baker: When Teams Slowly Decay by Anointing a Hidden Dictator Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly on the Scrum Master Tool...

3 Maalis 16min

The Scrum Master Mistake of Copy-Pasting Success Instead of Recreating the Journey | Nigel Baker

Nigel Baker: The Scrum Master Mistake of Copy-Pasting Success Instead of Recreating the Journey Read the full Show Notes and search through the world's largest audio library on Agile and Scrum direc...

2 Maalis 16min

The Explicit and Implicit Layers of Unclear Decision Rights | Lai-Ling Su

Lai-Ling Su: The Explicit and Implicit Layers of Unclear Decision Rights Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly on the Scrum Master...

27 Helmi 15min

What Scrum Masters Must Do More of in 2026—Think Like a Business Owner | Lai-Ling Su

Lai-Ling Su: What Scrum Masters Must Do More of in 2026—Think Like a Business Owner Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly on the S...

26 Helmi 13min

When Leadership Changes—Supporting Teams Through the Uncertainty | Lai-Ling Su

Lai-Ling Su: When Leadership Changes—Supporting Teams Through the Uncertainty Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly on the Scrum M...

25 Helmi 13min

Why the Us-Versus-Them Mentality Is the Fastest Path to Team Self-Destruction | Lai-Ling Su

Lai-Ling Su: Why the Us-Versus-Them Mentality Is the Fastest Path to Team Self-Destruction Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly o...

24 Helmi 16min

The Product and Service Story That Every Scrum Master Needs to Hear | Lai-Ling Su

Lai-Ling Su: The Product and Service Story That Every Scrum Master Needs to Hear Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly on the Scru...

23 Helmi 18min

Kaikki yhdessä sovelluksessa

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi yhdessä paikassa.

Sinulle valikoitua sisältöä

Podme-sovelluksessa kokoat suosikkisi helposti omaan kirjastoosi. Saat meiltä myös kuuntelusuosituksia!

Jatka kuuntelua koska tahansa

Voit jatkaa siitä mihin jäit, myös offline-tilassa.

Premium

9,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa

Aloita 14 päivän kokeilu

Premium

13,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa
Yksi lisäkäyttäjä

Kokeile 14 päivää maksutta

Suosittua kategoriassa Politiikka ja uutiset

rss-tasta-on-kyse-ivan-puopolo-verkkouutiset

Tarinat ja äänet, joita rakastat kuunnella

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi

Lue lisää