Apache Beam with Kenneth Knowles and Pablo Estrada

Apache Beam with Kenneth Knowles and Pablo Estrada

On the podcast this week, your hosts Stephanie Wong and Mark Mirchandani talk about the data processing tool Apache Beam with guests Pablo Estrada and Kenneth Knowles.

Kenn starts us off with an overview of how Apache Beam began and how Cloud Dataflow was involved. The unique batch and stream method and emphasis on correctness garnered support from developers early on and continues to attract users. Pablo helps us understand why Beam is a better option for certain projects looking to process large amounts of data. Our guests describe how Beam may be a better fit than microservices that could become obsolete as company needs change.

Next, we step back and take a look at why batch and stream is the gold standard of data processing because of its balance between low latency and ease of "being done" with data collection. Beam's focus on the correctness of data and correctness in processing that data is a core component. With good data, processing becomes easier, more reliable, and cheaper. Kenn gives examples of how things can go wrong with bad data processing. Beam strives for the perfect combination of low latency, correct data, and affordability. Users can choose where to run Beam pipelines, from other Apache software offerings to Dataflow, which means excellent flexibility. Our guests talk about the pros and cons of some of these options and we hear examples of how companies are using Beam along with supporting software to solve data processing challenges.

To get started with Beam, check out Beam College or attend Beam Summit 2022.

Kenneth Knowles

Kenn Knowles is chair of the Apache Beam Project Management Committee. Kenn has been working on Google Cloud Dataflow—Google's Beam backend—since 2014. Kenn holds a PhD in programming languages from the University of California, Santa Cruz.

Pablo Estrada

Pablo is a Software Engineer at Google, and a management committee member for Apache Beam. Pablo is big into working on an open source project, and has worked all across the Apache Beam stack.

Cool things of the week
  • Under the sea: Building the world's fiber optic internet video
  • Google Data Cloud Summit site
  • It's official—Google Distributed Cloud Edge is generally available blog
    • GCP Podcast Episode 228: Fastly with Tyler McMullen podcast
  • Save big by temporarily suspending unneeded Compute Engine VMs—now GA blog
Interview
  • Apache Beam site
  • Apache Beam Documentation site
  • Dataflow site
  • Apache Flink site
  • Apache Spark site
  • Apache Samza site
  • Apache Nemo site
  • Spanner site
  • BigQuery site
  • Beam College site
  • Beam College on Github site
  • Beam Developer Mailing List email
  • Beam User Mailing List email
  • Beam Summit site
What's something cool you're working on?

Mark is working on a new Apache Beam video series Getting Started Wtih Apache Beam

Hosts

Stephanie Wong and Mark Mirchandani

Episoder(335)

AlloyDB with Sandy Ghai and Gurmeet "GG" Goindi

AlloyDB with Sandy Ghai and Gurmeet "GG" Goindi

AlloyDB for PostgreSQL has launched and hosts Mark Mirchandani and Gabe Weiss are here this week to talk about it with guests Sandy Ghai and Gurmeet Goindi. This fully managed, Postgres compatible dat...

18 Mai 202247min

Geospatial Awakening in Global Supply Chains with Nathan Eaton and Denise Pearl

Geospatial Awakening in Global Supply Chains with Nathan Eaton and Denise Pearl

This week, Googler Denise Pearl and NGIS Executive Director Nathan Eaton join hosts Alexandrina Garcia-Verdin and Donna Schut to talk about how modern technology and data collection can significantly ...

4 Mai 202256min

BigLake with Gaurav Saxena and Justin Levandoski

BigLake with Gaurav Saxena and Justin Levandoski

Stephanie Wong and Debi Cabrera are learning all about BigLake from guests Gaurav Saxena and Justin Levandoski of the BigQuery team. BigLake offers unified data management from both data warehouses an...

27 Apr 202241min

Spanner Myths Busted with Pritam Shah and Vaibhav Govil

Spanner Myths Busted with Pritam Shah and Vaibhav Govil

This week, we're busting myths around Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. Mark Mirchandani and Max Saltonstall host this episode and learn about the fantastic capabilities of ...

20 Apr 202235min

GKE Gateway Controller with Bowei Du and Abdelfettah Sghiouar

GKE Gateway Controller with Bowei Du and Abdelfettah Sghiouar

Hosts Anthony Bushong and Kaslin Fields welcome Bowei Du and Abdelfettah Sghiouar to talk about the Gateway Controller, a tool that helps developers use the Gateway API in GKE. Bowei starts the show w...

13 Apr 202236min

Apache Beam with Kenneth Knowles and Pablo Estrada

Apache Beam with Kenneth Knowles and Pablo Estrada

On the podcast this week, your hosts Stephanie Wong and Mark Mirchandani talk about the data processing tool Apache Beam with guests Pablo Estrada and Kenneth Knowles. Kenn starts us off with an overv...

6 Apr 202239min

Celebrating Women's History Month with Vidya Nagarajan Raman

Celebrating Women's History Month with Vidya Nagarajan Raman

Stephanie Wong and Debi Cabrera host a special episode highlighting the amazing accomplishments of our guest Vidya Nagarajan Raman as we celebrate Women's History Month! With her more than 20 years of...

30 Mar 202241min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
forklart
i-retten
popradet
stopp-verden
aftenpodden-usa
lydartikler-fra-aftenposten
rss-gukild-johaug
det-store-bildet
fotballpodden-2
dine-penger-pengeradet
nokon-ma-ga
rss-ness
hanna-de-heldige
aftenbla-bla
frokostshowet-pa-p5
rss-penger-polser-og-politikk
e24-podden
rss-utenrikskomiteen-med-bogen-og-grasvik