DoK Community #42 Spark on Kubernetes is Now Generally Available: Why & How to Migrate to It // Jean-Yves Stephan

DoK Community #42 Spark on Kubernetes is Now Generally Available: Why & How to Migrate to It // Jean-Yves Stephan

Abstract of the talk…

Apache Spark natively runs on top of Kubernetes (instead of Hadoop YARN) since 2018, but it's only since Spark 3.1 (released in March 2021) that the integration is now officially generally available & production-ready. What is the high-level architecture of Spark on Kubernetes, how does it compare to alternatives, what does the migration look like? These are some of the questions we will answer together. We will first introduce the core concepts, then go through the stories of customers who migrated, and then give you concrete technical tips to help you be successful with Spark (on Kubernetes). If time permits, I may do a risky live demo. This will be a technical talk with very fresh content - I hope you will like it. I plan to make it short enough to make room for Q&A and improvisations based on your request. So let me know if there's something specific you're interested in.

Bio…

I'm one of the co-founders at Data Mechanics (https://www.datamechanics.co), a Cloud-Native Spark Platform for Data Engineers. We're a YCombinator backed startup. We strive to finally make Apache Spark as developer friendly and cost-effective as it should be.. by automating the infrastructure management side (autoscaling, automated sizing of containers, autotuning of Spark configurations) and building intuitive dashboards to help monitor your data pipelines. Prior to Data Mechanics, I was a software engineer at Databricks, where I led their Spark infrastructure team.

Jaksot(243)

Highly Available Postgres Clusters In Kubernetes // John Long & Jonathan Gonzalez (DoK Day North America 2022)

Highly Available Postgres Clusters In Kubernetes // John Long & Jonathan Gonzalez (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT A practical session about running Highly Available PostgreSQL in Kubernetes. The primary objective will be to demonstr...

2 Marras 202215min

Inter-Cluster PostreSQL on Kubernetes // Julian Fischer (DoK Day North America 2022)

Inter-Cluster PostreSQL on Kubernetes // Julian Fischer (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT In this talk you’ll explore how to run a PostgreSQL cluster across multiple Kubernetes clusters. Learn what challenges ar...

2 Marras 202217min

Open Source Databases on Kubernetes- Best Practices // Peter Zaitsev (DoK Day North America 2022)

Open Source Databases on Kubernetes- Best Practices // Peter Zaitsev (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT So you’re looking to run your Open Source Database on Kubernetes. What best practices should you follow and what pitfall...

2 Marras 202216min

The Kubernetes Native Database // Jeffrey Carpenter (DoK Day North America 2022)

The Kubernetes Native Database // Jeffrey Carpenter (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT In the software industry we’re fond of terms that define major trends, like “cloud native”, “Kubernetes native” and “server...

2 Marras 202216min

Databases on Kubernetes: Why are they important? // With Bhavin Shah, Xing Yang, Gabriele Bartolini & Patrick McFadin (DoK Day North America 2022)

Databases on Kubernetes: Why are they important? // With Bhavin Shah, Xing Yang, Gabriele Bartolini & Patrick McFadin (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT Kubernetes has crossed the chasm, but what about stateful applications and databases? Join us for this panel discussion and...

2 Marras 202234min

Data streaming on Kubernetes // Yaniv Ben Hemo (DoK Day North America 2022)

Data streaming on Kubernetes // Yaniv Ben Hemo (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT I will cover what is the current data streaming on k8s landscape, why it is important, use cases, and what are the challeng...

2 Marras 202213min

Architecting Your First Event Driven Serverless Streaming Applications on K8 // Timothy Spann (DoK Day North America 2022)

Architecting Your First Event Driven Serverless Streaming Applications on K8 // Timothy Spann (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT Once you have built a topic in Apache Pulsar, you will quickly see the need to build event-driven applications. This can r...

2 Marras 202213min

Fybrik - A Kubernetes based platform for governed data use // Flora Gilboa-Solomon, Alexey Roytman, Maryna Strelchuk & Barry Hijkoop (DoK Day North America 2022)

Fybrik - A Kubernetes based platform for governed data use // Flora Gilboa-Solomon, Alexey Roytman, Maryna Strelchuk & Barry Hijkoop (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT Data is the foundation for business value. However, in many enterprises, it is spread across different data stores, public...

1 Marras 202220min