DoK Community #42 Spark on Kubernetes is Now Generally Available: Why & How to Migrate to It // Jean-Yves Stephan

DoK Community #42 Spark on Kubernetes is Now Generally Available: Why & How to Migrate to It // Jean-Yves Stephan

Abstract of the talk…

Apache Spark natively runs on top of Kubernetes (instead of Hadoop YARN) since 2018, but it's only since Spark 3.1 (released in March 2021) that the integration is now officially generally available & production-ready. What is the high-level architecture of Spark on Kubernetes, how does it compare to alternatives, what does the migration look like? These are some of the questions we will answer together. We will first introduce the core concepts, then go through the stories of customers who migrated, and then give you concrete technical tips to help you be successful with Spark (on Kubernetes). If time permits, I may do a risky live demo. This will be a technical talk with very fresh content - I hope you will like it. I plan to make it short enough to make room for Q&A and improvisations based on your request. So let me know if there's something specific you're interested in.

Bio…

I'm one of the co-founders at Data Mechanics (https://www.datamechanics.co), a Cloud-Native Spark Platform for Data Engineers. We're a YCombinator backed startup. We strive to finally make Apache Spark as developer friendly and cost-effective as it should be.. by automating the infrastructure management side (autoscaling, automated sizing of containers, autotuning of Spark configurations) and building intuitive dashboards to help monitor your data pipelines. Prior to Data Mechanics, I was a software engineer at Databricks, where I led their Spark infrastructure team.

Jaksot(243)

Dok Talks #151 - Analytics with Apache Superset and ClickHouse // Vijay Anand Ramakrishnan

Dok Talks #151 - Analytics with Apache Superset and ClickHouse // Vijay Anand Ramakrishnan

https://go.dok.community/slack https://dok.community With: Vijay Anand Ramakrishnan - Database Administrator, ChistaDATA Bart Farrell - Head of Community, Data on Kubernetes Community ABSTRACT OF TH...

23 Syys 202233min

Dok Talks #150 - Building a Simple Postgres Async Streaming Cluster // Julian Fischer

Dok Talks #150 - Building a Simple Postgres Async Streaming Cluster // Julian Fischer

https://go.dok.community/slack https://dok.community With: Julian Fischer - CEO, anynines GmbH Bart Farrell - Head of Community, Data on Kubernetes Community ABSTRACT OF THE TALK In this talk you wi...

23 Syys 20221h 4min

DoK Talks #149 - Overcoming challenges with protecting and migrating data in multi-cloud K8s environments // Sebastian Glab & Martin Phan

DoK Talks #149 - Overcoming challenges with protecting and migrating data in multi-cloud K8s environments // Sebastian Glab & Martin Phan

https://go.dok.community/slack https://dok.community/ With: Sebastian Glab - Cloud Architect, CloudCasa by Catalogic Martin Phan - Field CTO – Americas, CloudCasa by Catalogic Bart Farrell - Head...

16 Syys 202247min

DoK Talks #147 - Evaluating Cloud Native Storage Vendors // Dinesh Majrekar

DoK Talks #147 - Evaluating Cloud Native Storage Vendors // Dinesh Majrekar

https://go.dok.community/slack https://dok.community/ With: Dinesh Majrekar - CTO, Civo Bart Farrell - Head of Community, Data on Kubernetes Community ABSTRACT OF THE TALK In a continuation of ...

5 Syys 20221h

Dok Talks #146 - OpenFeature - Making feature flags a commodity // Oleg Nenashev

Dok Talks #146 - OpenFeature - Making feature flags a commodity // Oleg Nenashev

https://go.dok.community/slack https://dok.community/ With: Oleg Nenashev - Community Builder and Developer Advocate, Dynatrace Bart Farrell - Head of Community, Data on Kubernetes Community AB...

26 Elo 20221h 1min

DoK Talks #145 - Making Hard Things Easy is Hard // Kurt Rinehart

DoK Talks #145 - Making Hard Things Easy is Hard // Kurt Rinehart

https://go.dok.community/slack https://dok.community/ https://youtu.be/6eSWOUzCb4w With: Kurt Rinehart - Director of Information Engineering, Section Bart Farrell - Head of Community, Data on Kube...

19 Elo 202257min

DoK Talks #144 - We will Dok You! - The journey to adopt stateful workloads on k8s // Guy Menahem

DoK Talks #144 - We will Dok You! - The journey to adopt stateful workloads on k8s // Guy Menahem

https://go.dok.community/slack https://dok.community/ https://youtu.be/AjvwG53yLMY With: Guy Menahem - Solution Architect, Komodor Bart Farrell - Head of Community, Data on Kubernetes Community A...

18 Elo 20221h 6min

DoK Talks #142 - Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your Stateful Workload // Peter Schuurman

DoK Talks #142 - Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your Stateful Workload // Peter Schuurman

https://go.dok.community/slack https://dok.community/ ABSTRACT OF THE TALK How do you make sure your Stateful Workloads remain available when your Kubernetes infrastructure updates? This talk wil...

18 Elo 202258min