#37 DoK Community: Running Data Replication Pipelines on Kubernetes with Argo // Stephen Bailey

#37 DoK Community: Running Data Replication Pipelines on Kubernetes with Argo // Stephen Bailey

Abstract of the talk…

Hundreds of data teams have migrated to the ELT pattern in recent years, leveraging SaaS tools like Stitch or FiveTran to reliably load data into their infrastructure. These SaaS offerings are outstanding and can accelerate your time to production significantly. However, many teams prefer to roll their own tools. One solution in these cases is to deploy singer.io taps and targets — Python scripts that can perform data replication between arbitrary sources and destinations. The Singer specification is the foundation for the popular Stitch SaaS, and it is also leveraged by a number of independent consultants and data projects. Singer pipelines are highly modular. You can pipe any tap to any target to build a data pipeline that fits your needs, making them a good fit for containerized workflows. This article walks through the workflow at a high level and provides some example code to get up and running with some shared templates. I also drill into reasons for choosing the Argo approach over other orchestration tools like Airflow or Dagster, and the implications from a team perspective.

Bio…

Stephen Bailey is Director of Growth Analytics at Immuta, where he strives to implement privacy best practices while delivering business value from data. He loves to teach and learn, on just about any subject. He holds a PhD in educational cognitive neuroscience from Vanderbilt and enjoys reading philosophy

Jaksot(243)

What's New in Kubernetes Storage (DoK Day EU 2022) // Xing Yang

What's New in Kubernetes Storage (DoK Day EU 2022) // Xing Yang

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Kubernetes SIG Storage is responsible for ensuring storage is available for containers in...

28 Touko 20229min

What we've learned from running a PostgreSQL managed service on Kubernetes (DoK Day EU 2022) // Oleksii Kliukin

What we've learned from running a PostgreSQL managed service on Kubernetes (DoK Day EU 2022) // Oleksii Kliukin

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Kubernetes is an emerging platform of choice for deploying and running PostgresSQL. Deplo...

28 Touko 202211min

Weathering The Cloud Storm- Modern Data Management Patterns for Reliability and Availability (DoK Day EU 2022) // Denis Magda

Weathering The Cloud Storm- Modern Data Management Patterns for Reliability and Availability (DoK Day EU 2022) // Denis Magda

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) “Zero downtime” and “always-on” are illusions. All systems fail sooner or later, whether ...

28 Touko 202210min

Using Kubernetes to deliver a “serverless” service (DoK Day EU 2022) // Jim Walker

Using Kubernetes to deliver a “serverless” service (DoK Day EU 2022) // Jim Walker

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Serverless promises to change the way we consume software. It allows us to potentially pa...

28 Touko 202220min

The many uses of Kubernetes cross cluster migration of persistent data (DoK Day EU 2022) // Ryan Kaw

The many uses of Kubernetes cross cluster migration of persistent data (DoK Day EU 2022) // Ryan Kaw

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Multiple clusters exist in most Kubernetes environments today, and number of clusters wil...

28 Touko 20227min

The future of data on Kubernetes with Adobe and CNCF (DoK Day EU 2022) // Joseph Sandoval, Xing Yang & Sylvain Kalache

The future of data on Kubernetes with Adobe and CNCF (DoK Day EU 2022) // Joseph Sandoval, Xing Yang & Sylvain Kalache

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Some data-intensive workloads are easier to run in Kubernetes than others. Why? What need...

28 Touko 202217min

The Data on Kubernetes Landscape (DoK Day EU 2022) // Melissa Logan & Sylvain Kalache

The Data on Kubernetes Landscape (DoK Day EU 2022) // Melissa Logan & Sylvain Kalache

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) We know from the first Data on Kubernetes Report that 90% of respondents believe Kubernet...

27 Touko 202210min

Testing the Mettle- Evaluating data solutions for large-scale production to check who stacks up (DoK Day EU 2022) // Dinesh Majrekar

Testing the Mettle- Evaluating data solutions for large-scale production to check who stacks up (DoK Day EU 2022) // Dinesh Majrekar

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) The state of the CNCF Storage options has exploded in the past few years, but if you had ...

27 Touko 20229min