#37 DoK Community: Running Data Replication Pipelines on Kubernetes with Argo // Stephen Bailey

#37 DoK Community: Running Data Replication Pipelines on Kubernetes with Argo // Stephen Bailey

Abstract of the talk…

Hundreds of data teams have migrated to the ELT pattern in recent years, leveraging SaaS tools like Stitch or FiveTran to reliably load data into their infrastructure. These SaaS offerings are outstanding and can accelerate your time to production significantly. However, many teams prefer to roll their own tools. One solution in these cases is to deploy singer.io taps and targets — Python scripts that can perform data replication between arbitrary sources and destinations. The Singer specification is the foundation for the popular Stitch SaaS, and it is also leveraged by a number of independent consultants and data projects. Singer pipelines are highly modular. You can pipe any tap to any target to build a data pipeline that fits your needs, making them a good fit for containerized workflows. This article walks through the workflow at a high level and provides some example code to get up and running with some shared templates. I also drill into reasons for choosing the Argo approach over other orchestration tools like Airflow or Dagster, and the implications from a team perspective.

Bio…

Stephen Bailey is Director of Growth Analytics at Immuta, where he strives to implement privacy best practices while delivering business value from data. He loves to teach and learn, on just about any subject. He holds a PhD in educational cognitive neuroscience from Vanderbilt and enjoys reading philosophy

Jaksot(243)

Tech with project RapGOD (DoK Day EU 2022) // Abhijith Ganesh

Tech with project RapGOD (DoK Day EU 2022) // Abhijith Ganesh

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) The Rap God project acts as a great entry point to many incoming open-source enthusiasts ...

27 Touko 20228min

Serverless Event Streaming Applications as Functions on K8 (DoK Day EU 2022) // Timothy Spann

Serverless Event Streaming Applications as Functions on K8 (DoK Day EU 2022) // Timothy Spann

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) We will walk through how to build serverless event streaming applications as functions ru...

27 Touko 20228min

Running Kafka on Kubernetes, across three clouds at Adobe (DoK Day EU 2022) // Adi Muraru

Running Kafka on Kubernetes, across three clouds at Adobe (DoK Day EU 2022) // Adi Muraru

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Adobe runs dozens of Kafka clusters spread across both public (AWS and Azure) and private...

27 Touko 202216min

Running a database on local NVMes on Kubernetes (DoK Day EU 2022) // Tomáš Nožička & Maciej Zimnoch

Running a database on local NVMes on Kubernetes (DoK Day EU 2022) // Tomáš Nožička & Maciej Zimnoch

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Running a database on Kubernetes with persistent storage is relatively easy but when it c...

27 Touko 20229min

Resilient Redis (DoK Day EU 2022) // Hrittik Roy & Ryan Gray

Resilient Redis (DoK Day EU 2022) // Hrittik Roy & Ryan Gray

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Redis is a widely used open-source in-memory data store and cache that has become a key c...

27 Touko 20227min

PV TrashCan - Protection against accidental deletion of PVs or Namespaces (DoK Day EU 2022) // Veda Talakad, Aditya Kulkarni & Aditya Dani

PV TrashCan - Protection against accidental deletion of PVs or Namespaces (DoK Day EU 2022) // Veda Talakad, Aditya Kulkarni & Aditya Dani

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Accidental PVC delete or namespace delete can cause the Persistent Volume to get deleted....

27 Touko 202211min

Protecting data with CSI Volume Snapshots on Kubernetes (DoK Day EU 2022) // Grant Griffiths

Protecting data with CSI Volume Snapshots on Kubernetes (DoK Day EU 2022) // Grant Griffiths

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) The container storage interface (CSI) is a contract between different container orchestra...

27 Touko 202211min

Operator Lifecycle Management (DoK Day EU 2022) // Julian Fischer

Operator Lifecycle Management (DoK Day EU 2022) // Julian Fischer

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) The ability to extend Kubernetes with Custom Resource Definitions and respective controll...

27 Touko 202215min