Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

https://go.dok.community/slack

https://dok.community/

From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)


Many ML pipelines depend on shared filesystems for input, output and intermediate data storage. Standards such as CSI have made it possible for applications in Kubernetes to access a variety of data storage systems. Yet, data scientists still have to deal with low-level details of data access in order to execute their pipelines in Kubernetes. Datashim is a framework that manages the lifecycle of a Dataset object, a CustomResourceDefinition that represents a source of data. Datashim takes care of the details of data access while Kubernetes pods can declaratively access the data by referencing a Dataset in their specifications. This talk will describe Datashim and the Dataset object, discuss its use in ML pipelines, and demonstrate how its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Datashim is an incubating project of the Linux Foundation Data and AI Foundation


Srikumar Venugopal is a Research Scientist in IBM Research Europe in Dublin, Ireland. His research interests lie in the area of cloud computing and large-scale distributed systems, specifically in the topics of middleware, resource management, and scalability. He is the co-founder and current lead for the Datashim project.

Jaksot(243)

#39 DoK Community: A fireside chat with Jérôme Petazzoni // Jérôme Petazzoni

#39 DoK Community: A fireside chat with Jérôme Petazzoni // Jérôme Petazzoni

Abstract of the talk… A fireside chat with Jérôme Petazzoni in which we will get to know him up close and personal, ask him about how his personal music projects influence his professional work, and a...

11 Huhti 20211h 3min

#38 DoK Community: Patterns to create stateful applications on Kubernetes // Prashant Ghildiyal

#38 DoK Community: Patterns to create stateful applications on Kubernetes // Prashant Ghildiyal

Abstract of the talk… In this talk we will discuss what are the best patterns to create stateful applications on top of Kubernetes. This will include application layer caching, embeddable database as ...

8 Huhti 20211h 10min

Dok en español #2 ¡Suelten el Krake! Trayendo la Energía al Lazo de Cómputo // Juan A. Fraire

Dok en español #2 ¡Suelten el Krake! Trayendo la Energía al Lazo de Cómputo // Juan A. Fraire

Abstract of the talk… ENG: Cloud&Heat has always focused on providing energy-efficient data centers. In the last 8 years, we have developed an innovative water cooling technology for servers, converti...

27 Maalis 202157min

#29 DoK Community: How Absa Developed Cloud Native Global Load Balancer for Kubernetes // Yury Tsarev

#29 DoK Community: How Absa Developed Cloud Native Global Load Balancer for Kubernetes // Yury Tsarev

Abstract of the talk… Global load balancing, commonly referred to as GSLB (Global Server Load Balancing) solutions, have typically been the domain of proprietary network software and hardware vendors ...

27 Maalis 202154min

DoK en español #1- Nuestros aprendizajes con Kubernetes // Aitor Artola, Miriam González, Raquel López Ruiz e Isidro Nistal

DoK en español #1- Nuestros aprendizajes con Kubernetes // Aitor Artola, Miriam González, Raquel López Ruiz e Isidro Nistal

Our learnings from Kubernetes

27 Maalis 20211h 6min

#37 DoK Community: Running Data Replication Pipelines on Kubernetes with Argo // Stephen Bailey

#37 DoK Community: Running Data Replication Pipelines on Kubernetes with Argo // Stephen Bailey

Abstract of the talk… Hundreds of data teams have migrated to the ELT pattern in recent years, leveraging SaaS tools like Stitch or FiveTran to reliably load data into their infrastructure. These SaaS...

25 Maalis 20211h 2min

My questions about Data on K8s // Kunal Kushwaha

My questions about Data on K8s // Kunal Kushwaha

Bio… Junior pursuing Computer Science & Engineering. Co-founder at Code for Cause. CNCF Intern 2020. MLH Coach. Google Summer of Code Mentor. YouTuber. Gold Microsoft Learn Student Ambassador.

21 Maalis 202154min

#36 DoK Community: A Snapshot of DevOps // Tiffany Jachja

#36 DoK Community: A Snapshot of DevOps // Tiffany Jachja

Abstract of the talk… DevOps is like a camera. We focus on what's important, we capture the good times, we develop from the negatives, and if things don't work out, we take another shot. Many teams es...

20 Maalis 20211h 6min