Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

https://go.dok.community/slack

https://dok.community/

From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)


Many ML pipelines depend on shared filesystems for input, output and intermediate data storage. Standards such as CSI have made it possible for applications in Kubernetes to access a variety of data storage systems. Yet, data scientists still have to deal with low-level details of data access in order to execute their pipelines in Kubernetes. Datashim is a framework that manages the lifecycle of a Dataset object, a CustomResourceDefinition that represents a source of data. Datashim takes care of the details of data access while Kubernetes pods can declaratively access the data by referencing a Dataset in their specifications. This talk will describe Datashim and the Dataset object, discuss its use in ML pipelines, and demonstrate how its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Datashim is an incubating project of the Linux Foundation Data and AI Foundation


Srikumar Venugopal is a Research Scientist in IBM Research Europe in Dublin, Ireland. His research interests lie in the area of cloud computing and large-scale distributed systems, specifically in the topics of middleware, resource management, and scalability. He is the co-founder and current lead for the Datashim project.

Jaksot(243)

Tech with project RapGOD (DoK Day EU 2022) // Abhijith Ganesh

Tech with project RapGOD (DoK Day EU 2022) // Abhijith Ganesh

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) The Rap God project acts as a great entry point to many incoming open-source enthusiasts ...

27 Touko 20228min

Serverless Event Streaming Applications as Functions on K8 (DoK Day EU 2022) // Timothy Spann

Serverless Event Streaming Applications as Functions on K8 (DoK Day EU 2022) // Timothy Spann

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) We will walk through how to build serverless event streaming applications as functions ru...

27 Touko 20228min

Running Kafka on Kubernetes, across three clouds at Adobe (DoK Day EU 2022) // Adi Muraru

Running Kafka on Kubernetes, across three clouds at Adobe (DoK Day EU 2022) // Adi Muraru

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Adobe runs dozens of Kafka clusters spread across both public (AWS and Azure) and private...

27 Touko 202216min

Running a database on local NVMes on Kubernetes (DoK Day EU 2022) // Tomáš Nožička & Maciej Zimnoch

Running a database on local NVMes on Kubernetes (DoK Day EU 2022) // Tomáš Nožička & Maciej Zimnoch

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Running a database on Kubernetes with persistent storage is relatively easy but when it c...

27 Touko 20229min

Resilient Redis (DoK Day EU 2022) // Hrittik Roy & Ryan Gray

Resilient Redis (DoK Day EU 2022) // Hrittik Roy & Ryan Gray

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Redis is a widely used open-source in-memory data store and cache that has become a key c...

27 Touko 20227min

PV TrashCan - Protection against accidental deletion of PVs or Namespaces (DoK Day EU 2022) // Veda Talakad, Aditya Kulkarni & Aditya Dani

PV TrashCan - Protection against accidental deletion of PVs or Namespaces (DoK Day EU 2022) // Veda Talakad, Aditya Kulkarni & Aditya Dani

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Accidental PVC delete or namespace delete can cause the Persistent Volume to get deleted....

27 Touko 202211min

Protecting data with CSI Volume Snapshots on Kubernetes (DoK Day EU 2022) // Grant Griffiths

Protecting data with CSI Volume Snapshots on Kubernetes (DoK Day EU 2022) // Grant Griffiths

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) The container storage interface (CSI) is a contract between different container orchestra...

27 Touko 202211min

Operator Lifecycle Management (DoK Day EU 2022) // Julian Fischer

Operator Lifecycle Management (DoK Day EU 2022) // Julian Fischer

https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) The ability to extend Kubernetes with Custom Resource Definitions and respective controll...

27 Touko 202215min