Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

https://go.dok.community/slack

https://dok.community/

From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)


Many ML pipelines depend on shared filesystems for input, output and intermediate data storage. Standards such as CSI have made it possible for applications in Kubernetes to access a variety of data storage systems. Yet, data scientists still have to deal with low-level details of data access in order to execute their pipelines in Kubernetes. Datashim is a framework that manages the lifecycle of a Dataset object, a CustomResourceDefinition that represents a source of data. Datashim takes care of the details of data access while Kubernetes pods can declaratively access the data by referencing a Dataset in their specifications. This talk will describe Datashim and the Dataset object, discuss its use in ML pipelines, and demonstrate how its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Datashim is an incubating project of the Linux Foundation Data and AI Foundation


Srikumar Venugopal is a Research Scientist in IBM Research Europe in Dublin, Ireland. His research interests lie in the area of cloud computing and large-scale distributed systems, specifically in the topics of middleware, resource management, and scalability. He is the co-founder and current lead for the Datashim project.

Jaksot(243)

Dok Talks #123 - Can Data Become a Declarative Resource? // Roey Libfeld, Michael Greenberg & Uri Zaidenwerg

Dok Talks #123 - Can Data Become a Declarative Resource? // Roey Libfeld, Michael Greenberg & Uri Zaidenwerg

https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK Most K8s users find stateful K8s deployments challenging, to say the least, when persistent data is involved the declarative, ...

17 Maalis 20221h 7min

Dok Talks #122 - Operationalizing a Data Infrastructure Stack on Kubernetes

Dok Talks #122 - Operationalizing a Data Infrastructure Stack on Kubernetes

https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK Kubernetes is massively powerful, but there are still a large number of details that are needed to get right before really lev...

16 Maalis 202236min

Dok Student Sessions - Contributing to Cloud Native Glossary // Kunal Verma

Dok Student Sessions - Contributing to Cloud Native Glossary // Kunal Verma

https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK In this session, we'll be talking about a new open source project in the CNCF community i.e. the Cloud Native Glossary. The ma...

16 Maalis 202238min

Dok Talks #121 - Running Stateful Apps in Kubernetes Made Simple // Steve Buchanan

Dok Talks #121 - Running Stateful Apps in Kubernetes Made Simple // Steve Buchanan

https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK Eventually the time will come to run a stateful app in Kubernetes. This can be a scary thing adding more moving parts to a Kub...

11 Maalis 20221h

Dok Talks #120 - A Gentle Introduction to Building Data Intensive Applications // Joe Karlsson

Dok Talks #120 - A Gentle Introduction to Building Data Intensive Applications // Joe Karlsson

https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK We all know that data intensive applications have had explosive growth in the past decade. Data now drives significant portion...

9 Maalis 20221h 1min

Dok Talks #119 - Cloud-Native Data Pipelines // Hakan Lofcali

Dok Talks #119 - Cloud-Native Data Pipelines // Hakan Lofcali

https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK This talk walks you through our stack, architecture, and processes. We develop tools to deploy and run data-driven application...

4 Maalis 202253min

Dok Talks #118 - Troubleshooting ClickHouse Performance // Shiv Lyer

Dok Talks #118 - Troubleshooting ClickHouse Performance // Shiv Lyer

https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK This talk is about how I use several tools, technologies and processes to troubleshoot ClicHouse Performance. I will be talkin...

2 Maalis 20221h 2min

Dok Talks #117 - Why you should care about data mesh // Luke Feeney

Dok Talks #117 - Why you should care about data mesh // Luke Feeney

https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK Data mesh is a new approach for designing modern data architectures by embracing organizational constructs as well as technolo...

24 Helmi 202258min