Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

https://go.dok.community/slack

https://dok.community/

From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)


Many ML pipelines depend on shared filesystems for input, output and intermediate data storage. Standards such as CSI have made it possible for applications in Kubernetes to access a variety of data storage systems. Yet, data scientists still have to deal with low-level details of data access in order to execute their pipelines in Kubernetes. Datashim is a framework that manages the lifecycle of a Dataset object, a CustomResourceDefinition that represents a source of data. Datashim takes care of the details of data access while Kubernetes pods can declaratively access the data by referencing a Dataset in their specifications. This talk will describe Datashim and the Dataset object, discuss its use in ML pipelines, and demonstrate how its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Datashim is an incubating project of the Linux Foundation Data and AI Foundation


Srikumar Venugopal is a Research Scientist in IBM Research Europe in Dublin, Ireland. His research interests lie in the area of cloud computing and large-scale distributed systems, specifically in the topics of middleware, resource management, and scalability. He is the co-founder and current lead for the Datashim project.

Jaksot(243)

#28 DoK Community: Getting Started Contributing to Kubernetes // Rin Oliver & Savitha Raghunathan. (Presenter: Bart Farrell)

#28 DoK Community: Getting Started Contributing to Kubernetes // Rin Oliver & Savitha Raghunathan. (Presenter: Bart Farrell)

https://go.dok.community/slack Abstract of the talk… This talk will walk through how to get started contributing to Kubernetes, combatting imposter syndrome, the many other ways you can get started ...

11 Helmi 202156min

#27 DoK Community: Cost management for OpenShift, a new SaaS service to understand your Kubernetes costs // Sergio Ocón

#27 DoK Community: Cost management for OpenShift, a new SaaS service to understand your Kubernetes costs // Sergio Ocón

Abstract of the talk… For IT decision-makers, this goes above and beyond just keeping infrastructure running and efficient; it is about understanding how your IT budget affects your business, and how ...

4 Helmi 202156min

#1 DoK Community Brazil: DevOps, kubernetes and data // Rogeria Portilho (Talk in Portuguese)

#1 DoK Community Brazil: DevOps, kubernetes and data // Rogeria Portilho (Talk in Portuguese)

Abstract of the talk… My experience in this contemporary technology journey of the last 4 years, fears, mistakes, IT paradigms, and agile methodologies impact my goals. Bio… I love working with techn...

31 Tammi 20211h 2min

DoK Nederkube Edition #1: Is Kubernetes ready for Data Management? // Michel de Ru, Jeffry Molanus & Arie van den Bos

DoK Nederkube Edition #1: Is Kubernetes ready for Data Management? // Michel de Ru, Jeffry Molanus & Arie van den Bos

Abstract of the talk… Kubernetes became the standard for micro services architectures. But what about handling massive and scalable data management on top of it? Is it possible and what does it mean f...

30 Tammi 20211h

#26 DoK Community: How to unblock your release pipelines with data // Olaf Molenveld

#26 DoK Community: How to unblock your release pipelines with data // Olaf Molenveld

https://go.dok.community/slack Abstract of the talk… Even though microservices are becoming a pattern, we still see a lot of "monolithical" deploys and manual reactive actions. This blocks the abilit...

28 Tammi 20211h

#25 DoK Community: Deconstructing Postgres into a Cloud Native Platform // Álvaro Hernández

#25 DoK Community: Deconstructing Postgres into a Cloud Native Platform // Álvaro Hernández

https://go.dok.community/slack Abstract of the talk… Is deploying Postgres in Kubernetes just repackaging it into a container? Can’t Postgres leverage the wide range of Cloud-Native software and integ...

21 Tammi 20211h 1min

#1 DoK Community India: "Best practices for overprovisioning in k8s" // Miguel Ángel Mingorance & José Luis Talavera

#1 DoK Community India: "Best practices for overprovisioning in k8s" // Miguel Ángel Mingorance & José Luis Talavera

https://go.dok.community/slack We will discuss how we can implement an efficient solution to overscale a Kubernetes cluster and therefore keep always enough room in the cluster for applications to ...

15 Tammi 202140min

#24 DoK Community: The architecture of a distributed database // Jim Walker, Lisa-Marie Namphy & Keith McClellan

#24 DoK Community: The architecture of a distributed database // Jim Walker, Lisa-Marie Namphy & Keith McClellan

Abstract of the talk… Cockroach Labs has built a database architected from the ground up to be distributed. It is a perfect fit for the cloud and Kubernetes as it naturally scales and survives without...

14 Tammi 20211h 7min