Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

https://go.dok.community/slack

https://dok.community/

From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)


Many ML pipelines depend on shared filesystems for input, output and intermediate data storage. Standards such as CSI have made it possible for applications in Kubernetes to access a variety of data storage systems. Yet, data scientists still have to deal with low-level details of data access in order to execute their pipelines in Kubernetes. Datashim is a framework that manages the lifecycle of a Dataset object, a CustomResourceDefinition that represents a source of data. Datashim takes care of the details of data access while Kubernetes pods can declaratively access the data by referencing a Dataset in their specifications. This talk will describe Datashim and the Dataset object, discuss its use in ML pipelines, and demonstrate how its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Datashim is an incubating project of the Linux Foundation Data and AI Foundation


Srikumar Venugopal is a Research Scientist in IBM Research Europe in Dublin, Ireland. His research interests lie in the area of cloud computing and large-scale distributed systems, specifically in the topics of middleware, resource management, and scalability. He is the co-founder and current lead for the Datashim project.

Jaksot(243)

The Challenges of Data Processing On Kubernetes - A look at Spark, Flink, Dask, and Ray // Holden Karau (DoK Day North America 2022)

The Challenges of Data Processing On Kubernetes - A look at Spark, Flink, Dask, and Ray // Holden Karau (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT This talk will go through both the improvements that have been made in Kubernetes for batch analytic workloads as well as ...

31 Loka 202220min

Scaling our SaaS offering to thousands of clusters // Dax McDonald (DoK Day North America 2022)

Scaling our SaaS offering to thousands of clusters // Dax McDonald (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT Sourcegraph is a code intelligence platform that helps our customers to understand their code better. As we have scaled up...

29 Loka 202221min

Why we decided to migrate our Jaeger storage to ClickHouse on Kubernetes // Arul Jegadish Francis (DoK Day North America 2022)

Why we decided to migrate our Jaeger storage to ClickHouse on Kubernetes // Arul Jegadish Francis (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) Abstract We at OpsVerse provide a DevOps tools platform with fully-managed open source-based tools. One of our key offerings is a ...

28 Loka 202213min

Building a Digital Factory for the Sheet Metal Industry // Elie Assi (From the DoK Day North America 2022)

Building a Digital Factory for the Sheet Metal Industry // Elie Assi (From the DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) Abstract We develop systems to digitize the sheet metal industry with the belief that they should cooperate with each other in an ...

27 Loka 202220min

How we built our Big Data Stack (almost) entirely on top of Kubernetes // Neylson Crepalde (From DoK Day NA 2022)

How we built our Big Data Stack (almost) entirely on top of Kubernetes // Neylson Crepalde (From DoK Day NA 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) Abstract Working with Terabytes of data is a major challenge for organizations both in terms of architecture and cost. In recent ...

26 Loka 202216min

Dok Talks #153 - CRD Panel // Eyar Zilberman & Álvaro Hernández

Dok Talks #153 - CRD Panel // Eyar Zilberman & Álvaro Hernández

https://go.dok.community/slack https://dok.community We are going to speak about CRDs, and discuss considering them as higher level entities that we normally consider them. CRDs normally are kind of a...

14 Loka 202258min

Dok #152-Running PostgreSQL in Kubernetes:from day 0 to day 2 with CloudNativePG // Gabriele Bartolini

Dok #152-Running PostgreSQL in Kubernetes:from day 0 to day 2 with CloudNativePG // Gabriele Bartolini

https://go.dok.community/slack https://dok.community With: Gabriele Bartolini - Vice President/CTO of Cloud Native and Kubernetes, EDB Bart Farrell - Head of Community, Data on Kubernetes Community...

28 Syys 20221h 3min

Dok Talks #148 - Cost and Kubernetes // Chris Love

Dok Talks #148 - Cost and Kubernetes // Chris Love

https://go.dok.community/slack https://dok.community With: Chris Love - Managing Partner, LionKube Bart Farrell - Head of Community, Data on Kubernetes Community ABSTRACT OF THE TALK Using Kuberne...

27 Syys 202245min