Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal

https://go.dok.community/slack

https://dok.community/

From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)


Many ML pipelines depend on shared filesystems for input, output and intermediate data storage. Standards such as CSI have made it possible for applications in Kubernetes to access a variety of data storage systems. Yet, data scientists still have to deal with low-level details of data access in order to execute their pipelines in Kubernetes. Datashim is a framework that manages the lifecycle of a Dataset object, a CustomResourceDefinition that represents a source of data. Datashim takes care of the details of data access while Kubernetes pods can declaratively access the data by referencing a Dataset in their specifications. This talk will describe Datashim and the Dataset object, discuss its use in ML pipelines, and demonstrate how its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Datashim is an incubating project of the Linux Foundation Data and AI Foundation


Srikumar Venugopal is a Research Scientist in IBM Research Europe in Dublin, Ireland. His research interests lie in the area of cloud computing and large-scale distributed systems, specifically in the topics of middleware, resource management, and scalability. He is the co-founder and current lead for the Datashim project.

Jaksot(243)

DoK Talks #67- Run Apache APISIX in Kubernetes // Jintao Zhang

DoK Talks #67- Run Apache APISIX in Kubernetes // Jintao Zhang

Abstract of the talk… Apache APISIX is a dynamic, real-time, high-performance API gateway. You can use Apache APISIX to handle traditional north-south traffic, as well as east-west traffic between ser...

2 Elo 202143min

DoK #69- To Certify or Not to Certify, is Kubernetes Certification Worth it? // Keith McClellan

DoK #69- To Certify or Not to Certify, is Kubernetes Certification Worth it? // Keith McClellan

Abstract of the talk… As an engineer, should I consider getting a certification? What makes a certification valuable to me or my employer? How do I pick which one to get? Will these really help me bui...

2 Elo 20211h 9min

DoK Talks #68- The Kubernetes-native way to providing database services to developers // Adam Sandor

DoK Talks #68- The Kubernetes-native way to providing database services to developers // Adam Sandor

Bio… Adam is a Solutions Architect at Styra, helping companies adopt Cloud Native tech. Coming from a Java-dev background he is most excited about the space where software development and operations m...

28 Heinä 202158min

DoK #66 Crossplane Packages as a Distribution Mechanism // Daniel Mangum

DoK #66 Crossplane Packages as a Distribution Mechanism // Daniel Mangum

Abstract of the talk… A typical user's journey with Crossplane starts with provisioning infrastructure using the Kubernetes API, then evolves to composing infrastructure into higher level abstractions...

21 Heinä 20211h 5min

DoK #65 Using Kubernetes and ClickHouse to enable high performance app analytics // Robert Hodges

DoK #65 Using Kubernetes and ClickHouse to enable high performance app analytics // Robert Hodges

Abstract of the talk… Embedded analytics are a major source of value to application users. Virtually every SaaS offering has them or is adding them now. This talk shows how to build low latency analyt...

16 Heinä 20211h 6min

DoK #63 Stranger Danger - Kubernetes Edition // Matt Jarvis

DoK #63 Stranger Danger - Kubernetes Edition // Matt Jarvis

Abstract of the talk… Kubernetes is a powerful set of abstractions, but it's flexibility and configurability means it's pretty insecure by default. In this hands on talk, I'll show how an attacker can...

13 Heinä 20211h 6min

DoK #62 Easy Kubernetes Volumes using Longhorn // Saiyam Pathak

DoK #62 Easy Kubernetes Volumes using Longhorn // Saiyam Pathak

Abstract of the talk… Longhorn is a lightweight, reliable, and powerful distributed block storage system for Kubernetes. It is an open source tool that can be installed on any Kubernetes Cluster. It h...

13 Heinä 20211h 10min

DoK #61 Perfecting Machine Learning Workloads on Kubernetes // Lars Suanet

DoK #61 Perfecting Machine Learning Workloads on Kubernetes // Lars Suanet

Abstract of the talk… More and more applications are powered by Machine Learning (ML) models. Where the gap between Software Engineers and a Production environment on Kubernetes is already big, the ga...

2 Heinä 20211h 4min