DoK Talks #141 - Dossier: multi-tenant distributed Jupyter Notebooks // Iacoppo Colonnelli & Dario Tranchitella

DoK Talks #141 - Dossier: multi-tenant distributed Jupyter Notebooks // Iacoppo Colonnelli & Dario Tranchitella

https://go.dok.community/slack
https://dok.community

ABSTRACT OF THE TALK

When providing data analysis as a service, one must tackle several problems. Data privacy and protection by design are crucial when working on sensitive data. Performance and scalability are fundamental for compute-intensive workloads, e.g. training Deep Neural Networks. User-friendly interfaces and fast prototyping tools are essential to allow domain experts to experiment with new techniques. Portability and reproducibility are necessary to assess the actual value of results.

Kubernetes is the best platform to provide reliable, elastic, and maintainable services. However, Kubernetes alone is not enough to achieve large-scale multi-tenant reproducible data analysis. OOTB support for multi-tenancy is too rough, with only two levels of segregation (i.e. the single namespace or the entire cluster). Offloading computation to off-cluster resources is non-trivial and requires the user's manual configuration. Also, Jupyter Notebooks per se cannot provide much scalability (they execute locally and sequentially) and reproducibility (users can run cells in any order and any number of times).

The Dossier platform allows system administrators to manage multi-tenant distributed Jupyter Notebooks at the cluster level in the Kubernetes way, i.e. through CRDs. Namespaces are aggregated in Tenants, and all security and accountability aspects are managed at that level. Each Notebook spawns into a user-dedicated namespace, subject to all Tenant-level constraints. Users can rely on provisioned resources, either in-cluster worker nodes or external resources like HPC facilities. Plus, they can plug their computing nodes in a BYOD fashion. Notebooks are interpreted as distributed workflows, where each cell is a task that one can offload to a different location in charge of its execution.

BIO

Iacopo Colonnelli is a Computer Science research fellow. He received his Ph.D. with honours in Modeling and Data Science at Università di Torino with a thesis on novel workflow models for heterogeneous distributed systems, and his master’s degree in Computer Engineering from Politecnico di Torino with a thesis on a high-performance parallel tracking algorithm for the ALICE experiment at CERN. His research focuses on both statistical and computational aspects of data analysis at large scale and on workflow modeling and management in heterogeneous distributed architectures.

Dario is an SWE that turned DevOps, and he's regretting this choice day by day. Besides making memes on Twitter that gain more reactions than technical discussions, leading the development of Open Source projects at CLASTIX, an Open Source-based start-up focusing on Multi-Tenancy in Kubernetes.

KEY TAKE-AWAYS FROM THE TALK

From this talk, people will learn:
- The different requirements of Data analysis as a service
- How to configure for multi-tenancy at the cluster level with Capsule
- How to write distributed workflows as Notebooks with Jupyter Workflows
- How to combine all these aspects into a single platform: Dossier

All the software presented in the talk is OpenSource, so attendees can directly play with them and include them in their experiments with no additional restrictions.



Avsnitt(243)

Implementing Data & Databases on K8s within the Dutch Government | DoKC Town Hall

Implementing Data & Databases on K8s within the Dutch Government | DoKC Town Hall

Implementing Data & Databases on K8s within the Dutch GovernmentPresented by Sebastiaan Mannem, Director at Mannem Solutions A small walkthrough of projects within the Dutch government running databas...

13 Feb 202444min

Unsticking Ourselves from Glue: Migrating PayIt’s Data Pipelines to Argo Workflows and Hera | DoKC Town Hall

Unsticking Ourselves from Glue: Migrating PayIt’s Data Pipelines to Argo Workflows and Hera | DoKC Town Hall

Unsticking Ourselves from Glue: Migrating PayIt’s Data Pipelines to Argo Workflows and HeraPresented by Matt Menzenski, Senior Software Engineering Manager, Payitgov At PayIt, we’ve been deploying app...

6 Feb 202423min

Repel Boarders! How to find a Kubernetes operator that really protects your data | DoKC Town Hall

Repel Boarders! How to find a Kubernetes operator that really protects your data | DoKC Town Hall

Repel Boarders! How to find a Kubernetes operator that really protects your dataPresented by Robert Hodges, AltinityOperators are a godsend for managing data in Kubernetes. But how about protecting it...

30 Jan 202419min

DoK + Apache Spark | DoKC Town Hall

DoK + Apache Spark | DoKC Town Hall

DoK + Apache SparkPresented by Holden Karau, Spark Committer and Open Source Engineer at NetflixIn this brief talk, Holden will cover some of the best practices from trying to deploy both small and la...

23 Jan 202419min

DoK @ Comcast - Deliver Business Outcomes & Improved DevX with Data Services on K8s | DoKC Town Hall

DoK @ Comcast - Deliver Business Outcomes & Improved DevX with Data Services on K8s | DoKC Town Hall

DoK @ Comcast: Delivering Business Outcomes & Improved DevX with Data Services Running on Kubernetes Presented by Greg Otto, Executor Director, DevX Platforms & Charles Ju, Principal Engineer Transfor...

3 Jan 202416min

DoK Talks - What is Kafka? The rise of one of the world's most used streaming data technologies // Abbey Russell

DoK Talks - What is Kafka? The rise of one of the world's most used streaming data technologies // Abbey Russell

Abbey Russell, PM at Cockroach Labs, shared the backstory on how and why Kafka was created. Along the way, you'll learn about - Who Franz Kafka was - Kafka's earliest use at Linkedin in 2010 -...

9 Mars 202315min

DoK Talks - (almost)Everything you need to know about stateful cloud native network applications // W Watson

DoK Talks - (almost)Everything you need to know about stateful cloud native network applications // W Watson

https://go.dok.community/slack https://dok.community/ https://youtu.be/KjiK6eXYO34 DoK Talk with W Watson, Founder at Vulk Co-op

2 Mars 202343min

The Outer Nerd #001 - Dungeons & Dragons - Why should you care? // Abhi Vaidyanatha, Fabian Met & Chase Christensen

The Outer Nerd #001 - Dungeons & Dragons - Why should you care? // Abhi Vaidyanatha, Fabian Met & Chase Christensen

https://dokcommunity.slack.com/ https://dok.community/ ABSTRACT OF THE TALK Fabian, Chris and Abhi will discuss their passion for roleplaying games, and what they can teach us about the power of ...

13 Dec 202258min

Populärt inom Teknik

uppgang-och-fall
elbilsveckan
bilar-med-sladd
rss-elektrikerpodden
rss-technokratin
har-vi-akt-till-mars-an
skogsforum-podcast
rss-veckans-ai
market-makers
natets-morka-sida
developers-mer-an-bara-kod
bli-saker-podden
hej-bruksbil
rss-en-ai-till-kaffet
rss-laddstationen-med-elbilen-i-sverige
rss-powerboat-sverige-podcast
gubbar-som-tjotar-om-bilar
rss-sakerhetspodcasten
rss-milpodden
ai-sweden-podcast