DoK Talks #141 - Dossier: multi-tenant distributed Jupyter Notebooks // Iacoppo Colonnelli & Dario Tranchitella

DoK Talks #141 - Dossier: multi-tenant distributed Jupyter Notebooks // Iacoppo Colonnelli & Dario Tranchitella

https://go.dok.community/slack
https://dok.community

ABSTRACT OF THE TALK

When providing data analysis as a service, one must tackle several problems. Data privacy and protection by design are crucial when working on sensitive data. Performance and scalability are fundamental for compute-intensive workloads, e.g. training Deep Neural Networks. User-friendly interfaces and fast prototyping tools are essential to allow domain experts to experiment with new techniques. Portability and reproducibility are necessary to assess the actual value of results.

Kubernetes is the best platform to provide reliable, elastic, and maintainable services. However, Kubernetes alone is not enough to achieve large-scale multi-tenant reproducible data analysis. OOTB support for multi-tenancy is too rough, with only two levels of segregation (i.e. the single namespace or the entire cluster). Offloading computation to off-cluster resources is non-trivial and requires the user's manual configuration. Also, Jupyter Notebooks per se cannot provide much scalability (they execute locally and sequentially) and reproducibility (users can run cells in any order and any number of times).

The Dossier platform allows system administrators to manage multi-tenant distributed Jupyter Notebooks at the cluster level in the Kubernetes way, i.e. through CRDs. Namespaces are aggregated in Tenants, and all security and accountability aspects are managed at that level. Each Notebook spawns into a user-dedicated namespace, subject to all Tenant-level constraints. Users can rely on provisioned resources, either in-cluster worker nodes or external resources like HPC facilities. Plus, they can plug their computing nodes in a BYOD fashion. Notebooks are interpreted as distributed workflows, where each cell is a task that one can offload to a different location in charge of its execution.

BIO

Iacopo Colonnelli is a Computer Science research fellow. He received his Ph.D. with honours in Modeling and Data Science at Università di Torino with a thesis on novel workflow models for heterogeneous distributed systems, and his master’s degree in Computer Engineering from Politecnico di Torino with a thesis on a high-performance parallel tracking algorithm for the ALICE experiment at CERN. His research focuses on both statistical and computational aspects of data analysis at large scale and on workflow modeling and management in heterogeneous distributed architectures.

Dario is an SWE that turned DevOps, and he's regretting this choice day by day. Besides making memes on Twitter that gain more reactions than technical discussions, leading the development of Open Source projects at CLASTIX, an Open Source-based start-up focusing on Multi-Tenancy in Kubernetes.

KEY TAKE-AWAYS FROM THE TALK

From this talk, people will learn:
- The different requirements of Data analysis as a service
- How to configure for multi-tenancy at the cluster level with Capsule
- How to write distributed workflows as Notebooks with Jupyter Workflows
- How to combine all these aspects into a single platform: Dossier

All the software presented in the talk is OpenSource, so attendees can directly play with them and include them in their experiments with no additional restrictions.



Episoder(243)

Dok Talks #151 - Analytics with Apache Superset and ClickHouse // Vijay Anand Ramakrishnan

Dok Talks #151 - Analytics with Apache Superset and ClickHouse // Vijay Anand Ramakrishnan

https://go.dok.community/slack https://dok.community With: Vijay Anand Ramakrishnan - Database Administrator, ChistaDATA Bart Farrell - Head of Community, Data on Kubernetes Community ABSTRACT OF TH...

23 Sep 202233min

Dok Talks #150 - Building a Simple Postgres Async Streaming Cluster // Julian Fischer

Dok Talks #150 - Building a Simple Postgres Async Streaming Cluster // Julian Fischer

https://go.dok.community/slack https://dok.community With: Julian Fischer - CEO, anynines GmbH Bart Farrell - Head of Community, Data on Kubernetes Community ABSTRACT OF THE TALK In this talk you wi...

23 Sep 20221h 4min

DoK Talks #149 - Overcoming challenges with protecting and migrating data in multi-cloud K8s environments // Sebastian Glab & Martin Phan

DoK Talks #149 - Overcoming challenges with protecting and migrating data in multi-cloud K8s environments // Sebastian Glab & Martin Phan

https://go.dok.community/slack https://dok.community/ With: Sebastian Glab - Cloud Architect, CloudCasa by Catalogic Martin Phan - Field CTO – Americas, CloudCasa by Catalogic Bart Farrell - Head...

16 Sep 202247min

DoK Talks #147 - Evaluating Cloud Native Storage Vendors // Dinesh Majrekar

DoK Talks #147 - Evaluating Cloud Native Storage Vendors // Dinesh Majrekar

https://go.dok.community/slack https://dok.community/ With: Dinesh Majrekar - CTO, Civo Bart Farrell - Head of Community, Data on Kubernetes Community ABSTRACT OF THE TALK In a continuation of ...

5 Sep 20221h

Dok Talks #146 - OpenFeature - Making feature flags a commodity // Oleg Nenashev

Dok Talks #146 - OpenFeature - Making feature flags a commodity // Oleg Nenashev

https://go.dok.community/slack https://dok.community/ With: Oleg Nenashev - Community Builder and Developer Advocate, Dynatrace Bart Farrell - Head of Community, Data on Kubernetes Community AB...

26 Aug 20221h 1min

DoK Talks #145 - Making Hard Things Easy is Hard // Kurt Rinehart

DoK Talks #145 - Making Hard Things Easy is Hard // Kurt Rinehart

https://go.dok.community/slack https://dok.community/ https://youtu.be/6eSWOUzCb4w With: Kurt Rinehart - Director of Information Engineering, Section Bart Farrell - Head of Community, Data on Kube...

19 Aug 202257min

DoK Talks #144 - We will Dok You! - The journey to adopt stateful workloads on k8s // Guy Menahem

DoK Talks #144 - We will Dok You! - The journey to adopt stateful workloads on k8s // Guy Menahem

https://go.dok.community/slack https://dok.community/ https://youtu.be/AjvwG53yLMY With: Guy Menahem - Solution Architect, Komodor Bart Farrell - Head of Community, Data on Kubernetes Community A...

18 Aug 20221h 6min

DoK Talks #142 - Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your Stateful Workload // Peter Schuurman

DoK Talks #142 - Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your Stateful Workload // Peter Schuurman

https://go.dok.community/slack https://dok.community/ ABSTRACT OF THE TALK How do you make sure your Stateful Workloads remain available when your Kubernetes infrastructure updates? This talk wil...

18 Aug 202258min

Populært innen Teknologi

lydartikler-fra-aftenposten
romkapsel
tomprat-med-gunnar-tjomlid
nasjonal-sikkerhetsmyndighet-nsm
teknisk-sett
energi-og-klima
teknologi-og-mennesker
shifter
smart-forklart
elektropodden
rss-impressions-2
fornybaren
rss-ai-forklart
pedagogisk-intelligens
rss-alt-vi-kan
rss-polypod
rss-snakk-om-sikkerhet
rss-ki-praten
rss-alt-som-gar-pa-strom
rss-heis