#31 DoK Community: The Data Lifecycle - Where Do We Go From Here // Benjamin Rogojan. (Presenter: Bart Farrell)

#31 DoK Community: The Data Lifecycle - Where Do We Go From Here // Benjamin Rogojan. (Presenter: Bart Farrell)

Abstract of the talk…

Going from raw data to machine learning models successfully in companies of all sizes requires more than just an understanding of programming. Teams need to manage their data products lifecycle, their software as well as the data. Data products like machine learning models aren’t created out of thin air. They are built on layers of best practices that ensure the models are using accurate data, they are outputting reliable numbers and they have some method to interact with the outside world. So how do we get there? The purpose of this talk is to discuss the current state of the data lifecycle as it pertains to creating data products. This could be machine learning models, dashboards and data APIs. We will outline the general architecture that helps take data from raw to some form of machine learning model. In addition, we will discuss some of the concepts that are being applied from DevOps as well as being created in MLOps to help better facilitate your data life cycle.

Bio…

Ben has spent his career focused on all forms of data. He has focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. He has also worked in various industries including transportation, Big Tech, start-ups, insurance, Saas and more. In all of these industries he has helped companies develop their data strategy. Often starting from scratch to develop an end-to-end data solution. Ben privately consults on data science and engineering problems both solo with Seattle Data Guy as well as with a company called Acheron Analytics. He has experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.

Key take-aways from the talk…

- Creating successful data products and models requires more than just programming skills - Best practices from DevOps can help improve data science and ML models maintenance and lifecycle

Jaksot(243)

The Challenges of Data Processing On Kubernetes - A look at Spark, Flink, Dask, and Ray // Holden Karau (DoK Day North America 2022)

The Challenges of Data Processing On Kubernetes - A look at Spark, Flink, Dask, and Ray // Holden Karau (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT This talk will go through both the improvements that have been made in Kubernetes for batch analytic workloads as well as ...

31 Loka 202220min

Scaling our SaaS offering to thousands of clusters // Dax McDonald (DoK Day North America 2022)

Scaling our SaaS offering to thousands of clusters // Dax McDonald (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT Sourcegraph is a code intelligence platform that helps our customers to understand their code better. As we have scaled up...

29 Loka 202221min

Why we decided to migrate our Jaeger storage to ClickHouse on Kubernetes // Arul Jegadish Francis (DoK Day North America 2022)

Why we decided to migrate our Jaeger storage to ClickHouse on Kubernetes // Arul Jegadish Francis (DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) Abstract We at OpsVerse provide a DevOps tools platform with fully-managed open source-based tools. One of our key offerings is a ...

28 Loka 202213min

Building a Digital Factory for the Sheet Metal Industry // Elie Assi (From the DoK Day North America 2022)

Building a Digital Factory for the Sheet Metal Industry // Elie Assi (From the DoK Day North America 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) Abstract We develop systems to digitize the sheet metal industry with the belief that they should cooperate with each other in an ...

27 Loka 202220min

How we built our Big Data Stack (almost) entirely on top of Kubernetes // Neylson Crepalde (From DoK Day NA 2022)

How we built our Big Data Stack (almost) entirely on top of Kubernetes // Neylson Crepalde (From DoK Day NA 2022)

From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) Abstract Working with Terabytes of data is a major challenge for organizations both in terms of architecture and cost. In recent ...

26 Loka 202216min

Dok Talks #153 - CRD Panel // Eyar Zilberman & Álvaro Hernández

Dok Talks #153 - CRD Panel // Eyar Zilberman & Álvaro Hernández

https://go.dok.community/slack https://dok.community We are going to speak about CRDs, and discuss considering them as higher level entities that we normally consider them. CRDs normally are kind of a...

14 Loka 202258min

Dok #152-Running PostgreSQL in Kubernetes:from day 0 to day 2 with CloudNativePG // Gabriele Bartolini

Dok #152-Running PostgreSQL in Kubernetes:from day 0 to day 2 with CloudNativePG // Gabriele Bartolini

https://go.dok.community/slack https://dok.community With: Gabriele Bartolini - Vice President/CTO of Cloud Native and Kubernetes, EDB Bart Farrell - Head of Community, Data on Kubernetes Community...

28 Syys 20221h 3min

Dok Talks #148 - Cost and Kubernetes // Chris Love

Dok Talks #148 - Cost and Kubernetes // Chris Love

https://go.dok.community/slack https://dok.community With: Chris Love - Managing Partner, LionKube Bart Farrell - Head of Community, Data on Kubernetes Community ABSTRACT OF THE TALK Using Kuberne...

27 Syys 202245min