CERN’s Transition to Containerization and Kubernetes with Ricardo Rocha

CERN’s Transition to Containerization and Kubernetes with Ricardo Rocha

Some of the highlights of the show include:


  • The challenges that CERN was facing when storing, processing, and analyzing data, and why it pushed them to think about containerization.
  • CERN’s evolution from using mainframes, to physical commodity hardware, to virtualization and private clouds, and eventually to containers. Ricardo also explains how the migration to containerization and Kubernetes was started.
  • Why there was a big push from groups that focus on reproducibility to explore containerization.
  • How end users have responded to Kubernetes and containers. Ricardo talks about the steep Kubernetes learning curve, and how they dealt with frustration and resistance.
  • Some of top benefits of migrating to Kubernetes, and the impact that the move has had on their end users.
  • Current challenges that CERN is working through, regarding hybrid infrastructure and rising data loads. Ricardo also talks about how CERN optimizes system resources for their scientists, and what it’s like operating as a public sector organization.
  • How CERN handles large data transfers.


Links:


Transcript

Emily: Hi everyone. I’m Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product’s value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn’t talk about them. Instead, we talk a lot about technical reasons. I’m hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you’ll join me.



Emily: Welcome to the Business of Cloud Native. I'm your host, Emily Omier, and today I'm here with Ricardo Rocha. Ricardo, thank you so much for joining us.



Ricardo: It's a pleasure.



Emily: Ricardo, can you actually go ahead and introduce yourself: where you work, and what you do?



Ricardo: Yeah, yes, sure. I work at CERN, the European Organization for Nuclear Research. I'm a software engineer and I work in the CERN IT department. I've done quite a few different things in the past in the organization, including software development in the areas of storage and monitoring, and also distributed computing. But right now, I'm part of the CERN Cloud Team, and we manage the CERN private cloud and all the resources we have. And I focus mostly on networking and containerization, so Kubernetes and all these new technologies.



Emily: And on a day to day basis, what do you usually do? What sort of activities are you actually doing?



Ricardo: Yeah. So, it's mostly making sure we provide the infrastructure that our physics users and experiments require, and also the people on campus. So, CERN is a pretty large organization. We have around 10,000 people on-site, and many more around the world that depend on our resources. So, we operate private clouds, we basically do DevOps-style work. And we have a team dedicated for the Cloud, but also for other areas of the data center. And it's mostly making sure everything operates correctly; try to automate more and more, so we do some improvements gradually; and then giving support to our users.



Emily: Just so everyone knows, can you tell a little bit more about what kind of work is done at CERN? What kind of experiments people are running?



Ricardo: Our main goal is fundamental research. So, we try to answer some questions about the universe. So, what's dark matter? What's dark energy? Why don't we see antimatter? And similar questions. And for that, we build very large experiments.



So, the biggest experiment we have, which is actually the biggest scientific experiment ever built, is the Large Hadron Collider, and this is a particle accelerator that accelerates two beams of protons in opposite directions, and we make them collide at very specific points where we build this very large physics experiments that try to understand what happens in these collisions and try to look for new physics. And in reality, what happens with these collisions is that we generate large amounts of data that need to be stored, and processed, and analyzed, so the IT infrastructure that we support, it’s larger fraction dedicated to this physics analysis.



Emily: Tell me a little bit more about some of the challenges related to processing and storing the huge amount of data that you have. And also, how this has evolved, and how it pushed you to think about containerization.



Ricardo: The big challenge we have is the amount of data that we have to support. So, these experiments, each of the experiments, at the moment of the collisions, it can generate data in the order of one petabyte a second. This is, of course, not something we can handle, so the first thing we do, we use these hardware triggers to filter this data quite significantly, but we still generate, per experiment, something like a few gigabytes a second, so up to 10 gigabytes a second. And this we have to store, and then we have large farms that will handle the processing and the reconstruction of all of this. So, we've had these sort of experiments since quite a while, and to analyze all of this, we need a large amount of resources, and with time.



If you come and visit CERN, you can see a bit of the history of computing, kind of evolving with what we used to have in the past in our data center. But it's mostly—we used to have large mainframes, that now it's more in the movies that we see them, but we used to have quite a few of those. And then we transitioned to physical commodity hardware with Linux servers. Eventually introduced virtualization and private clouds to improve the efficiency and the provisioning of these resources to our users, and then eventually, we moved to containers and the main motivation is always to try to be as efficient as possible, and to speed up this process of provisioning resources, and be more flexible in the way we assign compute and also storage.



What we've seen is that in the move from physical to virtualization, we saw that the provisioning and maintenance got significantly improved. What we see with containerization is the extra speed in also deployment and update of the applications that run on those resources. And we also see an improving resource utilization. We already had the possibility to improve quite a bit with virtualization by doing things like overcommit, but with containers, we can go one step further by doing more efficient resource sharing for the different applications we have to run.



Emily: Is the amount of data that you're processing stable? Is it steadily increasing, have spikes, a combination?



Ricardo: So, the way it works is, we have what we call ‘beam’ which is when we actually have protons circulating in the accelerator. And during these periods, we try to get as much collisions as ...

Episoder(269)

Building a Dual Growth Flywheel at GitLab with Nick Veenhof

Building a Dual Growth Flywheel at GitLab with Nick Veenhof

This week on The Business of Open Source, I spoke with Nick Veenhof, Director of Contributor Success at GitLab. GitLab has probably the most well-articulated open source strategy out there, and we tal...

18 Jun 202536min

Solving Universal, Persistant Problems with David Aronchick

Solving Universal, Persistant Problems with David Aronchick

This week on The Business of Open Source, I spoke with David Aronchick, CEO and founder of Expanso, about luck and timing, building into universal truths and the reasons for Kubernetes’ success. Befor...

11 Jun 202545min

David and Goliath in the CMS Market with Thomas Schedler

David and Goliath in the CMS Market with Thomas Schedler

This week on The Business of Open Source, I spoke with Thomas Schedler, co-founder and CEO of Sulu. Sulu is a small, bootstrapped company that spun out of an agency; Thomas was recommended by someone ...

28 Mai 202535min

Open Source Firmware for EV Charging Stations with Marco Möller

Open Source Firmware for EV Charging Stations with Marco Möller

This week on The Business of Open Source I spoke with Marco Möller, CEO and co-founder of Pionix. This was a fabulous conversation about a company that’s in a very different market from the usual open...

14 Mai 202533min

AI-generated Code Copied from Open Source with Julian Coccia

AI-generated Code Copied from Open Source with Julian Coccia

This week on The Business of Open Source, I spoke with Julian Coccia, CTO of ScanOSS, about selling access to data while making open source software. Of course, we also talked about being an open sour...

7 Mai 202535min

How to be Successful when Donating a Project to the CNCF with Liz Rice

How to be Successful when Donating a Project to the CNCF with Liz Rice

Today on The Business of Open Source I spoke with Liz Rice, Chief Open Source Officer at Isovalent, which is now part of Cisco. We addressed two subjects: How to be successful as a company that donate...

30 Apr 202540min

Open Source Manifestos with Vincent Untz

Open Source Manifestos with Vincent Untz

This week on The Business of Open Source I talked about Open Source Manifestos with Vincent Untz, CTO of Centreon. The entire conversation focused on this idea of open source manifestos, which Vincent...

2 Apr 202539min

How a Rebrand Increased Sales with Lukas Gentele

How a Rebrand Increased Sales with Lukas Gentele

This week on The Business of Open Source, I spoke with Lukas Gentele, the CEO and co-founder of LoftLabs. Here’s some of the things we covered: There are many open source projects at LoftLabs. We talk...

26 Mar 202542min

Populært innen Business og økonomi

lydartikler-fra-aftenposten
stopp-verden
dine-penger-pengeradet
rss-penger-polser-og-politikk
e24-podden
rss-borsmorgen-okonominyhetene
pengepodden-2
finansredaksjonen
livet-pa-veien-med-jan-erik-larssen
pengesnakk
utbytte
okonomiamatorene
tid-er-penger-en-podcast-med-peter-warren
morgenkaffen-med-finansavisen
liberal-halvtime
stormkast-med-valebrokk-stordalen
lederpodden
rss-politisk-preik
rss-markedspuls-2
lederskap-nhhs-podkast-om-ledelse