The Network Behind the Network

The Network Behind the Network

Bryan and Adam are joined by Oxide colleagues Arjen, Matt, John, and Nathaneal to talk about the management network--the brainstem of the Oxide Rack. Just as it ties together so many components, this episode ties together many many (many!) topics we've discussed in other episodes.

We've been hosting a live show weekly on Mondays at 5p for about an hour, and recording them all; here is the recording from May 8th 2023.

In addition to Bryan Cantrill and Adam Leventhal, we were joined by Oxide colleagues Arjen Roodselaar, Matt Keeter, John Gallagher, and Nathanael Huffman.

This built on work described in many previous episodes:

  • Cabling the Backplane Prior to going all-in on a cabled backplane with blind-mated server sleds (i.e. no plugging, unplugging, mis-plugging network cables). We (Bryan) espoused an "NC-SI or bust" mantra... at least in part to avoid doubling the cable count. With the cabled backplane, the reasons for NC-SI disappeared (which let the many reasons against truly shine).
  • The Pragmatism of Hubris in which we talk about our embedded operating system, Hubris (and it's companion debugger, Humility). Hubris runs on the service processors that are the main endpoints on the management network. Matt's work controlling the management network switch (the VSC7448) is in the context of Hubris, as is John's work communicating with the sleds over the management network.
  • The Power of Proto Boards showed and told about the many small boards we've used in development. Several of those were purpose built for controlling and simulating parts of the management network.
  • The Oxide Supply Chain Kate Hicks joined us to talk about the challenges of navigating the supply chain. Mentioned here in the context of "supply-chain-driven design": we designed around the parts we could procure! Tip: stay away from "automotive-quality" parts when the auto industry is soaking them all up.
  • Holistic Boot in which we talked about how (uniquely!) Oxide boots from nothing to its operating system and services. Over the management network, we can drive server recovery by piping in a RAMdisk over the network and then (slowly) through the UART to the CPU.
  • Get You a State Machine for Great Good Andrew joined us to talk about his work on a state-machine driven text-UI and its companion replay debugger. We mentioned this in the context of John replaying the long upload process in seconds rather than hours to fix a UI bug.

Major components of the management network

Matt's VSC7448 dev kit

Matt's remote tuning setup via webcam

Management network debugging
Management network debugging

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(180)

This Old Repo: LLMs and the Restoration of BattleTris

This Old Repo: LLMs and the Restoration of BattleTris

Bryan and Adam discuss the process of restoring a software project--BattleTris--untouched and unbuilt in over 20 years! How did LLMs help restore code Bryan started in the mid-1990s and what does that...

9 Juni 1h 18min

Rooting for the Home Team with Paul Freedman and Bryan Carmel

Rooting for the Home Team with Paul Freedman and Bryan Carmel

Two years ago we introduced listeners to the Oakland Ballers, the startup returning baseball to the city of Oakland. Bryan and Adam were joined again by Paul Freedman and Bryan Carmel to discuss the B...

27 Maj 1h 2min

The Tale of Reverso

The Tale of Reverso

Oxide ships a rack scale system--how to test the manufacturing of the backplane and switches? Previously we've been using a collection of sacrificial servers, but this was unwieldy, expensive, and uns...

16 Maj 1h 6min

AI in Computer Science Education

AI in Computer Science Education

AI is an existential topic for all aspects of education--for none more so than Computer Science. Bryan and Adam were joined by Kathi Fisler and Shriram Krishnamurthi, professors of Computer Science at...

10 Maj 1h 29min

Mechanical Engineering at Oxide [chapter images]

Mechanical Engineering at Oxide [chapter images]

Bryan and Adam were joined by members of the Oxide mechanical engineering team to talk the mechanical challenges of building a rack-scale computer, and--in particular--of scaling manufacturing from ju...

7 Maj 1h 24min

Are LLMs Insufficently Lazy?

Are LLMs Insufficently Lazy?

Brogrammer Garry Tan has been boasting about "writing" tens of thousands of lines of code each day as the paragon of productivity. Is this really the right way to think about building systems? Bryan a...

3 Maj 1h 31min

Building a Quorum of Trust in the Oxide Rack

Building a Quorum of Trust in the Oxide Rack

The Oxide rack contains within it a distributed system that needs to trust itself. But how is this trust built? Bryan and Adam were joined by colleagues Andrew and Finch to explore how Trust Quorum wa...

4 Apr 1h 26min

When Nine Nines Isn't Enough

When Nine Nines Isn't Enough

Bryan and Adam were joined by members of the Oxide team to describe the multi-year search for a mysterious source of hardware failures. All related to an ultra-reliable--and yet still not reliable eno...

18 Mars 1h 24min

Populärt inom Teknik

uppgang-och-fall
elbilsveckan
market-makers
rss-laddstationen-med-elbilen-i-sverige
bilar-med-sladd
rss-elektrikerpodden
rss-uppgang-och-fall
rss-technokratin
ai-sweden-podcast
skogsforum-podcast
gubbar-som-tjotar-om-bilar
rss-it-sakerhetspodden
rss-en-ai-till-kaffet
rss-snacka-om-ai
natets-morka-sida
developers-mer-an-bara-kod
under-femton
bli-saker-podden
hej-bruksbil
dom-kallar-oss-krypto