Beyond Language: Inside a Hundred-Trillion-Token Video Model
AI + a16z3 Juli 2024

Beyond Language: Inside a Hundred-Trillion-Token Video Model

In this episode of the AI + a16z podcast, Luma Chief Scientist Jiaming Song joins a16z General Partner Anjney MIdha to discuss Jiaming's esteemed career in video models, culminating thus far in Luma's recently released Dream Machine 3D model that shows abilities to reason about the world across a variety of aspects. Jiaming covers the history of image and video models, shares his vision for the future of multimodal models, and explains why he thinks Dream Machine demonstrates its emergent reasoning capabilities. In short: Because it was trained on a volume of high-quality video data that, if measured in relation to language data, would amount to hundreds of trillions of tokens.

Here's a sample of the discussion, where Jiaming explains the "bitter lesson" as applied to training generative models, and in the process sums up a big component of why Dream Machine can do what it does by using context-rich video data:

"For a lot of the problems related to artificial intelligence, it is often more productive in the long run to use methods that are simpler but use more compute, [rather] than trying to develop priors, and then trying to leverage the priors so that you can use less compute.

"Cases in this question first happened in language, where people were initially working on language understanding, trying to use grammar or semantic parsing, these kinds of techniques. But eventually these tasks began to be replaced by large language models. And a similar case is happening in the vision domain, as well . . . and now people have been using deep learning features for almost all the tasks. This is a clear demonstration of how using more compute and having less priors is good.

"But how does it work with language? Language by itself is also a human construct. Of course, it is a very good and highly compressed kind of knowledge, but it's definitely a lot less data than what humans take in day to day from the real world . . .

"[And] it is a vastly smaller data set size than visual signals. And we are already almost exhausting the . . . high-quality language sources that we have in the world. The speed at which humans can produce language is definitely not enough to keep up with the demands of the scaling laws. So even if we have a world where we can scale up the compute infrastructure for that, we don't really have the infrastructure to scale up the data efforts . . .

"Even though people would argue that the emergence of large language models is already evidence of the scaling law . . . against the rule-based methods in language understanding, we are arguing that language by itself is also a prior in the face of more of the richer data signal that is happening in the physical world."

Learn more:

Dream Machine

Jiaming's personal site

Luma careers

The bitter lesson

Follow everyone on X:

Jiaming Song

Anjney Midha

Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Avsnitt(90)

Patrick Collison on Stripe’s Early Choices, Smalltalk, and What Comes After Coding

Patrick Collison on Stripe’s Early Choices, Smalltalk, and What Comes After Coding

Michael Truell, CEO of Cursor, sits down with Patrick Collison, CEO of Stripe and an investor in Anysphere, to talk about Collison's history with Smalltalk and Lisp, the MongoDB and Ruby decisions Str...

24 Mars 52min

OpenClaw: Why the Internet Isn't Built for AI Agents

OpenClaw: Why the Internet Isn't Built for AI Agents

Yoko Li, Guido Appenzeller, and Joel de la Garza discuss OpenClaw, the open source personal AI assistant that's forcing a rethink of how identity, permissions, and security work on the internet. They ...

19 Mars 47min

What's Missing Between LLMs and AGI - Vishal Misra & Martin Casado

What's Missing Between LLMs and AGI - Vishal Misra & Martin Casado

Vishal Misra returns to explain his latest research on how LLMs actually work under the hood. He walks through experiments showing that transformers update their predictions in a precise, mathematical...

17 Mars 47min

Replit's CEO on Vibe Coding, Wealth Building, and What Most People Get Wrong About AI

Replit's CEO on Vibe Coding, Wealth Building, and What Most People Get Wrong About AI

Jack Neel speaks with Amjad Masad, CEO at Replit, about how AI is making it easier than ever to build and ship software without a technical background. They discuss Replit's rise from a browser-based ...

10 Mars 1h 39min

Jack Altman & Martin Casado on the Future of VC

Jack Altman & Martin Casado on the Future of VC

Jack Altman sits down with Martin Casado, General Partner at a16z, to unpack the shifting dynamics of venture capital and why media matters more than ever. They cover a16z’s evolution from generalists...

3 Mars 53min

AI’s Capital Flywheel: Models, Money, and the Future of Power

AI’s Capital Flywheel: Models, Money, and the Future of Power

a16z's Martin Casado and Sarah Wang join Latent Space hosts Alessio Fanelli and Swyx to discuss what makes this AI investment cycle unlike anything in the history of venture capital. They cover why th...

24 Feb 57min

Durable Execution and the Infrastructure Powering AI Agents

Durable Execution and the Infrastructure Powering AI Agents

Raghu Raghuram, Managing Partner at a16z, and Sarah Wang, General Partner at a16z, speak with Samar Abbas, CEO of Temporal, about how durable execution became the infrastructure layer behind some of t...

19 Feb 1h 3min

Evals, Feedback Loops, and the Engineering That Makes AI Work

Evals, Feedback Loops, and the Engineering That Makes AI Work

Martin Casado speaks with Ankur Goyal, founder and CEO of Braintrust, about where engineering actually matters in AI and where it doesn't. They cover the open source vs closed source model cycle, why ...

17 Feb 43min

Populärt inom Business & ekonomi

framgangspodden
varvet
badfluence
rss-jossan-nina
rss-svart-marknad
svd-tech-brief
avanzapodden
uppgang-och-fall
borsmorgon
rss-borsens-finest
rss-dagen-med-di
rss-inga-dumma-fragor-om-pengar
kapitalet-en-podd-om-ekonomi
rss-kort-lang-analyspodden-fran-di
tabberaset
bathina-en-podcast
fill-or-kill
affarsvarlden
rikatillsammans-om-privatekonomi-rikedom-i-livet
market-makers