Building the howto100m Video Corpus
Data Skeptic19 Elo 2019

Building the howto100m Video Corpus

Video annotation is an expensive and time-consuming process. As a consequence, the available video datasets are useful but small. The availability of machine transcribed explainer videos offers a unique opportunity to rapidly develop a useful, if dirty, corpus of videos that are "self annotating", as hosts explain the actions they are taking on the screen.

This episode is a discussion of the HowTo100m dataset - a project which has assembled a video corpus of 136M video clips with captions covering 23k activities.

Related Links

The paper will be presented at ICCV 2019

@antoine77340

Antoine on Github

Antoine's homepage

Jaksot(590)

The Death of a Language

The Death of a Language

USC students from the CAIS++ student organization have created a variety of novel projects under the mission statement of "artificial intelligence for social good". In this episode, Kyle interviews Zane and Leena about the Endangered Languages Project.

1 Kesä 201920min

Neural Turing Machines

Neural Turing Machines

Kyle and Linh Da discuss the concepts behind the neural Turing machine.

25 Touko 201925min

Data Infrastructure in the Cloud

Data Infrastructure in the Cloud

Kyle chats with Rohan Kumar about hyperscale, data at the edge, and a variety of other trends in data engineering in the cloud.

18 Touko 201930min

NCAA Predictions on Spark

NCAA Predictions on Spark

In this episode, Kyle interviews Laura Edell at MS Build 2019.  The conversation covers a number of topics, notably her NCAA Final 4 prediction model.

11 Touko 201923min

The Transformer

The Transformer

Kyle and Linhda discuss attention and the transformer - an encoder/decoder architecture that extends the basic ideas of vector embeddings like word2vec into a more contextual use case.

3 Touko 201915min

Mapping Dialects with Twitter Data

Mapping Dialects with Twitter Data

When users on Twitter post with geographic tags, it creates the opportunity for a variety of interesting questions to be posed having to do with language, dialects, and location.  In this episode, Kyle interviews Bruno Gonçalves about his work studying language in this way.

26 Huhti 201925min

Sentiment Analysis

Sentiment Analysis

This is an interview with Ellen Loeshelle, Director of Product Management at Clarabridge.  We primarily discuss sentiment analysis.

20 Huhti 201927min

Attention Primer

Attention Primer

A gentle introduction to the very high-level idea of "attention" in machine learning, as it will play a major role in some upcoming episodes over the next few weeks.

13 Huhti 201914min

Suosittua kategoriassa Tiede

rss-mita-tulisi-tietaa
utelias-mieli
tiedekulma-podcast
hippokrateen-vastaanotolla
rss-lihavuudesta-podcast
rss-poliisin-mieli
rss-totta-vai-tuubaa
radio-antro
menologeja-tutkimusmatka-vaihdevuosiin
rss-duodecim-lehti
rss-metsanomistaja-podcast
docemilia
rss-astetta-parempi-elama-podcast
rss-radplus
rss-ilmasto-kriisissa