Building the howto100m Video Corpus
Data Skeptic19 Aug 2019

Building the howto100m Video Corpus

Video annotation is an expensive and time-consuming process. As a consequence, the available video datasets are useful but small. The availability of machine transcribed explainer videos offers a unique opportunity to rapidly develop a useful, if dirty, corpus of videos that are "self annotating", as hosts explain the actions they are taking on the screen.

This episode is a discussion of the HowTo100m dataset - a project which has assembled a video corpus of 136M video clips with captions covering 23k activities.

Related Links

The paper will be presented at ICCV 2019

@antoine77340

Antoine on Github

Antoine's homepage

Episoder(590)

The Death of a Language

The Death of a Language

USC students from the CAIS++ student organization have created a variety of novel projects under the mission statement of "artificial intelligence for social good". In this episode, Kyle interviews Zane and Leena about the Endangered Languages Project.

1 Jun 201920min

Neural Turing Machines

Neural Turing Machines

Kyle and Linh Da discuss the concepts behind the neural Turing machine.

25 Mai 201925min

Data Infrastructure in the Cloud

Data Infrastructure in the Cloud

Kyle chats with Rohan Kumar about hyperscale, data at the edge, and a variety of other trends in data engineering in the cloud.

18 Mai 201930min

NCAA Predictions on Spark

NCAA Predictions on Spark

In this episode, Kyle interviews Laura Edell at MS Build 2019.  The conversation covers a number of topics, notably her NCAA Final 4 prediction model.

11 Mai 201923min

The Transformer

The Transformer

Kyle and Linhda discuss attention and the transformer - an encoder/decoder architecture that extends the basic ideas of vector embeddings like word2vec into a more contextual use case.

3 Mai 201915min

Mapping Dialects with Twitter Data

Mapping Dialects with Twitter Data

When users on Twitter post with geographic tags, it creates the opportunity for a variety of interesting questions to be posed having to do with language, dialects, and location.  In this episode, Kyle interviews Bruno Gonçalves about his work studying language in this way.

26 Apr 201925min

Sentiment Analysis

Sentiment Analysis

This is an interview with Ellen Loeshelle, Director of Product Management at Clarabridge.  We primarily discuss sentiment analysis.

20 Apr 201927min

Attention Primer

Attention Primer

A gentle introduction to the very high-level idea of "attention" in machine learning, as it will play a major role in some upcoming episodes over the next few weeks.

13 Apr 201914min

Populært innen Vitenskap

fastlegen
fremtid-pa-frys
rekommandert
tingenes-tilstand
rss-rekommandert
jss
sinnsyn
vett-og-vitenskap-med-gaute-einevoll
tomprat-med-gunnar-tjomlid
villmarksliv
forskningno
rss-overskuddsliv
rss-paradigmepodden
nordnorsk-historie
fjellsportpodden
doktor-fives-podcast
dekodet-2
tidlose-historier
rss-nysgjerrige-norge
pod-britannia