Building the howto100m Video Corpus
Data Skeptic19 Aug 2019

Building the howto100m Video Corpus

Video annotation is an expensive and time-consuming process. As a consequence, the available video datasets are useful but small. The availability of machine transcribed explainer videos offers a unique opportunity to rapidly develop a useful, if dirty, corpus of videos that are "self annotating", as hosts explain the actions they are taking on the screen.

This episode is a discussion of the HowTo100m dataset - a project which has assembled a video corpus of 136M video clips with captions covering 23k activities.

Related Links

The paper will be presented at ICCV 2019

@antoine77340

Antoine on Github

Antoine's homepage

Episoder(590)

Adversarial Explanations

Adversarial Explanations

Walt Woods joins us to discuss his paper Adversarial Explanations for Understanding Image Classification Decisions and Improved Neural Network Robustness with co-authors Jack Chen and Christof Teuscher.

14 Feb 202036min

ObjectNet

ObjectNet

Andrei Barbu joins us to discuss ObjectNet - a new kind of vision dataset. In contrast to ImageNet, ObjectNet seeks to provide images that are more representative of the types of images an autonomous machine is likely to encounter in the real world. Collecting a dataset in this way required careful use of Mechanical Turk to get Turkers to provide a corpus of images that removes some of the bias found in ImageNet. http://0xab.com/

7 Feb 202038min

Visualization and Interpretability

Visualization and Interpretability

Enrico Bertini joins us to discuss how data visualization can be used to help make machine learning more interpretable and explainable. Find out more about Enrico at http://enrico.bertini.io/. More from Enrico with co-host Moritz Stefaner on the Data Stories podcast!

31 Jan 202035min

Interpretable One Shot Learning

Interpretable One Shot Learning

We welcome Su Wang back to Data Skeptic to discuss the paper Distributional modeling on a diet: One-shot word learning from text only.

26 Jan 202030min

Fooling Computer Vision

Fooling Computer Vision

Wiebe van Ranst joins us to talk about a project in which specially designed printed images can fool a computer vision system, preventing it from identifying a person. Their attack targets the popular YOLO2 pre-trained image recognition model, and thus, is likely to be widely applicable.

22 Jan 202025min

Algorithmic Fairness

Algorithmic Fairness

This episode includes an interview with Aaron Roth author of The Ethical Algorithm.

14 Jan 202042min

Interpretability

Interpretability

Interpretability Machine learning has shown a rapid expansion into every sector and industry. With increasing reliance on models and increasing stakes for the decisions of models, questions of how models actually work are becoming increasingly important to ask. Welcome to Data Skeptic Interpretability. In this episode, Kyle interviews Christoph Molnar about his book Interpretable Machine Learning. Thanks to our sponsor, the Gartner Data & Analytics Summit going on in Grapevine, TX on March 23 – 26, 2020. Use discount code: dataskeptic. Music Our new theme song is #5 by Big D and the Kids Table. Incidental music by Tanuki Suit Riot.

7 Jan 202032min

NLP in 2019

NLP in 2019

A year in recap.

31 Des 201938min

Populært innen Vitenskap

fastlegen
fremtid-pa-frys
rekommandert
tingenes-tilstand
rss-rekommandert
jss
sinnsyn
vett-og-vitenskap-med-gaute-einevoll
tomprat-med-gunnar-tjomlid
villmarksliv
forskningno
rss-overskuddsliv
rss-paradigmepodden
nordnorsk-historie
fjellsportpodden
doktor-fives-podcast
dekodet-2
tidlose-historier
rss-nysgjerrige-norge
pod-britannia