Data Skeptic23 Nov 2019

Annotator Bias

The modern deep learning approaches to natural language processing are voracious in their demands for large corpora to train on. Folk wisdom estimates used to be around 100k documents were required for effective training. The availability of broadly trained, general-purpose models like BERT has made it possible to do transfer learning to achieve novel results on much smaller corpora.

Thanks to these advancements, an NLP researcher might get value out of fewer examples since they can use the transfer learning to get a head start and focus on learning the nuances of the language specifically relevant to the task at hand. Thus, small specialized corpora are both useful and practical to create.

In this episode, Kyle speaks with Mor Geva, lead author on the recent paper Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets, which explores some unintended consequences of the typical procedure followed for generating corpora.

Source code for the paper available here: https://github.com/mega002/annotator_bias

Upptäck Premium

Prova 14 dagar kostnadsfritt

Skaffa Premium

Avsnitt(590)

Listener Survey Review

In this episode, Kyle and Linhda review the results of our recent survey. Hear all about the demographic details and how we interpret these results.

11 Aug 202023min

Human Computer Interaction and Online Privacy

Moses Namara from the HATLab joins us to discuss his research into the interaction between privacy and human-computer interaction.

27 Juli 202032min

Authorship Attribution of Lennon McCartney Songs

Mark Glickman joins us to discuss the paper Data in the Life: Authorship Attribution in Lennon-McCartney Songs.

20 Juli 202033min

GANs Can Be Interpretable

Erik Härkönen joins us to discuss the paper GANSpace: Discovering Interpretable GAN Controls. During the interview, Kyle makes reference to this amazing interpretable GAN controls video and it's accompanying codebase found here. Erik mentions the GANspace collab notebook which is a rapid way to try these ideas out for yourself.

11 Juli 202026min

Sentiment Preserving Fake Reviews

David Ifeoluwa Adelani joins us to discuss Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection.

6 Juli 202028min

Interpretability Practitioners

Sungsoo Ray Hong joins us to discuss the paper Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs.

26 Juni 202032min

Facial Recognition Auditing

Deb Raji joins us to discuss her recent publication Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing.

19 Juni 202047min

Robust Fit to Nature

Uri Hasson joins us this week to discuss the paper Robust-fit to Nature: An Evolutionary Perspective on Biological (and Artificial) Neural Networks.

12 Juni 202038min

Allt en och samma app

Lyssna på dina favoritpoddar och ljudböcker på ett och samma ställe.

Noga utvalt innehåll

Njut av handplockade tips som passar din smak – utan ändlöst scrollande.

Fortsätt när du vill

Fortsätt lyssna där du slutade – även offline.

Premium

99 kr/ månad

Tillgång till alla Premium-poddar
Reklamfritt premium-innehåll
Avsluta när du vill

Prova 14 dagar gratis

Premium

129 kr/ månad

Tillgång till alla Premium-poddar
Reklamfritt premium-innehåll
Avsluta när du vill
Ett extra konto

Prova 14 dagar gratis

Populärt inom Vetenskap

Berättelserna och rösterna du älskar att lyssna på

Obegränsad lyssning på alla dina favoritpoddar och ljudböcker

Upptäck Premium