Annotator Bias
Data Skeptic23 Nov 2019

Annotator Bias

The modern deep learning approaches to natural language processing are voracious in their demands for large corpora to train on. Folk wisdom estimates used to be around 100k documents were required for effective training. The availability of broadly trained, general-purpose models like BERT has made it possible to do transfer learning to achieve novel results on much smaller corpora.

Thanks to these advancements, an NLP researcher might get value out of fewer examples since they can use the transfer learning to get a head start and focus on learning the nuances of the language specifically relevant to the task at hand. Thus, small specialized corpora are both useful and practical to create.

In this episode, Kyle speaks with Mor Geva, lead author on the recent paper Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets, which explores some unintended consequences of the typical procedure followed for generating corpora.

Source code for the paper available here: https://github.com/mega002/annotator_bias

Episoder(590)

Defending the p-value

Defending the p-value

Yudi Pawitan joins us to discuss his paper Defending the P-value.

12 Okt 202030min

Retraction Watch

Retraction Watch

Ivan Oransky joins us to discuss his work documenting the scientific peer-review process at retractionwatch.com.

5 Okt 202032min

Crowdsourced Expertise

Crowdsourced Expertise

Derek Lim joins us to discuss the paper Expertise and Dynamics within Crowdsourced Musical Knowledge Curation: A Case Study of the Genius Platform.

21 Sep 202027min

The Spread of Misinformation Online

The Spread of Misinformation Online

Neil Johnson joins us to discuss the paper The online competition between pro- and anti-vaccination views.

14 Sep 202035min

Consensus Voting

Consensus Voting

Mashbat Suzuki joins us to discuss the paper How Many Freemasons Are There? The Consensus Voting Mechanism in Metric Spaces. Check out Mashbat's and many other great talks at the 13th Symposium on Algorithmic Game Theory (SAGT 2020)

7 Sep 202022min

Voting Mechanisms

Voting Mechanisms

Steven Heilman joins us to discuss his paper Designing Stable Elections. For a general interest article, see: https://theconversation.com/the-electoral-college-is-surprisingly-vulnerable-to-popular-vote-changes-141104 Steven Heilman receives funding from the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

31 Aug 202027min

False Consensus

False Consensus

Sami Yousif joins us to discuss the paper The Illusion of Consensus: A Failure to Distinguish Between True and False Consensus. This work empirically explores how individuals evaluate consensus under different experimental conditions reviewing online news articles. More from Sami at samiyousif.org Link to survey mentioned by Daniel Kerrigan: https://forms.gle/TCdGem3WTUYEP31B8

24 Aug 202033min

Fraud Detection in Real Time

Fraud Detection in Real Time

In this solo episode, Kyle overviews the field of fraud detection with eCommerce as a use case.  He discusses some of the techniques and system architectures used by companies to fight fraud with a focus on why these things need to be approached from a real-time perspective.

18 Aug 202038min

Populært innen Vitenskap

fastlegen
fremtid-pa-frys
rekommandert
tingenes-tilstand
rss-rekommandert
jss
sinnsyn
vett-og-vitenskap-med-gaute-einevoll
tomprat-med-gunnar-tjomlid
villmarksliv
forskningno
rss-overskuddsliv
rss-paradigmepodden
nordnorsk-historie
fjellsportpodden
doktor-fives-podcast
dekodet-2
tidlose-historier
rss-nysgjerrige-norge
pod-britannia