Annotator Bias
Data Skeptic23 Nov 2019

Annotator Bias

The modern deep learning approaches to natural language processing are voracious in their demands for large corpora to train on. Folk wisdom estimates used to be around 100k documents were required for effective training. The availability of broadly trained, general-purpose models like BERT has made it possible to do transfer learning to achieve novel results on much smaller corpora.

Thanks to these advancements, an NLP researcher might get value out of fewer examples since they can use the transfer learning to get a head start and focus on learning the nuances of the language specifically relevant to the task at hand. Thus, small specialized corpora are both useful and practical to create.

In this episode, Kyle speaks with Mor Geva, lead author on the recent paper Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets, which explores some unintended consequences of the typical procedure followed for generating corpora.

Source code for the paper available here: https://github.com/mega002/annotator_bias

Episoder(590)

Monetization of Youtube Conspiracy Theorists

Monetization of Youtube Conspiracy Theorists

Cameron Ballard joins us today to discuss his work around YouTube conspiracy theories. He revealed interesting observations about conspiracy theories on YouTube including how predatory ads are most common in conspiracy theory videos and how YouTube's algorithm subtly works for predatory ads.

1 Aug 202254min

User Perceptions of Problematic Ads

User Perceptions of Problematic Ads

Eric Zeng joins us to discuss his study around understanding bad ads and efforts that can be taken to limit bad ads online. He discussed how he and his co authors scrapped a large amount of ad data, applied a machine learning algorithm, and commensurate statistical results.

25 Jul 202237min

Political Digital Advertising Analysis

Political Digital Advertising Analysis

NaLette Brodnax, a political scientist and an Assistant Professor in the McCourt School of Public Policy at Georgetown University joins us to discuss her work on analyzing digital advertisements for political campaigns. She used data for electoral campaigns on Facebook to answer questions that help us better understand how digital ads affect the outcome of elections. Click here for additional show notes! Thanks to our sponsor! https://neptune.ai/ Log, store, query, display, organize and compare all your model metadata in a single place

21 Jul 202235min

Fraud Detection in Crowdfunding Campaigns

Fraud Detection in Crowdfunding Campaigns

18 Jul 202235min

Artificial Intelligence and Auction Design

Artificial Intelligence and Auction Design

11 Jul 202243min

Privacy Preference Signals

Privacy Preference Signals

Have you ever wondered what goes on under the hood when you accept a website's cookies? Today, Maximilian Hils, a PhD student in Computer Science, at the University of Innsbruck, Austria, dissects the ad tech industry and the standards put in place to protect users' data. He also shares his thoughts on the use of VPNs as well as other tools that help shield your data from prying eyes on the internet. Click here for additional show notes Thanks to our sponsor: https://clear.ml/ ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML workflows at scale.

4 Jul 202233min

Neural Architecture Search for CTR Prediction

Neural Architecture Search for CTR Prediction

Ravi Krishna joins us today to talk about his recent work on a differentiable NAS framework for ads CTR prediction. He discussed what CTR prediction is about and why his NAS framework helps in building neural networks for better ads recommendation. Listen to learn about methodology, related literature and his results. Click for additional show notes Thanks to our sponsor: https://astrato.io Astrato is a modern BI and analytics platform built for the Snowflake Data Cloud. A next-generation live query data visualization and analytics solution, empowering everyone to make live data decisions.

27 Jun 202228min

Algorithmic PPC Management

Algorithmic PPC Management

Effectively managing a large budget of pay per click advertising demands software solutions. When spending multi-million dollar budgets on hundreds of thousands of keywords, an effective algorithmic strategy is required to optimize marketing objectives. In this episode, Nathan Janos joins us to share insights from his work in the ad tech industry. Click for additional show notes Thanks to our sponsor! https://wandb.com/ The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management.

21 Jun 202243min

Populært innen Vitenskap

fastlegen
rekommandert
fremtid-pa-frys
jss
tingenes-tilstand
rss-rekommandert
tomprat-med-gunnar-tjomlid
vett-og-vitenskap-med-gaute-einevoll
villmarksliv
rss-paradigmepodden
sinnsyn
dekodet-2
rss-nysgjerrige-norge
forskningno
doktor-fives-podcast
nordnorsk-historie
fjellsportpodden
rss-overskuddsliv
tidlose-historier
nevropodden