Annotator Bias
Data Skeptic23 Nov 2019

Annotator Bias

The modern deep learning approaches to natural language processing are voracious in their demands for large corpora to train on. Folk wisdom estimates used to be around 100k documents were required for effective training. The availability of broadly trained, general-purpose models like BERT has made it possible to do transfer learning to achieve novel results on much smaller corpora.

Thanks to these advancements, an NLP researcher might get value out of fewer examples since they can use the transfer learning to get a head start and focus on learning the nuances of the language specifically relevant to the task at hand. Thus, small specialized corpora are both useful and practical to create.

In this episode, Kyle speaks with Mor Geva, lead author on the recent paper Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets, which explores some unintended consequences of the typical procedure followed for generating corpora.

Source code for the paper available here: https://github.com/mega002/annotator_bias

Avsnitt(590)

Predicting Stock Prices

Predicting Stock Prices

Today on the show we have Andrea Fronzetti Colladon (@iandreafc), currently working at the University of Perugia and inventor of the Semantic Brand Score, joins us to talk about his work studying human communication and social interaction. We discuss the paper Look inside. Predicting Stock Prices by Analyzing an Enterprise Intranet Social Network and Using Word Co-Occurrence Networks.

19 Juli 202134min

N-Beats

N-Beats

Today on the show we have Boris Oreshkin @boreshkin, a Senior Research Scientist at Unity Technologies, who joins us today to talk about his work N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting. Works Mentioned: N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting By Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio https://arxiv.org/abs/1905.10437 Social Media Linkedin Twitter

12 Juli 202134min

Translation Automation

Translation Automation

Today we are back with another episode discussing AI in the work field. AI has, is, and will continue to facilitate the automation of work done by humans. Sometimes this may be an entire role. Other times it may automate a particular part of their role, scaling their effectiveness. Carl Stimson, a Freelance Japanese to English translator, comes on the show to talk about his work in translation and his perspective about how AI will change translation in the future.

6 Juli 202136min

Time Series at the Beach

Time Series at the Beach

Shane Ross, Professor of Aerospace and Ocean Engineering at Virginia Tech University, comes on today to talk about his work "Beach-level 24-hour forecasts of Florida red tide-induced respiratory irritation."

28 Juni 202123min

Automatic Identification of Outlier Galaxy Images

Automatic Identification of Outlier Galaxy Images

Lior Shamir, Associate Professor of Computer Science at Kansas University, joins us today to talk about the recent paper Automatic Identification of Outliers in Hubble Space Telescope Galaxy Images. Follow Lio on Twitter @shamir_lior

21 Juni 202136min

Do We Need Deep Learning in Time Series

Do We Need Deep Learning in Time Series

Shereen Elsayed and Daniela Thyssens, both are PhD Student at Hildesheim University in Germany, come on today to talk about the work "Do We Really Need Deep Learning Models for Time Series Forecasting?"

16 Juni 202129min

Detecting Drift

Detecting Drift

Sam Ackerman, Research Data Scientist at IBM Research Labs in Haifa, Israel, joins us today to talk about his work Detection of Data Drift and Outliers Affecting Machine Learning Model Performance Over Time. Check out Sam's IBM statistics/ML blog at: http://www.research.ibm.com/haifa/dept/vst/ML-QA.shtml

11 Juni 202127min

Darts Library for Time Series

Darts Library for Time Series

Julien Herzen, PhD graduate from EPFL in Switzerland, comes on today to talk about his work with Unit 8 and the development of the Python Library: Darts.

31 Maj 202125min

Populärt inom Vetenskap

dumma-manniskor
p3-dystopia
doden-hjarnan-kemisten
allt-du-velat-veta
svd-nyhetsartiklar
kapitalet-en-podd-om-ekonomi
rss-ufobortom-rimligt-tvivel
sexet
dumforklarat
det-morka-psyket
rss-vetenskapsradion-2
rss-i-hjarnan-pa-louise-epstein
bildningspodden
rss-vetenskapspodden
rss-vetenskapsradion
paranormalt-med-caroline-giertz
rss-spraket
rss-personlighetspodden
medicinvetarna
rss-arkeologi-historia