[MINI] Leakage
Data Skeptic1 Jul 2016

[MINI] Leakage

If you'd like to make a good prediction, your best bet is to invent a time machine, visit the future, observe the value, and return to the past. For those without access to time travel technology, we need to avoid including information about the future in our training data when building machine learning models. Similarly, if any other feature whose value would not actually be available in practice at the time you'd want to use the model to make a prediction, is a feature that can introduce leakage to your model.

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(601)

Data Provenance and Reproducibility with Pachyderm

Data Provenance and Reproducibility with Pachyderm

Versioning isn't just for source code. Being able to track changes to data is critical for answering questions about data provenance, quality, and reproducibility. Daniel Whitenack joins me this week ...

3 Feb 201740min

[MINI] Logistic Regression on Audio Data

[MINI] Logistic Regression on Audio Data

Logistic Regression is a popular classification algorithm. In this episode, we discuss how it can be used to determine if an audio clip represents one of two given speakers. It assumes an output varia...

27 Jan 201720min

Studying Competition and Gender Through Chess

Studying Competition and Gender Through Chess

Prior work has shown that people's response to competition is in part predicted by their gender. Understanding why and when this occurs is important in areas such as labor market outcomes. A well stru...

20 Jan 201734min

[MINI] Dropout

[MINI] Dropout

Deep learning can be prone to overfit a given problem. This is especially frustrating given how much time and computational resources are often required to converge. One technique for fighting overfit...

13 Jan 201715min

The Police Data and the Data Driven Justice Initiatives

The Police Data and the Data Driven Justice Initiatives

In this episode I speak with Clarence Wardell and Kelly Jin about their mutual service as part of the White House's Police Data Initiative and Data Driven Justice Initiative respectively. The Police D...

6 Jan 201749min

The Library Problem

The Library Problem

We close out 2016 with a discussion of a basic interview question which might get asked when applying for a data science job. Specifically, how a library might build a model to predict if a book will ...

30 Des 201635min

2016 Holiday Special

2016 Holiday Special

Today's episode is a reading of Isaac Asimov's Franchise.  As mentioned on the show, this is just a work of fiction to be enjoyed and not in any way some obfuscated political statement.  Enjoy, and h...

23 Des 201639min

[MINI] Entropy

[MINI] Entropy

Classically, entropy is a measure of disorder in a system. From a statistical perspective, it is more useful to say it's a measure of the unpredictability of the system. In this episode we discuss how...

16 Des 201616min

Populært innen Vitenskap

fastlegen
tingenes-tilstand
rss-nysgjerrige-norge
forskningno
sinnsyn
rekommandert
liberal-halvtime
villmarksliv
rss-zahid-ali-hjelper-deg
rss-paradigmepodden
jss
tomprat-med-gunnar-tjomlid
fjellsportpodden
kvinnehelsepodden
nordnorsk-historie
tidlose-historier
rss-inn-til-kjernen-med-sunniva-rose
rss-overskuddsliv
nevropodden
rss-rekommandert