Data Skeptic15 Elo 2014

Guerilla Skepticism on Wikipedia with Susan Gerbic

Our guest this week is Susan Gerbic. Susan is a skeptical activist involved in many activities, the one we focus on most in this episode is Guerrilla Skepticism on Wikipedia, an organization working to improve the content and citations of Wikipedia.

During the episode, Kyle recommended Susan's talk a The Amazing Meeting 9 which can be found here.

Some noteworthy topics mentioned during the podcast were Neil deGrasse Tyson's endorsement of the Penny for NASA project. As well as the Web of Trust and Rebutr browser plug ins, as well as how following the Skeptic Action project on Twitter provides recommendations of sites to visit and rate as you see fit via these tools.

For her benevolent reference, Susan suggested The Odds Must Be Crazy , a fun website that explores the statistical likelihoods of seemingly unlikely situations. For all else, Susan and her various activities can be found via SusanGerbic.com.

Kokeile Premiumia

Nauti 14 päivää ilmaiseksi

Tilaa Premium

Jaksot(588)

[MINI] AdaBoost

AdaBoost is a canonical example of the class of AnyBoost algorithms that create ensembles of weak learners. We discuss how a complex problem like predicting restaurant failure (which is surely caused by different problems in different situations) might benefit from this technique.

4 Marras 201610min

Stealing Models from the Cloud

Platform as a service is a growing trend in data science where services like fraud analysis and face detection can be provided via APIs. Such services turn the actual model into a black box to the consumer. But can the model be reverse engineered? Florian Tramèr shares his work in this episode showing that it can. The paper Stealing Machine Learning Models via Prediction APIs is definitely worth your time to read if you enjoy this episode. Related source code can be found in https://github.com/ftramer/Steal-ML.

28 Loka 201637min

[MINI] Calculating Feature Importance

For machine learning models created with the random forest algorithm, there is no obvious diagnostic to inform you which features are more important in the output of the model. Some straightforward but useful techniques exist revolving around removing a feature and measuring the decrease in accuracy or Gini values in the leaves. We broadly discuss these techniques in this episode.

21 Loka 201613min

NYC Bike Share Rebalancing

As cities provide bike sharing services, they must also plan for how to redistribute bicycles as they inevitably build up at more popular destination stations. In this episode, Hui Xiong talks about the solution he and his colleagues developed to rebalance bike sharing systems.

14 Loka 201629min

[MINI] Random Forest

Random forest is a popular ensemble learning algorithm which leverages bagging both for sampling and feature selection. In this episode we make an analogy to the process of running a bookstore.

7 Loka 201612min

Election Predictions

Jo Hardin joins us this week to discuss the ASA's Election Prediction Contest. This is a competition aimed at forecasting the results of the upcoming US presidential election competition. More details are available in Jo's blog post found here. You can find some useful R code for getting started automatically gathering data from 538 via Jo's github and official contest details are available here. During the interview we also mention Daily Kos and 538.

30 Syys 201621min

[MINI] F1 Score

The F1 score is a model diagnostic that combines precision and recall to provide a singular evaluation for model comparison. In this episode we discuss how it applies to selecting an interior designer.

23 Syys 20169min

Urban Congestion

Urban congestion effects every person living in a city of any reasonable size. Lewis Lehe joins us in this episode to share his work on downtown congestion pricing. We explore topics of how different pricing mechanisms effect congestion as well as how data visualization can inform choices. You can find examples of Lewis's work at setosa.io. His paper which we discussed during the interview isDistance-dependent congestion pricing for downtown zones. On this episode, we discuss State of California data which can be found at pems.dot.ca.gov.

16 Syys 201635min

Premium

9,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa

Aloita 14 päivän kokeilu

Premium

13,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa
Yksi lisäkäyttäjä

Kokeile 14 päivää maksutta

Guerilla Skepticism on Wikipedia with Susan Gerbic

Kokeile Premiumia

Jaksot(588)

[MINI] AdaBoost

Stealing Models from the Cloud

[MINI] Calculating Feature Importance

NYC Bike Share Rebalancing

[MINI] Random Forest

Election Predictions

[MINI] F1 Score

Urban Congestion

Kaikki yhdessä sovelluksessa

Sinulle valikoitua sisältöä

Jatka kuuntelua koska tahansa

Premium

Premium

Suosittua kategoriassa Tiede

Tarinat ja äänet, joita rakastat kuunnella