
Unstructured Data for Finance
Financial analysis techniques for studying numeric, well structured data are very mature. While using unstructured data in finance is not necessarily a new idea, the area is still very greenfield. On this episode,Delia Rusu shares her thoughts on the potential of unstructured data and discusses her work analyzing Wikipedia to help inform financial decisions. Delia's talk at PyData Berlin can be watched on Youtube (Estimating stock price correlations using Wikipedia). The slides can be found here and all related code is available on github.
11 Nov 201633min
![[MINI] AdaBoost](https://cdn.podme.com/podcast-images/99E5B4C49CC9487AB4880B5C8DF050F0_small.jpg)
[MINI] AdaBoost
AdaBoost is a canonical example of the class of AnyBoost algorithms that create ensembles of weak learners. We discuss how a complex problem like predicting restaurant failure (which is surely caused by different problems in different situations) might benefit from this technique.
4 Nov 201610min

Stealing Models from the Cloud
Platform as a service is a growing trend in data science where services like fraud analysis and face detection can be provided via APIs. Such services turn the actual model into a black box to the consumer. But can the model be reverse engineered? Florian Tramèr shares his work in this episode showing that it can. The paper Stealing Machine Learning Models via Prediction APIs is definitely worth your time to read if you enjoy this episode. Related source code can be found in https://github.com/ftramer/Steal-ML.
28 Okt 201637min
![[MINI] Calculating Feature Importance](https://cdn.podme.com/podcast-images/0F55914FDD50DA660BA6E1AB7FF4DF27_small.jpg)
[MINI] Calculating Feature Importance
For machine learning models created with the random forest algorithm, there is no obvious diagnostic to inform you which features are more important in the output of the model. Some straightforward but useful techniques exist revolving around removing a feature and measuring the decrease in accuracy or Gini values in the leaves. We broadly discuss these techniques in this episode.
21 Okt 201613min

NYC Bike Share Rebalancing
As cities provide bike sharing services, they must also plan for how to redistribute bicycles as they inevitably build up at more popular destination stations. In this episode, Hui Xiong talks about the solution he and his colleagues developed to rebalance bike sharing systems.
14 Okt 201629min
![[MINI] Random Forest](https://cdn.podme.com/podcast-images/8D34F613EF0312365218B01C071A6E66_small.jpg)
[MINI] Random Forest
Random forest is a popular ensemble learning algorithm which leverages bagging both for sampling and feature selection. In this episode we make an analogy to the process of running a bookstore.
7 Okt 201612min

Election Predictions
Jo Hardin joins us this week to discuss the ASA's Election Prediction Contest. This is a competition aimed at forecasting the results of the upcoming US presidential election competition. More details are available in Jo's blog post found here. You can find some useful R code for getting started automatically gathering data from 538 via Jo's github and official contest details are available here. During the interview we also mention Daily Kos and 538.
30 Sep 201621min
![[MINI] F1 Score](https://cdn.podme.com/podcast-images/8FED6905D98C6C952F5E44F53FCA0D51_small.jpg)
[MINI] F1 Score
The F1 score is a model diagnostic that combines precision and recall to provide a singular evaluation for model comparison. In this episode we discuss how it applies to selecting an interior designer.
23 Sep 20169min





















