
MS Connect Conference
Cloud services are now ubiquitous in data science and more broadly in technology as well. This week, I speak to Mark Souza, Tobias Ternström, and Corey Sanders about various aspects of data at scale. ...
9 Dec 201642min

Causal Impact
Today's episode is all about Causal Impact, a technique for estimating the impact of a particular event on a time series. We talk to William Martin about his research into the impact releases have on ...
2 Dec 201634min
![[MINI] The Bootstrap](https://cdn.podme.com/podcast-images/7C57C80B6107185A1853EA37AA1F81FC_small.jpg)
[MINI] The Bootstrap
The Bootstrap is a method of resampling a dataset to possibly refine it's accuracy and produce useful metrics on the result. The bootstrap is a useful statistical technique and is leveraged in Bagging...
25 Nov 201610min
![[MINI] Gini Coefficients](https://cdn.podme.com/podcast-images/0CB79915CADD4FF315AEC0244BFE0624_small.jpg)
[MINI] Gini Coefficients
The Gini Coefficient (as it relates to decision trees) is one approach to determining the optimal decision to introduce which splits your dataset as part of a decision tree. To pick the right feature ...
18 Nov 201615min

Unstructured Data for Finance
Financial analysis techniques for studying numeric, well structured data are very mature. While using unstructured data in finance is not necessarily a new idea, the area is still very greenfield. On ...
11 Nov 201633min
![[MINI] AdaBoost](https://cdn.podme.com/podcast-images/99E5B4C49CC9487AB4880B5C8DF050F0_small.jpg)
[MINI] AdaBoost
AdaBoost is a canonical example of the class of AnyBoost algorithms that create ensembles of weak learners. We discuss how a complex problem like predicting restaurant failure (which is surely caused ...
4 Nov 201610min

Stealing Models from the Cloud
Platform as a service is a growing trend in data science where services like fraud analysis and face detection can be provided via APIs. Such services turn the actual model into a black box to the con...
28 Okt 201637min
![[MINI] Calculating Feature Importance](https://cdn.podme.com/podcast-images/0F55914FDD50DA660BA6E1AB7FF4DF27_small.jpg)
[MINI] Calculating Feature Importance
For machine learning models created with the random forest algorithm, there is no obvious diagnostic to inform you which features are more important in the output of the model. Some straightforward bu...
21 Okt 201613min

















