
Mining the Social Web with Matthew Russell
This week's episode explores the possibilities of extracting novel insights from the many great social web APIs available. Matthew Russell's Mining the Social Web is a fantastic exploration of the tools and methods, and we explore a few related topics. One helpful feature of the book is it's use of a Vagrant virtual machine. Using it, readers can easily reproduce the examples from the book, and there's a short video available that will walk you through setting up the Mining the Social Web virtual machine. The book also has an accompanying github repository which can be found here. A quote from Matthew that particularly reasonates for me was "The first commandment of Data Science is to 'Know thy data'." Take a listen for a little more context around this sage advice. In addition to the book, we also discuss some of the work done by Digital Reasoning where Matthew serves as CTO. One of their products we spend some time discussing is Synthesys, a service that processes unstructured data and delivers knowledge and insight extracted from the data. Some listeners might already be familiar with Digital Reasoning from recent coverage in Fortune Magazine on their cognitive computing efforts. For his benevolent recommendation, Matthew recommends the Hardcore History Podcast, and for his self-serving recommendation, Matthew mentioned that they are currently hiring for Data Science job opportunities at Digital Reasoning if any listeners are looking for new opportunities.
7 Nov 201450min
![[MINI] Is the Internet Secure?](https://cdn.podme.com/podcast-images/99E5B4C49CC9487AB4880B5C8DF050F0_small.jpg)
[MINI] Is the Internet Secure?
This episode explores the basis of why we can trust encryption. Suprisingly, a discussion of looking up a word in the dictionary (binary search) and efficiently going wine tasting (the travelling salesman problem) help introduce computational complexity as well as the P ?= NP question, which is paramount to the trustworthiness RSA encryption. With a high level foundation of computational theory, we talk about NP problems, and why prime factorization is a difficult problem, thus making it a great basis for the RSA encryption algorithm, which most of the internet uses to encrypt data. Unlike the encryption scheme Ray Romano used in "Everybody Loves Raymond", RSA has nice theoretical foundations. It should be noted that although this episode gives good reason to trust that properly encrypted data, based on well choosen public/private keys where the private key is not compromised, is safe. However, having safe encryption doesn't necessarily mean that the Internet is secure. Topics like Man in the Middle attacks as well as the Snowden revelations are a topic for another day, not for this record length "mini" episode.
31 Okt 201426min

Practicing and Communicating Data Science with Jeff Stanton
Jeff Stanton joins me in this episode to discuss his book An Introduction to Data Science, and some of the unique challenges and issues faced by someone doing applied data science. A challenge to any data scientist is making sure they have a good input data set and apply any necessary data munging steps before their analysis. We cover some good advise for how to approach such problems.
24 Okt 201436min
![[MINI] The T-Test](https://cdn.podme.com/podcast-images/D4F601A907A6AF7DF60138528F39BDDD_small.jpg)
[MINI] The T-Test
The t-test is this week's mini-episode topic. The t-test is a statistical testing procedure used to determine if the mean of two datasets differs by a statistically significant amount. We discuss how a wine manufacturer might apply a t-test to determine if the sweetness, acidity, or some other property of two separate grape vines might differ in a statistically meaningful way. Check out more details and examiles found in the show notes linked below. https://dataskeptic.com/blog/episodes/2014/t-test
17 Okt 201417min

Data Myths with Karl Mamer
This week I'm joined by Karl Mamer to discuss the data behind three well known urban legends. Did a large blackout in New York and surrounding areas result in a baby boom nine months later? Do subliminal messages affect our behavior? Is placing beer alongside diapers a recipe for generating more revenue than these products in separate locations? Listen as Karl and I explore these claims.
10 Okt 201448min
![[MINI] Selection Bias](https://cdn.podme.com/podcast-images/99E5B4C49CC9487AB4880B5C8DF050F0_small.jpg)
[MINI] Selection Bias
A discussion about conducting US presidential election polls helps frame a converation about selection bias.
3 Okt 201414min
![[MINI] Confidence Intervals](https://cdn.podme.com/podcast-images/400A0A687EBA8533686511190585771C_small.jpg)
[MINI] Confidence Intervals
Commute times and BBQ invites help frame a discussion about the statistical concept of confidence intervals.
26 Sep 201411min





















