![[MINI] Markov Chains](https://cdn.podme.com/podcast-images/99E5B4C49CC9487AB4880B5C8DF050F0_small.jpg)
[MINI] Markov Chains
This episode introduces the idea of a Markov Chain. A Markov Chain has a set of states describing a particular system, and a probability of moving from one state to another along every valid connected state. Markov Chains are memoryless, meaning they don't rely on a long history of previous observations. The current state of a system depends only on the previous state and the results of a random outcome. Markov Chains are a useful way method for describing non-deterministic systems. They are useful for destribing the state and transition model of a stochastic system. As examples of Markov Chains, we discuss stop light signals, bowling, and text prediction systems in light of whether or not they can be described with Markov Chains.
20 Mars 201511min

Oceanography and Data Science
Nicole Goebel joins us this week to share her experiences in oceanography studying phytoplankton and other aspects of the ocean and how data plays a role in that science. We also discuss Thinkful where Nicole and I are both mentors for the Introduction to Data Science course. Last but not least, check out Nicole's blog Data Science Girl and the videos Kyle mentioned on her Youtube channel featuring one on the diversity of phytoplankton and how that changes in time and space.
13 Mars 201533min
![[MINI] Ordinary Least Squares Regression](https://cdn.podme.com/podcast-images/99E5B4C49CC9487AB4880B5C8DF050F0_small.jpg)
[MINI] Ordinary Least Squares Regression
This episode explores Ordinary Least Squares or OLS - a method for finding a good fit which describes a given dataset.
6 Mars 201518min

NYC Speed Camera Analysis with Tim Schmeier
New York State approved the use of automated speed cameras within a specific range of schools. Tim Schmeier did an analysis of publically available data related to these cameras as part of a project at the NYC Data Science Academy. Tim's work leverages several open data sets to ask the questions: are the speed cameras succeeding in their intended purpose of increasing public safety near schools? What he found using open data may surprise you. You can read Tim's write up titled Speed Cameras: Revenue or Public Safety? on the NYC Data Science Academy blog. His original write up, reproducible analysis, and figures are a great compliment to this episode. For his benevolent recommendation, Tim suggests listeners visit Maddie's Fund - a data driven charity devoted to helping achieve and sustain a no-kill pet nation. And for his self-serving recommendation, Tim Schmeier will very shortly be on the job market. If you, your employeer, or someone you know is looking for data science talent, you can reach time at his gmail account which is timothy.schmeier at gmail dot com.
27 Feb 201516min
![[MINI] k-means clustering](https://cdn.podme.com/podcast-images/99E5B4C49CC9487AB4880B5C8DF050F0_small.jpg)
[MINI] k-means clustering
The k-means clustering algorithm is an algorithm that computes a deterministic label for a given "k" number of clusters from an n-dimensional datset. This mini-episode explores how Yoshi, our lilac crowned amazon's biological processes might be a useful way of measuring where she sits when there are no humans around. Listen to find out how!
20 Feb 201514min

Shadow Profiles on Social Networks
Emre Sarigol joins me this week to discuss his paper Online Privacy as a Collective Phenomenon. This paper studies data collected from social networks and how the sharing behaviors of individuals can unintentionally reveal private information about other people, including those that have not even joined the social network! For the specific test discussed, the researchers were able to accurately predict the sexual orientation of individuals, even when this information was withheld during the training of their algorithm. The research produces a surprisingly accurate predictor of this private piece of information, and was constructed only with publically available data from myspace.com found on archive.org. As Emre points out, this is a small shadow of the potential information available to modern social networks. For example, users that install the Facebook app on their mobile phones are (perhaps unknowningly) sharing all their phone contacts. Should a social network like Facebook choose to do so, this information could be aggregated to assemble "shadow profiles" containing rich data on users who may not even have an account.
13 Feb 201538min
![[MINI] The Chi-Squared Test](https://cdn.podme.com/podcast-images/96FF53360097657F6548787A4FAF096A_small.jpg)
[MINI] The Chi-Squared Test
The Chi-Squared test is a methodology for hypothesis testing. When one has categorical data, in the form of frequency counts or observations (e.g. Vegetarian, Pescetarian, and Omnivore), split into two or more categories (e.g. Male, Female), a question may arise such as "Are women more likely than men to be vegetarian?" or put more accurately, "Is any observed difference in the frequency with which women report being vegetarian differ in a statistically significant way from the frequency men report that?"
6 Feb 201517min

Mapping Reddit Topics with Randy Olson
My quest this week is noteworthy a.i. researcher Randy Olson who joins me to share his work creating the Reddit World Map - a visualization that illuminates clusters in the reddit community based on user behavior. Randy's blog post on created the reddit world map is well complimented by a more detailed write up titled Navigating the massive world of reddit: using backbone networks to map user interests in social media. Last but not least, an interactive version of the results (which leverages Gephi) can be found here. For a benevolent recommendation, Randy suggetss people check out Seaborn - a python library for statistical data visualization. For a self serving recommendation, Randy recommends listeners visit the Data is beautiful subreddit where he's a moderator.
30 Jan 201529min




















