[MINI] Sudoku \in NP
Data Skeptic10 Nov 2017

[MINI] Sudoku \in NP

Algorithms with similar runtimes are said to be in the same complexity class. That runtime is measured in the how many steps an algorithm takes relative to the input size.

The class P contains all algorithms which run in polynomial time (basically, a nested for loop iterating over the input). NP are algorithms which seem to require brute force. Brute force search cannot be done in polynomial time, so it seems that problems in NP are more difficult than problems in P. I say it "seems" this way because, while most people believe it to be true, it has not been proven. This is the famous P vs. NP conjecture. It will be discussed in more detail in a future episode.

Given a solution to a particular problem, if it can be verified/checked in polynomial time, that problem might be in NP. If someone hands you a completed Sudoku puzzle, it's not difficult to see if they made any mistakes. The effort of developing the solution to the Sudoku game seems to be intrinsically more difficult. In fact, as far as anyone knows, in the general case of all possible examples of the game, it seems no strategy can do better on average than just random guessing.

This notion of random guessing the solution is where the N in NP comes from: Non-deterministic. Imagine a machine with a random input already written in its memory. Given enough such machines, one of them will have the right answer. If they all ran in parallel, one of them could verify it's input in polynomial time. This guess / provided input is often called a witness string.

NP is an important concept for many reasons. To me, the most reason to know about NP is a practical one. Depending on your goals or the goals of your employer, there are many challenging problems you may attempt to solve. If a problem you are trying to solve happens to be in NP, then you should consider the implications very carefully. Perhaps you'll be lucky and discover that your particular instance of the problem is easy. Sudoku is pretty easy if only 2 remaining squares need to be filled in. The traveling salesman problem is easy to solve if you live in a country where all roads for a ring with exactly one road in and out.

If the problem you wish to solve is not trivial, or if you will face many instances of the problem and expect some will not be trivial, then it's unlikely you'll be able to find the exact solution. Sure, maybe you can grab a bunch of commodity servers and try to scale the heck out of your attempt. Depending on the problem you're solving, that might just work. If you can out-purchase your problem in computing power, then problems in NP will surrender to you. But if your input size ever grows, it's unlikely you'll be able to keep up.

If your problem is intractable in this way, all is not lost. You might be able to find an approximate solution to your problem. Good enough is better than no solution at all, right? Most of the time, probably. However, some tremendous work has also been done studying topics like this. Are there problems which are not even approximable in polynomial time? What approximation techniques work best? Alas, those answers lie elsewhere.

This episode avoids a discussion of a few key points in order to keep the material accessible. If you find this interesting, you should next familiarize yourself with the notions of NP-Complete, NP-Hard, and co-NP. These are topics we won't necessarily get to in future episodes. Michael Sipser's Introduction to the Theory of Computation is a good resource.

Avsnitt(588)

[MINI] Ordinary Least Squares Regression

[MINI] Ordinary Least Squares Regression

This episode explores Ordinary Least Squares or OLS - a method for finding a good fit which describes a given dataset.

6 Mars 201518min

NYC Speed Camera Analysis with Tim Schmeier

NYC Speed Camera Analysis with Tim Schmeier

New York State approved the use of automated speed cameras within a specific range of schools. Tim Schmeier did an analysis of publically available data related to these cameras as part of a project at the NYC Data Science Academy. Tim's work leverages several open data sets to ask the questions: are the speed cameras succeeding in their intended purpose of increasing public safety near schools? What he found using open data may surprise you. You can read Tim's write up titled Speed Cameras: Revenue or Public Safety? on the NYC Data Science Academy blog. His original write up, reproducible analysis, and figures are a great compliment to this episode. For his benevolent recommendation, Tim suggests listeners visit Maddie's Fund - a data driven charity devoted to helping achieve and sustain a no-kill pet nation. And for his self-serving recommendation, Tim Schmeier will very shortly be on the job market. If you, your employeer, or someone you know is looking for data science talent, you can reach time at his gmail account which is timothy.schmeier at gmail dot com.

27 Feb 201516min

[MINI] k-means clustering

[MINI] k-means clustering

The k-means clustering algorithm is an algorithm that computes a deterministic label for a given "k" number of clusters from an n-dimensional datset.  This mini-episode explores how Yoshi, our lilac crowned amazon's biological processes might be a useful way of measuring where she sits when there are no humans around.  Listen to find out how!

20 Feb 201514min

Shadow Profiles on Social Networks

Shadow Profiles on Social Networks

Emre Sarigol joins me this week to discuss his paper Online Privacy as a Collective Phenomenon. This paper studies data collected from social networks and how the sharing behaviors of individuals can unintentionally reveal private information about other people, including those that have not even joined the social network! For the specific test discussed, the researchers were able to accurately predict the sexual orientation of individuals, even when this information was withheld during the training of their algorithm. The research produces a surprisingly accurate predictor of this private piece of information, and was constructed only with publically available data from myspace.com found on archive.org. As Emre points out, this is a small shadow of the potential information available to modern social networks. For example, users that install the Facebook app on their mobile phones are (perhaps unknowningly) sharing all their phone contacts. Should a social network like Facebook choose to do so, this information could be aggregated to assemble "shadow profiles" containing rich data on users who may not even have an account.

13 Feb 201538min

[MINI] The Chi-Squared Test

[MINI] The Chi-Squared Test

The Chi-Squared test is a methodology for hypothesis testing. When one has categorical data, in the form of frequency counts or observations (e.g. Vegetarian, Pescetarian, and Omnivore), split into two or more categories (e.g. Male, Female), a question may arise such as "Are women more likely than men to be vegetarian?" or put more accurately, "Is any observed difference in the frequency with which women report being vegetarian differ in a statistically significant way from the frequency men report that?"

6 Feb 201517min

Mapping Reddit Topics with Randy Olson

Mapping Reddit Topics with Randy Olson

My quest this week is noteworthy a.i. researcher Randy Olson who joins me to share his work creating the Reddit World Map - a visualization that illuminates clusters in the reddit community based on user behavior. Randy's blog post on created the reddit world map is well complimented by a more detailed write up titled Navigating the massive world of reddit: using backbone networks to map user interests in social media. Last but not least, an interactive version of the results (which leverages Gephi) can be found here. For a benevolent recommendation, Randy suggetss people check out Seaborn - a python library for statistical data visualization. For a self serving recommendation, Randy recommends listeners visit the Data is beautiful subreddit where he's a moderator.

30 Jan 201529min

[MINI] Partially Observable State Spaces

[MINI] Partially Observable State Spaces

When dealing with dynamic systems that are potentially undergoing constant change, its helpful to describe what "state" they are in.  In many applications the manner in which the state changes from one to another is not completely predictable, thus, there is uncertainty over how it transitions from state to state.  Further, in many applications, one cannot directly observe the true state, and thus we describe such situations as partially observable state spaces.  This episode explores what this means and why it is important in the context of chess, poker, and the mood of Yoshi the lilac crowned amazon parrot.

23 Jan 201512min

Easily Fooling Deep Neural Networks

Easily Fooling Deep Neural Networks

My guest this week is Anh Nguyen, a PhD student at the University of Wyoming working in the Evolving AI lab. The episode discusses the paper Deep Neural Networks are Easily Fooled [pdf] by Anh Nguyen, Jason Yosinski, and Jeff Clune. It describes a process for creating images that a trained deep neural network will mis-classify. If you have a deep neural network that has been trained to recognize certain types of objects in images, these "fooling" images can be constructed in a way which the network will mis-classify them. To a human observer, these fooling images often have no resemblance whatsoever to the assigned label. Previous work had shown that some images which appear to be unrecognizable white noise images to us can fool a deep neural network. This paper extends the result showing abstract images of shapes and colors, many of which have form (just not the one the network thinks) can also trick the network.

16 Jan 201528min

Populärt inom Vetenskap

p3-dystopia
dumma-manniskor
paranormalt-med-caroline-giertz
svd-nyhetsartiklar
allt-du-velat-veta
rss-vetenskapligt-talat
kapitalet-en-podd-om-ekonomi
dumforklarat
rss-vetenskapspodden
rss-ufobortom-rimligt-tvivel
sexet
rss-vetenskapsradion
rss-i-hjarnan-pa-louise-epstein
rss-vetenskapsradion-2
det-morka-psyket
medicinvetarna
a-kursen
hacka-livet
rss-spraket
rss-personlighetspodden