[MINI] Sudoku \in NP
Data Skeptic10 Nov 2017

[MINI] Sudoku \in NP

Algorithms with similar runtimes are said to be in the same complexity class. That runtime is measured in the how many steps an algorithm takes relative to the input size.

The class P contains all algorithms which run in polynomial time (basically, a nested for loop iterating over the input). NP are algorithms which seem to require brute force. Brute force search cannot be done in polynomial time, so it seems that problems in NP are more difficult than problems in P. I say it "seems" this way because, while most people believe it to be true, it has not been proven. This is the famous P vs. NP conjecture. It will be discussed in more detail in a future episode.

Given a solution to a particular problem, if it can be verified/checked in polynomial time, that problem might be in NP. If someone hands you a completed Sudoku puzzle, it's not difficult to see if they made any mistakes. The effort of developing the solution to the Sudoku game seems to be intrinsically more difficult. In fact, as far as anyone knows, in the general case of all possible examples of the game, it seems no strategy can do better on average than just random guessing.

This notion of random guessing the solution is where the N in NP comes from: Non-deterministic. Imagine a machine with a random input already written in its memory. Given enough such machines, one of them will have the right answer. If they all ran in parallel, one of them could verify it's input in polynomial time. This guess / provided input is often called a witness string.

NP is an important concept for many reasons. To me, the most reason to know about NP is a practical one. Depending on your goals or the goals of your employer, there are many challenging problems you may attempt to solve. If a problem you are trying to solve happens to be in NP, then you should consider the implications very carefully. Perhaps you'll be lucky and discover that your particular instance of the problem is easy. Sudoku is pretty easy if only 2 remaining squares need to be filled in. The traveling salesman problem is easy to solve if you live in a country where all roads for a ring with exactly one road in and out.

If the problem you wish to solve is not trivial, or if you will face many instances of the problem and expect some will not be trivial, then it's unlikely you'll be able to find the exact solution. Sure, maybe you can grab a bunch of commodity servers and try to scale the heck out of your attempt. Depending on the problem you're solving, that might just work. If you can out-purchase your problem in computing power, then problems in NP will surrender to you. But if your input size ever grows, it's unlikely you'll be able to keep up.

If your problem is intractable in this way, all is not lost. You might be able to find an approximate solution to your problem. Good enough is better than no solution at all, right? Most of the time, probably. However, some tremendous work has also been done studying topics like this. Are there problems which are not even approximable in polynomial time? What approximation techniques work best? Alas, those answers lie elsewhere.

This episode avoids a discussion of a few key points in order to keep the material accessible. If you find this interesting, you should next familiarize yourself with the notions of NP-Complete, NP-Hard, and co-NP. These are topics we won't necessarily get to in future episodes. Michael Sipser's Introduction to the Theory of Computation is a good resource.

Avsnitt(588)

Video Game Analytics

Video Game Analytics

This episode discusses video game analytics with guest Anders Drachen. The way in which people get access to games and the opportunity for game designers to ask interesting questions with data has changed quite a bit in the last two decades. Anders shares his insights about the past, present, and future of game analytics. We explore not only some of the innovations and interesting ways of examining user experience in the gaming industry, but also touch on some of the exciting opportunities for innovation that are right on the horizon. You can find more from Anders online at andersdrachen.com, and follow him on twitter @andersdrachen

19 Juni 201531min

[MINI] Anscombe's Quartet

[MINI] Anscombe's Quartet

This mini-episode discusses Anscombe's Quartet, a series of four datasets which are clearly very different but share some similar statistical properties with one another. For example, each of the four plots has the same mean and variance on both axis, as well as the same correlation coefficient, and same linear regression. The episode tries to add some context by imagining each of these datasets as data about a sports team, and why it can be important to look beyond basic summary statistics when exploring your dataset.

12 Juni 20159min

Proposing Annoyance Mining

Proposing Annoyance Mining

A recent episode of the Skeptics Guide to the Universe included a slight rant by Dr. Novella and the rouges about a shortcoming in operating systems. This episode explores why such a (seemingly obvious) flaw might make sense from an engineering perspective, and how data science might be the solution. In this solo episode, Kyle proposes the concept of "annoyance mining" - the idea that with proper logging and enough feedback, data scientists could be provided the right dataset from which they can detect flaws and annoyances in software and other systems and automatically detect potential bugs, flaws, and improvements which could make those systems better. As system complexity grows, it seems that an abstraction like this might be required in order to keep maintaining an effective development cycle. This episode is a bit of a soap box for Kyle as he explores why and how we might track an appropriate amount of data to be able to make better software and systems more suited for the users.

9 Juni 201530min

Preserving History at Cyark

Preserving History at Cyark

Elizabeth Lee from CyArk joins us in this episode to share stories of the work done capturing important historical sites digitally. CyArk is a non-profit focused on using technology to preserve the world's important historic and cultural locations digitally. CyArk's founder Ben Kacyra, a pioneer in 3D capture technology, and his wife, founded CyArk after seeing the need to preserve important artifacts and locations digitally before they are lost to natural disasters, human destruction, or the passage of time. We discuss their technology, data, and site selection including the upcoming themes of locations and the CyArk 500. Elizabeth puts out the call to all listeners to share their opinions on what important sites should be included in The Cyark 500 Challenge - an effort to digitally preserve 500 of the most culturally important heritage sites within the next five years. You can Nominate a site by submitting a short form at CyArk.org Visit http://www.cyark.org/projects/ to view an immersive, interactive experience of many of the sites preserved.

5 Juni 201523min

[MINI] A Critical Examination of a Study of Marriage by Political Affiliation

[MINI] A Critical Examination of a Study of Marriage by Political Affiliation

Linhda and Kyle review a New York Times article titled How Your Hometown Affects Your Chances of Marriage. This article explores research about what correlates with the likelihood of being married by age 26 by county. Kyle and LinhDa discuss some of the fine points of this research and the process of identifying factors for consideration.

29 Maj 201510min

Detecting Cheating in Chess

Detecting Cheating in Chess

With the advent of algorithms capable of beating highly ranked chess players, the temptation to cheat has emmerged as a potential threat to the integrity of this ancient and complex game. Yet, there are aspects of computer play that are measurably different than human play. Dr. Kenneth Regan has developed a methodology for looking at a long series of modes and measuring the likelihood that the moves may have been selected by an algorithm. The full transcript of this episode is well annotated and has a wealth of excellent links to the things discussed. If you're interested in learning more about Dr. Regan, his homepage (Kenneth Regan), his page on wikispaces, and the amazon page of books by Kenneth W. Regan are all great resources.

22 Maj 201544min

[MINI] z-scores

[MINI] z-scores

This week's episode dicusses z-scores, also known as standard score. This score describes the distance (in standard deviations) that an observation is away from the mean of the population. A closely related top is the 68-95-99.7 rule which tells us that (approximately) 68% of a normally distributed population lies within one standard deviation of the mean, 95 within 2, and 99.7 within 3. Kyle and Linh Da discuss z-scores in the context of human height. If you'd like to calculate your own z-score for height, you can do so below. They further discuss how a z-score can also describe the likelihood that some statistical result is due to chance. Thus, if the significance of a finding can be said to be 3σ, that means that it's 99.7% likely not due to chance, or only 0.3% likely to be due to chance.

15 Maj 201510min

Using Data to Help Those in Crisis

Using Data to Help Those in Crisis

This week Noelle Sio Saldana discusses her volunteer work at Crisis Text Line - a 24/7 service that connects anyone with crisis counselors. In the episode we discuss Noelle's career and how, as a participant in the Pivotal for Good program (a partnership with DataKind), she spent three months helping find insights in the messaging data collected by Crisis Text Line. These insights helped give visibility into a number of different aspects of Crisis Text Line's services. Listen to this episode to find out how! If you or someone you know is in a moment of crisis, there's someone ready to talk to you by texting the shortcode 741741.

8 Maj 201534min

Populärt inom Vetenskap

p3-dystopia
dumma-manniskor
paranormalt-med-caroline-giertz
svd-nyhetsartiklar
allt-du-velat-veta
rss-vetenskapligt-talat
kapitalet-en-podd-om-ekonomi
dumforklarat
sexet
rss-ufobortom-rimligt-tvivel
rss-i-hjarnan-pa-louise-epstein
rss-vetenskapsradion
rss-vetenskapsradion-2
rss-vetenskapspodden
det-morka-psyket
hacka-livet
medicinvetarna
rss-spraket
a-kursen
barnpsykologerna