[MINI] Sudoku \in NP
Data Skeptic10 Nov 2017

[MINI] Sudoku \in NP

Algorithms with similar runtimes are said to be in the same complexity class. That runtime is measured in the how many steps an algorithm takes relative to the input size.

The class P contains all algorithms which run in polynomial time (basically, a nested for loop iterating over the input). NP are algorithms which seem to require brute force. Brute force search cannot be done in polynomial time, so it seems that problems in NP are more difficult than problems in P. I say it "seems" this way because, while most people believe it to be true, it has not been proven. This is the famous P vs. NP conjecture. It will be discussed in more detail in a future episode.

Given a solution to a particular problem, if it can be verified/checked in polynomial time, that problem might be in NP. If someone hands you a completed Sudoku puzzle, it's not difficult to see if they made any mistakes. The effort of developing the solution to the Sudoku game seems to be intrinsically more difficult. In fact, as far as anyone knows, in the general case of all possible examples of the game, it seems no strategy can do better on average than just random guessing.

This notion of random guessing the solution is where the N in NP comes from: Non-deterministic. Imagine a machine with a random input already written in its memory. Given enough such machines, one of them will have the right answer. If they all ran in parallel, one of them could verify it's input in polynomial time. This guess / provided input is often called a witness string.

NP is an important concept for many reasons. To me, the most reason to know about NP is a practical one. Depending on your goals or the goals of your employer, there are many challenging problems you may attempt to solve. If a problem you are trying to solve happens to be in NP, then you should consider the implications very carefully. Perhaps you'll be lucky and discover that your particular instance of the problem is easy. Sudoku is pretty easy if only 2 remaining squares need to be filled in. The traveling salesman problem is easy to solve if you live in a country where all roads for a ring with exactly one road in and out.

If the problem you wish to solve is not trivial, or if you will face many instances of the problem and expect some will not be trivial, then it's unlikely you'll be able to find the exact solution. Sure, maybe you can grab a bunch of commodity servers and try to scale the heck out of your attempt. Depending on the problem you're solving, that might just work. If you can out-purchase your problem in computing power, then problems in NP will surrender to you. But if your input size ever grows, it's unlikely you'll be able to keep up.

If your problem is intractable in this way, all is not lost. You might be able to find an approximate solution to your problem. Good enough is better than no solution at all, right? Most of the time, probably. However, some tremendous work has also been done studying topics like this. Are there problems which are not even approximable in polynomial time? What approximation techniques work best? Alas, those answers lie elsewhere.

This episode avoids a discussion of a few key points in order to keep the material accessible. If you find this interesting, you should next familiarize yourself with the notions of NP-Complete, NP-Hard, and co-NP. These are topics we won't necessarily get to in future episodes. Michael Sipser's Introduction to the Theory of Computation is a good resource.

Avsnitt(588)

[MINI] Ant Colony Optimization

[MINI] Ant Colony Optimization

In this week's mini episode, Linhda and Kyle discuss Ant Colony Optimization - a numerical / stochastic optimization technique which models its search after the process ants employ in using random walks to find a goal (food) and then leaving a pheremone trail in their walk back to the nest.  We even find some way of relating the city of San Francisco and running a restaurant into the discussion.

8 Aug 201415min

Data in Healthcare IT with Shahid Shah

Data in Healthcare IT with Shahid Shah

Our guest this week is Shahid Shah. Shahid is CEO at Netspective, and writes three blogs: Health Care Guy, Shahid Shah, and HitSphere - the Healthcare IT Supersite. During the program, Kyle recommended a talk from the 2014 MIT Sloan CIO Symposium entitled Transforming "Digital Silos" to "Digital Care Enterprise" which was hosted by our guest Shahid Shah. In addition to his work in Healthcare IT, he also the chairperson for Open Source Electronic Health Record Alliance, an non-profit organization that, amongst other activities, is hosting an upcoming conference. The 3rd annual OSEHRA Open Source Summit: Global Collaboration in Healthcare IT , which will be taking place September 3-5, 2014 in Washington DC. For our benevolent recommendation, Shahid suggested listeners may benefit from taking the time to read books on leadership for the insights they provide. For our self-serving recommendation, Shahid recommended listeners check out his company Netspective , if you are working with a company looking for help getting started building software utilizing next generation technologies.

1 Aug 201457min

[MINI] Cross Validation

[MINI] Cross Validation

This miniepisode discusses the technique called Cross Validation - a process by which one randomly divides up a dataset into numerous small partitions. Next, (typically) one is held out, and the rest are used to train some model. The hold out set can then be used to validate how good the model does at describing/predicting new data.

25 Juli 20140s

Streetlight Outage and Crime Rate Analysis with Zach Seeskin

Streetlight Outage and Crime Rate Analysis with Zach Seeskin

This episode features a discussion with statistics PhD student Zach Seeskin about a project he was involved in as part of the Eric and Wendy Schmidt Data Science for Social Good Summer Fellowship.  The project involved exploring the relationship (if any) between streetlight outages and crime in the City of Chicago.  We discuss how the data was accessed via the City of Chicago data portal, how the analysis was done, and what correlations were discovered in the data.  Won't you listen and hear what was found?

18 Juli 201433min

[MINI] Experimental Design

[MINI] Experimental Design

This episode loosely explores the topic of Experimental Design including hypothesis testing, the importance of statistical tests, and an everyday and business example.

11 Juli 201415min

The Right (big data) Tool for the Job with Jay Shankar

The Right (big data) Tool for the Job with Jay Shankar

In this week's episode, we discuss applied solutions to big data problem with big data engineer Jay Shankar.  The episode explores approaches and design philosophy to solving real world big data business problems, and the exploration of the wide array of tools available.

7 Juli 201449min

[MINI] Bayesian Updating

[MINI] Bayesian Updating

In this minisode, we discuss Bayesian Updating - the process by which one can calculate the most likely hypothesis might be true given one's older / prior belief and all new evidence.

27 Juni 201411min

Personalized Medicine with Niki Athanasiadou

Personalized Medicine with Niki Athanasiadou

In the second full length episode of the podcast, we discuss the current state of personalized medicine and the advancements in genetics that have made it possible.

20 Juni 201457min

Populärt inom Vetenskap

p3-dystopia
dumma-manniskor
paranormalt-med-caroline-giertz
svd-nyhetsartiklar
allt-du-velat-veta
rss-vetenskapligt-talat
kapitalet-en-podd-om-ekonomi
dumforklarat
rss-vetenskapspodden
rss-ufobortom-rimligt-tvivel
sexet
rss-vetenskapsradion
rss-i-hjarnan-pa-louise-epstein
rss-vetenskapsradion-2
det-morka-psyket
medicinvetarna
a-kursen
hacka-livet
rss-spraket
rss-personlighetspodden