Saturday, May 10, 2014

What is algorithm? Sounds so cheem!

Indeed, the study of algorithm can get even more complex and deep than what the already cheem word 'algorithm' suggests. Personally, it is the study of the art of solving problems by manipulating data and structures to come up with a fast and reliable solution.

Before we dwell too deeply on defining the precise meaning of algorithm and data structures, let's consider the following daily "problem".

Suppose that we have to search for an item from a pile of random stuff. How would we find the item as quickly as possible? Of course one of the most straight forward way is to go through the items in the pile one by one, from the top-most item to the bottom, until we find what we are looking for. If there are 10 items, we can find our item real quick. If we have 20 items, maybe it gets a bit longer. But what if we have 10000 items? Or a million? Our method of scanning the items one by one will take a long time before we get to find the item. Can we do better?

This is the kind of problem that you will often face when dealings with problems and trying to come up with the correct and fast enough algorithm. In this example, there is really not much we can do, given that the stuff are randomly distributed in the pile. The best solution (for now) is really scanning the whole items, and it takes linear time to find our item. Linear time means that the time taken for the algorithm to terminate is proportional to the size of the problem.

However, if we are told that the items are sorted by their weight, e.g. from the lightest to the heaviest, there is a faster solution that beats our linear time method! Let's see the following demonstration.

Suppose the weight of our item is 500kg (wonder what it is huh) and it is inside a pile of items which is sorted by weight:
1 5 40 45 80 90 120 500 670 700 800

If we try to search for the 500kg item linearly from the first item (1kg), we will find our item after 8 steps:
1 5 40 45 80 90 120 500 670 700 800 // nah..
1 5 40 45 80 90 120 500 670 700 800 // not this one..
1 5 40 45 80 90 120 500 670 700 800 // still no..
1 5 40 45 80 90 120 500 670 700 800 // uhh..
1 5 40 45 80 90 120 500 670 700 800 // persevere..
1 5 40 45 80 90 120 500 670 700 800 // cmon..
1 5 40 45 80 90 120 500 670 700 800 // oh god.. why..
1 5 40 45 80 90 120 500 670 700 800 // FOUND IT!
Total: 8 steps

Now let's revise our method by utilising the information that we have. Consider this: If we choose an item from the pile somewhere in the middle, and it turns out to be lighter than 500kg, we can ignore all items that come before it (because they will be lighter that 500kg too!) and start searching on the other half. We effectively eliminate half of our search space in one step! We can repeat the process of choosing the middle item and eliminate one half of the piles for every steps: (the blue colored numbers are the current search space at each step)
1 5 40 45 80 90 120 500 670 700 800 //90 < 500, so 500 must be right
1 5 40 45 80 90 120 500 670 700 800 //670 > 500, so 500 must be on the left
1 5 40 45 80 90 120 500 670 700 800 //120 < 500, 500 must be on the right
1 5 40 45 80 90 120 500 670 700 800 //FOUND IT!
Total: 4 steps

This is called binary search, and this algorithm takes logarithmic time, since the expected number of steps needed is proportionate to the logarithmic curve, which grows much slower than a linear curve. Binary search is an example of the application of a problem solving paradigm called "Divide and Conquer", which divides up its problem space into smaller pieces to work on, one of the most powerful recurring ideas in algorithm study.

Here we can see a huge improvement from 8 steps to 4 steps. I must admit that if we are searching for items of different weight, for example 1kg, the linear method will return faster for this case. However, when the size of the problem becomes really big, e.g 1 million, and the item that we need to find is randomized, the number of steps needed for the linear method may reach 1 million, while for binary search you need at most 20 steps to find the item!

We can see that studying algorithm can give us huge performance boost and impact on the effectiveness of our solutions. Hence, its importance goes without saying, in which its applications can range from remotely useful (such as... almost nothing) to banking, scientific research, DNA sequencing, security, airplanes scheduling, to anything that you can think of!

This simple illustration demonstrates a few common ideas on algorithm and data structure, and there are many more to look forward to.