Key to Optimization: Algorithm and Analysis

Every problem has solutions but the most important thing is to find the solution that best fits that scenario. So as we take a dig inside the algorithms and techniques we will investigate how we can analyse the complexity and conclude the correct optimized solution.

Let us begin with a simple example and try to find the time and complexity.

Example : We need the sum of n natural numbers starting from ‘a’ and ending at ‘b’.
Solution 1:

public class StupidSum {

public long sum(long a, long b){

long totalSum = 0;

for(long i=a;i<=b;i++)

totalSum += i;

return totalSum;
 }
}

As we see the complexity of this solution is O(n) .For small data this algorithm is fine but as the numbers grow very large, it can surely become a bottleneck in any application.

Now let’s look at the second one:
Solution 2:

public class SmartSum {

public long sum(long a, long b){

long fullRange = b*(b+1)/2;

a--; // so result is inclusive of a

long initialRange = a*(a+1)/2;

return (fullRange - initialRange);
  }
}

This solution takes into account that the sum of any series of integers between zero and n, inclusive, can be calculated with the formula (n(n+1))/2.
As we see the complexity of this approach is O(1), which is not dependent on the result set at all, however big may be the data it will work in same time.

So , next let’s try to compare the two algorithms by a simple program :


public static void main(String args[]) {

System.out.println("Sum integers between 46 and 891000");

StupidSum s1 = new StupidSum ();

long time1 = System.currentTimeMillis();

long stupidsum = s1.sum(46,891000);

long time2 = System.currentTimeMillis();

System.out.println("StupidSum:"+stupidsum+" Time:"+(time2-time1)+" ms.");

SmartSum s2 = new SmartSum();

time1 = System.currentTimeMillis();

long smartsum = s2.sum(46,891000);

time2 = System.currentTimeMillis();

System.out.println("SmartSum:"+smartsum+" Time:"+(time2-time1)+" ms.");

}

Output :

Sum integers between 46 and 891000
StupidSum:396940944465 Time:5 ms.
SmartSum:396940944465 Time:0 ms.

Conclusion :
The output of the above code clearly stipulates why we must focus on an optimized and smart algorithm.

Sorting and Searching techniques
One of the major killers of any application is when it comes to implement a search or sort functionality. Some of the most common operations are searching, insertion and delete.

Example :
As we see the most common methods of searching could be a linear way of searching by comparing each element with the key, but that would be of complexity O(n), which can be optimised.
So how do we make our search easier, one efficient method is Binary Search, which takes into account that the data is sorted and then uses the divide and conquer algorithm to find the key.
As we see it has Worst case performance O(log n)
Best case performance O(1)
And average case performance O(log n).
So in most cases it is advisable to use binary search where there are arrays and lists as the data structures.

Sorting:
When it comes to sorting there are many techniques used, but few good ones with better performances would be Quick Sort, Merge sort, Heap Sort , of course depending upon the data structure used and the amount of data. All these 3 algorithms have a complexity of n log n as best case.

Hashes:
One of the most important aspects of code optimisation is hashing.
So how do we store the data in hash tables.

Dictionary example:
Suppose we want to store the words and their meanings in a dictionary, and we treat the word as the key to find the meaning.
So one method could be to use an array to store the meaning and its index can be the word.
To insert a Definition into the dictionary, we define a function hashCode() that maps each word (key) to a unique integer. But the problem arises as English has fewer than one million words, so we would require an array that long.
Clearly we need a better solution.

Suppose n is the number of keys (words) whose definitions we want to store, and
suppose we use a table of N buckets, where N is perhaps a bit larger than n,
but much smaller than the number of possible keys. A hash table maps a huge set of possible keys into N buckets by applying a compression function to each hash code.

So, we take the word’s hashcode() and apply a compression function or hash function on it to find the bucket where we would store it.

h(hashCode) = hashCode mod N.

With this compression function, no matter how long and variegated the keys are, we can map them into a table whose size is not much greater than the actual number of entries we want to store. However, we’ve created a new problem: several keys are hashed to the same bucket in the table if h(hashCode1) = h(hashCode2). This circumstance is called a collision. To avoid this, we instead of having each bucket in the table reference one entry, we have it reference a linked list of entries, called a chain. If several keys are mapped to the same bucket, their definitions all reside in
that bucket’s linked list.

Benefits of Hashing:
Insertion, Deletion, Search becomes easier and now takes less time.

So how do we search a key here: we would apply the decompression function to the hashcode of the key to find the bucket where the value is stored, which has a complexity of O(1), then in the chain we can move linearly to find the desired element. Similarly for insertion and deletion once we find the key it is one step more to perform the insertion and deletion.

This is beneficial as we clearly see while handling huge amount of data we save a substantial amount of time in search, insertion and deletion.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.