Concept of Hashing

Posted on Updated on

The concept of Hashing is used in Map to store value using its Key. Hashing uses the informational data of key to store the value. A hashcode is generated by using key, which is an unique value. The function which generate the hashcode is defined in class of the Object.The hashcode is thus used as the index position at which value associated with the Key is stored. The hashcode cannot directly index the hashtable to store the value, it is processed with by hash function.

The advantage of hashing is that it allows the execution time of basic operations, such as add( ), contains( ), remove( ), and size( ), to remain constant even for large sets.

Let us understand in detail by an example
Given a problem statement of searching in which we need to search an element.
Now if in case the data is not sorted, then it’ll require to search to iterate throughout the data to find the desired data; however if we consider the case of Sorted data then it’ll require less time as in case of unsorted data. The worst case runtime complexity is given by O(log n).

Hashing gives us magical words by which we can reduce the searching to just one probe and the runtime complexity is constant to O(log n). Is this like magic? I’m sure it is. In this we’ll know the index position of value, the moment we’ve key to find from Map/set.
Hashing-1

The example of address directory can help you understand better. Here in this directory, we’ll have name of person as key and address to be value of it. The hash function will generate the hashcode of key and depending upon the hashcode generated value will stored in the table along with its key.
The function which generate the hashcode is called to be universal hash function. Practically it is impossible to get unique hashcode of each key, there will be scenario where two different objects have same hashcode.
The hash function will have following properties:

  1. It’ll return a number for an object.
  2. Two equal object will always have same number.
  3. Two different object may have same number.

Let us see how a value is stored using hash function in array.

Suppose we’ve an empty array with size N, a hash function hf. The mapping of objects will be done starting index positions 0,1,2…..N-1. Index position for each object will be calculated using hf(object). Such array will be called as hash Table.
Hashing-2

There is one thing we should understand that different function have different implementation for Indexes using hashcode. We can choose the hash function depending upon our requirement. One simple approach is to define your own hash function by use of hashcode() method. The hashcode() will generate a numeric value for every object. Let us see an example:

OUTPUT:

Hashcode for an Integer is 2016
Hashcode for an String is 1537251

The hashcode() is defined in object class and thus it is inherited by each class. However the implementation is different for different classes.The hashcode implementation for string is by below formula:

s.charAt(0) * 31n-1 + s.charAt(1) * 31n-2 + … + s.charAt(n-1)

Ex: “ABC” = ‘A’ * 312 + ‘B’ * 31 + ‘C’ = 65 * 312 + 66 * 31 + 67 = 64578

Collision:

The example of address directory can help you understand better. Here in this directory, we’ll have name of person as key and address to be value of it.  The hash function will generate the hashcode of key and depending upon the hashcode generated value will stored in the table along with its key.  There  will be certain scenarios where hashcode of two different objects might be same.

Such situation is said to be Collision.  Let’s see an example where two different string have same hashcode.
Example:
AaAa → 2019172
BBBB → 2019172

We see that how this will create problem for hashing the values. If location is pre-occupied by the first object then where will be second object stored? To deal problem of Collision, there are several ways by which it can be dealt.
1. Separate Chaining:
In this we keep the key that collide into linked list. A hashtable then is array of list.
2. Linear Probing:
If we cannoit insert at index k, we try the next slot k+1. If that one is occupied, we go to k+2, and so on. This is quite simple approach but it requires new thinking about hash tables.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.