Tuesday, October 30, 2007

Hash Tables

Applications require a dynamic set that supports only the dictionary operations INSERT, SEARCH, and DELETE. For example, a compiler maintains a symbol table.

Hash functions: a good hash function satisfies the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots, independently of where any other key has hashed to. It is usually straightforward in an application to devise some method for interpreting each key as a (possibly large) nature number.

The division method: map a key k into one of m slots by taking the remainder of k divided by m, h(k)= k mod m. m, a prime, should not be too close to an exact power of 2.

The multiplication method: h(k)=the floor of (k*A mod 1), A is a constant in the range 0 < A < 1.

Universal hashing: choose the hash function randomly in a way that is independent of the keys that are actually going to be stored.

Perfect hashing: the worst-case number of memory accesses required to perform a search is O(1). The set of keys must be static: once the keys are stored in the table, the set of keys never changes. The basic idea is to use a two-level hashing scheme with universal hashing at each level and there are no collisions at the second-level by choosing the hash function carefully.

R
esolve collision: chaining or open addressing.

Chaining: We put all the elements that hash to the same slot in a linked list.

Open-addressing:
All elements are stored in the hash table itself. The advantage of open addressing is that it avoids pointers altogether. To perform insertion, we successively probe the hash table until we found an empty slot in which to put the key. Deletion from an open-address hash table is difficult (think of the searching procedure).

Three techniques are commonly used to compute the probe sequences: linear probing (only m distinct probe sequences and a famous problem: primary clustering and the average search time increases), quadratic probing, and double hashing (h(k,i) = (h1(k) + i*h2(k)) mod m, the initial position, the offset, or both may vary).


Hash table Vs Binary Search Tree

A hash table can store and retrieve data quickly in O(1) or constant time. However, its uses beyond this are limited. A binary search tree can insert and retrieve in O(log(n)). A binary search tree also maintains its data in sorted order. For some data set such as 10,000 entries, a binary search tree's O(log(n)) will be fast enough.

No comments: