Monday, November 5, 2007

Associative array: Hash Table vs BST

An associative array (also map, mapping, hash, dictionary, finite map, lookup table, and in query-processing an index or index file) is an abstract data type composed of a collection of keys and a collection of values, where each key is associated with one value. The operation of finding the value associated with a key is called a lookup or indexing, and this is the most important operation supported by an associative array. The relationship between a key and its value is sometimes called a mapping or binding.

Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures.

Representations: Hash Table vs Binary Search Tree (self-balancing)

There are two main efficient data structures used to represent associative arrays, the hash table and the self-balancing binary search tree. Skip lists are also an alternative, though relatively new and not as widely used. Relative advantages and disadvantages include:

  • Hash tables have faster average lookup and insertion time (O(1)), while some kinds of binary search tree have faster worst-case lookup and insertion time (O(log n) instead of O(n)). Hash tables have seen extensive use in real time systems, but trees can be useful in high-security real time systems where untrusted users may deliberately supply information that triggers worst-case performance in a hash table, although careful design can remove that issue. Hash tables shine in very large arrays, where O(1) performance is important. Skip lists have worst-case operation time of O(n), but average-case of O(log n), with much less insertion and deletion overhead than balanced binary trees.
  • Hash tables can have more compact storage for small value types, especially when the values are bits.
  • There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.
  • Building a hash table requires a reasonable hash function for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys. On the other hand, with hash tables the data may be cyclically or partially ordered without any problems.
  • Balanced binary trees and skip lists preserve ordering allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently.
  • Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range.
http://en.wikipedia.org/wiki/Associative_array

No comments: