Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures.
Representations: Hash Table vs Binary Search Tree (self-balancing)
There are two main efficient data structures used to represent associative arrays, the hash table and the self-balancing binary search tree. Skip lists are also an alternative, though relatively new and not as widely used. Relative advantages and disadvantages include:
- Hash tables have faster average lookup and insertion time (O(1)), while some kinds of binary search tree have faster worst-case lookup and insertion time (O(log n) instead of O(n)). Hash tables have seen extensive use in real time systems, but trees can be useful in high-security real time systems where untrusted users may deliberately supply information that triggers worst-case performance in a hash table, although careful design can remove that issue. Hash tables shine in very large arrays, where O(1) performance is important. Skip lists have worst-case operation time of O(n), but average-case of O(log n), with much less insertion and deletion overhead than balanced binary trees.
- Hash tables can have more compact storage for small value types, especially when the values are bits.
- There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.
- Building a hash table requires a reasonable hash function for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys. On the other hand, with hash tables the data may be cyclically or partially ordered without any problems.
- Balanced binary trees and skip lists preserve ordering — allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently.
- Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range.
No comments:
Post a Comment