Your understanding of hash tables (and who use them) is flawed.

The problem is, hash table is a rather vague term. Under the hood there are many implementations... but first let's talk about the use of BST (Binary Search Trees).

Why does C++ uses a Binary Search Tree ?

~~C++ is designed by commitee, there are many possible implementations of hash tables leading to widely different characteristics while the most popular implementations of BST (Red-Black Tree and AVL Tree) have nearly identical characteristics. Therefore, they did not rejected hash tables outright, they just could not settle on the characteristics to choose and the details to expose to the user.~~

See James Kanze's comment, the proposal arrived too late and James asks an interesting question as to why Stepanov did not proposed it first. I still suspect the number of choices to be the culprit.

Why do databases use Search Trees ?

First of all, let's settle on a database software. I'll pick Oracle because it's both widely documented and so typical of SQL databases. Oracle offers two types of indexes: Bitmap and Search Trees.

*Note: they do not use BINARY Search Trees, but instead use B+Trees which are much more IO and cache friendly*

There is a fundamental difference between a Hash Table and a Search Tree: the latter is sorted. Many databases operations imply sorting:

- get the nth element
- get the top n elements
- get the elements in [a,b]

In all those cases, a Hash Table is useless.

Furthermore, databases need to juggle with huge datasets (in general), which means that they need to organize their data in order to minimize IO (disk read/write). Here, the sorted nature of a Search Tree mean that (in the index) elements that are likely to be accessed together (because they share much) will also be grouped together instead of being scattered to the four corners of the disk.

Finally, *internally* Oracle may use Hash Tables in its execution plan. When you perform an operation that requires the intersection of two sets of rows, the optimization engine may decide that storing the (temporary) sets in Hash Tables is the fastest way to go.

Now, regarding performance.

Indeed, the performance of Search Trees is generally well-known and easy to understand O(log N) is nice and tidy.

On the other hand, as I said, there are many different Hash Tables implementation possible, as well as strategies to handle both growth and shrink... definitely more complicated.

A simple example of structure, a Hash Table may use:

- Open Addressing: the hash table is an array of elements, the hash indicates the slot of the array in which to put the element, if the slot is full there is a strategy to locate another slot. The same strategy is used for searching.
- Buckets: the hash table is an array of pointers to buckets, the hash indicates the slot of the bucket in which to put the elements. It is assumed that the bucket can grow infinitely.

Those two strategies have extremely different characteristics, and the latter characteristics also depend on the buckets implementations (the easy implementation is to use a simple linked-list).

But even if you pick an implementation, its performance is based on the hash function distribution, which varies depending on the input sequence itself!

My personal advice ? To pick between `unordered_map`

and `map`

in C++, I simply ask myself about whether I need sorted elements or not. If I need them to be sorted I use a `map`

, otherwise I use an `unordered_map`

. Most of the times, the performances are just as good anyway, so it's just **semantics**.