

Using squirrel3 with std::unordered_map additionally makes insertions slightly slower and lookups slightly faster.įurther comparisons will be relative to these results, since exact performance numbers for std::unordered_map depend on one’s standard library distribution (here, Microsoft’s). Running these benchmarks on std::unordered_map produces results roughly equivalent to the 100% column, just with marginally slower insertions and faster clears.

However, we should not directly compare memory amplification against the coming open-addressed tables, as storing larger values would improve separate chaining’s ratio. Technically, the reported memory amplification in an underestimate, as it does not include heap allocator overhead. Insertions and erasures, on the other hand, are pretty slow-they involve allocation. These are respectable results, especially lookups at 100%: the CPU is able to hide much of the memory latency. To get our bearings, let’s first implement a simple separate chaining table and benchmark it at various load factors. Time to look up each element by following a Sattolo cycle 6 (2-4x).Several less interesting benchmarks were also run, all of which exhibited identical performance across equal-memory open-addressed tables and much worse performance for separate chaining. I make no claims regarding the quality or robustness of this hash function, but observe that it’s cheap, it produces the expected number of collisions in power-of-two tables, and it passes smhasher when applied bytewise.

Given a hash function drawn from a universal family, inserting \(N\) keys into a table with \(K\) buckets results in an expected bucket size of \(\frac In separate chaining, a hash function is used to map each key to one of \(K\) buckets.Įach bucket holds a linked list, so to retrieve a key, one simply traverses its corresponding bucket. Most people first encounter hash tables implemented using separate chaining, a model simple to understand and analyze mathematically. When prioritizing deterministic performance over memory efficiency, two-way chaining is also a good choice.Ĭode for this article may be found on GitHub. Your default hash table should be open-addressed, using Robin Hood linear probing with backward-shift deletion.
