
When hash function becomes expensive, it is better to cache hash values. Using a proper randomization hash function guards against most non-random inputs and doesn't greatly hurt performance. For huge hash tables, cache miss is often the bottleneck. In real applications, it is almost always preferred to use non-identity hash function for integers (e.g. There was a reddit thread on the topic but I couldn't find it via a quick google search. There are faster hash table implementations in Rust. It should be fast, but it should not win all the benchmarks. It needs to come out of the box with hostile input resistance, for instance, and a general-purpose hash with no domain knowledge that can be exploited, a general-purpose memory strategy, etc. A general-purpose Rust hash library shouldn't be the absolute blazingly fastest hash library around, because if it is, that means that corners have been cut that will bite people. I'm not guaranteeing that has happened here, but I do suspect we're getting close to that frontier. Double-check that you can afford to cut the same corners as your optimized code, or you'll wish you more careful later. It's great fun and useful to optimize things, but after a while I stop paying much attention because after a certain point, optimization ceases being a sign that the programmer has been careful and programmed for performance carefully and starts being a sign that corners have been cut to get the number ever lower. A hash resistant against hostile input, in the absolute limit, simply can't be as fast as one that ignores that issue, because even if you so much as mix in a single more byte into the resistant hash it'll eventually show up on an aggressive enough benchmark of aggressively enough optimized code. It's very easy to end up benchmarking two things that are more fundamentally different than meets the eye. This points to a danger of these sorts of microbenchmarks, as you start getting into the really hyper-optimized domains. The default hash function of Rust's HashMap (SipHash last I checked) is known to be relatively slow as it's cryptographically secure and optimised for medium-sized strings If you're literally only benching the hashmap, I wouldn't be surprised that the hash function is in fact the bottleneck.Įspecially if you're comparing to a completely trivial hash: Rustc itself actually uses fnv for most of its hashmaps, as it performs much better for integers & short keys, and the collision resistance of sip is not really necessary.

Recently, I did a changed our ZFS implementation of Redox from using SipHasher to DJB2, this gave a massive speed-up (x2-4!) If your input is trusted and short - and even more so bounded - and hashmaps actually show up on profiles, changing the hash function to something non-cryptographic and/or type-specific is a known performance improvement e.g.: Rust's hashmap assumes the input data is untrusted and attacker-controlled. The default hash function of Rust's HashMap (SipHash last I checked) is known to be relatively slow as it's cryptographically secure and optimised for medium-sized strings - in fact this has a FAQ entry. Although I tried various hash functions, and I don’t believe they are the bottleneck
