Recent hash table development raises questions about an optimal solution.
Posted Dec 2 2025

Credit: Alberto Ruiz / Shutterstock
Hash tables are one of the oldest and simplest data structures for storing elements and supporting deletions and queries. Invented in 1953, they underly most computational systems. Yet despite their ubiquity, or perhaps because of it, computer scientists continually strive to improve their performance with a view to achieving an optimal trade-off between time and space.
Recent gains have been particularly significant, with new techniques upsetting previous conjectures about optimal hash tables and providing a platform for further research. Tur…
Recent hash table development raises questions about an optimal solution.
Posted Dec 2 2025

Credit: Alberto Ruiz / Shutterstock
Hash tables are one of the oldest and simplest data structures for storing elements and supporting deletions and queries. Invented in 1953, they underly most computational systems. Yet despite their ubiquity, or perhaps because of it, computer scientists continually strive to improve their performance with a view to achieving an optimal trade-off between time and space.
Recent gains have been particularly significant, with new techniques upsetting previous conjectures about optimal hash tables and providing a platform for further research. Turning theory into practice has also piqued interest, although what works well in theory can be very different in practice.
Said William Kuszmaul, an assistant professor of computer science at Carnegie Mellon University, “The cool thing about hash tables is that they present many often-simple questions that have resisted analysis, some going back to the 1960s and 1970s, others more recent. Some of these questions have simple answers that are still waiting to be discovered, while others may require the development of substantial new techniques before we can solve them.”
The quest for an optimal and achievable trade-off between time and space started in the late 1950s, when IBM engineer W. Welsey Peterson published a paper in the IBM Journal of Research and Development that identified the conundrum of hash tables in their need to be both fast, so they can quickly retrieve information, and compact, using as little memory as possible.
In 1972, Jeffrey Ullman, Stanford W. Ascherman Professor of Engineering (Emeritus) at Stanford University, conjectured that uniform probing based on greedy open addressing hash tables—essentially those using a greedy algorithm that randomly picks a sequence of slots and as soon as one is found to be available, inserts the element—was optimal.
This conjecture remained open for more than a decade before it was proven in 1985 by Andrew Yao, then a professor at Stanford University, in a paper titled “Uniform Hashing is Optimal.” While Ullman based his conjecture on amortized expected time, which focuses on the average query time across all the elements in the hash table, Yao used the same conjecture but for worst-case expected time that guarantees how long a hash table will take to access a specific item.
Martín Farach-Colton, chair of the Department of Computer Science and Engineering at New York University Tandon, said, “For some time, computer scientists thought uniform probing was the best that could be done.” But a turning point was approaching as work continued on the development of hash tables, with papers such as “Iceberg Hashing: Optimizing Many Hash-Table Criteria at Once,” published in September 2021.
The Iceberg hashing paper returned to the dilemma of time and space, stating, “Hash tables continue to be the focus of a great deal of both theoretical and empirical research. A central reason for this is that many of the fundamental properties that one desires from a hash table are difficult to achieve simultaneously; thus, many variants offering different trade-offs have been proposed.”
Iceberg hashing provides a hash table that simultaneously offers the strongest known guarantees on a large number of core properties and supports constant-time operations while improving space efficiency, cache efficiency, and low failure probability.
While the Iceberg hashing paper was initially theoretical, a revision included a variation of the theory in practice. Said Farach-Colton, “Iceberg hash tables have strong mathematic properties and are faster in practice then previous tables.” He added, “On this basis, the real story started with Iceberg hash tables leading to tiny pointers.”
In November 2021, the authors of the Iceberg hashing paper published “Tiny Pointers,” a paper that would lead to new breakthroughs in the time and space problem. The paper introduced a new data structure object, the tiny pointer, which could be used to replace traditional pointers. A pointer stores the address of data in memory and acts as a reference to the location of the data. Talking about tiny pointers, Farach-Colton explained, “Pointers started out at eight bits. Then, as computers got bigger, pointers grew too; 16 bits were needed, then we went to 32 bits and 64 bits. Tiny pointers have a smaller memory footprint than traditional pointers.”
Recognizing a potential answer to the space problem of hash tables, Andrew Krapivin, then an undergraduate at Rutgers University (Krapivin has since spent a year at Cambridge University in the U.K. and is heading back to Carnegie Mellon to work with Kuszmaul) working with Farach-Colton as his research advisor, read the Tiny Pointers paper. He said, “I was not looking to establish a new theory, but was interested in implementing tiny pointers. When I did this, I realized there was a better way to implement hash tables. I was surprised by how big my work turned out to be. At first, I just thought I had some cute results.”
Farach-Colton thought the Tiny Pointers paper “tied everything up with a neat bow,” but checking out Krapivin’s ‘cute results’ and calling in Kuszmaul for a second opinion, it became clear that he was wrong and that Krapivin had discovered a new, better, hash table structure that disproved the conjectures of Yao, but not of Ullman, and promised faster searches than previous open addressed hash tables. The result is significant, although it should be noted that some other types of hash tables, such as Iceberg tables, that do not use open addressing are faster than those structured by Krapivin. It should also be noted that Tiny Pointers considered both the insertion and deletion of data elements, while Krapivin’s structure includes only insertions.
While Krapivin’s findings caused sensational headlines about how an undergraduate had overturned a computer science conjecture that had been upheld for many years, he and his colleagues take a lower-key view and talk more about progress towards the optimal time and space curve, and the potential for further discovery.
Working together, Farach-Colton, Krapivin, and Kuszmaul published a paper in January 2025 that was updated in February 2025 and titled “Optimal Bounds for Open Addressing Without Reordering.” The paper set out to revisit one of the simplest problems in data structures: the task of inserting elements into an open-addressed hash table so that elements can later be retrieved with as few probes as possible.
The paper delivered two significant results. Kuszmaul said the first, perhaps less significant but widely heralded result shows Yao’s conjecture is not true as there is a “greedy open-addressed hash table that doesn’t do reordering,” but does outperform uniform probing for worst-case expected time bounds. The result, he noted, “is a surprise and is also very simple. It uses what is ultimately a standard technique, but it uses it to get an unexpected result.”
The second and more significant result shows that using non-greedy hash tables makes it possible to build a hash table with perfect space efficiency and perfect amortized expected query time: the best of both worlds. The second result also gets a better worst-case expected time bound than any greedy solution can get. The authors are working on a follow-up paper that will probe Ullman’s conjecture.
“At a technical level, this result is super-cool; it takes a really different approach to building open-addressed hash tables than has ever been taken before,” said Kuszmaul. “With Andrew Krapivin’s results, we have optimal space-time trade-offs in two new hash table settings: greedy open addressing without reordering, and non-greedy open addressing without reordering.”
From a practical perspective, few hash tables of this type or those using uniform probing make the transition from theory to practice, and any practical solutions are usually a development of the hash table concept or built on hash table techniques. Alex Conway, an assistant professor of computer science at Cornell Tech who stands at the intersection of theoretical and practical problems, cites Mosaic Pages, a virtual memory system that uses techniques from the Tiny Pointers paper.
More often, however, there is no smooth transition from theory to practice. Commenting on Krapivin’s results, Conway said, “While theoretical results like this may not directly result in new state-of-the-art hash table designs, they allow us to understand hash tables and related problems at a fundamental level. This can result in indirect applications, such as with tiny pointers in Mosaic Pages. Theoretical results can also open the door to explaining some of the other long-standing open questions in hash tables.”
The technique used by Krapivin to produce the second result moved away from a greedy algorithm, which inserts new elements in the first available slot in a table and therefore slows down as the table is filled and less slots are available, with a non-greedy algorithm.
A non-greedy algorithm prioritizes which slots in the hash table should be filled. For example, a non-greedy algorithm inserting elements into a hash table may see slots still open on the left-hand side of the table and fill these before moving to the right-hand side. This initially sacrifices time, but results in a better structured hash table and improved time to fill further slots in the table.
Krapivin explained, “The non-greedy approach does not always choose the same order of looking in slots. This was counterintuitive because you might think, ‘why would I not put an item in the first place (that is available) I would look for it?’”
In terms of time, Krapivin added, “A problem with uniform probing is that most data elements are inserted really, really quickly, but some take an incredible amount of time, because essentially if the table fills up, you have no idea where to look. The idea of funnel hashing, the greedy result, is that you structure the table so you know where to look at the end, but there is a limit to how fast you can make the table. As the table fills up, there are a lot of ‘unlikely’ spots you have to check. The non-greedy approach essentially fills up the table and sets up the query sequence so that you look in ‘more likely’ spots first.”
While a breakthrough in the performance of hash tables, it should be noted that these results apply only to inserting elements into an open-addressed hash table and not to deleting or reordering the items. That said, they also provide a steppingstone to further investigation of optimal hash tables.
Kuszmaul is working on a paper with his students and colleagues on a setting including open addressing, no reordering, and with both insertions and deletions. He said, “With this setting it’s widely believed that uniform probing should do pretty well. It should do just as well as it did in the setting where there were no deletions. Results in progress: uniform probing actually does very badly, even if the hash table is only, say, 50% full. This is another example where a thing that seems obviously true just isn’t.”
Further Reading
Bender, M.A., Conway, A., Farach-Colton, M., et al. **Iceberg Hashing: Optimizing Many Hash-Table Criteria at Once, Cornell University, September 2021, **https://dl.acm.org/doi/10.1145/3625817
Bender, M.A., Conway, A., Farach-Colton, M., et al. **Tiny Pointers, Cornell University, November 2021, **https://arxiv.org/abs/2111.12800
Farach-Colton, M., Krapivin, A., and Kuszmaul, W. **Optimal Bounds for Open Addressing Without Reordering, Cornell University, January 2025, **https://arxiv.org/abs/2501.02305
Submit an Article to CACM
CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.
You Just Read
Speeding Up Hash Tables
View in the ACM Digital Library
© 2026 ACM 0001-0782/26/01
Shape the Future of Computing
ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.
Communications of the ACM (CACM) is now a fully Open Access publication.
By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.