Why There’s No Single Best Way To Store Information

Just as there’s no single best way to organize your bookshelf, there’s no one-size-fits-all solution to storing information.

Consider the simple situation where you create a new digital file. Your computer needs to rapidly find a place to put it. If you later want to delete it, the machine must quickly find the right bits to erase. Researchers aim to design storage systems, called data structures, that balance the amount of time it takes to add data, the time it takes to later remove it, and the total amount of memory the system needs.

To get a feel for these challenges, imagine you keep all your books in a row on one long shelf. If they’re organized alphabetically, you can quickly pick out any book. But whenever you acquire a new book, it’ll take time to find its proper spot. Conversely, if you place books wherever there’s space, you’ll save time now, but they’ll be hard to find later. This trade-off between insertion time and retrieval time might not be a problem for a single-shelf library, but you can see how it could get cumbersome with thousands of books.

Instead of a shelf, you could set up 26 alphabetically labeled bins and assign books to bins based on the first letter of the author’s last name. Whenever you get a new book, you can instantly tell which bin it goes in, and whenever you want to retrieve a book, you will immediately know where to look. In certain situations, both insertion and removal can be a lot faster than they would be if you stored items on one long shelf.

Of course, this bin system comes with its own problems. Retrieving books is only instantaneous if you have one book per bin; otherwise, you’ll have to root around to find the right one. In an extreme scenario where all your books are by Asimov, Atwood, and Austen, you’re back to the problem of one long shelf, plus you’ll have a bunch of empty bins cluttering up your living room.

Computer scientists often study data structures called hash tables that resemble more sophisticated versions of this simple bin system. Hash tables calculate a storage address for each item from a known property of that item, called the key. In our example, the key for each book is the first letter of the author’s last name. But that simple key makes it likely that some bins will be much fuller than others. (Few authors writing in English have a last name that starts with X, for example.) A better approach is to start with the author’s full name, replace each letter in the name with the number corresponding to its position in the alphabet, add up all these numbers, and divide the sum by 26. The remainder is some number between zero and 25. Use that number to assign the book to a bin.

Loading more...