Disclaimers: This post contains several simplifications to help explain the core database concepts. Specifically, it doesn’t cover the distributed systems required to handle big data. Only read footnotes if you want to dive deeper🤿.
You have data. The best kind of data, the kind that fits nicely into tables: structured data. Your precious users gave it to you so you can provide them value. What should you do with it? How should you store it?
The Great Librarian of Alexandria
A pile of scrolls, courtesy of Nano Banana
Third century BC, you are chosen to be the first Chief Librarian of the Great Library of Alexandria. You have scrolls upon scrolls of knowledge, and your task is to organize them so that visitors can find the information they need quickly.
You could organize the scrolls by topic, author, or date. How do you choose? How do you deal with new scrolls arriving every day?
Well, you do have a lot of experience with libraries and you know that people usually look for scrolls by author. So you decide to sort the scrolls alphabetically by author name. Then you put all the authors starting with ‘A’ on one shelf, ‘B’ in another shelf, and so on.
As you don’t want to move scrolls around too much, you leave some empty space in each shelf to accommodate new scrolls. You also create a map at the entrance of the library that tells visitors which shelf to go to for each author.
It’s great! It’s working well. Visitors can find scrolls quickly, and new scrolls can be added easily (most of the time).
But then, one day, a new type of visitor arrives: Statisticians. They are asking weird questions: How many scrolls arrived last month? How many scrolls from authors whose names start with a ‘C’? How many lines of text per scroll on average?
Your current organization is not optimized for these types of queries. You have to go through each shelf and count the scrolls manually. It takes a lot of time and effort.
Databases are magic
Relational Databases (PostgreSQL, MySQL, SQLite, etc.) are like your library but on steroids. They use disk space instead of shelves. To keep things well-ordered, they will ask you to define tables and columns. For your scrolls, you could create a ‘Scroll’ table with the following columns: ‘author’, ‘title’, ‘content’ and ‘creation_date’.