I wanted to analyze baby names. Oops, I built a whole game

11 min readJust now

–

Tl;dr: I built a daily name quiz based on ~150 years of US data, and you can play it here.

It’s no secret that I spend a lot of time thinking about names. When I first went through my (admittedly overwrought) naming process, I cut corners and only looked at 15 sample years of historical data rather than the full run back to 1880. But my programming and SQL skills have improved since my kids were born, so I went back and did a full data cleanup to see what I’d been missing.

I downloaded the raw .txt files from the Social Security Administration and wrote a script to combine them into one giant file. As a bonus, I also downloaded the state-b…

11 min readJust now

–

Tl;dr: I built a daily name quiz based on ~150 years of US data, and you can play it here.

I downloaded the raw .txt files from the Social Security Administration and wrote a script to combine them into one giant file. As a bonus, I also downloaded the state-by-state files, even though those records are patchy compared to the national data. I uploaded the combined CSVs into Hex and was ready to start analyzing. This was the motherlode! I immediately started whipping up dashboards to answer the questions I’d always wondered about.

One thing I wanted to dig into was gender-neutral names. I already knew girls’ names were more diverse than boys’, but now I could prove it.

Press enter or click to view image in full size

I also explored how neutral gender-neutral names actually were.

Press enter or click to view image in full size

I found 15 unfortunate girls who’d been given overwhelmingly male names like Emiliano, Rocco, or Rodrigo. On the flip side, 17 boys wound up with overwhelmingly female names like Brielle, Lyla, or Valentina.

This was just one small example, but I knew in my bones there were weird and wonderful naming treasures buried in that dataset. As you might imagine, it’s a hard sell to convince friends and family to read through your recreational SQL dashboards. I needed a format people actually wanted to interact with. A way to make it fun.

I decided to make a trivia game to bamboozle people into caring about names even half as much as I do.

🧩 Exploring Categories

My first challenge was figuring out the rules. Since I was generating all the questions manually, I needed to find a way to reduce my workload. That nudged me toward a Wordle-style, once-a-day game. I needed to decide what questions to ask, so I crunched numbers to find a handful of interesting themes. I narrowed it down to five categories that consistently surprised me.

Modern Popularity: Which name was more popular in 2024?
Gender Split: Was Rory more common for boys or for girls in 2024?
Decade: In which decade was the name Mortimer most popular?
Spelling: Which spelling of this name is most common?
Throwback Popularity: Which name was most popular in [random year from 1880–1990]?

Once I had my themes, it was time to generate questions. For every category except Spelling, I wrote Python scripts to generate the questions. The scripts used pre-filtered name lists, selected random options and generated both correct and incorrect values. Finally, the scripts shuffled the answers and spit out everything as JSON the game could use.

When it came to spelling, things were a bit harder. I knew grouping names manually was impossible, even for a small number of questions. Instead, I turned to Python’s Pronouncing and Jellyfish libraries, which are built for phonetic matching. The goal was to give every name a pronunciation value, which let me group all the spelling variants in SQL.

These libraries are great, but it turns out that name pronunciation is hard. After a bunch of fiddling and testing, I settled on a two-part solution. First, I took each name and used the safe_phones function to convert it into phonetic symbols. But the results were too precise, even for names I knew belonged together.

So I took those phones and normalized them to strip out minor differences, like whether Arianna begins with more of an Ah sound or an Air sound. The downside of this broader grouping is that names I consider different, like Eric and Ericka or Anne and Anna, can end up in the same bucket. Ultimately, I combined both metrics to get plausible groupings, then exported them for manual review. It’s not perfect, but it’s good enough to confidently pick the most common spelling of the three options included in my quiz.

🎨 Design

With the questions finalized, the next step was turning them into a game. I knew from the start I needed to keep things lightweight. I designed everything to run off static, cached files for when the game blew up to millions of daily players 😉. I avoided logins or stored PII while also prioritizing design flourishes to ensure it felt like a game rather than a test. I decided to start with vanilla HTML, CSS, and JavaScript to see how far that could take me. I stored game progress and user IDs in LocalStorage, giving me a primitive game state without needing a database.

From early on, I had a vision of a 5x3 grid of questions and answers. My typical design approach is to build something mediocre, then enlist my friend and former design partner to make it less terrible for me. But this time I wanted to do it myself. I bit the bullet, signed up for a Figma account, and started drawing. I iterated through several versions that mostly worked and bounced a few questions off some design tools for feedback.

Press enter or click to view image in full size

Once I had the basic layout, I started working on the flourishes. I’d never worked with complex CSS animations, but I found plenty of examples that showed me how to make elements flip and dance around. Revealing different faces of an element caused a few odd bugs, including one where the entire layout appeared mirrored.

People love dark mode, so I also spent time tweaking colors to give the game a more polished look.

Next, I ported the raw HTML/CSS/JS into a basic Rails app and registered a domain. Finally, I shared the game with some friends to get feedback! But even before the feature requests came rolling in, I knew there was more I wanted to do.

🏅 User Performance

One issue I skipped in the design phase was a user feedback mechanism. Some of these questions are quite difficult, but there’s no way for a player to know that. I was reluctant to introduce a leaderboard, since that invites cheating and requires user accounts.

Some early testers suggested a progressive difficulty increase, like how the NY Times Crossword starts easy on Mondays and ramps up difficulty throughout the week. I considered manually assigning difficulties, but it’s incredibly subjective. As I’ve been playing along, I’ve been surprised at how often I’m wrong myself. If you know people with less common names or spellings, it can really skew how you think about a name.

Instead of trusting my gut, I decided to show how other players fared on each question. Since I was already logging basic data upon game completion (user_id, games played, etc.), I simply added per-question tracking. This allowed me to report exactly what percentage of users answered a specific question correctly. I initially hacked together a solution with Redis and an hourly cron job, but eventually caved and spun up a Postgres database when I realized I needed it for other features. I began generating a static JSON file every hour, which the front end uses to show players how they stack up.

Press enter or click to view image in full size

One wrinkle I had to deal with was time zones. A new game begins at midnight in a user’s local time, meaning two games are always running at the same time. IN an abundance of caution, I configured the cron job to generate one file for yesterday, one for today, and one for tomorrow each time it runs. This ensures early risers in New Zealand and late-night players in Hawaii always see the right data.

🤝 Social Sharing & Gamification

Most of my projects are meant for only one user: me. But I spent so much time on this project that I wanted to reach as many players as possible. So I started brainstorming ways to make the game easy and fun to share.

A quick and easy feature was tracking streaks in LocalStorage to incentivize daily play. It’s easy for a user to tweak their settings and game the system, but with no prizes at stake, I wasn’t too worried. That said, I’m tempted to add a feature where an impossibly high streak triggers brutally difficult questions.

Streaks could help retain users, but I also needed a way to capture new ones. Another idea I borrowed from Wordle was the social sharing format. I wanted to replicate that grid-based format that instantly shows your score without needing any text.

Press enter or click to view image in full size

I considered a version that showed both correct and incorrect answers in a grid. But unlike Wordle, that would give away the answer key to anyone who hasn’t played yet. Since I wasn’t interested in randomizing the options for every player, I scrapped the idea.

Instead, I started playing around with emojis. I picked visual representations for each of my five categories and used ✅ and ❌ to represent the results. This brought me closer to the Wordle approach without spoiling the fun.

After some testing, I hit a snag: emojis render differently on every OS, and multi-line text sharing is notoriously flaky on mobile. Instead, I shifted to the idea of generating *images *that could be shared on social media.

I realized that with five questions and two possible outcomes (correct/incorrect), there are only 32 possible game states (2⁵). I could represent every possible result as a simple binary number.

All correct: 11111 → 31
All incorrect: 00000 → 0
Only the 3rd question correct: 00100 → 4
Alternating: 10101 → 21

If a million users played, I certainly didn’t want to generate a million unique images. Instead, I built a system that only generates an image when it’s needed. I ended up breaking the whole thing into three parts:

1. I built a small HTML/CSS page that knows how to show the right emojis for a given score. The controller just passes along an integer (like 21), and the page translates that into the 10101 pattern.

Press enter or click to view image in full size

2. I used Puppeteer to load that page, take a screenshot, and save the result as a PNG.

3. When someone shares their score, the app checks whether a PNG for that day’s result already exists. If not, it generates one and stores it. If it does exist, we simply reuse it. Then we point the user to a share URL (/sharing/2025–12–10/6) where that PNG is set as the og:image, so platforms know what to show.

When everything works, sharing the URL on LinkedIn or Facebook pulls in the right image and unfurls it cleanly.

Finally, anyone clicking through the link will play the game normally. But since that URL parameter (like 6) encodes the friend’s results, the game stores it. When the new player finishes, we display both scores side-by-side for a direct comparison.

I also added fireworks to the sharing button for a little extra razzle-dazzle.

🕵️ Obfuscation

Everything was coming together, but I still felt uneasy about leaving my game data out in the open. You might be surprised to learn that the answer to today’s Wordle is just sitting there in your browser’s network requests.

Press enter or click to view image in full size

I realize the stakes here are effectively zero, but I still wanted to make potential cheaters work slightly harder than just opening the network tab. Since the game requests the daily questions, I decided to hide the payload a bit. I authored my game content in JSON, then wrote a rake task to compile that file into a binary blob before it ever hits the frontend. I decided not to obscure the stats, since they don’t actually give away the answers.

Press enter or click to view image in full size

I used the same idea for the share URLs. I didn’t want users realizing that 31 always meant a perfect score, because that would let them spoof their results every day. I encoded the date and score into a short token and append a tiny HMAC signature, producing a slug like frku-ddc6y that I use for both sharing and tracking.

Could someone reverse-engineer it? Sure. But if they’re going to those lengths for a name quiz, I’m fine to let them win that one.

🚧 Next Steps

Overall, I’m proud of how this project evolved from a few silly SQL queries into a real product. And I’m not quite done tinkering. My next major goal is to fully automate the content pipeline. I want to bulk-upload hundreds of questions and have a script stitch them into daily game files. That way the game runs itself and I don’t have to scramble to create new game files every few weeks.

Even when this project is 100% complete, I won’t be done with the data. The Social Security Administration releases the 2025 name stats in a few months, and I’m eager to pull in the new data and see how the leaderboards shift. And I’ve got plenty of other name-related research I plan to share soon.

In the meantime, I hope you enjoy my game! And if you have any burning questions that can only be answered by parsing ~150 years of name data, let me know and I’d be happy to dive in.

🧩 Exploring Categories

🎨 Design

🏅 User Performance

🤝 Social Sharing & Gamification

🕵️ Obfuscation

🚧 Next Steps

Similar Posts