3 min readJust now
–
On my walk home from work, I had a strange idea: what if I could use AI to see how different models “think” when judging writing quality? To find out, I decided to compare two models side by side — and see how each one ranks Medium articles.
Press enter or click to view image in full size
Photo by Marcellin Steinhaus on Unsplash
If you are not a medium member, click here to read.
I started by asking ChatGPT for a base script. My hands were freezing, so I dictated the whole thing using its voice-to-text featu…
3 min readJust now
–
On my walk home from work, I had a strange idea: what if I could use AI to see how different models “think” when judging writing quality? To find out, I decided to compare two models side by side — and see how each one ranks Medium articles.
Press enter or click to view image in full size
Photo by Marcellin Steinhaus on Unsplash
If you are not a medium member, click here to read.
I started by asking ChatGPT for a base script. My hands were freezing, so I dictated the whole thing using its voice-to-text feature. By the time I got home, only one task remained: refine the code, test it, and get it working.
The latest script was successfully scraping Medium titles, sending them to two models (GPT-4o-mini and GPT-4o) and comparing how each ranks them. It’s a small experiment, but it reveals how subtle the differences can be between “mini” and “full” models when judging creative content. To keep things short, here’s the link to the full code. And here’s full logs.
gpt-4o-mini’s rankings: