Introducing SteamDB's new rating algorithm · SteamDB

This guest blog post is written by /u/tornmandate, whom we noticed had developed an interesting algorithm. Many thanks to Torn for coming up with this method and writing this blog post.

Introduction

SteamDB informed me they liked my formula, and it seems that many users would agree it produces better results than Wilson’s score interval’s lower bound. So this is a short write-up on how I came up with it, and why it’s probably a better estimate of the score.

This guest blog post is written by /u/tornmandate, whom we noticed had developed an interesting algorithm. Many thanks to Torn for coming up with this method and writing this blog post.

Introduction

If you’ve ever examined the Steam store more closely, you’ve probably noticed that their method of sorting games by review score is inadequate. They just divide the positive reviews by the total reviews to get the rating. A game with a single positive review would be ranked above some other game that has 48 positive reviews and a single negative review. While they do have thresholds at 50 and 500 total reviews—meaning no game with at least 80% positive reviews will rank below a game with fewer than 50 reviews, and no game with at least 95% positive reviews will rank below a game with fewer than 500 reviews—it’s still an inadequate system that leads to volatile rankings based on small sample sizes.

I looked over some alternatives, including Wilson’s formula that SteamDB used, but I didn’t quite like any of them. Some sites appeared to have effective sorting formulas, but the exact nature of those formulas was not disclosed. So, I sat down to create my own, starting with trying to word the rules by which Steam’s games should be sorted.

Core Principles

Games with more reviews should have an adjusted rating that is closer to their real rating, because the more reviews we have, the more certain we are that the score they give us is correct.
All ratings should be biased toward the neutral midpoint — 50%.

And that’s all, really. It just needs to be a little more precise: "For every 10x the reviews we have, we should be 2x more certain that the rating is correct." Almost good, but I can’t quite figure out what "2x more certain" means. But it sounds like it should be equivalent to "2x less uncertain", and I can work with that.

Clearly at 0 reviews we’re 100% uncertain as to what the rating should be. Let’s stretch that a bit and say that we’re 100% uncertain at even just 1 review, then we can apply the earlier thought. So at 10 reviews we should be 2x less uncertain, that is 50% uncertain. At 100 reviews, 25%. 1000 reviews, 12.5%, and so forth.

So given a game with 100 reviews of which 90% are positive, we’re 25% uncertain that this 90% is the correct rating. So we’re 75% certain that it is the correct rating. In other words, 75% of the final rating is 90%, and the other 25% is the average rating of 50%, which also nicely fits with our second rule. This gives us a final rating of 75% × 90% + 25% × 50% = 80%. This looks good, and these rules can be translated into a formula that gives us an adjusted rating with respect to the number of reviews and the "real rating".

The Formula

TotalReviews = PositiveReviews + NegativeReviews ReviewScore = PositiveReviews TotalReviews Rating = ReviewScore − ( ReviewScore − 0.5 ) * 2 − log 10 ( TotalReviews + 1 )

Why This Works

Compared to Wilson’s formula, it’s very short, and it’s not nearly as difficult to understand the idea behind how it works. One could easily get the question "Why would this random formula give better results than what a very established mathematician came up with?" I can’t really answer that, but it would seem many of you agree that it does indeed produce better results when rating Steam’s games. I can, however, try to give some insight into this.

For one, Wilson’s formula isn’t really meant to be used quite like this. While it does output a confidence interval stating "We are X% sure that the score is between x and y," and while you could reasonably take the middle of that interval instead of the lower bound, the real limitation runs deeper. Wilson’s formula has no mechanism to incorporate prior expectations about what the baseline score should be. It cannot account for scenarios where you have strong prior beliefs like "without any data, I expect the score should be around 50%". This absence of prior incorporation is what creates the intuitive discrepancies we feel when using it for rating systems.

Secondly, this lack of baseline bias leads to problematic behavior. Something that just came out and gets a single negative review will be scored much lower than an established terrible game with 10 positive and 500 negative reviews, even though our intuition tells us that a single data point shouldn’t be weighted so heavily. This is why one of the two rules I listed was that all ratings should be biased toward the neutral midpoint.

Finally, while Wilson’s formula probably gives us a more "precise" rating, so to say, it’s not necessarily what we want to see. There’s a lot of mathematics behind why what it does is correct, while the previously mentioned numbers of 2 and 10 that I picked for my formula were rather arbitrary. Still, I selected them so that the result would also account for the high number of reviews when assigning a good score. It’s why you’ll probably notice far fewer games with a low review count among the top games than before.

I think that’s important because a game that is very popular and very highly rated should be ranked higher than a game that isn’t as popular and is also very highly rated. Not because we can be more certain that this rating is correct, but because you, as someone who hasn’t tried that game, will more likely enjoy it if many other people have as well—assuming it’s not a niche game. And I think this aspect is definitely important and should be accounted for when trying to represent an entire game with just a single number.

Implementation

The SteamDB site now uses this new algorithm on all of its pages. You can find it on top rated games, and as a new addition, the browser extension now displays SteamDB ratings on Steam Store game pages. Games with fewer than 500 votes in total will show an ❓ unverified icon to signal that there’s not enough certainty in the rating. You can also find reference implementations in PHP and JavaScript below.

Important note: SteamDB rating is calculated from all purchase-type reviews, whereas Steam only uses Steam purchases and the rating may be calculated just from reviews in your language if there are enough reviews.

LaTeX

( Total Reviews = Positive Reviews + Negative Reviews )
( Review Score = \frac{Positive Reviews}{Total Reviews} )
( Rating = Review Score - (Review Score - 0.5)*2^{-log_{10}(Total Reviews + 1)} )

PHP

function GetRating( int $positiveVotes, int $negativeVotes ): float {
$totalVotes = $positiveVotes + $negativeVotes;
$average = $positiveVotes / $totalVotes;
$score = $average - ( $average - 0.5 ) * 2 ** -log10( $totalVotes + 1 );

return $score * 100;
}

JavaScript/TypeScript

function GetRating( positiveVotes: number, negativeVotes: number ): number {
const totalVotes = positiveVotes + negativeVotes;
const average = positiveVotes / totalVotes;
const score = average - ( average - 0.5 ) * Math.pow( 2, -Math.log10( totalVotes + 1 ) );

return score * 100;
}

This post and formula provided above are dedicated to the public domain. Reference implementations are available under the MIT license. Feel free to use it anywhere for any purpose.

Introduction

Introduction

Core Principles

The Formula

Why This Works

Implementation

LaTeX

PHP

JavaScript/TypeScript

Similar Posts