Some Statistical Analysis on the Compo Results

Posted by (twitter: @the_vrld)
September 20th, 2013 3:26 am

… or: A Guide on How To Score High Next Time.*

Recently, ashdnazg posted a handy CSV containing all Compo and Jam entries that received a rating. Being somewhat of a data-nerd, I decided to take a deeper look at the results. In particular, I was interested in one question:

“Do ratings in one category influence the ratings in another category, and if yes, how much?”

One way to answer this question is plotting the ratings in category A against the ratings in category B using a scatter plot. If A depends on B (or vice versa), we will roughly see a ‘line’. The line will become more ‘blurred’  the less dependent the categories are.

You can also put a number on the dependencen using MATHS. More specifically: Pearson’s correlation coefficient. The correlation coefficient measures the correlation (dependence) between two variables. As result, it will give you a value between -1 and 1, where -1 and 1 translate to “highly correlated” and 0 means “no correlation”.

Before jumping to conclusions, it’s time for a disclaimer: 

“Correlation does not imply causality” – i.e. high correlation of ‘Theme’ and ‘Mood’ can mean that a high rating in ‘Theme’ will lead to a high rating in ‘Mood’ or the reverse. Or it can mean there is no causality at all.


Anyway, here are the results of my analysis (correlation coefficient is shown in the lower right of the plots):

“Overall” Rating

As suspected, the overall rating is highly correlated with fun:compo-Overall-Fun

Surprisingly, it is less correlated with good graphics or innovation:

Here are the remaining plots. As you can see, the overall rating is almost independent on the humor rating, but surprisingly very correlated with the mood.

compo-Overall-Mood Audio and Theme have a moderate correlation with the overall rating.

There are two things we can take from this:

1. If you want a high overall rating, make you game (a) fun and (b) pretty. If you have time, try to evoke some intense feelings (mood).

But perhaps more important:

2. The Fun and Overall rating could be turned into one combined rating (“overall”).



This post is too short to show or even analyse all the correlations of each category with each other category. However, I created a zip containing all the graphs (including jam results) so you can draw your own conlusions. You can download the file here.


* Not really though. See the disclaimer.

9 Responses to “Some Statistical Analysis on the Compo Results”

  1. Nice work! These all make sense as far as I can see. Gameplay is the number one thing that should be valued in a game, good to see that others seem to feel that way. And as you mentioned, mood correlates to emotion, and nothing is better at evoking positive feedback from humans than emotion.

  2. caranha says:

    That is a very nice set of graphics you made there! Maybe you could try making correlations between other pairs of variables?

    As for mood, another reason for the high correlation might be that people are just not very sure how to rate mood, and just treat it as a proxy for overall. It would be interesting to see how it correlates to other variables.

  3. @caranha
    Well yeah, a lot of times I found that I didn’t know how to rate mood because the game didn’t give me any feelings whatsoever (which is perfectly fine). In those cases I just left the rating out (I did the same for humour on almost all games).

  4. ashdnazg says:

    It’s nice to see statistics are consistent!

    I’m not certain Fun and Overall can be combined, as there are those few games which are not fun at all and rely on mood. And others like my own have the fun score way higher than overall (17 vs. 120 IIRC).
    I definitely didn’t deserve a top 20 Overall, but I think my game is ludicrously fun.

    • vrld says:

      Seems as though statistics really works 😉

      Good point about not dropping the Fun category. I am still worried about the high correlation though. Maybe it could help to rename “Overall” to “General Impression”?

      • ashdnazg says:

        You look at it from a scientific point of view, where it’d be reasonable categories should be “tangent” to each other.

        However, by definition “Overall” is dependent on each of the other categories and it’s no surprise some categories will have a greater weight.

        People aren’t computers, and criticism isn’t math, you can’t make a function that gets the separate categories and outputs the overall. (I did try to make one, it wasn’t as accurate as I’d have wished)
        That’s why you need an Overall category even though it’s highly correlated with fun/mood/whatever.

  5. Codexus says:

    I think it’s a problem that people are not taking production values enough into consideration when voting for the Overall category. It makes me sad when I see games that obviously took a lot of work and skills to make beaten by simplistic games.

    • caranha says:

      I think there might be a bit of cognitive bias at work here: If you rate a game low on graphics and sound, but high on fun and innovation (and I know I’ve seen my fair share of those), it might feel “wrong” to give it a low overall score, even if it would be the right thing to do.

Leave a Reply

You must be logged in to post a comment.

[cache: storing page]