Before I start this post I just wanna say that I am not writing this out of any spite or bitterness towards the results or angry about anything. I am completely proud of my entry and what I achieved during this LD (my first entry
), so don’t just dismiss this post as a bitter rant about my personal rankings.
But I do want to bring up the question of how fair the results are and if the way results are currently calculated is working as intended…
From what I can see and assume, the results are just calculated as an average of what you given in each category. So for example if you were rated by 10 people, your score for each category is tallied up and divided by 10, to give you a score for each category. Now this seems a bit broken for a competition like this, where each entry will get wildly different number of votes/ratings.
So in theory if a game was only rated by 1 person, but that person gave the game 5/5 for each category, then it would get a score of 5.0 for each category… This seems wrong to me and allows for wild fluctuation in the results depending on how many people played/rated your game.
Let’s take a look at some examples and see if what I am saying seems to make sense:
Looking at the top 25 list it doesnt take long before you can go down the list and find a game that scored *very* well, but on closer inspection seems to have only been played by a very few number of people…
Lonely Hated Rock – Came 5th overall in the competition and had scores of #8 in theme, #21 in fun and #30 in innovation #94 in mood and #128 in graphics. Very good scores all round! I’m impressed! That is, until I Look at the spreadsheet and find that this game was only played and rated 16 times…
Pocket Planet – Came 10th overall and joint 1st in fun! Well done… 20 people rated this entry.
Pale Blue Dot – 16th overall and #23 in the fun category… but was only played/rated by 20 people.
Necro Gaia – #16 overall, #11 for graphics #16 for fun and #50 for mood. This game actually ranked in the top 100 for 6 categories! Wow.. thats amazing, how many times was that game played/rated? 18!! (Whats worse… the author of this game had a coolness rating of 0… yep, according to the data, they never played/rated any other game.
Using the latest play/rate data from https://docs.google.com/spreadsheet/ccc?key=0Ao74NZQqNUt5dDNvZUJ1UXVqZGkxUGVlVkxlZ3JnM2c#gid=0
I am not bashing any of the entries in my example, merely using them to point out how having fewer people play your game seems to benefit your final result.
It would seem that using an average score to give you a final result is flawed in a competition such as this. Average scores and using the mean is a great metric but should only be used when you can guarantee that each entry will get the same number of data points. If this is not the case you need to look to alternatives and start doing more statistics to get a fairer results.
It really depends on your viewpoint, but as I have pointed out above, if simply using the average to calculate the final score. The competition and results are open for wild abuse and un-fair results.
Maybe somebody could compile a data spreadsheet showing the number of ratings a game got, and ordered by the overall ranking of the game… would be interesting to see how high-up some games can get, by having a very few number of people rate their entry…
EDIT : I put an example in one of my reply comments below but just bringing it up to the original post for more exposure.
Here is an extreme example highlighting the problem in a simple way that shows how the system doesnt work for the current dataset:
With the current system a game that is played and rated by 10 people, all giving the game 5/5 would score a 5.0 in the final rankings… but a game that is played by 100 people and rated 5/5 by 50 and 4/5 by 50 would score 4.75 in the final rankings… now just consider this example and ponder if this seems correct to you? Personally I think this highlights how the final rankings for the games doesnt correctly reflect the ratings a game got.