5 Stars of Ambiguity

Posted by (twitter: @csanyk)
September 12th, 2012 9:31 am

Lots of blog posts this morning about ratings.

I wrote this post last week when I got started with reviewing the LD24 entries.  Mostly this was advice for how to make YOUR game if you want me to rate it higher.  Partly it was advice on how to look at a game when giving it your rating.

I recently posted a comment in Oogby’s blog, where he was talking about his own feelings about rating LD games:

Ludum Dare ratings are highly subjective.  It shouldn’t matter because the weight of averages should (in theory anyway) correct for bad judges, even if everyone is a bad judge. [This works better in theory than in practice, but generally speaking if you really could get all 1400 of us to rate every game, it would work pretty well in practice.]

It’s not a perfect system and it’s not intended to be.  There’s almost nothing at stake beyond how much value you personally place on the rankings, so it’s not a big deal.  It’s a pragmatic solution to find a way to recognize the better efforts among the pack of entrants, and it works well generally although, sure if you wanted to quibble about the #7 game being inferior to the #8 game in some category, I’m sure that happens.  That’s why we all get to apply our own rating — we can rank however we want, secure in the knowledge that we know best.  The overall ranking scores merely tell us what the participants as a whole felt about the games they rated.

Not everyone rates every single game, most in fact do not, and it’s easy to overlook great games.  Most of the best games from LD23 I didn’t even find until after the rankings were published.

It’s best not to take rating (either rating someone else’s game, or how others rate your own game) too seriously.  They’re just opinions, after all.  You can use them to guide how you think about the objective quality of your game if you want, or you can discard them entirely, or rage at them, or anything you want to.

I think that’s worth keeping in mind.

When I was younger, I would have felt very strongly about the ratings meaning something objective and quantifiable.  I now look at this view as naive, and that the ratings are just a tool.  An imperfect tool, and something not to take as written in stone, or objective in any way.  They’re the aggregated consensus of our individually limited opinions.  We can be wrong or right about any particular factor that we base our ratings on.  We can have different tastes.  We can see things others miss, and miss things others see.

Nevertheless, I feel like when I apply a star-rating to a category for some game that I’m rating, I need to know what that rating means.  It may well mean something completely different to everyone else who applies their rating to the same game, but I need to know what I mean by it when I give that rating.  I expect there’s some similarity, as well, of course,  but probably quite a bit of variance.

It’d be controversial for me to say “you’re all rating wrong, here’s how to do it.”  I’m not saying that, at all.  I just want to convey my viewpoint and my thoughts, in order to share them.  And you may agree or not, and in any case whether you agree or not is less important and less interesting than whether you choose to engage me with your thoughts and allow me to be shaped in turn by what I think about them and how I choose to respond.  I’m always looking for new ways to think about things, hopefully to improve myself, but also to understand others.

So, then:  What the hell is a “star”?

Does “star” have an inherent Meaning?

Stars sound like good things.  We have five star movies, hotels, and generals, and by and large we thing highly of these things, or at least respect them.

But what about other symbols?  We don’t use them, but I could conceive of a strange rating system that would allow me to grant all the Lucky Charms symbols, not just stars.  Maybe a mix of stars and hearts, or maybe all the Zapf Dingbats.  I kindof want to have a few airplanes and boat anchors in my ratings.

That’s absurd, isn’t it?  These things are just tokens, they don’t stand for anything, they’re just a thing to be counted, is my point.  Sure, we could invent meaning:  the stars mean talent, the hearts mean passion, the boat anchors mean something else, I don’t know… but that’s not the point of how the rating system is set up, is it.

We keep it simple so that we can understand what a rating means quickly, at a glance.  If we wanted to take a lot of time to mull over a complex rating, we might as well take that time to experience the game directly.  Simple ratings scores sacrifices nuance and detail and precision, and that’s okay. It’s more than OK, it’s a feature.

We add compexity back in by having multiple ratings categories.  That way we can fairly assess specific aspects of our games.  But we don’t want to have so many categories that the ratings become once again too complicated.

Minimum rating?

Even a 1-star general is still a general.  We don’t think they’re a “bad” general.  They’re just a lower rank than a 2-star general.

But when we think of a 1-star hotel, restaurant, or movie, we think they’re to be avoided. A videogame is more like a movie than like a military officer, so it’s tempting to discard the notion that a 1-star rating could mean something is good.  Some people really shy away from using the 1 (or sometimes the 5, or both) star ratings, preferring to reserve these for the ultra rare games that “truly” deserve the highest or lowest rating.  Effectively, they constrain themselves to a 4- or 3-star system, then.  Which, when you realize that, is pretty silly.

Personally, I believe that unless you’re willing to use the full range of the rating system, you’re not using the system correctly.  That means, for me, I have no problem assigning 1 star to a game if I think it deserves 1 star.  I don’t take into account whether the the developer never programmed before, or if this was their first game — in fact, I expect those entries to be of inferior quality and to get a lower rating.

I hope this doesn’t discourage anyone from continuing to make games and try to do better each time.  We all start out sucking.  We enjoy what we do and we have some success at it, and we keep doing it, and we get better.  Low ranking shouldn’t be interpreted as “give up, find something else” — it should mean “keep trying, learn and get better.”  Of course, some people maybe will give up, and find other things to do.  But this is a decision that should come from within, not be influenced by what other people think.

I have a sense that some reviewers may be uncomfortable giving a “poor” rating to a game.  I’m probably a bit opposite of that, I am more stingy with my high ratings.  But I do use the full scale.  I expect most games to be fairly low.  After all, we only had 48 hours to develop them, how good can they really be?  (Surprisingly good, in some cases, and these are the ones that get 4 and 5 stars.)  But when I apply ratings, I rate the games as games — not as games that were developed in 48 hours.  My thesis is that a game developed in just 48 hour can be awesome.  There have been a few that I have enjoyed every bit as much as I’ve enjoyed the best games of all time.  That’s really amazing when you think about it, but it’s true.

At its core, a 5-unit ranking system is just a 5-unit ranking system.  How we choose to interpret the numbers can vary. Consider the following 5-value series:

  • 1 2 3 4 5
  • -2 -1 0 1 2
  • F D C B A

They mean different things, don’t they?

Well, I tend to look at the rating system as 1, 2, 3, 4, 5.

A person who tends to avoid giving out one-star ratings probably interprets the system as -2 -1 0 1 2.  To them, giving a -2 feels negative, and they don’t want to be negative and discourage someone who put their heart into their project, and it was the best they could do, just not very good.  So they subconsciously weight the rating “relative to what I think the developer’s ability must have been”.  And unfortunately, since we are comparing the games against each other, this only skews the rankings.

Someone looking it like the american primary school letter grade system has yet a different way of looking at it:  A = 100-90% B = 89-80% C = 79-70% D = 69-60% F <= 59%.  Or perhaps they “grade on a curve” and try to adjust the ratings they give the game relative to the ratings they’ve already given to other games.  They want to “normalize” the numbers so that the “average” score is a C, and there’s a standard distribution of the other letter grades.

My point is that everyone can make some claim of their way of thinking about the 5-tiered rating in one of these ways, or perhaps yet another way, and be at least somewhat justified.

I would, however, strongly advise against trying to “grade on a curve” because it’s impossible to know what the curve should be until you’ve assessed every single game.

Anyhow, based on my way of looking at it, you shouldn’t feel bad if you get a 1 or 2 rating.  That’s still better than zero, right?

Zero Ratings

In some sense, 0 might be the lowest possible rating.  But that’s not really true.  0 really means “not applicable”.

It’s hard to say someone did a bad job at [category] if they weren’t trying for that category.

But it’s still ambiguous.  A zero could also mean “I don’t know” or “I forgot to rate this category.”  So I don’t like to encourage thinking of 0 as “lowest”.  I don’t know how they do the math when the calculate the rankings, but I hope that 0’s don’t get counted against a game.

My 5-scale

Here’s how I do it:

0 = not applicable.  You didn’t try to do anything for this category, and it wouldn’t be fair to rate the game poorly based on it not having [category].

1  = Well, you did the minimum.  At least there’s something.  OR, the game sorely lacks [category] and needs it.  It’s one thing if you make a silent game because you are using silence as an element of the game.  It’s another if you made a game that really should have sounds, but doesn’t, because you don’t have the ability to fit delivery of the sound features in the time allotted.  But probably all you managed here was a basic implementation, maybe some kind of placeholder content or “hello world” level implementation.  Maybe it’s buggy, or maybe it’s just an idea that didn’t pan out, or maybe it’s something that had potential but needed lots more polish and balancing to make it work well.

2 = It needs something more to feel “finished” or “good”, but it’s more than just “placeholder content.”  For a 48 hour project it’s enough to get by, and is probably all the average hobbyist developer can reasonably manage in that amount of time if they’re not a professional [programmer|designer|artist|musician|sound engineer|whatever].  Definitely don’t feel bad if you get a 2-star rating from me!

3 = Pretty solid, it’s evident that time was put into it, and  you have some idea what  you’re doing, and are talented or have a knack for good taste or good decision making. There may be flaws, and some additional polish would probably help, but as is, this is pretty good.

4 = Well conceived and well executed. Probably well balanced and consistently high quality across the whole of the game, too.  Genuinely fun.  Starting to not feel like it’s experimental at this point.

5 = Amazing!  Your game stands up well against anything I’ve ever played.  Professional quality.  If I’m rating graphics, this does NOT mean Crysis.  Asteroids has 5-star graphics, too.  It means that your graphical style works very well and as a cohesive whole makes your game look awesome.  There are MANY ways to look awesome.  Just as there are many ways to excel at all of the other categories.

5 Responses to “5 Stars of Ambiguity”

  1. AaronYip says:

    A couple comments caught my eye.

    “When I was younger I would have felt very strongly about the ratings meaning something objective and quantifiable. I now look at this view as naive, and that the ratings are just a tool. An imperfect tool, and something not to take as written in stone, or objective in any way.”

    As medium-makers, game developers are servants. They can be an Emily Dickinson and only privately serve their own muses and friends. Or like chefs, they can serve masses. The ratings, while very much just opinions, reflect the /immediate perception/ of a game’s players. Numbers mean nothing to a modernist, who expects his audience to “get” his painting or find their own experiences in his work–and so, he may be considered the greater artist. But for those honing their craft towards the other goal–actively acting to bridge the gap between their art and their audience’s understanding of it–ratings mean something more tangible. They become an evaluation of proper affordances and feedback; they represent accessibility as well as subjective quality. Agreed, ratings are not an evaluation of a game. It’s the measure of whether any player can immediately understand, appreciate, and enjoy the experience–and for some, that’s the more important tool.

    “We all start out sucking.”

    I’ve followed the next part of this quote: “And the day we forget that we suck is the day we never improve again.” Sucking is relative and perspective. Instead of being afraid of the notion, a person should realize that “sucking” is merely the measure of how much potential we have to improve. 😛

    • csanyk says:

      AaronYip, I like the way you’re thinking here. To quibble just a bit, I don’t think I ever said that “ratings are not an evaluation of a game.” I don’t know how else to take a rating BUT that. How would I rate a game without evaluating it? I try to evaluate every game that I rate carefully, consistently, and completely, to the extent that is humanly possible. What I /would/ reject is a notion that ratings are absolute, that they are objective, or that they say everything that can be said about a game.

    • johnfn says:

      This comment – specifically the second paragraph of this comment – is amazing.

  2. mdkess says:

    From my post, I didn’t mean to suggest that there was a ton at stake or that the ratings system was so flawed that it was broken. I mean, ultimately LD is just about creating something cool – the ratings seem like more of a way to get people to play games than anything, and create a sense of community. But that doesn’t mean that they can’t be improved. I think your point

    “I would, however, strongly advise trying to “grade on a curve” because it’s impossible to know what the curve should be until you’ve assessed every single game.”

    really touched at what I was trying to get at – my ratings scale changed based on how many games I had played. My thought, perhaps poorly communicated on my part, is that it could be better normalized by offering a vague guideline to what the ratings mean, which I think would offer more consistent ratings at the end. It looks like some games will end the contest with 20 ratings or less – at which point a few bad reviewers could really skew ratings. Again, not that there’s really anything at stake, but I think it’s worth exploring.

    • csanyk says:

      mdkess, I didn’t necessarily direct my post at your or anyone else’s post; I was just throwing my own thoughts out there. I pretty much agree with what you say here. The purpose of ratings is multi-fold: it gets us to play more games, it helps us to find the games that are worth trying out, it gives us a sense of how well we did, it gives us incentive to try to do better, it encourages us to give each other feedback and to think about what we’re doing. It does a lot of things. I think we’d both agree that if you simply look at it as a means to the end of coming up with rankings and picking “winners”, it misses out on a lot of the value that the ratings system delivers.

Leave a Reply

You must be logged in to post a comment.

[cache: storing page]