Ratings categories

Posted by (twitter: @csanyk)
December 30th, 2014 12:16 pm

For the longest time, we’ve had the following categories:

  1. Overall
  2. Innovation
  3. Fun
  4. Theme
  5. Graphics
  6. Audio
  7. Humor
  8. Mood

For LD31, I noticed that if we wanted to remove our game submission from a rating category, we had the option to disable it.  I didn’t see the point, myself, but I suppose if your game really wasn’t trying for one of the categories, there’s no harm in recusing yourself.

I would like to see another category added for future LD events:  Controls.  I think controls are a critical element to game design, since they are what makes a game interactive, and thus, a game.  Not having its own category is an oversight that should have been corrected a long time ago.

<3 this post if you would like to see a Controls rating category added to voting!

Tags: , , ,


38 Responses to “Ratings categories”

  1. pht59x says:

    “there’s no harm in recusing yourself”

    all this is far too complicated and is likely to create biases in the rating.

  2. csanyk says:

    I’m not sure how removing yourself from ratings in categories you don’t intend to compete in would create bias. Can you explain that?

  3. pht59x says:

    Assume Bethoven as a game-maker, ie his music is good but his gameplay is crap.

    If he recuses himself from gameplay, won’t he then have an absolutely wonderfull entry?

    • csanyk says:

      No; he will just get rated in the categories he chooses to enter. Ratings in any one category do not affect ratings in any other category.

      I’m still not seeing how opting out of a ratings category hurts the system.

      • pht59x says:

        1) Won’t you then end up comparing games for the blind, for the color-blinded, for the deaf… ?

        What does the rating/ranking of such different games mean in such circumstances?

        2) [Somethin different] I don’t think an “overall” category is necessary: the overall mark should result from the fair marking of well defined/behaved sub-categories. In a maths essay, there are points granted for algebra, some points for geometry: the overal mark is the sum of both algebra and geometry.

        • csanyk says:

          1) I think you’re seeing problems where there is none.

          When rating a game, raters have the option to give a ranking of 0-5 stars. 0 is defined as N/A. The new opt-out of category mechanism allows the submitter to effectively rate themselves as N/A in whatever categories they choose not to be rated in. Raters have always been able to choose to ignore certain categories if they don’t feel that they fit the game in question.

          If someone makes a game that is all text, rating it in the Graphics category doesn’t make sense, so if the rater gives it a 0, or if the submitter opts out of the Graphics category, there’s really no difference.

          Another way to look at it, not all judges rate every single game, either. In fact, that’s the norm. Recusing to judge all 1500 games, and only rating the ones you actually have time to play during the judging period is fine. Opting out of rating a game you rate in one category is fine. So the submitter choosing to opt out of a category should be fine. That’s how it worked for LD31. Are you saying the ranking system is broken? Considering that it’s impossible to have every judge play every game and rank it, I’d say the system isn’t perfect, but it is reasonably good.

          2) I didn’t follow your reasoning at all. There already IS an Overall category. it’s not a calculation based on the others, it’s a subjective judgement of how the good the game is, “overall”.

          We’re not debating removing any of the categories, or what the meaning of “Overall” is in the context defined by LD; we’re talking about adding a category: Controls.

          • pht59x says:

            >1) I think you’re seeing problems where there is none.

            >When rating a game, raters have the option to give a ranking of 0-5 stars. 0 is defined as N/A. The new >opt-out of category mechanism allows the submitter to effectively rate themselves as N/A in whatever >categories they choose not to be rated in. Raters have always been able to choose to ignore certain >categories if they don’t feel that they fit the game in question.

            >If someone makes a game that is all text, rating it in the Graphics category doesn’t make sense, so if >the rater gives it a 0, or if the submitter opts out of the Graphics category, there’s really no difference.

            If I think I’ve made crap music for a game, you’re saying it’s definitely not my interest to opt-out the audio category. If there is no difference, what’s the point of this opt-out category mechanism? If, as a submitter, I get nothing when I opt-out, why should I do so?

            >Another way to look at it, not all judges rate every single game, either. In fact, that’s the norm. >Recusing to judge all 1500 games, and only rating the ones you actually have time to play during the >judging period is fine. Opting out of rating a game you rate in one category is fine. So the submitter >choosing to opt out of a category should be fine.

            Not clear to me.

            >That’s how it worked for LD31. Are you saying the ranking system is broken? Considering that it’s >impossible to have every judge play every game and rank it, I’d say the system isn’t perfect, but it is >reasonably good.

            No, not broken, but there are a few unclear aspects:

            + You should have a set of well defined criteria with as little overlapping as possible.
            + When you submit an entry, your game will be assessed against all criteria. This opting-out thing appears to me as nonsense.
            + When you’re given the results back, rank values and marks per criterium are fine. However, it would be better to know how many votes you received in each aspect of your game, together with the total number of votes you received. Indeed, comments are extremely valuable.
            + “Audio” should be splitted into “Sound” and “Music” (and when people put music, they should be a compulsory credit list).
            + “Graphics” should also be referenced.
            + define “Mood” thoroughly.
            + some other remarks about Coolness further down this thread.

            • csanyk says:

              “If I think I’ve made crap music for a game, you’re saying it’s definitely not my interest to opt-out the audio category. If there is no difference, what’s the point of this opt-out category mechanism? If, as a submitter, I get nothing when I opt-out, why should I do so?

              Er, I’m not saying that. I’m saying it’s now the submitter’s choice to opt out of categories if they choose to, and this doesn’t actually harm the rating system so far as I can see.

              Personally, I don’t see the point of opting out of most of the categories, most of the time. However, I can appreciate that some submitters may not wish to be rated in a given category.

              To stick with the Audio example, to me, it’s a different matter if your game has no sound or crap sound. If the sound is crap, it deserves a low rating. If you *know* the sound is crap, and don’t want to be rated in that category because you don’t need your ego bruised, maybe just take the audio out entirely. But I also think there’s a difference between not having sound in your game because you lack the time, skill, or tools to implement it, and using silence as a design element, like if you’re making an arthouse game about the experiences of being deaf, say.

              For me personally, I want as much feedback as I can get on my game, so I doubt that I’d remove any categories when submitting my game. I was intrigued that we can now do so, and in thinking about it I don’t see how it harms the rating system, despite my personal choice not to use the option.


              The code would be used to check that no proprietary black boxes are called upon. No obfuscation should be permitted.

              That’s exactly why source code is required for all Compo entries. There’s still no “programming” category in the ratings. LD is not a *programming* competition; it’s a *game development* competition. The distinction is important. Game design depends on programming, and better programmers will be able to implement a given design better, but ultimately it’s about the design of the game and it’s implementation, viewed from a “results” standpoint (eg, how it plays, not how well programmed it is).

              I wouldn’t be opposed to a Programming category, in any case; I think good programming deserves to be recognized, and should be encouraged. But good programming all by itself does not make for a good game. I also suspect that a majority of judges don’t usually examine source code (perhaps that would change if we started rating it), and maybe only look at it in order to see how the submitter accomplished something, or if they suspect a rules violation (I suspect that the rules are not very strictly enforced, if enforced at all).

        • mortus says:

          2) Having a lot of categories makes game harder to rate sometimes. I was having a hard time rating games in both Fun and Overall at first, and I believe I was defining Overall in at least 2 or 3 ways during several LDs for myself. The thing is, I don’t think that anywhere on the site we have a strict definition of what Overall is. Last few events though, I was really happy that the category is there, as I’ve probably finally defined it for me.

          Sometimes the sound is great, the graphics are great and the game is innovative and fun, but it just doesn’t look good enough, it all doesn’t fit together. You play it and try to see one bad thing in it and you can’t, yet you can’t say that’s a good game OVERALL. There were few such cases in my experience and there were games which I rated 4 in graphics, audio and fun, but 3 or even 2 in overall.

          On the other hand, sometimes it’s the exact opposite. There’s no audio, the graphics is crap and when you analyze it, it’s not even fun, but the game is such great experience OVERALL (I’ve really tried to describe the situation without using that word, but looks like the category is named perfectly, you can’t avoid it) when you play it, that you just can’t rate it lower than 3-4 in Overall. If anything, the category “Addictiveness” should be added, but it would confront with Overall a lot, I guess.

          P.S. Note that I’m always trying to rate game in every category distinctly, I highly disapprove people who leave comments like “WOW! Best as always! 5 stars all categories!!!” (and I believe rating accordingly) for games that are really good and deserve to place top 3 in both overall and fun but have two beeps from sfxr as audio (or a horror game rated 5/5 in Humor).

          • pht59x says:

            >…
            >On the other hand, sometimes it’s the exact opposite. There’s no audio, the graphics is crap and when you >analyze it, it’s not even fun, but the game is such great experience OVERALL (I’ve really tried to describe >the situation without using that word, but looks like the category is named perfectly, you can’t avoid it) >when you play it, that you just can’t rate it lower than 3-4 in Overall. If anything, the category >“Addictiveness” should be added, but it would confront with Overall a lot, I guess.

            Maybe the distinction should be made clearer between OVERALL as subjective perception by the rater (one of the criteria) and OVERALL as a calculated index for ranking.

            >P.S. Note that I’m always trying to rate game in every category distinctly, I highly disapprove people who >leave comments like “WOW! Best as always! 5 stars all categories!!!” (and I believe rating accordingly) for >games that are really good and deserve to place top 3 in both overall and fun but have two beeps from >fxr as audio (or a horror game rated 5/5 in Hum

            Added to the minimu 20 games rated, maybe there should be a MAXIMUM number of games you’re allowed to rate. Today, I’m not even sure how many games you need to rate to get 100 % Coolness. Is it 20, 50 100 ? I have no idea. Have you? You would have to choose your games carefully before rating them. That would put an end to people rating 200 games in 24 hours as we saw it this year.

            Maybe it should also be made compulsory to leave comments for each game you rate.

            • PapyPilgrim says:

              You need to rate at least 100 games to get a 100% coolness. The “top coolness” category on the result page seems to be random though: I rated something like 115-120 entries (far from the top raters) but I am listed in the top 5.

          • PapyPilgrim says:

            I 100% agree that the lack of guidelines (or clear definitions of what each category is) really hurts the rating, and leads to this “5/5 all around” you are describing: when looking at some of the top entries, I can’t help but think that some ratings are not deserved (even if the game itself is, indeed, awesome)

            I participated in many compos and had the same problem: it took me a while to get a clear idea of how I should rate each category if I wanted to be fare. I wrote an article about it a while ago (http://ludumdare.com/compo/2014/05/07/rating-consistency-how-i-rate-games/) but not much came out of it

            • csanyk says:

              It actually doesn’t matter what guidelines a judge applies when rating games, as long as they apply them consistently when they rate, and enough people rate each game.

              I’ve thought a lot about how the rating system works, and posted my thoughts on it a couple years ago: http://ludumdare.com/compo/?p=180789

              • pht59x says:

                Very useful reading. All this information should be available within a click when you rate a game — maybe through opening a sub-window.

                May I also suggest you replace N/A (non applicable) by 0 (zero). That would be less confusing.

              • pht59x says:

                Also, best wishes for the new year. It was nice talking to you.

  4. Ulydev says:

    I think Gameplay should be added more than Controls.

    • csanyk says:

      Can you offer a definition of “Gameplay”? It seems vague, and to me would be close to “Overall”.

      • pht59x says:

        I agree “Gameplay” is really a vague notion.

      • Ulydev says:

        Sorry for not explaining my point of view. Gameplay would represent all of the controls aspect as well as how easy / or not it is to play the game, independently of the game mechanics. A game can be hard yet easy to control.

        • csanyk says:

          Thanks, Ulydev.

          That definition of “gameplay” is still problematic for me; it conflates factors too much, and is thus, as I had feared, vague. I think some of what you’re talking about is covered in “Fun”, some of it by “Overall”; while the control aspect of your Gameplay idea is best separated into its own category.

          It would seem that “Controls” + “Challenge” would be something close to what you’re talking about with “Gameplay”. Challenge might be another good candidate for an additional category.

          Controls is a pretty well-defined thing. Are they intuitive? Are they responsive? Do you feel like you’re in control of the game, or not?

          Challenge is also pretty well-defined. Is the game easy or difficult? Does it punish the player for failure in a way that is unfair or not fun? Or does it encourage them to try again? Does the game get harder at a rate that compliments the rate at which the typical player learns to master the game? etc.

  5. Always felt that Ludum dare had a bias towards art rather programming in the score categories.

    To measure design we have
    Innovation
    Fun
    Theme

    To measure graphics we have:
    Graphics
    Humor/Mood ( i can’t believe these are separate categories)

    To measure programming we have:
    Fun.

    something like controls, maybe technical proficiency or impressive scope would be cool. But maybe programming just too internal to the game, that measuring would be pointless or not accurately captured by the people who vote. I always feel like there are games that impress me from a gameplay or programming stand point and i feel like i can’t express that with the score system.

    or maybe as a programmer, i care about stuff that isn’t really important to the game as a whole, and I’m just trying to make myself look more important.

    • csanyk says:

      That’s an interesting thought. I think that all the categories are influenced by the programming, but none of them rate the programming itself. To truly rate the programming as its own category, I’d suppose most judges would that to mean “look at the source code. is it beautiful? is it efficient? is it clean? is it expressive? is it clever?” Some very good games are just horrible messes when you look at the code. LD48 encourages developers to work quickly, not cleanly.

      Sure, the controls are programmed. But this would be a category rating how the controls feel to the user; not their underlying programming. We can probably safely surmise that in most cases well programmed controls will feel good to the player. But that is not necessarily so; there can be controls for which the quality of the programming is fine — no bugs, well documented, no wasted instructions — but there are faults in the design or in the way the various inputs come together.

      • pht59x says:

        >…I think that all the categories are influenced by the programming,

        Audio is not (though there are a few issues here, too. Maybe for another thread?)

        Maybe you meant “most” categories…

        >but none of them rate the programming itself.

        In an action game, “Fun” is related to the programming. But “is related to” does not mean “is a measure of”.

        >To truly rate the programming as its own category, I’d suppose most judges would that to mean “look at >the source code. is it beautiful? is it efficient? is it clean? is it expressive? is it clever?” Some very good >games are just horrible messes when you look at the code. LD48 encourages developers to work >quickly, not cleanly.

        The code would be used to check that no proprietary black boxes are called upon. No obfuscation should be permitted. It would not matter whether the code was an undebuggable mess, in which case the game would not run, or would keep crashing!.

        To make the code inspectable and prevent huge copy paste from previous projects some limit on the number of lines should be set (something in the order of 1000 lines for 2 days sounds reasonable)

        In the case of games running directly on an OS, the code would also be useful to quickly check if the game is not going to wipe out your system when you run it. In case of doubt, you would simply not rate the game.

        For Web game, a clear distinction should be made between games running with a webplayer (ie Unity…) and HTML/JavaScript games. Most of top games in LD31 are based on Unity and only run because their author had zillions of lines of code at their disposal to produce all sorts of juicy effects. I’m not interested in using a particle generator or a library of easing functions: I want to make them! In recent LDs, there are no points givent for that type of effort! Then you compare people driving Ferraris and people walking. That goes beyond me!

        >Sure, the controls are programmed. But this would be a category rating how the controls feel to the >user; not their underlying programming. We can probably safely surmise that in most cases well >programmed controls will feel good to the player.

        I agree.

        >But that is not necessarily so; there can be controls for which the quality of the programming is fine — >no bugs, well documented, no wasted instructions — but there are faults in the design or in the way the >various inputs come together.

        Faults in the design would not be sanctioned as such. If you anticipated those as fatal running ones, you would not bother running the game.

    • PapyPilgrim says:

      I don’t think people should evaluate the “programming” of a game, but the game itself. Yes you can do something technically impressive, but at it’s core a game is supposed to give something to the player (emotions, fun, something to think about… you name it).

      This bias towards art you are talking about is a direct consequence of this. Not of the rating system.

      Take one of the top entry, Orion (http://ludumdare.com/compo/ludum-dare-31/?action=preview&uid=8733)
      The gameplay is basically “ok, find button xxx, press it, and then wait a few seconds to get told wich button to press next”.
      Is it fun in itself? No (it is fun for young kids though, but an adult is quickly bored). Yet this game is in the top 100 for fun, innovation and overall.
      I don’t want to argue wether or not such results are deserved, but this game (successfully) fulfilled a fantasy : being an astronaut and piloting a rocket. Programming as nothing to do with that.

  6. hexagore says:

    I think “controls” are covered by “fun”. If you fluff the controls of your game your score for fun will suffer directly.

    For what it’s worth I only know this because of personal experience. My most recent entry scored really well in every category except fun. All of the criticisms it received concerned the controls and the feel of the player character. It made the game too hard, or at least it made the difficulty ramp up far too quickly. It destroyed the fun. :)

    • pht59x says:

      >I think “controls” are covered by “fun”.

      You’re probably right.

    • PapyPilgrim says:

      I disagree here
      For this LD I spent most of the compo working on the controls of my game, and I got very good feedbacks on it. Yet I knew (still from feedbacks), that I would have a poor rating in fun: The only consistent critic I got was that my game was too railroaded and that there were no choices to make.
      As a result, my rating in fun is one of the worst I got in 10 participations. Controls had nothing to do with it.

  7. CosyCave says:

    I’d like to see something that covers how mechanically good the game is. I don’t mean mechanics of the game but how good it feels and controls are a big part of that. I’m not sure what this category should be called, but I feel that Fun as a category is too broad at the moment and a lot of different things fall under it. Fun is also very open to interpretation and up to personal preference.

    Prime examples of how having bad controls doesn’t always mean the game can’t be fun: Surgeon simulator and QWOP. In these games controls are terrible (intentionally so in case of at least Surgeon Simulator) and that makes it fun. I’ve seen many LD entries where I didn’t consider the game fun, but it was very well constructed mechanically and I can’t really reflect that in my scores. I can try to balance it in the fun factor, but I don’t think that always is very fitting or helpful to maker of the game. I can always say it in comments, but why not make a category for it to get a broader picture?

  8. PapyPilgrim says:

    Well I am glad to see a post that don’t suggest a complete reworking of the system and just suggest a small addition. After each LD it seems everyone has an opinion on how the system is broken and how everyting should be done “their way”…

    I have indeed suggested in the past (http://ludumdare.com/compo/2013/01/06/dont-get-your-expectations-too-high/) to add a control category, and I still stand to that opinion now: good controls are crucial to the experience of a game, and don’t impact ONLY the fun of the game. The sheer existence of a ‘control’ category could even lead people to try something new in that area.

  9. Robber says:

    I think ‘immersion’ would be a good alternative for mood, although a more difficult term, it would be more about how a game feels and how engaging it is, mood is more interpreted as overall graphics and sound

    I vote for ‘gameplay’. it is a well know term in game industry. And would stand for how well it is made, covering controls, challanges, playability, difficulty etc…It would cover a lot more coding aspects than just fun.

    Fun is for me more about the idea and the, concept does the game make you laugh..and is a bit subjective
    concept and idea are also more or less covered by theme, Yes Fun is an important quality for games, but most of it’s aspects will be covered by the other categories

    a well defined subtitel would be benificialfor all the categories

    and for the fun of the discussion: ‘buggyness/quirckyness’ where it would be better to have zero stars :)

    • Robber says:

      also yes i would think the overall rating would be all catergories combined.

      • pht59x says:

        Absolutely ! It should result on some number crunching of all the other categories — as you do for any school work or exam you asseess.

        • csanyk says:

          It’s *not*, though. As defined, overall is its own category with its own number which is independent of the other categories. It is not mathematically derived from the other categories.

          How a judge arrives at their Overall score for a game they are rating is up to them, so some may well calculate an average. However, forcing all judges to treat Overall in this way prevents them from rating a game that is “greater than the sum of its parts”.

          I suppose it’s fine to have discussion about what the established categories mean, or what we think they should become, but that’s not really the central topic of the original post.

          • pht59x says:

            Like you, some other people think an entire picture is made up of more than the parts given. This is related to the study of Philosophy. I might need Gestalt therapy, but what you say makes almost no sense to me.

            In itself, this thread shows LD does have a terminology problem at the root of its rating system. That is also because you refuse to eradicate subjectivity as much as possible. Good luck with your handling of “controls”, “gameplay”, “usablitiy”, “overal”… You already had this “debate” about “controls” some time ago. I bet you’ll have it again in 2 years time.

            Il faut être un peu plus carré!

            No, the above does not simply translate into “you must be a little more square”. In French, the “il faut être un peu plus carré” has more positive connotations than it has in American xor (exclusive or) English. And playing with all sorts of connotations is exactly what you’re doing. Keep on the good fun.

            Since the central topic of the original post is “ratings categories”, may I suggest you add a 9th category to your already longi-ish initial list:

            9. FUZZINESS

            This would nicely come just after HUMOR and MOOD. How funny!

            Again, good luck!

            • csanyk says:

              pht59x, the definition of the Overall category, as defined by LD, is not a calculated value based on the other categories. This is easily demonstrable. If you put a value in all the other categories when rating a game, the Overall category does NOT self-generate a value. Thus, it is not an average or a sum of the other categories.

              If overall *were* a combination of all the other categories, it would make sense to remove control over the Overall category from our direct control, and instead make it a calculation based on what we enter for the other categories. It isn’t that.

              So, you’re wrong about the Overall category. However, it doesn’t actually matter that you’re wrong. The beauty of the rating system is that it allows each judge to “be wrong” — that is, to apply their own definitions and criteria when rating the games.

              The philosophy underlying the ratings categories varies from rater to rater. If they didn’t, everyone would score any given game exactly the same way. But that doesn’t happen. We each get to decide what each of the categories means for us. It doesn’t matter whether we’re right or wrong when we rate games. We rate them as we wish to rate them. They are our subjective ratings. It doesn’t matter if my understanding makes sense to you, or if your understanding makes sense to me, as long as both of us are consistent in how we rate all the games that we rate, and as long as a large enough number of judges rate each game. The wrongness of any individual judge melts into the average.

              In aggregate, the average of all of our subjective ratings may be used to create approximate rankings, which is all the voting results really are. With enough ratings per game, the rankings become approximately fair.

              • pht59x says:


                The philosophy underlying the ratings categories varies from rater to rater. If they didn’t, everyone would score any given game exactly the same way. But that doesn’t happen.

                Don’t forget the measures, ie the numbers used in each more or less subjective category are both discrete and upper-bounded. Hence, draws may happen. But I guess that if there are too many draws you wil call upon more subjectiity to sort out a messy situation.

                At least, I think we both agree that putting marks on art work has always been controversial.

  10. klianc09 says:

    Call it “Usability” instead of “Controls” and I’m totally for it.

Leave a Reply

You must be logged in to post a comment.

[cache: storing page]