Awesome and wicked rating system idea!

Posted by
September 19th, 2014 9:14 am

I’ve heard it mentioned more than once that Ludum Dare runs on the honor system. This works well for friendly competitions, but as the number of participants grow and the chances of making it to the shiny results page diminishes, will the competition remain friendly?

One idea is to keep the competition friendly and expand the amount of top spots through a percentile system (Sorceress’ idea).

Alternatively, I’ve heard some say they want the top spots to be prestigious and to not make it easier to get these positions. You can’t have prestige without cutthroat competition to make victory sweeter, so I present to you a cutthroat rating system. [drum-roll]

I described the current rating system to a lawyer friend. Lawyers are trained to be pessimists, so naturally my friend pointed out the potential for abuse by rapidly assigning low ratings in order to not only boost one’s own score but also gain higher visibility for one’s game through the default score. Apparently, the idea of honesty and good-will was lost in the conversation. My friend stated the most fair ranking system is that used in law school.

I haven’t been to law school, but from what I have heard, professors must give out a fixed percentage of each grade (A’s through F’s). This contributes to the tense and competitive atmosphere of law school where a difference between a pass and fail can literally come down to if one student had better punctuation than another.

Likewise, we can adopt a system where participants only have a certain amount of 1 through 5 ratings to distribute (in each category). When the person becomes sufficiently cool (by exhausting the number of votes) then a new set of ratings are made available, but still in the same percentages.

Of course a person will have to readjust their ratings for other games in order to give a rating that has been used-up, but we all want judging to be thoughtful and deliberate, right?

This means those who are star-stuck by the heavy hitters in LD will not have the ratings to support a surprise favorite and will have to readjust. Those that are too lenient will have to toughen up, and those that are too harsh will have to chill-out. The “shotgun rater” who rapidly assigns low ratings will quickly run out of ammunition.

Of course this will lead to may people giving and receiving the much-hated 1 ratings, but that’s the cost of balance.

This will manage those who distribute ratings unevenly, but what about those who rate randomly? I propose we make all ratings public. To protect the voter from retaliation, the name behind the vote can be replaced with a confidential ID. People should be able to see all of their ratings and query the ID to see that person’s voting pattern of all games. If the voting pattern seems random, then there can be a mechanism to report the voter.

I present this idea with much levity, and mostly for your amusement. I like how the current rating system rewards participation with feedback. But if you think this idea is super-cool, super-lame or super-radical let me know.


33 Responses to “Awesome and wicked rating system idea!”

  1. holgk says:

    Nice examination! I don’t know if there is a plan to change or review the current system. My suggestion would be that there is a minimum of time between two ratings. I know that my ratings are saved so it woulden’t be hard to make a timestamp and a check if I voted in the last 3 minutes or so. With such a ckeck it’s more unlikely that I made up a unqualified option about a game. (rate but not actually play the game) In addition to that, I think that we need a system so none can vote for games that are crashing immediately. I saw that games had ratings even though there were comments that the game is crashing and were still broken when I tried them.

  2. burgerdare says:

    I think that being able to check a voter’s pattern seems like a cool idea, but I don’t love the idea of having a fixed amount of ratings to distribute on a per-entry basis. It’s a bit different when you’re a professor and you have immediate access to all of the content in front of you, as well as a rough idea of how high its overall quality will be (students who have submitted poor work in the past are likely to submit more poor work) so it’s easier to decide if any given submission will be worthy of a top rating relative to other entries. A Ludum Dare rater doesn’t have that luxury, and they could rate a game with five stars only to find that the next game they play is vastly more enjoyable, which gives them two options. The user would then have to go back and rework a bunch of their ratings to try and figure out which games they REALLY enjoyed the most. Or, far worse, the user could decide that it isn’t worth the trouble and still end up giving a bad score to a well made game.
    I think that a good way to encourage fair ratings might be to provide a multiplier to a user’s coolness based on how varied their ratings are. I think that the best way to handle this would be using a formula like this:

    Overall_Coolness = Base_Coolness*(1-( (Most_Used-Least_Used)/(Most_Used) ))

    where Most_Used and Least_Used are the number of times that the most or least used star rating has been assigned. This way, if a participant runs around only leaving negative reviews on people’s pages, the boost in coolness that they would normally receive would very quickly get multiplied down to zero.

  3. Milo says:

    There’s a lot of missing information to know how well the rating system is working (for instance, whether “shotgun raters” actually cause significant bias in the results, or to what degree results are affected by the random selection of people who rate a given game).

    I don’t think the system you propose fixes the problem well; firstly, it would be more annoying than deliberate to have to adjust older ratings, and could lead to bias when people forget the qualities of a game they played a while ago or don’t feel like jiggling their results. However, more problematically, from a statistical standpoint, I see two problems that can easily arise:

    1. No game is judged by everyone.
    2. No one judges every game (not a problem in law school).

    Your system normalizes the first problem – every judge has equal weight now. But, where, at the moment, a judge’s opinion is likely to be somewhat independent of other games they’ve played, it would not be in your sample, since the particular sample of games that an individual judge receives now affects his rating of every game (and, worse, the judges do not choose a random sample of games).

    But I think that the “gist” of your system is good; I’d think it better to accomplish a similar effect while translating individual ratings to a final score. Right now, I believe that’s an average with the two extreme points removed. However, it would be easy to normalize each judges ratings to be uniformly distributed, then take the average – but more sophisticated schemes might work better at addressing both problems simultaneously. There is a question though: What is the purpose of the rankings? If we seek to recognize the best games, then statistical methods are justified. But, if it is meant more democratically, in a friendly sense of, “I voted 5 because I want this game to win”, then maybe the only improvements to make regard the transparency of the process.

    I do agree that, in any case, it would be good to somehow encourage people to use the full range of ratings – perhaps show judges the distribution of their scores in a given area, or encourage them to choose benchmarks out of the first few games they play to judge other games with respect to.

    • waynaul says:

      Milo,

      I appreciate the insight; could you elaborate more on the statistical concerns? How do the judges not have equal weight before and have equal weight after the normalization. How does this differ from an analogy of judges as professors and games as students?

      • Milo says:

        Well, by “weight”, I mean that if we consider that some people avoid the extremes and almost always vote 2, 3, or 4 and others use the whole range, the latter group’s opinion will be more strongly represented in the results; like if we had three judges, A, B, and C, where the first two avoid voting 1 or 5 and C votes with the whole range. Then, if A and B thought a game was terrible and both voted 2, but C loved it and voted 5, it would rank the same as a game that A and B both loved and voted 4, but which C hated and voted 1, despite this game being preferred by more judges.

        However, if the scores of each judge were normalized to a uniform distribution, every judge would necessarily be equally represented in the result, regardless of whether they tended to vote in a limited range. It gives a vote of a 4 a definite meaning (something like “better than 60-80% of other games”) which doesn’t vary between judges, meaning it addresses the problems due to each game being judged by different people.

        The law school analogy would be similar to a system where every judge assigns ratings to every game in a fixed distribution. That system is fair; every vote has a specific meaning. Even the system of “every judge is presented with a random set of games, which they score according to a distribution” is fair. However, your proposed system is more like: Every judge _chooses_ a set of games to rate. Their ratings over this set are forced to a fixed distribution. This is quite different from a school, wherein the professors have no choice in who their students are.

        The problem is that the judges here are biased towards rating some games more so than others – they’ll be more likely to play games which have attractive thumbnails, are easy to open (e.g. web-based), whose developers have a greater community presence and investment in their games (either in general, by writing a post-mortem, by having lots of coolness, by mention in a different blog post, or by suggesting their game to someone who streams games), whose descriptions appear more interesting, etc.. One would expect that the average judge sees a higher portion of high quality games than actually exist in the competition.

        Consider the extreme case where a judge only sees the best games of the competition – all of which would deserve 5s from a hypothetical judge who’d played every game. Yet, the particular judge would have to give only a few 5s, and would have to give one of them a 1, lowering a good game’s score considerably – and exposing themself to criticism over his ratings*. This would create extra noise in the highest quality games, as some would randomly be receiving scores of 1 due to the fixed distribution, whereas lesser games rated by judges who played a random sample may receive higher scores.

        I doubt many people fall into this extreme case (though some probably do), but I know that I certainly am not randomly picking games when I rate it, or allowing the system to choose for me – and, though I choose most games “randomly” (i.e. randomly, but only if they run on a Mac), it wouldn’t take too many choices to start messing with the system. Ultimately, instead of judges being biased by what scores they give, they would now be biased by what other games they play, and this would tend to jiggle the more important rankings (like the top 25) more so than elsewhere.

        (*Admittedly, this might be justified; it’s not in the spirit of LD to give popular or good developers all the attention and feedback. It’s certainly admirable to ensure that everyone receives substantial feedback, but that doesn’t preclude also finding the most fun games and giving feedback to them)

        • holgk says:

          “The problem is that the judges here are biased towards rating some games more so than others…”

          So you think that there should be a “random” picking system for rating games?
          A system that tells you which game you have to rate.

          Such a system would generate more ratings for “unpopular” or average games. But there has to be an option for choosing your Platform/OS.

          I don’t choose the games randomly. As you said “attractive thumbnails” etc. that’s how I choose the next game. So I think a random picking system is not totaly wrong but there should be a way to rate recommended games too. Maybe you unlock that feature after rating 20 random picked games or so.

        • waynaul says:

          Let me see if we’re on the same page. You’re point is that the law school analogy is that a professor rates all of the students in the class, and the end result is fair because it is a normal distribution. The proposed rating scheme does not match the law school analogy because a judge does not rate all games, so only a subset of all games are put into a normal distribution, which is not fair.

          But professors rates all of the students in their classes, not all of the students in the entire school. Isn’t it the same where subsets of the student body is subjected to separate fixed distributions?

          I think that might be a more accurate analogy. The judge is a professor. The games are all students at the school, and the set of games the judge rates is the class.

        • Milo says:

          The main difference is that professors have no choice in who they rate, but people here do. It is therefore unlikely that a professor will have a class of only the best students (even if a professor, consciously or unconsciously would choose that) – rather, a given class will probably be a more or less representative sample of the school. Putting that to a fixed distribution is fair – everyone is judged against a reasonable sample of other students. However, if the professor could choose a class that had uncharacteristically many good students, the system would force him to give some good students poor grades – below the grade they would receive in a random sample of students. LD is more like the second situation, since we have control over the games we rate.

          Strictly speaking, if we want to make our ratings have meaningful, statistical value, having judges rate random games according to a fixed distribution is probably the easiest way to be fair. I’m not advocating for that; I imagine it would make judging less fun and could decrease community interaction (since you couldn’t rate games shared by others), and LD seems more like a place where the system being fun is more important than it being right.

          • waynaul says:

            I’m going to argue that a given class is not representative of the student body at a law school. As sorceress has stated, the make-up of each year is different. The beginning year will have many students with low socioeconomic status, minorities, students with disabilities, etc. Unfortunately many of these students do not make it past their first year. Those whose GPA do not meet a threshold are cut from the school. The third and final year of students will have a greater make-up of students with advantages over those who were cut: funding from their parents, mentors such as parents or relatives who are lawyers, and connections to get internships.

            Professors choose who is in their class by the subject matter and prerequisites. As already stated, professors teaching subject matter for third year students will have students with many resources. Some professors teach remedial classes specifically for students who are at risk of being cut. Professors teaching a subject considered easy will also recieve a greater amount of stuggling students. Those that teach subjects that require strong speaking skills will get students who want to be litigating attorneys, while those who teach tax law will get those who are mathematically inclined who will go on to be research attorneys.

          • Milo says:

            That’s true; a professor specifying prerequisites, teaching a class appealing to students with certain goals & strengths, teaching an introductory vs. an advanced class, etc. is analogous to the sort of bias that the system would have in the context of the LD. Although, I’d conclude that that means the system’s not fair for the students – then, fairly explicitly, being an average student in a remedial course and being an average student in an advanced class would be considered the same, even though the former implies a lower overall rank than the latter. If grades are intended to reveal a student’s ability, relative to the rest of their school, this system doesn’t really work.

            Actually, the bias there is probably worse than in the LD; at least here, entries don’t get any meaningful say in who judges them. I think sorceress is probably right; the system creates a nice looking distribution, but isn’t necessarily fair to the students.

            In general, democratically choosing the best option is a much harder problem than it initially appears – it’s easy to recognize the flaws of our current system and to fix them, but not without introducing new problems. It’s difficult to weigh the new problems against the current ones. Ultimately, how votes are cast and counted affects the results. In your system, the choice of distribution of votes will affect whether a game which is polarizing – beloved by half of judges, hated by the rest – will rate above a game which is average (i.e. if the median of the distribution is above the average of the extremes, the average game will be higher. Otherwise, the polarizing one will).

            So, how do you choose the distribution if your choice affects the result? It is undesirable that games win because some magic system interpreted the results such that it won. The way voting works imbues the results with different meanings – and mathematical results like Arrow’s impossibility theorem and the Gibbard-Satterthwaite theorem show that there’s no objective way to take a voter’s preferences and figure out who the democratically preferred candidate is (Wikipedia has nice non-technical explanations of both theorems – the gist of both is that they state a few conditions for a voting system to be fair – e.g. “no individual controls the results” – and then prove that no fair voting system exists. They are among the most depressing results in mathematics, I believe).

            Personally, I agree with you that some system of accountability/transparency would be beneficial to the system. Encouraging judges to have similar standards to one another would help too (perhaps have them play a few high ranking and low rankings games from previous LDs or something?), but I’m skeptical that directly influencing what votes are cast via a fixed distribution is an improvement.

  4. PoV says:

    It’s an interesting idea, that’s for sure. I mainly worry about it being confusing. One of the main things that needs to be done is to simplify everything. We’ve collected a number of nuances over the years with the intent of one thing, but the result of another. There are a lot of things that could be a lot simpler.

    I don’t have the time right now to explain my plans for how I’d like to fix this and many related things, but what has to happen first is identifying what the problem is. For example, if the issue is that people don’t like getting low scores, like “2.4”, then maybe issue is that we’re telling them their score is “2.4”. So one solution is to just not tell them, but to instead only tell them that their game was 200th place for Audio. I’m not saying this is the problem, but just offering an example about fully understanding what the problem is. That, and sometimes ignorance is the kindest solution.

    • sorceress says:

      if 2.4 is a depressing score, then modify scores with this function:

      happyscore(score) = (15 – score) * score / 5

      then happyscore(2.4) = 6.05 \:D/

    • waynaul says:

      Thanks, PoV. Though I recently joined, I appreciate the work you have done in keeping LD going all these years. I’ve been more productive in the 48 hours of the last competition than I have been in a year.

      My post may sounds like a demand for drastic changes in the rating system, but the heart of my message is in the first three paragraphs. The rest of the post is light-hearted food for thought.

    • Tim Bumpus says:

      Ignorance is NEVER the kindest solution. Just ask anyone who became a laughing stock in the auditions for American Idol because no one would tell them they couldn’t sing.

      If someone isn’t doing well, you have to tell them. You have to. Because otherwise they’ll have a much harder time learning the difference.

      Short-term satisfaction is no substitute for long-term lessons. I take great issue with hiding the truth from people.

      • PoV says:

        It’s not the truth though. The truth is the rating relative to other people. The score itself is arbitrary, meaningless, without the big picture. It’s more a lie if given as-is, since you don’t know 500 other people scored just a few fractions points more.

  5. sorceress says:

    Consider popular developers, who usually make good games.

    People go play those first, and because those devs usually make good games, they will soak up most the top ratings early on, leaving judges with only 1-4 star votes to hand out. Thus making it impossible for other games to win.

  6. waynaul says:

    sorceress: I’m going to play devil’s advocate here (I like your percentile idea).

    Supposing people play the games created by the heavy-hitters first and give all of their 5’s away. I’m counting on Milo’s statement that over time their feelings toward a game they played before will settle. If they play a game by an up-and-comer, they may think “I really like this audio. Now that I think of it, it’s better than [popular developer’s] audio so I’m going to adjust [popular developer’s] audio to a 4 so that I can give up-and-comer’s audio a 5.

    I think it’s easy for someone to initially give high marks across the board. Over time it may be good for a person to be forced to reassess their ratings to an earlier game.

    I know that readjusting ratings sounds like a chore, but perhaps there are UI solutions to make it easier.

    • waynaul says:

      I’d say they would be that conscientious given the prospect of their votes being disclosed to the public. A person can be held accountable for any short cuts at any time in the future.

    • Will Edwards says:

      Came here to say exactly this!

      One completely different approach would be self-organization / collaborative filtering, both for weighing final votes and deciding a winner and for suggesting games people would like to play.

  7. Managore says:

    “Lawyers are trained to be pessimists, so naturally my friend pointed out the potential for abuse by rapidly assigning low ratings in order to not only boost one’s own score but also gain higher visibility for one’s game through the default score.”

    Surely the easy solution here is to run a simple script that checks to see whether individuals are giving out extraordinarily unusual ratings (for example, if more than X% of ratings they give are a 1) and nullify all ratings given by such individuals. The exact method used could be kept secret to avoid abuse or exploitation. Essentially all this would be doing would be implementing a spam filter for the ratings.

    “My friend stated the most fair ranking system is that used in law school.”

    Depending on how restrictive this is, it could be crippling and frustrating to work under, and it is still open to abuse.

  8. steve says:

    If you really wanted to use this method it would be easier to just drop the star rating completely and have people order a list of games they’d rated into best < — > worst order, this way there’s no having to go back and adjust ratings, it just happens automatically by dragging the games around into the order you feel is most appropriate.

    That might solve the UI problems of this method, but I think there are still a few fundamental issues with it that other people have brought up that would be difficult to solve.

    I’ll add one of my own, for this LD I rated games over about a week, if I had to decide whether a game I rated at the start of the week was better than the one I just played at the end; I wouldn’t be able to do it, there’s no way that first game would be fresh enough in my mind that I could accurately determine which was better and if I went back and replayed games to keep them fresh enough in my mind to accurately rate against them it would take me significantly more time to rate games (thus less games played, and probably me getting bored of rating much sooner)

    On a side note, there are a couple of things that I think would be interesting to people that we could do right now without much effort at all;

    * display a breakdown of how people voted on your submission in each category

    * display that same breakdown for a user based on how they voted (no seeing who they voted on, just the number of 1/2/3/4/5 star rating they gave out)

    Perhaps both of these charts could be private and only visible to the author of the submission or the user herself to avoid people going on a witch hunt for people with high numbers of 1 star ratings, but just having this information available to someone might make them reconsider their ratings and try to rate in a little more balanced fashion?

    • steve says:

      that should be “into best < — > worst order” I forgot the comment form strips <>

      • DrCicero says:

        I think before doing any magic to the voting system, we should have more data visible. No premature optimization without profiling 😉

        Showing a breakdown on the stars (example: 1to5: 5, 3, 6, 10, 5) on every game page would easily gives other people valuable feedback how fairly a game was rated. Also the developer gets some more possibly useful feedback.

        Showing a breakdown on the authors page would also be interesting, and could be used to decourage giving many 1 stars, and encourage fair rating without any ‘manipulation’ of the ratings.

        1) Is there a statistic for of how many stars were given how often?

        2) Is the data of how many stars were voted still available or is it immediately deleted after a LD? Could we show a breakdown on previous LDs?

  9. Tim Bumpus says:

    Milo: “No one judges every game (not a problem in law school).”

    Well, let me just say that this system proposed would work perfectly for another popular ratings suggestion: being assigned a list of games you must rate.

    Let’s say when the rating period begins, everyone is assigned a set of games to rate (based on their stated platform preferences and amount of time they have). They are then only allowed to distribute ratings based an enforced ratio, as you have said.

    This solves the “people will just give highest ratings to the celebs” problem. Steve’s solution could be mixed in here as well.

  10. Josh Riley says:

    I like the thought process, but I’m not sure this idea will get the results we want. My fear is that I will get the best game in the competition after exhausting 4s and 5s, and be forced to underrate it. Also, what if it’s the last game before I get my new pool of ratings to give out? Will there be any point in rating it all since I presumably will be required to give it whatever I’m legally allowed to give. Obviously, I could go back and change my other rankings, but I don’t think the majority of people are going to do the work every time that situation arises.

    Another solution might be to level out people’s ratings relative to all the other ones they give (which the system may already do?). In other words, make all of the rankings a person gives out average to 3, by grading them on a curve in a way. Ex: If someone gives all 1s or all 5s to every game, they would effectively count as all 3s. The math seems to be pretty straight forward (adjustedRating=actualRating+(3-averageRating)) and as far as I can see this would completely eliminate any overly harsh (or overly nice) reviewers.

    This doesn’t fix the problem of people spamming reviews to get coolness (although a minimum timer between games might help with this a bit).

  11. klianc09 says:

    I don’t get it why everybody is making such a fuss out of the rating system. I am happy how it currently is. Any rating system is going to be flawed, especially since LD has now grown to such an immense size. And the rating simply doesn’t have to be 100% accurate for me, it’s okay if it fluctuates by 5-10% or so. As long as I have a rough feedback about how I did.

    As PoV said, the most important thing is keeping it simple. Anything too complex or unintuitive is just going to confuse and annoy participants, and that’s nothing I want for LD.

  12. waynaul says:

    klianc09: You’re right that LD has grown to an immense size. As the number of participants grows, it presents a problem, because the number of top spots that are displayed in the results page (let’s call these resources) are fixed, which means they are becoming more scarce. As resources become more scare, competition for these resources will increase.

    According to Encyclopedia Britannica Online, competition and cooperation are opposites. LD has elements of both with its judging event and community feedback. These are mutually exclusive goals however, and an increase in one will mean a decrease in the other. Historically LD has been more about cooperation because it was a small community with plenty of resources. Since it was small, it also was not well known nor prestigious. As more people hear about LD and participate, it gains prestige, but competition will increase.

    Will people continue to play fairly when resources become scarce? I say no. Just for fun, here is a story my law school friend told me:

    My friend was in a class where there was a research task that required the school law library. The students that went to the library first ripped out the library book pages (like the movie “Up”) related to the task in order to sabotage the other students. My friend was lucky that another friend worked at a law firm and had access to a separate law library. Other students were not so fortunate.

    With the increasing scarcity of resources in LD, I think there are two directions LD can go. One is to support cooperation by providing more resources. This is Sorceress’ idea where medals are given on a percentile basis, instead of a fixed number. I think it was the top 10% get bronze, the top 5% get silver and top 1% get gold.

    I like this idea, but to pick people’s brains I’m going to argue the opposite and say that we should keep the number of top spots the same to gain prestige.

    Suppose we give gold medals to the top 1% of participants. That means about 15 people will win the 48 hour competition and roughly 11 people will win the Jam. Can you imagine what the website will look like with 26 posts saying “OMG, I can’t believe I won!”. Imagine the flood of posts saying “Did better that I expected!” Not very prestigious. Someone who wants glory in LD wants to say to the world, “I am the winner of LD##!”. They want others to say “I wish I was you…”. They don’t want people to say “Me too!”.

    Now that we established why LD needs to be prestigious, we need to deal with cheating, because it will happen as it gets harder to win. The present system runs on the honor system. It’s not equipped to deal with cheating. My proposed rating system is both cutthroat to match the increase in competitiveness and has measures to deal with cheating in judging.

  13. dalbinblue says:

    It sounds like the ultimate goals it to make an individual’s ratings average out to a fixed number so that situations such as a bad voter gaming the system doesn’t negatively affect ratings and a conservative voter doesn’t count less than an extreme voter. At the same time we can’t make it more difficult. Given that, can we curve each individual’s score so they average 3? If you give a lot of high ratings, they curve down, while a stingy votes low scores might curve up a bit.

    Second, we can do some statistical analysis on the individual rating versus the group. If someone is voting difference is statistically significant over a large number of games, they can get flagged for potential abuse.

    The nice thing about these checks is they don’t change the current voting system, just how the results are interpreted. This means we can test it now using existing data (if the individual voting data is kept after the end of voting).

  14. PapyPilgrim says:

    I have been writing a post about the rating system in the past few days. I guess I will post it here since there is already a discussion going on:

    To me the current rating has several problems:
    – Rating consistency (between users but also for yourself as I mentioned in a previous post : http://www.ludumdare.com/compo/2014/05/07/rating-consistency-how-i-rate-games/
    – Shotgun ratings : people who rate quickly/randomly to boost their visibility
    – Popularity : some people just get more (and better) ratings just for being themselves
    – Comments are not constructive enough
    – prevalence of web games over desktop games
    – revenge ratings (you left a bad comment on my game? 1 star overall, that will teach you)

    So this is the system I would propose, feel free to give your impressions, we all want to have the best possible experience after all)

    I) Game Selection
    – The site offers ONE game you can play. No choice.
    >> The current coolness system can still be used (It would just be hidden), but no more popularity/buddy/revenge effect
    – If you cannot play that game, you have to leave a comment explaining why you can’t play (don’t own the device, crash, …)
    >> Removes any “I only play web games” behaviour, and gives instant feedback to the participant if there is a problem
    – Once you played, the game is added to your “Games I can rate” pool (more on that later).
    – You can still look for a specific user, and access its game via the url… but playing this game won’t add it to your pool
    >> Because we all have our favourite LD’ers, and because press coverage is always nice you should still be able to play their game. But this should not impact the ratings : we want to keep it fair.

    II) Evaluating
    I am amazed that nobody even mentioned the ELO rating system here… This system is DESIGNED to calculate the relative level of each participant when it is impractical to directly match everyone, and when the result of a direct comparison is uncertain. This system is very well documented (see the wikipedia article : http://en.wikipedia.org/wiki/Elo_rating_system)
    – The game randomly choose 2 games from your pool, and you just have to compare those 2 games in each category : 2 radio buttons for each category, “which one is better?”, done. Simple.
    >> Removes the star, and make it more simple to evaluate the game
    >> Reduce the shotgun ratings impact : if everyone agrees that Game A has better graphics but is less fun than Game B… A random rating saying the contrary won’t impact things much
    – Maintain an elo score for each category, and adjust it after each comparison.
    >> Gives a more fine grained scale to position entries on, and gives latitude in the way it can be represented at the end of the day.

    III) Comments
    – Each game should have a “public” comment section, and a “people who rated this game” comment section
    >> Because the comment section is an important part of an entry page, I don’t think it should be altered. So I suggest adding a special type of comments to the mix.
    – When you are comparing 2 games, have a textbox for each category in which you can leave a comment regarding this category.
    >> Promotes constructive feedbacks by restraining them to specific topics
    – You can always see what you said about a specific game (by browsing your “played game” pool, or when you have to compare it) and modify it
    >> Because sometime you are not inspired to leave a comment, or you think you should add something
    – Get a notification when someone respond to one of you comments (even better would be to have a thread for each comment).
    >> Allows to give feedback and ask questions more easily

  15. holgk says:

    As I said before an easy way to reduce the amount of shotgun ratings is to implement a timer. Your suggestion with only one rateable game is simple to combine with a timer that only gives you a new game every 3-5 minutes. So those who play the games, need this time as a minimum to play a game properly. But I think there should be a punishment for those who rate games that are actually not working. I saw on multiple occasion that those broken games had ratings. I think it would be nice if there is a checkbox that says something like game broken with a comment field next to it and this comment is send to you via eMail or message or both. So that those people know that and how theír game is broken. But your suggestion to write a comment on each game you can not play is bad. “If you cannot play that game, you have to leave a comment explaining why you can’t play (don’t own the device, crash, …)” I think it’s better to have a selection system for platforms and OS and the next game to rate has to fit in. So your game has not a dozen comments like “I don’t use linux…” and/or “Who makes an entry for IOS. Nobody will play this!” etc. I agree that I would like to have more comments but they have to be of a supported kind and not because they are mandatory.

    • holgk says:

      OH! Sorry! This comment was a reply for PapyPilgrim comment.

    • Dagnarus says:

      “your suggestion to write a comment on each game you can not play is bad”

      While reading PapyPilrim post it actually didn’t occur to me that those comments could be displayed on the entry page. It would indeed be bad to flood the page, but I disagree with your suggestion to choose which platform you want to play on. Web games have a huge avantage compared to other games just because they are played more, and it is imo a much bigger problem than the “shotgun ratings” everyone talks about, but that nobody can prove.

Leave a Reply

You must be logged in to post a comment.

[cache: storing page]