We made it! And we found the bug!

Posted by (twitter: @mikekasprzak)
August 21st, 2011 9:09 pm

Thanks everyone for you patience. It looks like we finally have the performance problems solved. Way to help us break-in the new host!

I think we are ready to run a Ludum Dare now! (Oops!)

As you can imagine, Phil and I had an INSANE couple days pounding the website in to submission. Here’s a re-telling of what happened.

In the days leading up to the event, we got the first notice from our $10/mo shared webhost that traffic was getting intense. No problem, at their request we installed caching. Overall we are running slower, but we made sure to point everyone to IRC and Twitter to get the theme.

The caching and redirection of traffic helped us get through the theme announcement (503 people on IRC), but come Saturday, our shared host informed us we were using way too much CPU (25% of a 8->16 core server… oops!), shutting the /compo/ site down.

We scrambled, panicked, and got the site up and running on a $60/mo VPS by working through the night. That still wasn’t doing it though, but it was 5/6 AM, and we both needed a recharge. We left the keys to the car with Seth, and crashed.

Some 3 hours of sleep later, Phil got up and switched us over to a $200/mo VPS. I was still asleep during this time, missed the memo, and had my own little site panic moment. But all was well, we just needed some tweaking. :)

Yet performance was still degrading, and fast! The submission system had nearly 100 submissions in it, so the site was certainly working well enough for people to send stuff in. Slowly.

Then came and went the mad rush of the final 5 hours… What was going on!? It just seemed unreal, how the heck were we bringing a $200/mo server to its knees already!

Hey! We’re on a (semi) dedicated server now, with 90% dynamic and generated content… AXE THE CACHE. Duh! Regenerating cached pages EVERY TIME someone posts is INTENSE! We disabled all caching, and it was bliss. A period classic Ludum Dare site performance of days yonder as we browsed. This fairy tale ended quickly though, leading me to have to track down the right settings to get browsers correctly locally caching images (by default, this isn’t enabled). After much tweaking, and a couple crashes, we got this sorted.

How else can we lighten the load? The sidebar! Good idea! Kill that! How’s that? Not enough! Okay, non-essential plugins off too!

But it just wasn’t enough. GRR!!

It took us installing an SQL query profiler on WordPress to finally discover the killing issue: The Submission System. Remember when I said we had nearly 100 entries already? Well this kept going up as time passed. I was recording some submission metrics for my own amusement before I noticed it. Some 300’sh entries, and WHY THE HECK ARE THERE 760 SQL QUERIES BEING EXECUTED FOR THE VIEW ENTRIES PAGE!?

That there, friends, was our zero hour site killer.

I’d like to think there was some crazy light-show going on at Phil’s place, as he coded as fast as he could, on 3 hours sleep, to fix the submission site code from executing O(n) queries to just O(1).

And there you have it. We still needed a new server, but performance got worse and worse as more and more games got finished. Really, we couldn’t have solved this without you guys; Braving the super-slow website to get your entries in. Sometimes it takes 760 queries in your face to see these sort of things. 😀

Trial by fire. We got burned, but we hope the rest of you turned out alright. :)

Wow. Maybe I should be entering the LD website in the compo? Oh wait, it was a team effort, so that goes in the Jam.

Thanks everyone for your patience!

PS: Yes, the server costs have increased 20 fold overnight. We’ll be looking in to how to deal with that, once we have more rest behind us.

Tags:


45 Responses to “We made it! And we found the bug!”

  1. Heroic. Thank you guys for making the effort and keeping us all calm and collected with the constant updates via the soc-nets. I’m confident the funding issues will be easily resolved, too.

    Now, about that judging time extension… 😉

  2. lectvs says:

    Yay! Glad you guys got everything sorted out! I still have another day since I’m doing the jam! 😉

  3. Elegwa says:

    Wow, Thanks for tanking through this guys. Keeping us updated and making games while you guys are having panic attacks about server usage, then wallet usage. I’m sure all of us are thankful, I know I am!

  4. gimblll says:

    Thanks for the superb work guys!

  5. hamster_mk_4 says:

    Huzza, Ludum Dare is back on line. I was worried when 6:45 rolled by and I could not get to the submit page. I appreciate all the work you guys put into making this event work.

  6. CabinClyde says:

    It was Minecraft’s creator, Notch who brought the site down. Because he was live-streaming him developing his entry.

  7. johanp says:

    Thanks for the extra effort guys! Where’s the donate button? 😉

    • PoV says:

      I’ll open a discussion during the week about this. Things are a bit more complicated, now that our monthly bills have gone up. :)

      • wallacoloo says:

        Well the increased cost is only temporary, right? You didn’t sign up for a yearlong contract or anything, did you?

        • PoV says:

          No contract, but the one thing Ludum Dare always does is gets bigger. We went from 288 entries last event, to nearly 500, and were still not done! The heavy traffic time has passed for now, but we still have spikes ahead of us including Jam submission time, and results time where we tend to get mainstream gaming press coverage. December was typically a smaller event, but last December we smashed prior records with nearly 200 (??), and we can’t be sure we won’t again.

          April 2012 is the 10 year anniversary … And Notch is a regular … What gaming press website could resist that story? :)

  8. TomK32 says:

    My php is exactly great anymore as I moved on to ruby and js some years ago, but if you guys would just push the whole (submission) code up to github we could have a look at what’s so bad about the code.

  9. laremere says:

    Perhaps a new site running on something like google app engine would work? Most of the time the site gets little traffic, so it should stay within the free limits, and when the site does get a lot of traffic google’s servers shouldn’t sweat.

  10. Jigxor says:

    Thank you guys! We all really appreciate everything you do to keep this running!

  11. ratking says:

    Where can I donate money? :-)

  12. Edsploration says:

    Heroicly Done! Thanks for all the hard work, and being here 😀

  13. ZenithSal says:

    You guys are awesome, remember that

  14. barigorokarl says:

    Thank you guys! You’re the best! :)

    I have an idea for the judging system that might tweak the performance, so you can perhaps downgrade at least one contract:
    If I remember correctly, everytime you go to the judging-site every entry is shown. Why not build a “random”-button and show a random entry you didn’t see before. That would also be more “democratic”, because all entries have the same chance to be shown and don’t have to rely on standing exactly on the top/bottom/middle of the list.

    Your story reminds me a little bit of the starting days of a big fashion online shop I worked on for a while, 16 hour shifts and hoping our sql-code wouldn’t shoot our poor liitle sql-server down, so yeah, that seemed familiar … :)

    And do that donation thingy … Build it, and they will come :))

  15. Fenmar says:

    Thank you! Thank you so much for making this great event possible.

  16. TJ says:

    Consider hosting it on a Cloud service, where you can adjust the server power depending on load…

  17. Bleck says:

    Just make Notch pay for the new host and stop worrying about ‘clouds’ or whatever. I hear he can afford it.

  18. tecfreak says:

    Awesome! Just add the donate button and we’ll help with all what we can!

  19. recursor says:

    Just thought I’d add my thanks also. I really appreciate what you guys do, and did to keep everything going this time!

  20. G4JC says:

    This is also why they make CDN’s. I suggest you guys invest in one. :-O
    Content Distribution Systems can kill a load of traffic at the core, my personal preference is cloudflare but there’s others too (just not free). 😉

    • PoV says:

      CDN’s distribute static content. This weekend’s troubles is all from the generation/addition of dynamic content. Our $10/mo host kicked us off for CPU usage, not storage, not bandwidth.

  21. Holy crap, what a nightmare! D= I’m glad I submitted mine early…

  22. Elegwa says:

    Anyone who gets this far down, I’ve had an idea (I’m sure I’m not the first one to have this one) but if you wish you had money to give to LD48, well we have prototypes for games now don’t we? Get your compo/jam entry cleaned up and ready for a release, sell it, throw up ad’s, or whatever you want! Then donate all that money to our favourite website, ludumdare.com. I’m certainly going to try! “All profit goes to ludumdare.com”

  23. callidus says:

    you guys rock, thanks for holding the fort so we could complete another successful LD!

  24. Milo says:

    You should enforce an $800 entrance fee… for Notch.

  25. siebharinn says:

    Nice job troubleshooting and getting it fixed. Most people haven’t experienced that kind of pressure. You guys did good.

  26. ExciteMike says:

    At this point it seems entirely appropriate to set up a donate button in a conspicuous spot. It’s clearly costing you guys non-trivial amounts of money and time! I certainly appreciate it enough to toss some money your way!

    (for the love of God don’t make it an entrance fee, though!)

  27. cornedor says:

    Wow, very nice.

  28. SLiV says:

    Thanks guys! Heroicly done.

  29. The first I noticed before was the lack of pagination, I noticed that you have added it now, nice idea.

  30. debarra says:

    hey. im debarra, ive been making games as a hobby for a while. but htis was the first Ludum Dare i was going to enter. but when i logged in through Word Press and returned to Ludum Dare it said i was signed out and couldnt submit my game. i was wondering would this have anything to do with the problem you suffered

  31. rom016 says:

    I think as a solution you could get 2/3 $10 server sites and have the entries spread across them. Possible each one based in a country, with a high amount of entrants.

Leave a Reply

You must be logged in to post a comment.

[cache: storing page]