All that glitters
GameChanger and MongoDB: a case study in MySQL conversion

When I’m headed out to meet someone for the first time, I usually send a text or email to the tune of “I’ll be the guy in the GameChanger shirt & blue hat who’ll be about 5 minutes late.”  I’m just that guy.  I’m either late, sprinting, or both.

So Mongo.  Jerry Hsu and I presented at MongoNYC a few months back, and I promised to post some of the nuggets of what we talked about there.

The intro to that talk was simply how/why we made the decision to move from MySQL to MongoDB.

First off, I’ve been known to be pretty aggressive when making my tech decisions, but I was very skeptical of MongoDB at first.  Sure, the promise was awesome- a JSON-based database designed from the ground up to interoperate well with the web, offer blazing speed, and hit a sweet spot in the realm between relational databases and document/key-value stores.  I mean, JSON documents with indexing, sharding, and replication?  But I was also starting a new company, and with all the crazy risks that entailed, I didn’t want to build on something that might have me up at all hours dealing with production outages and with minimal documentation.

But at GameChanger we’re tackling this hairy problem: we synchronize data across mobile devices and all kinds of wacky web end-points, and the whole things is built on a RESTy JSON-based HTTP API.  Everything, including our website itself, consumes our API.

Initially, it was built on pure LAMP (where P=Python) as a couple of Django applications.  The API application exposed essentially a fancy CRUD API, and served as a translation layer between the JSON data format (which conveniently maps beautifully onto Python native dict & list structures) and Django Model objects, which in turn were mapped onto tables in MySQL via reams of auto-generated SQL.  We’re doing Python/Django, but it’s not that different if you’re using Hybernate in Java, or ActiveRecord in Rails, or your own home-grown ORM (admit it, you’ve built one)- this is how it works.

And this is how it falls on its face, every time: first, we naively build views.  Yay!  It takes us minutes to construct our first models, generate a SQL schema, and start manipulating objects and writing business code.  It feels powerful.  A few months in, you start putting some users on it, and it’s a little slow.  Tolerable, though.  But some calls take 5-10 seconds to return.  You start to investigate.  Ah!  Of course, a particular lookup through the ORM is loading a bunch of ancillary data, and there’s a missing index to boot.  No matter!  You specify some lazy-loading rules, add that index, and move on.

You build out more and more features, and you start to get bigger amounts of data and more load.  More things are starting to get slow.  Basically, you’re now living in ORM-optimization land: tweaking some form of annotations, hinting query structures, hard-coding little bits of SQL (it’s just a little bit, and it makes it so much better!).  At one point on GameChanger, I looked at a particular API call, and a single view resulted in 16,000 (sixteen THOUSAND) database queries.  That it took only 10 seconds, and not 10 minutes, seemed like the miracle.

And so with a day or so of optimization, I reduced it to ONLY about 400 queries.  And I mean, it did a lot- constructed this big document of play data, zillions of joins, etc.  And it only took a couple of seconds to return after that.  Only.

So what’s the next step?  Well, aside from endlessly tweaking relationships in your ORM (wow, doesn’t seem to simple anymore, does it), the next step is to start changing your data model.  You probably started out with something very normalized (every object gets its own table, you have pretty lookups for status codes, etc).  You start to denormalize, squashing object-groupings into flatter structures, and more and more of your ORM lookups become primary-key queries that return partial sets of fields from a single (gigantic) row in a table.

So we were staring down this path, with our bit JSON-translation-engine, trying to simplify our application code while complexifying our database layer.  For a single “play” object in our system (designed to be cross-sport suitable), we had something like 9 tables involved.  It was no wonder the ORM was choking.

It was at about this point that I saw MongoDB hit version 1.1.  Here’s what went through my head:

  • I can scrap the whole JSON->dict->Model->SQL->Table overhead
  • My underlying datastore can more exactly mirror my API structure
  • Denormalization is an inherent part of the Document model
  • I’ll get this whole scalability roadmap for free

We never looked back.  It took 6 months to frankenstein our way from MySQL onto MongoDB entirely (I don’t believe in big-bang rewrites, so we built cross-DB bridging code, and migrated our data system-by-system across).

A couple of weeks ago, we moved our production systems from Slicehost to Amazon EC2.  We still had a small chunk of our data stored in MySQL (Django sessions, and our user auth info), but I wasn’t going to install MySQL (our single point of failure in the old architecture, I might add) on EC2.  So we finally moved every last bit of data onto Mongo (Mango was crucial in doing the last bit).

Some things are still slow, and still require careful optimization (hey, this is software, people), but on the whole MongoDB has been screaming fast, and feels thoroughly modern.  The lack of schema in Mongo let us model out our “generic” data types with ease, where they were very complex in MySQL.  I’m positively glad we did it, and I’ll tell anyone, I’m not going back: MongoDB has replaced MySQL as my default storage engine of choice, and I expect that to be true for future endeavors as well.

More to come: how we use MongoDB for intelligent caching, lessons learned in scaling MongoDB, and maybe some other bits on Replication and Sharding, as we get deeper into that world.

(disclosure: I know the 10gen founding crew well, and that did influence my initial decision to “jump in” a little, but my conclusions are my own.)

  1. alvera-ruis reblogged this from aurum
  2. st-louis-dentist reblogged this from aurum
  3. eve-fischer reblogged this from aurum
  4. wypozyczalnia-samochodow reblogged this from aurum
  5. dumpster---rental reblogged this from aurum
  6. lewesde reblogged this from aurum
  7. univ-poker-gratuit reblogged this from aurum
  8. mutuelle-expert reblogged this from aurum
  9. satellite-dish reblogged this from aurum
  10. universite-mutuelle reblogged this from aurum
  11. univ-poker reblogged this from aurum
  12. aurum posted this
blog comments powered by Disqus
blog comments powered by Disqus