Warning: The API has evolved since the initial release - make sure to check out this post before diving in!
Today I’m releasing v0.1 of Mongoid Alizé. From MongoSF nonetheless!
Mongoid Alize is a Ruby gem that allows Mongoid-based applications to easily denormalize data between one-to-one, one-to-many, and many-to-many relations.
While this blog post is an announcement of the release, it’s also intended to be a a primer on denormalization and how to use it to make your applications faster.
(If you’re already familiar with the concept of denormalization, you might skip the following and head directly to mongoid_alize on Github. Otherwise, read on!)
Denormalization: Why Fat is Fast
Database denormalization “is the process of attempting to optimise the read performance of a database by adding redundant data or by grouping data.”
The skinny: to cut down on trips to the database, you store copies of data in multiple tables/collections. In doing so, you avoid the need for JOINs and lookups needed to stitch together related pieces of data.
In sum: you’re storing more data - but your overall application can get much faster!
Real World analogy
Airlines make money when butts are in seats. It’s more than twice as profitable to fly 1 flight with 300 people than it is to fly 2 flights with 150 people each (all other things equal).
Chances are, there are a lot of empty seats (i.e. fields) on the roundtrips between your application and your database. And because of that, you’re probably making a lot of extra roundtrips. If you could just get certain “passengers” to fly together, the number of roundtrips could be significantly reduced.
Because many applications block on I/O (new roundtrips can’t start until the previous roundtrip returns), response time tends to increase linearly with the number of roundtrips taken. So just like the airlines, reducing this number by flying at capacity is essential for a cost-efficient operation.
Code Example: Users and Posts
Your blogging app has
Posts. Users write the
Posts. A common use case for your app is to display a list of posts with the post’s title and the authoring user’s name.
Assume this is your implied data structure in a no-sql database:
[posts] - user_id - title [users] - id - name
Sadly, fetching the list of posts with user names will require 1+n trips to the database, where n is the number of posts. If you have 10 posts, for example, you need 1 lookup to get the post record(s), and then 1 lookup for each post to get the user’s name - that’s 11 database roundtrips!
Here’s a typical, unrefined HAML view snippet that does just that:
1 2 3
post.user.name is a cloaked dagger! This snippet looks like it will hit the database entirely once -
Post.all - but in fact it does so
Post.count more times - on each invocation of
This is Not Good.
Don’t cache out yet
Finding and addressing occurrences like these is a common first step in performance tuning a Rails application.
In the ActiveRecord days, we had the
:include option - which would load associations via JOINs.
This was very helpful, but we often don’t have an analogue in the no-sql world.
Nowadays, once the response times start climbing, many developers turn immediately to caching. Dropping in record caches, partial caches, page caches, tango and caches, etc. And caching Done Right can help here. Caches are Good Things.
However, a mature, scaling app will typically employ caching in many forms, and a multitude of caching tiers can quickly lead to Real Tears.
So when possible, it’s better to achieve the intent of a caching via something you Already Have. (ok I’m getting a little Carried Away)
No JOINS, no caching, no problem!
While you don’t get JOINs with most no-sql architectures, you do get a lot of flexibility with record structure. This is well-suited to denormalizing data. When denormalizing, you’re adding additional, conceptually auxiliary, attributes to related collections. Not worrying about the maintenance of an explicit schema makes this a much more practical solution than it might be otherwise.
Let’s take a look. I’ve updated the implied data model, now showing
user.name denormalized into
[posts] - user_id - title - user_name # populated from user.name [users] - id - name
Now we can write our HAML snippet like this:
1 2 3
1 database hit later and we are winning at denormalizing.
user_name is a real field
stored inside the
posts collection and retrieved along with the post itself.
Those other 10 lookups Never Happened!
Now we just need to make sure that the
user_name attribute of the
Post is populated.
Mongoid::Alize comes in.
Let’s do just that by adding one line of code to our
1 2 3 4 5 6 7
Here we’re telling Alize - “copy the
name field from the
user relation into the post record itself.” (note: you can specify as many fields as you like, or specify none to denormalize all non-internal fields)
Alize then defines a new field
user_name and adds callbacks to the
Post to models to populate the field
when the value changes. C’est facile, mon ami!
The kitchen Sync
Here’s the question that makes denormalization sound scary at first. “If I store multiple copies of my data everywhere, how will I ever keep them all fresh/in sync?”
I’m glad you asked! The ActiveModel spec (and Mongoid’s implementation of it) have two classy properties that make this possible.
- A relation and its inverse are well-defined. If a model
has_many :foos, it’s easy to find out who
- Callbacks, sweet callbacks. Once you know who
Foois, its easy to do something when a
Alize takes advantage of these concepts to keep data in sync. Bi-directional syncing is fully supported for one-to-one, one-to-many, many-to-one, and many-to-many relations. This includes save operations and destroys.
Here’s a line-by-line example (assume users and posts as above w/ alize included):
1 2 3 4 5 6 7 8
The denormalized user data included in the post stays in sync from create to destroy.
The tip of the honeyburg
Here’s an example that shows off Alize’s support for
many relations on the source side
(i.e. the side of the relation where the denormalized attributes go).
The use case here - you’d like to show a user and all of their post titles using just 1 database call.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
_id is also stored in posts_fields but omitted here for readability. It’s needed to locate
records when propagating updates.)
Ready to try it out?
If using Bundler, add this to your
Or clone it: mongoid_alize on Github. The repository has a full set of specs, examples, and documentation.
Mongoid Alize was born from within a real-life Rails/Mongoid/JSON application with lots of nooks and crannies. It’s also very young, so if you try it out I’d love to hear about your experiences and suggestions. Bug reports and pull requests are very welcome!
For next time…
This denormalization pattern scales well up the development stack. JSON serialization becomes easy and fast when you don’t have to bounce around populating object graphs. And it’s easy, for example, to use denormalized fields to populate nested Backbone JS models. More on that soon!