Dealing with N+1 Queries in Rails
One of the most common problems in Rails applications is N+1 queries. As an example, let’s use a simple blogging application, with a Post
model and a Tag
model. When you visit the application you are presented with a snippet for the 10 most recent blog posts.
In our view we show the title for each blog post, as well as the tags associated with that post.
This code is problematic - we made a single query for all the posts, but for each post we render we are making an additional query to get the tags for that post. This is the N+1 problem: we made a single query (the 1 in N+1) that returned somethig (of size N) that we iterate over and perform another database query on (N of them).
If you look at your logs you will most likely see something like this:
(I’m assuming in this example that we have a many-to-many relationship between posts and tags, which is why we would need the posts_tags
table)
Solving N+1 Queries with Includes
A quick Google search will probably tell you to use includes
- this tells ActiveRecord to load all the tags as part of loading the posts.
Now your logs will look something like this:
Much more efficient! This has solved the N+1 problem for this scenario.
N+1 Queries for Count
Let’s continue with this example and build out a tag cloud for our blog. This page will show a simple alphabetical list of all the tags, as well as the number of posts associated with each tag.
Predictably, this will lead to another N+1 problem. The only difference is that the SQL query is a COUNT
.
We can try the same fix as before and use includes
.
Let’s look at the generated SQL.
We might be tempted to look at the SQL and conclude that we have solved the N+1 problem. Unfortunately, we have actually solved the N+1 problem only to introduce another problem - memory bloat! We are loading all the Post
objects (or at least all Post
objects with at least one tag) into memory and then never using them, except to get the size of the associated collection for each tag. This will slow down this page and require an enormous amount of memory - keep in mind that there is no limit on the amount of Post
objects we’re loading here.
Before we get into the solution I want to point out another nuance with this code - note that in the template I am using the size
function.
The posts
association has both a size
and a count
function. If I had used the count
function instead Rails would have ignored the loaded posts
collection and made a COUNT
SQL query. Put another way - calling count
on an ActiveRecord association will always query the database, even if the association has already been loaded. If I had used includes
in the controller and the count
function in the template I would have incurred the overhead of both the original N+1 query as well as the memory bloat of loading all the posts.
Solving N+1 Queries with Counter Cache
Showing the size of an association is such a common problem that Rails has built-in solution - counter cache. The idea is that we create an extra column in our tags
database table and store the number of posts associated with each tag. We then no longer need to load or query the posts associated with each tag - the Tag
model will have an attribute with the number of associated posts. Rails will do all the work of making sure this counter column is kept in sync as new posts are created or deleted.
Let’s start with adding the database column - by convention we will call this column posts_count
.
Once we tell Rails that we are using a counter cache column it will keep the column updated as new posts are created and removed, but we need to manually seed the count. Rails has a built-in function called reset_counters
to do this automatically, but it’s not recommended that we use this inside migrations. To see why, let’s see what that would look like.
There are 2 problems with using reset_counters
inside this migration. Firstly, the migration will be much slower than a simple SQL query, potentially slowing down the next deployment. (‘Slower’ in this context usually means minutes instead of seconds) Secondly, we’re referencing the Tag
model in our migration, which goes against the best practices for migrations - if we ever decide to rename the Tag
model to Hashtag
our migration would fail to run.
Now that we have the database column, we need to tell Rails to keep it in sync.
Now that we have this in place, solving the N+1 problem is trivial.
Solving N+1 Queries with Regular SQL
The counter cache column does solve the N+1 problem, but it’s not without drawbacks. Since the counter cache column is kept in sync by the ActiveRecord code, we have to make sure to always make updates through ActiveRecord - if we make updates directly to the database we would need to refresh the counter cache.
Another drawback is that counter cache doesn’t work with scopes. Let’s introduce the concept of draft posts - each Post
will have a published_at
attribute, and we will only display posts where this attribute is populated. We also want to update our tags page to show the number of published posts for each tag. This means we can no longer rely on the counter cache column (since the counter will include both published and draft posts) - we are effectively back to the original N+1 and memory bloat problems. We can solve this problem by using plain SQL, since SQL already has a solution for this problem - GROUP BY
!
The SQL query for getting the number of published posts for each tag looks like this:
This will give us the number of published posts for each tag (although tags without any published posts won’t show up at all). We can recreate this query in ActiveRecord and make this data available to our view.
This will create a simple hash, mapping each tag id to the number of posts for that tag. All with a single, efficient query!
We’re not completely over line though, since the tag_usage
hash will not contain values for tags without any published posts. We can take care of this by using a default in our view.
This is not a great solution though, since our view is now aware of a very specific implementation detail (or implementation flaw) in how we construct the hash. An alternative solution would be to modify our SQL query to do a LEFT OUTER
join which will ensure all tags are present in the resulting hash, but this would require us to write some custom SQL which is less than ideal. A better solution would be to wrap the tag usage in a view model.
Now both the controller and view and straightforward, and the TagUsage
view model can easily be tested.
This solves the N+1 problem and avoids any potential memory bloat. As an additional bonus it make the code easy to read and test. I prefer this to the counter cache solution, even when there is no scope on the association. Happy coding.