Dealing with N+1 Queries in Rails

One of the most common problems in Rails applications is N+1 queries. As an example, let’s use a simple blogging application, with a Post model and a Tag model. When you visit the application you are presented with a snippet for the 10 most recent blog posts.

class HomeController < ApplicationController
  def index
    @posts = Post.order(published_at: :desc).limit(10)
  end
end

In our view we show the title for each blog post, as well as the tags associated with that post.

<% @posts.each do |post| %>
  <%= link_to(post.slug, post.title) %>
  <%= post.snippet %>
  Tags: <%= post.tags.map(&:name).join(', ') %>
<% end %>

This code is problematic - we made a single query for all the posts, but for each post we render we are making an additional query to get the tags for that post. This is the N+1 problem: we made a single query (the 1 in N+1) that returned somethig (of size N) that we iterate over and perform another database query on (N of them).

If you look at your logs you will most likely see something like this:

SELECT "posts".* FROM "posts" ORDER BY "posts"."published_at" DESC LIMIT 10
SELECT "tags".* FROM "tags" INNER JOIN "posts_tags" ON "posts_tags"."tag_id" = "tags".id" WHERE "posts_tags"."post_id" = 15
SELECT "tags".* FROM "tags" INNER JOIN "posts_tags" ON "posts_tags"."tag_id" = "tags".id" WHERE "posts_tags"."post_id" = 14
SELECT "tags".* FROM "tags" INNER JOIN "posts_tags" ON "posts_tags"."tag_id" = "tags".id" WHERE "posts_tags"."post_id" = 13
SELECT "tags".* FROM "tags" INNER JOIN "posts_tags" ON "posts_tags"."tag_id" = "tags".id" WHERE "posts_tags"."post_id" = 11
SELECT "tags".* FROM "tags" INNER JOIN "posts_tags" ON "posts_tags"."tag_id" = "tags".id" WHERE "posts_tags"."post_id" = 9
SELECT "tags".* FROM "tags" INNER JOIN "posts_tags" ON "posts_tags"."tag_id" = "tags".id" WHERE "posts_tags"."post_id" = 8
SELECT "tags".* FROM "tags" INNER JOIN "posts_tags" ON "posts_tags"."tag_id" = "tags".id" WHERE "posts_tags"."post_id" = 7
SELECT "tags".* FROM "tags" INNER JOIN "posts_tags" ON "posts_tags"."tag_id" = "tags".id" WHERE "posts_tags"."post_id" = 6
SELECT "tags".* FROM "tags" INNER JOIN "posts_tags" ON "posts_tags"."tag_id" = "tags".id" WHERE "posts_tags"."post_id" = 5
SELECT "tags".* FROM "tags" INNER JOIN "posts_tags" ON "posts_tags"."tag_id" = "tags".id" WHERE "posts_tags"."post_id" = 4

(I’m assuming in this example that we have a many-to-many relationship between posts and tags, which is why we would need the posts_tags table)

Solving N+1 Queries with Includes

A quick Google search will probably tell you to use includes - this tells ActiveRecord to load all the tags as part of loading the posts.

class HomeController < ApplicationController
  def index
    @posts = Post.includes(:tags).order(published_at: :desc).limit(10)
  end
end

Now your logs will look something like this:

SELECT "posts".* FROM "posts" ORDER BY "posts"."published_at" DESC LIMIT 10
SELECT "posts_tags".* FROM "posts_tags" WHERE "posts_tags"."post_id" IN (...)
SELECT "tags".* FROM "tags" WHERE "tags"."id" IN (...)

Much more efficient! This has solved the N+1 problem for this scenario.

N+1 Queries for Count

Let’s continue with this example and build out a tag cloud for our blog. This page will show a simple alphabetical list of all the tags, as well as the number of posts associated with each tag.

class TagsController < ApplicationController
  def index
    @tags = Tag.order(:name)
  end
end
<% @tags.each do |tag| %>
  <%= link_to(tag, tag.name) %>
  Number of Posts: <%= tag.posts.size %>
<% end %>

Predictably, this will lead to another N+1 problem. The only difference is that the SQL query is a COUNT.

SELECT "tags".* FROM "tags" ORDER BY "tags"."name" ASC
SELECT COUNT(*) FROM "posts" INNER JOIN "posts_tags" ON "posts_tags"."post_id" = "posts".id" WHERE "posts_tags"."tag_id" = 2
SELECT COUNT(*) FROM "posts" INNER JOIN "posts_tags" ON "posts_tags"."post_id" = "posts".id" WHERE "posts_tags"."tag_id" = 3
SELECT COUNT(*) FROM "posts" INNER JOIN "posts_tags" ON "posts_tags"."post_id" = "posts".id" WHERE "posts_tags"."tag_id" = 5
SELECT COUNT(*) FROM "posts" INNER JOIN "posts_tags" ON "posts_tags"."post_id" = "posts".id" WHERE "posts_tags"."tag_id" = 6
SELECT COUNT(*) FROM "posts" INNER JOIN "posts_tags" ON "posts_tags"."post_id" = "posts".id" WHERE "posts_tags"."tag_id" = 9

We can try the same fix as before and use includes.

class TagsController < ApplicationController
  def index
    @tags = Tag.includes(:posts).order(:name)
  end
end

Let’s look at the generated SQL.

SELECT "tags".* FROM "tags" ORDER BY "tags"."name" ASC
SELECT "posts_tags".* FROM "posts_tags" WHERE "posts_tags"."tag_id" IN (...)
SELECT "posts".* FROM "posts" WHERE "posts"."id" IN (...)

We might be tempted to look at the SQL and conclude that we have solved the N+1 problem. Unfortunately, we have actually solved the N+1 problem only to introduce another problem - memory bloat! We are loading all the Post objects (or at least all Post objects with at least one tag) into memory and then never using them, except to get the size of the associated collection for each tag. This will slow down this page and require an enormous amount of memory - keep in mind that there is no limit on the amount of Post objects we’re loading here.

Before we get into the solution I want to point out another nuance with this code - note that in the template I am using the size function.

Number of Posts: <%= tag.posts.size %>

The posts association has both a size and a count function. If I had used the count function instead Rails would have ignored the loaded posts collection and made a COUNT SQL query. Put another way - calling count on an ActiveRecord association will always query the database, even if the association has already been loaded. If I had used includes in the controller and the count function in the template I would have incurred the overhead of both the original N+1 query as well as the memory bloat of loading all the posts.

Solving N+1 Queries with Counter Cache

Showing the size of an association is such a common problem that Rails has built-in solution - counter cache. The idea is that we create an extra column in our tags database table and store the number of posts associated with each tag. We then no longer need to load or query the posts associated with each tag - the Tag model will have an attribute with the number of associated posts. Rails will do all the work of making sure this counter column is kept in sync as new posts are created or deleted.

Let’s start with adding the database column - by convention we will call this column posts_count.

class AddPostsCountToTags < ActiveRecord::Migration
  def up
    add_column :tags, :posts_count, :integer, null: false, default: 0

    execute <<-SQL
      UPDATE tags
      SET posts_count = (
        SELECT COUNT(1)
        FROM posts_tags
        WHERE posts_tags.tag_id = tags.id
      )
    SQL
  end

  def down
    remove_column :tags, :posts_count
  end
end

Once we tell Rails that we are using a counter cache column it will keep the column updated as new posts are created and removed, but we need to manually seed the count. Rails has a built-in function called reset_counters to do this automatically, but it’s not recommended that we use this inside migrations. To see why, let’s see what that would look like.

class AddPostsCountToTags < ActiveRecord::Migration
  def up
    add_column :tags, :posts_count, :integer, null: false, default: 0

    # Don't do this in a migration
    Tag.find_each { |tag| Tag.reset_counters(tag.id, :posts) }
  end

  def down
    remove_column :tags, :posts_count
  end
end

There are 2 problems with using reset_counters inside this migration. Firstly, the migration will be much slower than a simple SQL query, potentially slowing down the next deployment. (‘Slower’ in this context usually means minutes instead of seconds) Secondly, we’re referencing the Tag model in our migration, which goes against the best practices for migrations - if we ever decide to rename the Tag model to Hashtag our migration would fail to run.

Now that we have the database column, we need to tell Rails to keep it in sync.

class Tag < ActiveRecord::Base
  has_many :posts, counter_cache: true
end

Now that we have this in place, solving the N+1 problem is trivial.

class TagsController < ApplicationController
  def index
    @tags = Tag.order(:name)
  end
end
<% @tags.each do |tag| %>
  <%= link_to(tag, tag.name) %>
  Number of Posts: <%= tag.posts_count %>
<% end %>

Solving N+1 Queries with Regular SQL

The counter cache column does solve the N+1 problem, but it’s not without drawbacks. Since the counter cache column is kept in sync by the ActiveRecord code, we have to make sure to always make updates through ActiveRecord - if we make updates directly to the database we would need to refresh the counter cache.

Another drawback is that counter cache doesn’t work with scopes. Let’s introduce the concept of draft posts - each Post will have a published_at attribute, and we will only display posts where this attribute is populated. We also want to update our tags page to show the number of published posts for each tag. This means we can no longer rely on the counter cache column (since the counter will include both published and draft posts) - we are effectively back to the original N+1 and memory bloat problems. We can solve this problem by using plain SQL, since SQL already has a solution for this problem - GROUP BY!

The SQL query for getting the number of published posts for each tag looks like this:

SELECT tags.id, COUNT(1)
FROM tags
INNER JOIN posts_tags ON tags.id = posts_tags.tag_id
INNER JOIN posts ON posts.id = posts_tags.post_id
WHERE posts.published_at IS NOT NULL
GROUP BY tags.id

This will give us the number of published posts for each tag (although tags without any published posts won’t show up at all). We can recreate this query in ActiveRecord and make this data available to our view.

class TagsController < ApplicationController
  def index
    @tags = Tag.order(:name)
    @tag_usage = Post.published.joins(:tags).group(:tag_id).count
  end
end

This will create a simple hash, mapping each tag id to the number of posts for that tag. All with a single, efficient query!

<% @tags.each do |tag| %>
  <%= link_to(tag, tag.name) %>
  Number of Posts: <%= @tag_usage[tag.id] %>
<% end %>

We’re not completely over line though, since the tag_usage hash will not contain values for tags without any published posts. We can take care of this by using a default in our view.

Number of Posts: <%= @tag_usage[tag.id] || 0 %>

This is not a great solution though, since our view is now aware of a very specific implementation detail (or implementation flaw) in how we construct the hash. An alternative solution would be to modify our SQL query to do a LEFT OUTER join which will ensure all tags are present in the resulting hash, but this would require us to write some custom SQL which is less than ideal. A better solution would be to wrap the tag usage in a view model.

class TagUsage
  def initialize(tags)
    @tags = tags
  end

  def number_of_published_posts(tag)
    published_posts[tag.id] || 0
  end

  private

  def published_posts
    @published_posts ||= Post.published
      .joins(:tags)
      .where(tags: { id: @tags })
      .group(:tag_id)
      .count
  end
end

Now both the controller and view and straightforward, and the TagUsage view model can easily be tested.

class TagsController < ApplicationController
  def index
    @tags = Tag.order(:name)
    @tag_usage = TagUsage.new(@tags)
  end
end
<% @tags.each do |tag| %>
  <%= link_to(tag, tag.name) %>
  Number of Posts: <%= @tag_usage.number_of_published_posts(tag) %>
<% end %>

This solves the N+1 problem and avoids any potential memory bloat. As an additional bonus it make the code easy to read and test. I prefer this to the counter cache solution, even when there is no scope on the association. Happy coding.