Patterns for Data Migrations in Rails
In my experience with developing Rails applications I have found migrations to be one of the most error-prone areas. As I result I have come up with a few loose patterns on what to do (and what not to do) when doing migrations.
Rule #1 – Never modify an existing migration
This is probably obvious to anyone who has worked with Rails past a cursory glance, but it is also the first rule I broke as a new Rails developer. Once your migration has been pushed it cannot be changed – environments where your migration has already run will simply ignore the change. This will lead to data inconsistencies (since some environments will run the older version of your migration and some the newer) and general pain in your development cycle.
Rule #2 – Migrations should not interact with models
Let’s look at a sample migration.
This migration simply adds a new ‘status’ column to the ‘users’ table, sets all the existing users to be ‘active’ and then changes the column to not allow null values. This will also run without any issues.
The reason why you don’t want to do this is because of the dependency on the User model. At a later point we might rename the ‘User’ model to something else (‘Person’ for example) in which case our migration will simply blow up.
What then is the alternative?
Pattern #1 - Execute custom SQL statements
Let’s change the previous example to not depend on the User model.
This is the most common pattern I use for migrations - write custom SQL and use the various methods on ActiveRecord::Base.connection (see the API docs) to query and manipulate the data.
Remember to use the quote method to properly escape any input you’re passing into SQL, for example we could rewrite this example like this:
Pattern #2 - Using migration models
Let’s take a look at a more complicated problem, which illustrates the need for migration models. Let’s assume our User model has many comments and we now want to add a last_comment_id field on the User. What would this migration look like?
Even with this simple example that’s quite a bit of custom SQL we need to write. If we were using STI (single table inheritance) this example would have gotten even more complicated. Of course, with the built-in ActiveRecord models this would have been really simple (and readable).
This is where migration models come in - we basically declare a model purely for the purpose of the migration. Let’s modify our example to use a migration model.
This migration will run without any issues. Unfortunately it can also have some unintended side-effects - if our User model has timestamps (meaning created_at and updated_at columns) Rails will automatically update those columns as part of the update, effectively doing a reset on all the timestamp information on our users.
Another very serious issue that can show up with this pattern is when the name of the migration model’s class is used - for example, in polymorphic associations or in single table inheritance. Our migration models are namespaced to the migration class, so you end up seeing values like ‘AddLastCommentIdToUsers::Comment’ in the type field.
This is all to say that we cannot use this pattern in this form. The best way to get around all of these issues is to use ActiveRecord for doing queries, but manual SQL for the updates. So instead of doing update_attribute in this case, we can simply write a manual SQL statement.
(The ‘execute’ method is always available in migrations - it simply executes the given SQL and doesn’t return any values.)
Summary
I think it’s important to keep in mind that data migrations are complicated - even when the code is very simple there is often hidden complexities that we are unaware of. So which of these 2 patterns should we choose? I usually end up writing custom SQL statements unless the query part is really complicated, in which case I will use migration models but still do the update portion with manual SQL.
If there are other patterns for writing data migrations in Rails I would love to hear about them. Happy coding.