For the logic of data migration, it is very convenient to use the code of models and the tools of the ORM ActiveRecord language instead of SQL.
But this can lead to the following problems:
The model class can be renamed or deleted. Then a nonexistent constant error will be received.
Validations can be added to models that will not allow changes to be made.
The model may contain callbacks with side effects that the author of the migration code does not expect.
For these situations, a “crutch” was invented with redefining the model class right in the migration. This is a duplication of knowledge, and cannot be considered an exemplary solution.
I will quote an example of such a “crutch” from the official Rails manual (up to version 4.2):
# db/migrate/20100513121110_add_flag_to_product.rb class AddFlagToProduct < ActiveRecord::Migration class Product < ActiveRecord::Base end def change add_column :products, :flag, :Boolean Product.reset_column_information Product.all.each do |product| product.update_attributes!(:flag => false) end end end
Personally, I don’t want to have something like this in the codebase.
By the way, instead of each, you should use find_each with batch processing.
It was curious to notice that, starting from version 4.2, this example was completely removed from the manual.
Maintenance Issues When Using SQL in Migration
If, in an effort to get away from using models in migrations, we want to use SQL commands directly, then we face the following disadvantages of this approach:
Logic is expressed more complicated than through model code. It is more difficult, because it is less concise, at a lower level of abstraction, in another language (SQL), which we use relatively rarely.
If there are JOINs, this is already a serious duplication of knowledge expressed in model relationships.
With long processing, it is impossible to track progress and it is impossible to understand whether processing is still in progress or if a deadlock has already occurred.
This section is optional and can be skipped.
The Thoughtbot article makes this point: schema migrations are not business logic, so it should be possible to drop them at any time and load the entire schema via DDL. This will allow you to quickly prepare the environment and run tests for CI. When you remove migrations from a project, the data migration logic they contain is lost.
I don’t see a problem here. Data migration logic, just like schemas, is not business logic and can potentially be thrown out of the project. Namely, when this logic is applied and rollbacks to the pre-migration state are unlikely. If we load a ready-made data schema, then we are not interested in any intermediate transformations of existing data.
For example, in a data migration, values can be set for fields that were previously empty and have become required. But if there is no data, then there is nothing to specify values for.
Let’s say there is data and we throw out old migrations. But the old data migrations have already brought the database to the desired state, and they are still irrelevant.
While nurturing the idea of the article, I had high hopes for this argument, but alas. Please write in the comments what I can miss here.
Exceptions when it is acceptable to do data migrations in schema migrations
It is acceptable to include a data migration in a schema migration if a rollback migration is not possible without it.
For example, turning a nullable field into a field with a default value or vice versa.
Since the migrations are supposed to be rollback, we have to ensure that the conversion is reversed, and updating the value is indispensable.
The request will be trivial, like:
UPDATE table SET field = ‘f’ WHERE field IS NULL
The whole migration might look like this:
class ClientDemandsMakeApprovedNullable < ActiveRecord::Migration def up change_column_null :client_demands, :approved, true change_column_default :client_demands, :approved, nil end def down execute("UPDATE client_demands SET approved = 'f' WHERE approved IS NULL") change_column_null :client_demands, :approved, false change_column_default :client_demands, :approved, false end end
Generally speaking, with a large amount of data in the table, you should not do this and you need to resort to more sophisticated tricks. For example, do not perform the migration on the production, but make all the changes manually and then replace the migration file and version in the database. This technique is described in detail in the Dan Mayer article Managing DB Schema & Data Changes in the Modifying Large Tables section.
Refusal of the decision due to the meager volume of the application or data
Up to a certain point of growth, you can turn a blind eye to mixing problems. These old “sins” will not harm you in any way. You can quit at any time and even act according to the situation. Please tell me in the comments if I’m wrong.
The use of a single mechanism for schema and data migrations, provided that high-quality rollback migrations are written, will ensure the possibility of continuous delivery.
But if the application, base, or team grows, then it is better to come to a stricter discipline in order to reduce losses.
Transferring data migrations from the code base to the ticket system
Since data migrations within schema migrations are the biggest concern, getting them out of there is paramount. We can agree not to consider data migrations as part of the codebase. They can be debugged in the REPL on the staging server and saved in a ticket for manual use in production.
This would be an improvement, but there are significant downsides to this approach:
- The data migrations code is not found when searching for the model name;
- There is no thought process of development through the test;
- No continuous delivery.
Pragmatic philosophy calls not to trust anyone and especially yourself. You cannot trust a script that was debugged on staging data. These data may not be complete. Designing code through a test produces the highest quality results I know of.
I admit that there may be projects where this approach will be justified, but I cannot recommend it for projects in which I myself participate.
Transferring data migrations to Rake tasks
A more reliable, accessible, and efficient means is to create Rake tasks for data migrations. Here it is convenient to cover them with tests directly. While writing a data migration test, I often have insights about requirements and manage to solve potential business problems.
Recently, the project had a data migration task that was not covered by the Rake test. At the code review, no one noticed that instead of adding an element to the array, the entire array was overwritten. A typo led to data corruption and the need to restore data from a backup in a manual selective mode. In the process of writing a test, such a logical typo could not have been made. So tests are our great helpers in the thought process.