TL;DR Please move data migration code into Rake tasks or use full schema migration style gems. Cover this logic with tests.
I work as a backend developer at FunBox. In a number of projects, we write a backend in Ruby On Rails. We strive to build adequate development processes, therefore, when faced with a problem, we try to comprehend it and develop methodological recommendations. So it happened with the problem of data migration. Once I did a data migration in a single Rake task covered with tests, and the team had a question: “Why not in a schema migration?” I asked in the internal dev chat, and much to my surprise, opinions were divided. It became clear that the question is ambiguous and worthy of thoughtful analysis and article. The maximum program for the purposes of the article will be fulfilled for me when someone links to this text to the code review in response to the question why a particular data migration is taken out or, conversely, not taken out of the schema migration.
Lyrical digression
I undertook to write this article to reduce the pain and increase the productivity of teamwork. In the beginning, I hoped to find hard evidence about the harm of abusing schema migrations for data migrations. In parallel with this, I read the book by Nikolai Berdyaev “The Meaning of Creativity. The experience of human justification. From it I learned the concept of “cathedral spirit.”
In the world of programming and IT, the desire of people to give all activities a scientific character with an evidence base for everything prevails. When I got into the world of Ruby, I felt something very different. Yukihiro Matsumoto created a language to make it easier for people to communicate through code, and this has created a special community of humane people. It seems to me that it is precisely the conciliar spirit that is felt in this community: everyone shares similar values, has similar intuitions and treats each other with love in the gospel sense of the word, which means that they do not need proof, since, according to Berdyaev, proof is needed under different circumstances. hostile intuitions.
The discovery of the concept of the conciliar spirit inspired me to write the article, when I already realized that demonstrative arguments are hardly possible. My goal is to gather arguments that will resonate with developers and generate an intuition that mixing schema migrations and data migrations is inefficient because it can lead to operational and maintenance problems.
Mixing data and schema migrations
Definitions of key concepts of the article
A data schema is a collection of tables with their columns, views, indexes, and stored procedures used to store and manipulate business entity data. It is an essential part of business logic.
Data schema migration is the logic of changing the data schema (adding, deleting tables, columns, indexes, etc.) necessary to add a new functionality to the product. Qualitative migration involves the definition of reverse actions to be able to roll back to the previous version of the product. The set of schema migrations is not an integral part of the business logic and can theoretically be replaced by a single migration that will include the logic of all migrations. On CI, where the database is always created from scratch, it is possible (and necessary) to load the entire structure with an SQL script that is generated when running migrations.
Data migration is the logic of changing the data itself in tables. That is, everything that is done through the UPDATE DML operation of the SQL language. The main subject of the article. It is not an integral part of the business logic.
Continuous Delivery is the quality of the development process that allows you to automatically deploy any version of the product with one team.
The official Rails documentation says that migrations are for schema migration, meaning they are limited to DDL queries. But the lack of a ready-made solution for data migrations leads to the abuse of schema migrations for data transformation. It seems that this problem is specific to Rails and similar omakase backend development frameworks. When there is no out-of-the-box solution for schema migrations, there is nothing to abuse.
Benefits of mixing data and schema migrations
There is a positive aspect to doing data transformations in the same way as schema transformations. That is, create increments of changes between versions that can be performed forward and backward. From a continuous delivery point of view, it should be possible to deploy any version of the system in such a way that the schema and data state are correct and consistent. It is also convenient to see all the increments in a single list in the file system and deal with them in the same way during the operation of the system.
Problems of mixing data and schema migrations
Data migrations are different from schema migrations and create a different load profile when executed. This creates problems that are talked about a lot in the English blogosphere. I’ve collected the most common (probably all) arguments and highlighted operational, maintenance, and questionable issues.
Operational problems
Data migrations take longer than schema migrations. This increases downtime for deployments. For large volumes, downtime can exceed the timeout set for migrations and manual intervention is required.
Long data migration transactions increase the likelihood of deadlocks in the database.
To prevent the indicated operational problems, you can use static code analysis tools at the development stage, for example, the Zero Downtime Migrations and Strong Migrations gems.
Maintenance issues
Violation of the Single Responsibility Principle
Schema migrations are a DSL (Domain Specific Language) in Ruby for SQL DDL constructs and bindings over them. As long as we use DSL, reasonable quality is guaranteed by manually checking that the migration is successful in both forward and backward directions. If we make a mistake in the sense of migration, we will not be able to continue development and will immediately fix it.
As soon as we go beyond the DSL to manipulate the data, we are violating the Single Responsibility Principle of SRP. The consequence of this violation for us is an increased risk of errors. If we want to eliminate it, then we want to cover migrations with tests, but …
No tests (at least adequate, cheap ones)
The author of the Ruby On Rails Data Migration article, for the sake of testing data migrations, rolls up previous migrations and checks that the target migration will perform the necessary data changes. In a large application, this will take a monstrously long time and increase the cognitive load on the team by having to read and write such tests. It’s not desirable to have data migration logic inside the Rails migration code where it’s so hard to test. I will tell you where to place this logic in the section on solutions.