LHM to the Railscue

This article intend to provide visibility over tools meant to solve migration issues encountered by Rails applications as well as an explained step by step example.

Alexandre Overtus
3 min readJan 9, 2022

rails db:migrate

Using Ruby On Rails, we all have ran this command which behind the scene perform ActiveRecord::Migrations.

At first, migrations are easy to write, seamless. Our project is small, a few tables, hundred of rows.

But then comes the tipping point and our project grows. Hundred of rows quickly becomes thousands and later millions. Running our initial rails command does not seem trivial anymore as migrations get stuck running for minutes, hours or even days.

Depending on which database engine we use and which kind of migration we attempt to run, the table will be locked while the engine modify the structure.

Each lock provides its own set of limitation which could lead to timeout or even downtime until the migration is performed successfully.

Doctolib provides a great article with lock examples on PG migrations:

Large Hadron Migrator

LHM is an online schema migration tool, initially created by Soundcloud, its now supported by Shopify.

The idea behind the tool is to breakdown migrations into multiple steps to mitigate risks and prevent locks from happening.

Running through an example

Goal: Add index on resubmitted column
Lhm.change_table :vrp_transactions do |m| 
m.add_index :resubmitted
end
  1. Create a shadow copy
Shadow copy of the table with added index

2. Add triggers to forward updates

Triggers are created to allow duplication of records between the current vrp_transactions table and lhmn_vrp_transactions table (shadow copy)

3. Copy source data in small chunks

INFO -- : Starting run of class=Lhm::Throttler::Time 
INFO -- : 0.07% (2026/2796715) complete
INFO -- : Starting run of class=Lhm::Throttler::Time
INFO -- : 0.14% (4026/2796715) complete
INFO -- : Starting run of class=Lhm::Throttler::Time
INFO -- : 0.22% (6026/2796715) complete
INFO -- : Starting run of class=Lhm::Throttler::Time
INFO -- : 0.29% (8026/2796715) complete
INFO -- : Starting run of class=Lhm::Throttler::Time
INFO -- : 0.36% (10026/2796715) complete
INFO -- : Starting run of class=Lhm::Throttler::Time
INFO -- : 100% complete

4. Rename target table

vrp_transactions becomes legacy table for backup
lhmn_vrp_transactions becomes the main vrp_transactions table, triggers are dropped

Conclusion

LHM and online schema migration tool in general are meant to solve the lock problem during long migrations but it still going to take a lot of time to backfill the source data.

In 2018, Michael Rook presented the idea of decoupling online migration and deployment. Two steps actively related but which could be performed separately. Migrations can run ahead of time of deployment as long as we approach database changes as a non destructive operation (remove_column, change_column …)

The underlying database is fragile, it’s the dear deep memory of our application, we have to be careful how we perform changes on it

--

--

No responses yet

Write a response