Alter columnt type from int to bigint with minimal downtime

Question

I have several tables with id columns that are declared to be serial (i.e. integer type with default nextval()).

Converting them to bigint seem to be quite straightforward:

ALTER TABLE my_table ALTER COLUMN id TYPE bigint;

But that might take a lot of time and lock the whole table. It's okay-ish solution for cases when the whole site is already down due to integer overflow, but not so good for provisional changes.

How would one do that concurrently?

Use logical replicaiton perhaps?

mustaccio
– mustaccio

2023-12-25 13:51:37 +00:00
Commented Dec 25, 2023 at 13:51 — mustaccio
– mustaccio, Commented Dec 25, 2023 at 13:51

Melkij · Accepted Answer · 2024-11-28 13:50:17Z

The general idea is: create a new column, fill it with values, change the column name with two renames so that the new column is called the same as the old one, delete the old one.

Let's assume we have table named transaction and we want to change type of id column.

new column

Here we create a new column and a trigger that will fill in values for new rows or during the update of existing rows.

begin; set local statement_timeout to '100ms'; alter table public.transaction add column new_id bigint; CREATE FUNCTION public."transaction_migr_f"() returns trigger as $$ begin new.new_id := new.id; return new; end $$ language plpgsql; CREATE TRIGGER "transaction_migr_t" before insert or update on public.transaction for each row execute function public."transaction_migr_f"(); commit;

updating existing rows

here we need to update all rows in the table in free form. It is necessary to do it in small parts so that concurrent update/delete can work, instead of waiting for row lock. On modern versions of postgresql (14+) you can generate queries

update public.transaction set new_id = id where id is distinct from new_id and ctid >='(0,0)' and ctid<'(10000,0)'; update public.transaction set new_id = id where id is distinct from new_id and ctid >='(10000,0)' and ctid<'(20000,0)';

and go from 0 to select relpages from pg_class where relname = 'transaction'. This is slightly more efficient in terms of IO.

It is not important to divide the work volume exactly into some number of rows. One part updated 10k rows in 1 second, and the other 50k in 5 seconds - this is not a problem as long as the total duration of each transaction is small. I recommend the transaction duration to 10-15 seconds for one batch. Accordingly, reduce or increase the ctid ranges for your database.

You can split it into parts by any other criteria. By id itself or by some timestamp by days.

Verify

check to see if anything was missed by mistake:

select count(*) cnt from public.transaction where id is distinct from new_id

Indexes

Check the table definition. Changing the type directly via alter table will do it for us, but in our case we need to reproduce all constraints/indexes/sequences/etc manually.

Column named id is typically a primary key with sequence. So, we need preserve primary key index (will be used in later steps) and not null constraint.

CREATE UNIQUE INDEX CONCURRENTLY "3ce47f41b86aaaadec955388734f8f98" ON public.transaction USING btree (new_id);

We create a unique index, later we will declare it as a new primary key.

begin; set local statement_timeout = '1s'; alter table public.transaction add constraint transaction_new_id_not_null check (new_id is not null) not valid; commit; alter table public.transaction validate constraint transaction_new_id_not_null;

SET NOT NULL without a long lock can be set in this way. I wrote more details here

It is necessary to create here via create index concurrently any other indexes in which the id field participates. Including multi-column or partial ones. Instead of id, now use new_id in index definition.

Swap columns

Many different actions, but all without long blocking

BEGIN; set local statement_timeout to '10s'; lock table public.transaction in access exclusive mode; aLter table public.transaction drop constraint transaction_pkey; -- name of current Primary Key constraint alter table public.transaction alter id drop default; alter table public.transaction alter id drop not null; -- actually mark new column as not null alter table public.transaction alter new_id set not null; -- and drop temporary check constraint alter table public.transaction drop constraint if exists transaction_new_id_not_null; -- the two commands above could have been done in the previous step, -- but here we still need a short-term table lock, so I combine them in one transaction alter table public.transaction rename id to old_id; alter table public.transaction rename new_id to id; ALTER TABLE public.transaction ALTER id SET DEFAULT nextval('transaction_id_seq'::regclass); -- name of sequence used in old default expression alter sequence transaction_id_seq owned by public.transaction.id; alter sequence transaction_id_seq as bigint; -- note: sequence has type too -- transform the unique index into a new one primary key alter table public.transaction add constraint transaction_pkey primary key using index "3ce47f41b86aaaadec955388734f8f98"; -- cleanup, we don't need this temporary trigger anymore (created on step 1) drop trigger "transaction_migr_t" on public.transaction; drop function public."transaction_migr_f"(); -- create new foreign keys NOT VALID if needed -- drop foreign keys which refer to old_id COMMIT;

After commit, the application will start working with the bigint column. Here you need to check the application itself, most applications, in my experience of many such migrations, do not react at all to the change of the column type, they continue to work properly.

Some applications may need to be restarted. In very rare cases, the application is hardcoded that the column is strictly of type int. When it starting to receive bigint values, the application returns an error.

Cleanup

Here it remains to drop the indexes that mention the old column old_id.

drop index concurrently index1; drop index concurrently index2; -- so on begin; set local statement_timeout to '100ms'; -- drop column itself alter table public.transaction drop column old_id; commit; -- if in the previous step we created some not valid constraint -- validate here alter table public.transaction validate constraint ...;

Done.

jjanes · Accepted Answer · 2023-12-26 17:52:06Z

There is no good way to do this without substantial downtime.

You would need to add a new column, change the code to populate but not rely on this new column when rows are inserted/updated, then populate the new column by copying from the old column with conversion and in chunks (to avoid locking problems, and possibly to throttle resource usage), then change the code to rely on the new column, then drop the old column. It is not fun.

The system could do this work for you if it knew how long the field was in storage. Then it could just up-convert from int to bigint on the fly when encountering old data with the sorter length. But I don't think PostgreSQL stores the field length (on a per-row basis) for fixed length fields like int and bigint, so there is no way for it to reliably detect this.

Stack Exchange Network

Alter columnt type from int to bigint with minimal downtime

2 Answers 2

Linked

Hot Network Questions

Alter columnt type from int to bigint with minimal downtime

2 Answers 2

Linked

Related

Hot Network Questions