MySql::To::PostgreSQL Migrating a Perl App from MySQL to PostgreSQL
Who I am Jérôme Étévé Full stack application developer at Broadbean
Our app Medium size. 24 tables , 3 ‘large’ ones. Uses DBIx::Class + some raw SQL (w/ DBI) A few DB based features A couple of nice accidentally convenient MySQL idiosyncrasies
Prereqs - Run your app in stricter mode (TRADITIONAL + DBI RaiseError) - App dates time zones are not floating - Have a good test suite, unit, functional and regressions. - Data Integrity, Consistency - Know your app functionally
Steps - Schema migration - Code compatibility - Data migration
Schema migration Can it be automated? Plenty of scripts there: https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL DDL: Less expressive(MySQL) to more expressive (Pg)
Schema migration Can it be automated? Sort of Do it by hand! (keeping the columns order)
Schema migration Work in a branch mysqldump --no-data --compatible=postgresql mysql > schema_mysql.sql cp schema_mysql.sql schema_pg.sql
Schema migration 1 - Wrap schema_pg.sql in a BEGIN; ROLLBACK; 2 - Adapt and reorder DDL statements 3 - Run against Postgresql DB 4 - Read error messages 5 - GOTO 2
Schema Migration - DDL Mysql: Character set at column level Pg: Pick one that makes sense for your data
Schema migration - DDL Mysql: Collation at column level Trick to make a string Case Sensitive: COLLATE utf8_bin Pg: Don’t need that, except if a collation is really needed for itself.
Schema migration - DDL Mysql: TINYINT(1) A sort of Mysql Boolean. Pg: Use BOOLEAN if your app really assumes a boolean. (0,1 .. 2 ..) Otherwise SMALLINT
Schema migration - DDL Mysql: INT(11,25,33,7) Pg: INTEGER
Schema migration - DDL Mysql: INT(*) UNSIGNED Q: Semantic UNSIGNED or cargo cult UNSIGNED? Look at your app and at the data. Pg: Semantic: CHECK >= 0 , Cargo cult: INTEGER if the data allows, BIGINT otherwise.
Schema migration - DDL Mysql: BIGINT(*) UNSIGNED Semantic: CHECK >=0 Data: Do you really need integers above 9223372036854775807? 9.2 x 1018 If yes then use NUMERIC (Not sequence-able)
Number of cat hair on earth: 2.4 x 1016 Source: Yahoo answer, Ask.com
Schema migration - DDL Mysql: DATETIME Pg: TIMESTAMP WITH TIMEZONE DateTime->now() is in UTC
Schema migration - DDL Mysql: JSON blobs. Often stored as TEXT Pg: JSONB indexable (or just JSON if needed)
Schema migration - DDL Mysql: IP addresses as VARCHAR Pg: Use INET (compatible with V4 and V6 IPs)
Schema migration - DDL Mysql: CHI::Driver::DBI tables chi_..(key VARCHAR(300), value BLOB) Pg: chi_..(key BYTEA NOT NULL, value BYTEA)
Schema migration - DDL Mysql: ENUM Pg: Define an ENUM type and use it as a column type: CREATE TYPE bla_type AS ENUM(‘v1’, ‘v2’); .. COLUMN one_bla bla_type, ...
Schema migration - DDL Mysql: Trigger bodies are in the trigger definition. Postgres: Define a special function, then CREATE TRIGGER example: CREATE TRIGGER trigger_watchdog_resultcount_update AFTER update ON watchdog_result FOR EACH ROW WHEN ( NEW.viewed <> OLD.viewed ) EXECUTE PROCEDURE watchdog_subscriptions_count_maintain();
Schema - AUTO_INCREMENT Mysqldump: Does not output AUTO_INCREMENT property. Postgres: Implement sequences manually with a SQL script..
Schema - AUTO_INCREMENT DROP FUNCTION IF EXISTS install_auto_increment(varchar, varchar) CASCADE; CREATE FUNCTION install_auto_increment(t_name varchar, c_name varchar) RETURNS void AS $end_of_function$ DECLARE sequence_name varchar; max_id BIGINT; BEGIN sequence_name = t_name || '_' || c_name || '_seq'; EXECUTE 'ALTER TABLE ' || quote_ident(t_name) || ' ALTER COLUMN ' || c_name || ' DROP DEFAULT'; EXECUTE 'DROP SEQUENCE IF EXISTS ' || quote_ident(sequence_name); EXECUTE ' SELECT COALESCE( max( ' || quote_ident(c_name) || ' ), 0 ) + 1 FROM ' || quote_ident(t_name) INTO max_id; EXECUTE 'CREATE SEQUENCE ' || sequence_name || ' INCREMENT BY 1 START WITH ' || max_id; EXECUTE 'ALTER TABLE ' || quote_ident(t_name) || ' ALTER COLUMN ' || quote_ident(c_name) || ' SET DEFAULT nextval(''' || quote_ident(sequence_name) || ''')'; RETURN; END; $end_of_function$ LANGUAGE plpgsql;
Schema - AUTO_INCREMENT Then call install_auto_increment(‘table’ , ‘id_columns’) For each of your auto_increment tables.
Application features Goal: Adapt your application so it works just fine on Postgresql too. Even if we use DBIx::Class, we will need a function to make the switch.
Application features Add that to your DBIx::Class schema: sub with_db{ my ($self, $pg, $mysql) = @_; if( $self->storage()->sqlt_type() eq 'PostgreSQL' ){ return &$pg(); } return &$mysql(); }
Application features Make sure you use DBIx::Class the right way Some things easy to overlook..
Application features Some things not to be overlooked: Formatting DateTime in DBIx::Class queries. $schema->storage->datetime_parser ->format_datetime( a DateTime instance )
Application features Some things not to be overlooked: Escaping %,_ characters in your LIKE queries
Application features use DBI::Const::GetInfoType; sub escape_for_like{ my ($self, $string) = @_; $string //= ''; my $escape_char = ''; $self ->storage->dbh_do(sub{ my ( $storage, $dbh ) = @_; if( my $new_escape = $dbh->get_info( $GetInfoType{SQL_SEARCH_PATTERN_ESCAPE} ) ) { $escape_char = $new_escape; } }); $string =~ s/([_%])/$escape_char$1/g; return $string; }
Application features The case of case insensitivity MySQL default collation makes varchar matching case insensitive -> Easy LIKE ‘Bla%’ == LIKE ‘bla%’ -> Painful and bug-prone VARCHAR as KEYs..
Application features The case of case insensitivity Postgres: Use a functional index with a good operator class. CREATE INDEX tag_name_ci ON tag( LOWER(tag_name) text_pattern_ops); SELECT * FROM tag WHERE LOWER(tag_name) LIKE LOWER(‘AB’) || ‘%’;
Application features The case of case insensitivity EXPLAIN SELECT * FROM tag WHERE lower(tag_name) LIKE LOWER('AB') || '%'; Index Scan using tag_name_ci on tag (cost=0.14..8.17 rows=1 width=566) Index Cond: ((lower((tag_name)::text) ~>=~ 'ab'::text) AND (lower ((tag_name)::text) ~<~ ' ac'::text)) Filter: (lower((tag_name)::text) ~~ 'ab%'::text)
Application features The case of case insensitivity In your DBIx::Class app: Use the method with_db(..) to do the right thing for PostgreSQL.
Application features Database wide advisory locking. -> Make sure you implement your locking based on ‘with_db’.
Application features If you use CHI::Driver::DBI: Bug when running with DBI RaiseError option. This fixes it: https://github.com/jonswar/perl-chi-driver-dbi/pull/5
Application features Does your DBIx::Class application use Postgresql reserved words as relation names? Like ‘user’. $schema->storage->sql_maker->quote_names(1);
So you’ve tested your application? But the upstream schema has changed :( $ mysqldump --no-data -- compatible=postgresql yourdb > schema_mysql.sql $ git diff schema_mysql.sql Then edit schema_pg.sql.. And test again
Time for Data migration Google will point you to plenty of tools.. The mysqldump way, a few ideas..
Time for Data migration Skip the cruft: --compact --no-create-info --skip-triggers
Time for Data migration Avoid character set issues: --default-character-set=utf8 --result-file=data.sql
Time for Data migration Using CHI::Driver::DBI? Skip chi_* tables --ignore-table=db.chi_bla
Time for Data migration cat header.sql data.sql footer.sql > data_pg.sql Header: SET standard_conforming_strings = 'off'; -- Compatible with mysql dump strings SET backslash_quote = 'on'; -- That are escaped with SET client_min_messages TO ERROR; -- No warnings for those UPDATE pg_cast SET castcontext='a' WHERE casttarget = 'boolean'::regtype; -- Mysql does not have boolean. DO $$ DECLARE tables CURSOR FOR SELECT tablename FROM pg_tables WHERE schemaname = 'public' ORDER BY tablename; BEGIN FOR table_record IN tables LOOP EXECUTE 'ALTER TABLE ' || table_record.tablename || ' DISABLE TRIGGER ALL' ; -- RUN IN superuser mode. END LOOP; END$$;
Time for Data migration cat header.sql data.sql footer.sql > data_pg.sql Footer: SET standard_conforming_strings = 'on'; SET backslash_quote = 'off'; UPDATE pg_cast SET castcontext='e' WHERE casttarget = 'boolean'::regtype; DO $$ DECLARE tables CURSOR FOR SELECT tablename FROM pg_tables WHERE schemaname = 'public' ORDER BY tablename; BEGIN FOR table_record IN tables LOOP EXECUTE 'ALTER TABLE ' || table_record.tablename || ' ENABLE TRIGGER ALL' ; END LOOP; END$$;
A timeline 1 - Reach prereq state 2 - Fork the code and migrate schema 3 - Test and adapt your app, write data migration scripts 4 - rebase and goto 3 5 - schema freeze 6 - Full data migration test 7 - Live goes to read only or down 8 - Migrate live data for good 9 - Re-install the sequences 10 - Switch the application! 11 - Fix and support
Conclusion PostgreSQL’s richer DDL allows stronger validation and consistency. No magic automatic migration possible Decisions have to be taken by humans
Conclusion, continued - A very good occasion to review and fix your schema - A very good occasion to make your App DB agnostic - Most of the steps have to be tailored to the specific application - Application expertise needed
Thanks! Questions?

PerlApp2Postgresql (2)

  • 1.
    MySql::To::PostgreSQL Migrating a PerlApp from MySQL to PostgreSQL
  • 2.
    Who I am JérômeÉtévé Full stack application developer at Broadbean
  • 3.
    Our app Medium size.24 tables , 3 ‘large’ ones. Uses DBIx::Class + some raw SQL (w/ DBI) A few DB based features A couple of nice accidentally convenient MySQL idiosyncrasies
  • 4.
    Prereqs - Run yourapp in stricter mode (TRADITIONAL + DBI RaiseError) - App dates time zones are not floating - Have a good test suite, unit, functional and regressions. - Data Integrity, Consistency - Know your app functionally
  • 5.
    Steps - Schema migration -Code compatibility - Data migration
  • 6.
    Schema migration Can itbe automated? Plenty of scripts there: https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL DDL: Less expressive(MySQL) to more expressive (Pg)
  • 7.
    Schema migration Can itbe automated? Sort of Do it by hand! (keeping the columns order)
  • 8.
    Schema migration Work ina branch mysqldump --no-data --compatible=postgresql mysql > schema_mysql.sql cp schema_mysql.sql schema_pg.sql
  • 9.
    Schema migration 1 -Wrap schema_pg.sql in a BEGIN; ROLLBACK; 2 - Adapt and reorder DDL statements 3 - Run against Postgresql DB 4 - Read error messages 5 - GOTO 2
  • 10.
    Schema Migration -DDL Mysql: Character set at column level Pg: Pick one that makes sense for your data
  • 11.
    Schema migration -DDL Mysql: Collation at column level Trick to make a string Case Sensitive: COLLATE utf8_bin Pg: Don’t need that, except if a collation is really needed for itself.
  • 12.
    Schema migration -DDL Mysql: TINYINT(1) A sort of Mysql Boolean. Pg: Use BOOLEAN if your app really assumes a boolean. (0,1 .. 2 ..) Otherwise SMALLINT
  • 13.
    Schema migration -DDL Mysql: INT(11,25,33,7) Pg: INTEGER
  • 14.
    Schema migration -DDL Mysql: INT(*) UNSIGNED Q: Semantic UNSIGNED or cargo cult UNSIGNED? Look at your app and at the data. Pg: Semantic: CHECK >= 0 , Cargo cult: INTEGER if the data allows, BIGINT otherwise.
  • 15.
    Schema migration -DDL Mysql: BIGINT(*) UNSIGNED Semantic: CHECK >=0 Data: Do you really need integers above 9223372036854775807? 9.2 x 1018 If yes then use NUMERIC (Not sequence-able)
  • 16.
    Number of cathair on earth: 2.4 x 1016 Source: Yahoo answer, Ask.com
  • 17.
    Schema migration -DDL Mysql: DATETIME Pg: TIMESTAMP WITH TIMEZONE DateTime->now() is in UTC
  • 18.
    Schema migration -DDL Mysql: JSON blobs. Often stored as TEXT Pg: JSONB indexable (or just JSON if needed)
  • 19.
    Schema migration -DDL Mysql: IP addresses as VARCHAR Pg: Use INET (compatible with V4 and V6 IPs)
  • 20.
    Schema migration -DDL Mysql: CHI::Driver::DBI tables chi_..(key VARCHAR(300), value BLOB) Pg: chi_..(key BYTEA NOT NULL, value BYTEA)
  • 21.
    Schema migration -DDL Mysql: ENUM Pg: Define an ENUM type and use it as a column type: CREATE TYPE bla_type AS ENUM(‘v1’, ‘v2’); .. COLUMN one_bla bla_type, ...
  • 22.
    Schema migration -DDL Mysql: Trigger bodies are in the trigger definition. Postgres: Define a special function, then CREATE TRIGGER example: CREATE TRIGGER trigger_watchdog_resultcount_update AFTER update ON watchdog_result FOR EACH ROW WHEN ( NEW.viewed <> OLD.viewed ) EXECUTE PROCEDURE watchdog_subscriptions_count_maintain();
  • 23.
    Schema - AUTO_INCREMENT Mysqldump:Does not output AUTO_INCREMENT property. Postgres: Implement sequences manually with a SQL script..
  • 24.
    Schema - AUTO_INCREMENT DROPFUNCTION IF EXISTS install_auto_increment(varchar, varchar) CASCADE; CREATE FUNCTION install_auto_increment(t_name varchar, c_name varchar) RETURNS void AS $end_of_function$ DECLARE sequence_name varchar; max_id BIGINT; BEGIN sequence_name = t_name || '_' || c_name || '_seq'; EXECUTE 'ALTER TABLE ' || quote_ident(t_name) || ' ALTER COLUMN ' || c_name || ' DROP DEFAULT'; EXECUTE 'DROP SEQUENCE IF EXISTS ' || quote_ident(sequence_name); EXECUTE ' SELECT COALESCE( max( ' || quote_ident(c_name) || ' ), 0 ) + 1 FROM ' || quote_ident(t_name) INTO max_id; EXECUTE 'CREATE SEQUENCE ' || sequence_name || ' INCREMENT BY 1 START WITH ' || max_id; EXECUTE 'ALTER TABLE ' || quote_ident(t_name) || ' ALTER COLUMN ' || quote_ident(c_name) || ' SET DEFAULT nextval(''' || quote_ident(sequence_name) || ''')'; RETURN; END; $end_of_function$ LANGUAGE plpgsql;
  • 25.
    Schema - AUTO_INCREMENT Thencall install_auto_increment(‘table’ , ‘id_columns’) For each of your auto_increment tables.
  • 26.
    Application features Goal: Adaptyour application so it works just fine on Postgresql too. Even if we use DBIx::Class, we will need a function to make the switch.
  • 27.
    Application features Add thatto your DBIx::Class schema: sub with_db{ my ($self, $pg, $mysql) = @_; if( $self->storage()->sqlt_type() eq 'PostgreSQL' ){ return &$pg(); } return &$mysql(); }
  • 28.
    Application features Make sureyou use DBIx::Class the right way Some things easy to overlook..
  • 29.
    Application features Some thingsnot to be overlooked: Formatting DateTime in DBIx::Class queries. $schema->storage->datetime_parser ->format_datetime( a DateTime instance )
  • 30.
    Application features Some thingsnot to be overlooked: Escaping %,_ characters in your LIKE queries
  • 31.
    Application features use DBI::Const::GetInfoType; subescape_for_like{ my ($self, $string) = @_; $string //= ''; my $escape_char = ''; $self ->storage->dbh_do(sub{ my ( $storage, $dbh ) = @_; if( my $new_escape = $dbh->get_info( $GetInfoType{SQL_SEARCH_PATTERN_ESCAPE} ) ) { $escape_char = $new_escape; } }); $string =~ s/([_%])/$escape_char$1/g; return $string; }
  • 32.
    Application features The caseof case insensitivity MySQL default collation makes varchar matching case insensitive -> Easy LIKE ‘Bla%’ == LIKE ‘bla%’ -> Painful and bug-prone VARCHAR as KEYs..
  • 33.
    Application features The caseof case insensitivity Postgres: Use a functional index with a good operator class. CREATE INDEX tag_name_ci ON tag( LOWER(tag_name) text_pattern_ops); SELECT * FROM tag WHERE LOWER(tag_name) LIKE LOWER(‘AB’) || ‘%’;
  • 34.
    Application features The caseof case insensitivity EXPLAIN SELECT * FROM tag WHERE lower(tag_name) LIKE LOWER('AB') || '%'; Index Scan using tag_name_ci on tag (cost=0.14..8.17 rows=1 width=566) Index Cond: ((lower((tag_name)::text) ~>=~ 'ab'::text) AND (lower ((tag_name)::text) ~<~ ' ac'::text)) Filter: (lower((tag_name)::text) ~~ 'ab%'::text)
  • 35.
    Application features The caseof case insensitivity In your DBIx::Class app: Use the method with_db(..) to do the right thing for PostgreSQL.
  • 36.
    Application features Database wideadvisory locking. -> Make sure you implement your locking based on ‘with_db’.
  • 37.
    Application features If youuse CHI::Driver::DBI: Bug when running with DBI RaiseError option. This fixes it: https://github.com/jonswar/perl-chi-driver-dbi/pull/5
  • 38.
    Application features Does yourDBIx::Class application use Postgresql reserved words as relation names? Like ‘user’. $schema->storage->sql_maker->quote_names(1);
  • 39.
    So you’ve testedyour application? But the upstream schema has changed :( $ mysqldump --no-data -- compatible=postgresql yourdb > schema_mysql.sql $ git diff schema_mysql.sql Then edit schema_pg.sql.. And test again
  • 40.
    Time for Datamigration Google will point you to plenty of tools.. The mysqldump way, a few ideas..
  • 41.
    Time for Datamigration Skip the cruft: --compact --no-create-info --skip-triggers
  • 42.
    Time for Datamigration Avoid character set issues: --default-character-set=utf8 --result-file=data.sql
  • 43.
    Time for Datamigration Using CHI::Driver::DBI? Skip chi_* tables --ignore-table=db.chi_bla
  • 44.
    Time for Datamigration cat header.sql data.sql footer.sql > data_pg.sql Header: SET standard_conforming_strings = 'off'; -- Compatible with mysql dump strings SET backslash_quote = 'on'; -- That are escaped with SET client_min_messages TO ERROR; -- No warnings for those UPDATE pg_cast SET castcontext='a' WHERE casttarget = 'boolean'::regtype; -- Mysql does not have boolean. DO $$ DECLARE tables CURSOR FOR SELECT tablename FROM pg_tables WHERE schemaname = 'public' ORDER BY tablename; BEGIN FOR table_record IN tables LOOP EXECUTE 'ALTER TABLE ' || table_record.tablename || ' DISABLE TRIGGER ALL' ; -- RUN IN superuser mode. END LOOP; END$$;
  • 45.
    Time for Datamigration cat header.sql data.sql footer.sql > data_pg.sql Footer: SET standard_conforming_strings = 'on'; SET backslash_quote = 'off'; UPDATE pg_cast SET castcontext='e' WHERE casttarget = 'boolean'::regtype; DO $$ DECLARE tables CURSOR FOR SELECT tablename FROM pg_tables WHERE schemaname = 'public' ORDER BY tablename; BEGIN FOR table_record IN tables LOOP EXECUTE 'ALTER TABLE ' || table_record.tablename || ' ENABLE TRIGGER ALL' ; END LOOP; END$$;
  • 46.
    A timeline 1 -Reach prereq state 2 - Fork the code and migrate schema 3 - Test and adapt your app, write data migration scripts 4 - rebase and goto 3 5 - schema freeze 6 - Full data migration test 7 - Live goes to read only or down 8 - Migrate live data for good 9 - Re-install the sequences 10 - Switch the application! 11 - Fix and support
  • 47.
    Conclusion PostgreSQL’s richer DDLallows stronger validation and consistency. No magic automatic migration possible Decisions have to be taken by humans
  • 48.
    Conclusion, continued - Avery good occasion to review and fix your schema - A very good occasion to make your App DB agnostic - Most of the steps have to be tailored to the specific application - Application expertise needed
  • 49.