0

We do a dump of huge database (~ 2 TB) via pg_dump command:

pg_dump -h 192.168.0.132 -d vodka_factory -U vodka_user -F d -j 10 -v -f /backups/vodka_factory_2021-03-26-23-56.dump/ 

pg_dump process successfully (return code 0, no errors), but on database we can see ~8 COPY processes and one "select" in 'idle in transaction' which never finish: enter image description here

No locks. More detailed information for some processes:

postgres=# select * from pg_stat_activity where pid=358683; -[ RECORD 1 ]----+--------------------------------------------------------------- datid | 16404 pid | 358683 application_name | pg_dump client_addr | 10.136.131.164 client_hostname | client_port | 57732 backend_start | 2021-03-26 23:56:43.311535+03 xact_start | 2021-03-26 23:56:43.321329+03 query_start | 2021-03-26 23:57:07.682549+03 state_change | 2021-03-26 23:57:07.683027+03 wait_event_type | Client wait_event | ClientRead state | idle in transaction backend_xid | backend_xmin | 3802998627 query | SELECT last_value, is_called FROM public.waterfall_task_id_seq backend_type | client backend postgres=# select * from pg_stat_activity where pid=358919; -[ RECORD 1 datid | 16404 pid | 358919 application_name | pg_dump client_addr | 10.136.131.164 client_hostname | client_port | 57744 backend_start | 2021-03-26 23:57:07.836322+03 xact_start | 2021-03-26 23:57:07.844714+03 query_start | 2021-03-27 00:29:19.240355+03 state_change | 2021-03-27 00:30:10.458445+03 wait_event_type | Client wait_event | ClientRead state | idle in transaction backend_xid | backend_xmin | 3802998627 query | COPY public.slr_child_care (id, benefit_type_id, child_count, child_index, start_month, region_index, ent_id, created, modified, external_id, begin, "end", candidate_id, hospital_json, calcs_json, years_replace, first_year_replace, second_year_replace, excluded_days, old_end, compensation, alg_code, state, correction_id, ref_number, date_of_issue, regional_factor, share_by) TO stdout; backend_type | client backend 

Databse version: PostgreSQL 11.6 on x86_64-pc-linux-gnu pg_dump version: pg_dump (PostgreSQL) 11.8

Could you please advice how we can investigate the issue? The iisue happens every time...

1 Answer 1

1

It is slightly surprising that pg_dump returned sucessfully with a return code of 0. The only theory I can come up with is that some of the 10 jobs got done early and started to idle, whereupon a misconfigured firewall cut the connection. The backends should die after around 2 hours due to TCP keepalive.

What you could do is reduce the keepalive idle time and see if that improves things:

pg_dump -d 'host=... port=... dbname=... user=... keepalives_idle=300' -f dumpfile 
0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.