Reduce vacuum overhead & partitioning strategy for high-volume messages in PostgreSQL 13

Question

I have a use case where I need to process a file containing messages generated for multiple customers. These messages need to be stored in a database and to be forwarded to respective customer in batches.

Once I receive an acknowledgment (which will be an individual response sent by customer for each message), I will pair the acknowledgment with CORRELATION_ID and delete the corresponding message from db. If no acknowledgment is received, I will retry the same message to customer after every 10 mins until I get the acknowledgement or the message will expire. (lets say expiry = 10 days, I will retry till 10 days and if no acknowledgement is received then I will delete that message from db).

We typically receive acknowledgments for 70-80% of the messages in the first attempt, for remaning message we cannot predict the behaviour, either we received the acknowledgment in mutiple retries or the message will get expired.

Additionally, each customer has a configuration (customerConfig) loaded during application startup. It has a field expiry w.r.t each message type and its dynamic which means customer can change these values at any moment. Sample below

[{ "customer": "524AI9KS", "messages": [{ "message_type": "Billing", "expiry ": "30" // in days }, { "message_type": "Promotional", "expiry ": "10" // in days }, { "message_type": "Support ", "expiry ": "15" // in days }] }, { "customer": "695AS5JW96", "messages": [{ "message_type": "Billing", "expiry ": "40" // in days }, { "message_type": "Promotional", "expiry ": "12" // in days }, { "message_type": "Support ", "expiry ": "11" // in days }] }]

Below is the current database script

CREATE TABLE CUSTOMER_MESSAGES ( CUSTOMER_ID VARCHAR(50) NOT NULL, MESSAGE_ID VARCHAR(50) NOT NULL, MESSAGE_TYPE VARCHAR(20) NOT NULL, CORRELATION_ID VARCHAR(50) NOT NULL, RETRY_COUNT NUMERIC(10, 0) NOT NULL, CRTE_TS TIMESTAMP NOT NULL, UPTS_TS TIMESTAMP NOT NULL, MESSAGE VARCHAR, CONSTRAINT CUSTOMER_MESSAGES_PK PRIMARY KEY (CUSTOMER_ID, MESSAGE_ID) ) PARTITION BY HASH (CUSTOMER_ID); CREATE TABLE CUSTOMER_MESSAGES_P0 PARTITION OF CUSTOMER_MESSAGES FOR VALUES WITH (MODULUS 10, REMAINDER 0); CREATE TABLE CUSTOMER_MESSAGES_P1 PARTITION OF CUSTOMER_MESSAGES FOR VALUES WITH (MODULUS 10, REMAINDER 1); CREATE TABLE CUSTOMER_MESSAGES_P2 PARTITION OF CUSTOMER_MESSAGES FOR VALUES WITH (MODULUS 10, REMAINDER 2); ... CREATE TABLE CUSTOMER_MESSAGES_P9 PARTITION OF CUSTOMER_MESSAGES FOR VALUES WITH (MODULUS 10, REMAINDER 9);

Message & acknowledgement pairing is done based on on CORRELATION_ID.

Above partitioning strategy was chosen when the message volume was small (less than 50,000 per day) and there were only 20 customers. Now, the message volume has grown to around 15-20 million per day, and there are about 10,000 customers.

The problem with the current design is that

It creates a lot of "vacuum" or empty space that leads to fragmentaion in db and ultimately will impact the query performance.
One parition will end up having lots of message, as messages distribution is based on customer. It may be possible that one customer is have 30-45% volumne at a given period.

We also considered partitioning by CRTE_TS (instead of CUSTOMER_ID) so that we can drop partitions after the retention period. However, since the retention period varies for each customer and for each message type, it can be challenging to implement.

I am using PostgreSQL 13 and backend application is in spring boot.

Can someone suggest, which partitioning strategy I should go with ? (given that PostgreSQL allows partitioning by RANGE, HASH, or LIST) and on which column should you create the partition?

Laurenz Albe · Accepted Answer · 2024-09-30 06:08:15Z

Range partitioning by a timestamp so that you can drop partitions with old data seems to be the right way to proceed. You will still end up with empty space in the tables after updates or deletes, but at least you will get rid of old partitions and their bloat regularly, so it cannot become too bad.

Stack Exchange Network

Reduce vacuum overhead & partitioning strategy for high-volume messages in PostgreSQL 13

1 Answer 1

Hot Network Questions

Reduce vacuum overhead & partitioning strategy for high-volume messages in PostgreSQL 13

1 Answer 1

Related

Hot Network Questions