To achieve exactly-once processing where messages are consumed from a queue with at-least-once delivery, many sources (e.g. here and here and here) suggest attaching a unique ID to messages in the producer, which can then be used by consumers to deduplicate messages.
I'm curious about how this works in practice. Consider a message consumer responsible for issuing refunds, where it's critical to ensure each refund is processed exactly once. If we use a database to manage deduplication, the consumer implementation might resemble the following pseudocode :
1. Receive refund message 2. Insert DB record with key message.unique_id 3. If insertion fails due to unique constraint: 3.1. Ignore this message and exit 4. Call payment gateway API to issue refund - Receive refund message
- Insert DB record with key message.unique_id
- If insertion fails due to unique constraint:
- Ignore this message and exit
- Call payment gateway API to issue refund
A problem arises if the consumer crashes between steps 3 and 4. The refund won't be processed despite retrying since the DB record prevents reprocessing, leading to at-most-once processing.
An alternative could involve a transaction to ensure the DB insert rolls back if the refund fails:
1. Receive refund message 2. Start DB write transaction 3. Insert DB record with key message.unique_id 4. If insertion fails due to unique constraint: 4.1. Ignore this message and exit 5. Call payment gateway API to issue refund 6. Commit transaction - Receive refund message
- Start DB write transaction
- Insert DB record with key message.unique_id
- If insertion fails due to unique constraint:
- Ignore this message and exit
- Call payment gateway API to issue refund
- Commit transaction
But now if the consumer crashes between steps 5 and 6, it will cause a duplicate refund upon retry, leading to at-least-once processing.
How is it possible to achieve exactly-once processing in this scenario?