1

I’m building a real-time chat application using NestJS, Postgresql (Main DB), Redis and Socket.IO.

My database schema (simplified) looks like this:

-- Chats table chats ( id serial primary key, name varchar(255) not null -- other fields... ); -- Users table users ( id serial primary key, login varchar(255) not null -- other fields... ); -- Many-to-many relation between users and chats table chat_members ( id serial primary key, user_id int not null references users(id), chat_id int not null references chats(id), unread_messages_count int not null default 0 ); 

For unread messages, every time someone sends a message in a chat, I increment unread_messages_count for all chat members who are not currently in that chat.

This works functionally, but it feels very inefficient and resource-heavy, especially when a chat has many members (e.g., group chats). Each message triggers multiple updates to chat_members.

Is this a good approach for implementing unread message counters, or is there a more efficient / standard pattern for this kind of feature?

Specifically, how can I avoid updating many rows on every new message while still being able to show an accurate unread counter like other chat apps do?

2 Answers 2

5

Storing an unread message counter in the database is a bad approach. Instead what you want to do is store the id of the last read message for each user. Then it doesn't matter how many new messages you add, that last read message id only updates when the user reads their messages. It also allows you to point the user to their first new message. And you can calculate how many messages are unread for that user with a simple COUNT(*) .. WHERE id > last_read_id.

Sign up to request clarification or add additional context in comments.

4 Comments

While last_read_message_id deserves a +1, I don't agree the db-side counter on its own is bad approach. The way OP coupled its upkeep to new message writes is the problem - if they just swap out the counter for a bookmark without changing how it's handled, the problem won't go away, only hide elsewhere.
It's not the worst thing you can do, no, but it has great potential to become out of sync with the actual state. One aborted transaction for whatever reason where the +=1 failed to be saved, and you have no idea anymore what the count should be. You at least need to pair it with something like a last_read_id, so you're able to reproduce the correct value at any time. If you "cache" that count value separately for performance, that's fine.
Kind of my point, the sync part being the problem. Cached counts are nice, bookmarks are nice, having both is even nicer, but overdoing how and how often either of these sync up/generate is why performance suffers. On the lost +=1 thing: OP's setup prevents that anomaly, it's actually one of very few benefits of triggers in this scenario. Failed trigger brings down the whole message-inserting transaction with it, making the +=1 no longer needed, leaving behind fully consistent state. And probably (hopefully) a "Failed to send. Click to try sending again."
I'm not sure OP is really using database triggers. They're only using the word "triggers", fairly casually, which makes me think they're not talking about database triggers. But, sure, if they are, then that's one less problem at least.
1

I increment unread_messages_count for all chat members who are not currently in that chat.

This optimises the app for inactive (non-)users, affecting the experience of actually present, online, active users. Looking at it from that angle should make it obvious you want the opposite.

Precomputed counts (tally tables) aren't inherently a bad idea, just not necessarily the way you do it, or for this particular use case. It's great to have those already prepared for when a user comes back, but it should stay out of the way of active users' traffic. Also, consider how much lag that really saves you - if it isn't perceivably faster than checking on demand, it's not worth it.


Is this a good approach for implementing unread message counters

You didn't really show or describe your implementation in enough detail. There's nothing wrong with it conceptually, it's done all the time. If "message triggers multiple updates" means actual triggers hold back every insert until it updates the counts in chat_members, that's bad.


is there a more efficient / standard pattern for this kind of feature?

avoid updating many rows on every new message while still being able to show an accurate unread counter

As already proposed by @deceze♦: just don't do it.

You only really need it when a user comes back. Save a bookmark where they left off and count whatever appeared after that.

create table chat_members ( id serial primary key, user_id int not null references users(id), chat_id int not null references chats(id), last_read_message_id int not null references messages(id) ); 
select chat_id, count(*) from users as u join chat_members as cm on cm.user_id=u.id join messages as m using(chat_id) where user_id = :userid and m.id > cm.last_read_message_id --also dictates our min(m.sent_at) group by chat_id order by min(m.sent_at)--fetch most recently visited threads first limit 10;--you can fetch the rest after loading the initial view 

Problem with that is, if you keep maintaining last_read_message_id the same way you maintained unread_messages_count, it just shifts and spreads the problem, or even risks amplifying it. You'd trade one big update of N users per incoming message, for N separate updates, one for each user, each read, and those updates also get in the way of online, active readers. Before, offline participants slowed down those online, now everyone online would slow down everyone else online.

You need to decouple one from the other:

  • Collect the last_read_message_id when the user leaves a given chat (in a detectible way closes, minimises, switches away from it). If it's hard to detect they're leaving the currently active chat, it can be dictated by scroll and view time sensitivity, reported in intervals or batches, or even assumed based on received heartbeats - you can simply move it up to the timestamp of the last signal from them. Point is not to move their read head on each message individually - the specifics are up to how smart you want to be with your strategy. Which, by the way, could be adjustible to your current load.
  • Keep it entirely app-side - each time there's a push/publish/notify event, the app can increment the given chat's counter internally. If its memory/cache gets wiped, you can re-establish some estimates based on last login timestamp. Multiple clients of the same user need to sync up, so this last_read_message_id "read head" is still useful.
  • Even if you move it app-side, you might still want to see chats going cold, without having to fetch anything from potentially unreachable endpoints. You can make them periodically report that, which again justifies keeping track of their read head position.
  • You probably need to paginate over long chats anyways, so you could always start by simply reporting back your pagination head, limiting the number of last_read_message_id updates to one per page fetched. All first-time chat visits probably want to start at most recent - in those cases and whenever user finishes catching up, you're back to the heartbeats/batches. But all future re-visits, should start from last read and let the user scroll, fetching X messages in each page at a time, giving you an easy way to bundle up and reduce multiple updates down to one.

It could also make sense to differentiate between long unread and recent - a chat blowing up with 900 messages overnight isn't the same as one that accumulated 900 over a year since you've lost interest in it. On the other hand, an unresponded message from yesterday might be more urgent than a new one from a minute ago. Measuring relative to last_read_message_id you can figure out

  • how many messages behind you are, but also
  • over how long,
  • how many different people spoke out (or were those just two people arguing publicly),
  • how much text that really is (as opposed to emojis, reactions) etc,

Compared to the unread_messages_count, it just has more uses.


like other chat apps do

You've probably seen examples of chat bubbles not showing any numbers, showing numbers that you never really cared to check and see if they were really correct: using estimates is just more efficient.

For only a few messages, the user probably wants to know if there's anything new at all, not how much exactly - no numbers, just an indicator. If they're far behind, they probably want to know that too, but still not exactly by how much - that's a "20+". Removing the need for exact counters lets you use built-in pg_stat histograms to get them for free, in constant time, or periodically precompute your own from pg_cron, or loosen up how often you configure your clients to report their pagination heads, heartbeats etc.

1 Comment

I removed the unread_message_count column from the chat_members table and replaced it with last_read_message_id. Now the unread count is calculated only once—when the user first opens the app—and capped at a maximum of 100 messages.So performance is not affected a lot as i taught. Thanks a lot to all!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.