How should I store user activities in ElasticSearch and figure out popular searches?

Question

I've got java logging user activities to Fluentd, Fluentd is then writing these activities into an elasticsearch index.

Every user activity is recorded, some include:

User1 follows user2
User1 likes article1
User1 creates article
User1 searches for tag
user2 signs up
More...

Now for each of these activities I'm storing a user object. For example, a CREATED activity would look like this:

{ activity: "CREATED", user: { userId: X, userName: xxxx, }, { article: { title: XOXO, description: "LOTS OF MARKUP", date: DATE_CREATED, ...more data on article include locations, coords, etc... tags: [ {tagName: "relatedTagToArticle, tagId: 1} {tagName: "relatedTag2ToArticle, tagId: 2} ], }, }}

The document could become larger for different activities. But by storing this sort of information I'd be able to select * activities where activities.user = [list of the users followers]and process the results in some sort of algorithm.

Is it fine to keep storing activities like this? Should I avoid this, if so why?

I'm also wondering how I should figure out and store popular tags and searches?

Should I have a program that runs every X minutes and calculates the number of unique SEARCH activities and stores that information in a Redis list?

Edit

One of the main reasons I'd add all this extra information (storing whole objects, such as an article) is so I can query the ES index and get activities from a users followers!

So say I wanted to build a news feed in my app, I could query es with something like this (pseudo code):

{ select all from user_activities where { activity_type: [LIKE, COMMENT, CREATED, FOLLOW] userId: [LIST_OF_FOLLOWING_IDS] date > XX AND date < YY <SOMETHING HERE TO DETERMINE POPULAR ACTIVITIES> } }

I'm not sure if this is a good approach or whether it will scale or not! Doing it this way, the data stored could become stale, for instance if I displayed a CREATED article from the above query, the article could've been updated! Although I'm not really worried about that just yet.

@PrasannaKRao why would I use Google analytics, that just makes me have to configure another service right? I'd rather just use es for storing user activities and somehow count the most popular tags, locations, searches, and etc! — James111
– James111, Commented Sep 9, 2016 at 6:23
Ok. It's best if you store data in es that is required for search only. I don't see you using that data for search. One common technique is to boost the quality of a document itself, based on such user activities. if you store such info in GA (also), building reports like top 10 searches, top 10 countries/departments from which the system is used, is very easy. If you do want to store these in es, why store the whole doc? — Prasanna K Rao
– Prasanna K Rao, Commented Sep 9, 2016 at 6:40
Say I wanted to curate a feed based on a users followers, I could simply query the user activity index, which would return data that could be directly returned to the app and diaplyed. I'm not sure if that's a common practice or not? If I didn't do this I'd store ids and then query mysql (this would probably take a lot longer). @PrasannaKRao — James111
– James111, Commented Sep 9, 2016 at 6:43

talex · Accepted Answer · 2025-09-25 10:27:21Z

The problem with storing every event in database and then query and aggregate them is that you can have too many events and it can possible be slow.

The word you looking fore is OLAP (OnLine Analytical Processing). There are existing tools that can do it for you.

Stack Exchange Network

How should I store user activities in ElasticSearch and figure out popular searches?

Edit

1 Answer 1

Hot Network Questions

How should I store user activities in ElasticSearch and figure out popular searches?

Edit

1 Answer 1

Related

Hot Network Questions