I've got java logging user activities to Fluentd, Fluentd is then writing these activities into an elasticsearch index.
Every user activity is recorded, some include:
- User1 follows user2
- User1 likes article1
- User1 creates article
- User1 searches for tag
- user2 signs up
- More...
Now for each of these activities I'm storing a user object. For example, a CREATED activity would look like this:
{ activity: "CREATED", user: { userId: X, userName: xxxx, }, { article: { title: XOXO, description: "LOTS OF MARKUP", date: DATE_CREATED, ...more data on article include locations, coords, etc... tags: [ {tagName: "relatedTagToArticle, tagId: 1} {tagName: "relatedTag2ToArticle, tagId: 2} ], }, }} The document could become larger for different activities. But by storing this sort of information I'd be able to select * activities where activities.user = [list of the users followers]and process the results in some sort of algorithm.
Is it fine to keep storing activities like this? Should I avoid this, if so why?
I'm also wondering how I should figure out and store popular tags and searches?
Should I have a program that runs every X minutes and calculates the number of unique SEARCH activities and stores that information in a Redis list?
Edit
One of the main reasons I'd add all this extra information (storing whole objects, such as an article) is so I can query the ES index and get activities from a users followers!
So say I wanted to build a news feed in my app, I could query es with something like this (pseudo code):
{ select all from user_activities where { activity_type: [LIKE, COMMENT, CREATED, FOLLOW] userId: [LIST_OF_FOLLOWING_IDS] date > XX AND date < YY <SOMETHING HERE TO DETERMINE POPULAR ACTIVITIES> } } I'm not sure if this is a good approach or whether it will scale or not! Doing it this way, the data stored could become stale, for instance if I displayed a CREATED article from the above query, the article could've been updated! Although I'm not really worried about that just yet.