1

I have a user table which has columns user_id, update_time, etc and an action table, which has columns user_id, action and create_time table.

 class User(models.Model): user_id = models.CharField(db_index = True, max_length = 255, unique = True, null = False) update_time = models.DateTimeField(db_index = True, default = timezone.now, null = True, blank = True) class Action(models.Model): user_id = models.CharField(db_index = True, max_length = 255, null = False) action = models.CharField(db_index = True, max_length = 15, unique = False, null = False) create_time = models.DateTimeField(db_index = True, auto_now_add = True, null = True) 

I want to save user's last active time in update_time column. I am getting many actions by many users per day. So I don't update user table update_time column while inserting an action. I update the update_time column by a background job, which finds the max of create_time of all actions corresponding to a user_id and update his update_time column in the user table.

The background job runs below code/query for this purpose. But the performance of this piece of code not so good. Can anyone help me optimize it, either a better MySQL query or even in a format of Django ORM query or any different strategy to do this overall thing?

 days_limit = datetime.now() - timedelta(1) query = "UPDATE user a JOIN (SELECT user_id, MAX(create_time) AS last_create_time FROM user_action WHERE create_time >= %s GROUP BY user_id) b ON a.user_id = b.user_id SET a.update_time = last_create_time WHERE a.update_time < last_create_time" cursor = connection.cursor() print cursor.execute(query, [str(days_limit)]) 
3
  • Thanks. But none are useful for me 1. user_id can't be integer. It's device id of Android app 2. Can't use foreign key because sometimes action row is created before user 3. Updating single row that's referred to by its primary key will be fast but updating it every time is redundant 4. multicolumn index on user_id and create_time seems decent solution but action table is too big to perform alter query without sufficient downtime. Will try 4th when we add more databases Commented May 17, 2017 at 19:05
  • If you look very closely at my answer, it was based on what was given here in your question. You have failed to create a minimal veryfialble complete example. Commented May 17, 2017 at 23:45
  • If you are not bothering with data integrity why are you using SQL? Why aren't you using something like mongo?? Commented May 17, 2017 at 23:45

1 Answer 1

1

Problem #1

class User(models.Model): user_id = models.CharField(db_index = True, max_length = 255, 

If this is a numeric user_id and not a user name, this column doesn't make any sense. It should be removed. One of the critical ways of speeding up mysql queries is to reduce the size of the data and the indices on disk. This column seems to be redundant of if it needs to be preserved it should be int.

Problem #2

class Action(models.Model): user_id = models.CharField(db_index = True, max_length = 255, null = False) 

As above but what you should really be having here is

class Action(models.Model): user_id = models.ForeignKey(User) 

because without a foreign key, you don't have a relationship between the two models and that's why you are forced to use raw queries rather than an ORM query

Problem #3

I am getting many actions by many users per day. So I don't update user table update_time column while inserting an action

If you had the correct table structure, doing many many updates each day wouldn't be slow at all. Updating a single row that's referred to by it's primary key will in fact be very fast. Much faster than the complex join query you are using now.

If you are really concerned about the few milliseconds that such an update would take, you can use celery to run it in the background.

Problem #4

 UPDATE user a JOIN (SELECT user_id, MAX(create_time) AS last_create_time FROM user_action WHERE create_time >= %s GROUP BY user_id) b ON a.user_id = b.user_id SET a.update_time = last_create_time WHERE a.update_time < last_create_time" 

To speed this up you need a multicolumn index on user_id and create_time but the user_id column is redundant ....

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. But none are useful for me 1. user_id can't be integer. It's device id of Android app 2. Can't use foreign key because sometimes action row is created before user 3. Updating single row that's referred to by its primary key will be fast but updating it every time is redundant 4. multicolumn index on user_id and create_time seems decent solution but action table is too big to perform alter query without sufficient downtime. Will try 4th when we add more databases

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.