0

I would like to seek some help regarding the query below.

Running this Script causes the system to timeout. The query is so slow it took 5 minutes to run for just 22 records. I believe this has something to do with "NOT IN" statement. I already look for answers here in Stackoverflow regarding this and some are suggesting using LEFT OUTER JOIN and WHERE NOT EXIST but I can't seem to incorporate it in this query.

 SELECT a.UserId, COUNT(DISTINCT(a.CustomerId)) AS TotalUniqueContact FROM [UserActivityLog] a WITH(NOLOCK) WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' AND a.ID NOT IN ( SELECT DISTINCT(COALESCE(a.activitylogid, 0)) FROM [CustomerNoteInteractions] a WITH(NOLOCK) WHERE a.reason IN ('20', '36') AND CAST(a.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' AND a.UserId IN (SELECT b.Id FROM [User] b WHERE b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1) ) AND a.UserId IN ( SELECT b.Id FROM [User] b WHERE b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1) GROUP BY a.UserId 
4
  • could you add the tag on what database you are using.. Commented Oct 1, 2015 at 0:53
  • Yup, you have problem with NOT IN , why ? , because there are a SUB QUERY within SUB QUERY under your NOT IN condition, meaning to say, you have 3 level of query to execute only in NOT IN and that may cause too much time. Commented Oct 1, 2015 at 0:54
  • You need to use INNER JOIN and LEFT JOIN to replace your sub query in NOT IN. Commented Oct 1, 2015 at 0:56
  • @DyrandzFamador Already added the tag. Anyway, I use SQL Server 2008 for this. Commented Oct 1, 2015 at 1:10

2 Answers 2

1

Here is what should be an equivalent query using EXISTS and NOT EXISTS:

SELECT a.UserId, COUNT(DISTINCT a.CustomerId) AS TotalUniqueContact FROM [UserActivityLog] a WITH(NOLOCK) WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' AND EXISTS (SELECT * FROM [User] b WHERE b.Id = a.UserId AND b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1) AND NOT EXISTS (SELECT * FROM [CustomerNoteInteractions] b WITH(NOLOCK) JOIN [User] c ON c.Id = b.UserId AND c.UserType = 'EpicUser' AND c.IsEpicEmployee = 1 AND c.IsActive = 1 WHERE b.activitylogid = a.ID AND b.reason IN ('20', '36') AND CAST(b.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' ) GROUP BY a.UserId 

Obviously, it's hard to understand what will truly help your performance without understanding your data. But here is what I expect:

  • I think the EXISTS/NOT EXISTS version of the query will help.
  • I think your conditions on UserActivityLog.ActivityDateTime and CustomerNoteInteractions.datecreated are a problem. Why are you casting? Is it not a date type? If not, why not? You would probably get big gains if you could take advantage of an index on those columns. But with the cast, I don't think you can use an index there. Can you do something about it?
  • You'll also probably benefit from indexes on User.Id (probably the PK anyways), and CustomerNoteInteractions.ActivityLogId.

Also, not a big fan of using with (nolock) to improve performance (Bad habits : Putting NOLOCK everywhere).

EDIT

If your date columns are of type DateTime as you mention in the comments, and so you are using the CAST to eliminate the time portion, a much better alternative for performance is to not cast, but instead modify the way you filter the column. Doing this will allow you to take advantage of any index on the date column. It could make a very big difference.

The query could then be further improved like this:

SELECT a.UserId, COUNT(DISTINCT a.CustomerId) AS TotalUniqueContact FROM [UserActivityLog] a WITH(NOLOCK) WHERE a.ActivityDatetime >= '2015-09-28' AND a.ActivityDatetime < dateadd(day, 1, '2015-09-30') AND EXISTS (SELECT * FROM [User] b WHERE b.Id = a.UserId AND b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1) AND NOT EXISTS (SELECT * FROM [CustomerNoteInteractions] b WITH(NOLOCK) JOIN [User] c ON c.Id = b.UserId AND c.UserType = 'EpicUser' AND c.IsEpicEmployee = 1 AND c.IsActive = 1 WHERE b.activitylogid = a.ID AND b.reason IN ('20', '36') AND b.datecreated >= '2015-09-28' AND b.datecreated < dateadd(day, 1, '2015-09-30')) GROUP BY a.UserId 
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you very much! I ran the script a while ago and its quite fast now. from Almost 7 Minutes down to 1 Minute! As for your question regarding UserActivityLog.ActivityDateTime the reason why I'm casting it is because it use a DateTime format, unfortunately I cant do anything about it since the system is quite Big now. Moreover, the Database is not Normalize and properly referenced which is the reason why the query tends to be slow.
I'm glad it helped. I think you can get even better performance if you index your 2 date columns (if they are not already indexed), and if you get rid of the cast. I added an edited query so you can see what I mean.
Thank you very much for your insight. It is very helpful. I don't know how to thank you. :) I'll take note everything that you said here. :) :)
0

This should get you pretty close or exactly work:

 SELECT a.UserId, COUNT(DISTINCT(a.CustomerId)) AS TotalUniqueContact FROM [UserActivityLog] a WITH(NOLOCK) inner join [User] b with (Nolock) on a.userid = b.id and b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1 left outer join [CustomerNoteInteractions] c with (nolock) on a.id = c.activitylogid and c.reason IN ('20', '36') AND CAST(c.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' left outer join [User] d with (nolock) on c.userid = d.id and d.UserType = 'EpicUser' AND d.IsEpicEmployee = 1 AND d.IsActive = 1 WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' and c.activitylogid is null GROUP BY a.UserId 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.