0

Given login/logout time of all users for a particular website in the form: (userId, login time, logout time). Store this data, to query the total number of users who logged in and logged out in a given time range.

What data structure should I use? and how to implement it?

4
  • A naive solution would bethe following. Have the data sorted by loginTime and a copy of the data sorted by logoutTime. To compute the users that logged in between A and B and logged out between C and D, first compute users S that logged in between A and B and then users T that logged out between C and D. Return the intersection of S and T. Commented Jun 7, 2016 at 6:30
  • 1
    In an interview, the solution is probably not the only important thing. You also need to demonstrate how you think by asking questions to refine this. How much data do you need to do this with is probably most important as it has serious impacts on what a good solution will be. The characteristics of the data - estimated number of users / frequency of login/logout for a single user also impact good solutions. Commented Jun 7, 2016 at 7:38
  • probably you can look into multi-index containers of boost library? Commented Jun 7, 2016 at 7:39
  • 2
    For a start, the question needs clarification. Say user A logs in at 9am, out at 11am, while for B it's 10am to 12pm. For a time range of 9:30 to 11:30, the " total number of users who logged in and logged out" is 0 if you mean users who've done both inside the time window, or you might want an answer of "1 log in, 1 log out", or simply the "total" 2. Say another user logs in a 8am and out at 2pm, do they count at all? Until you decide these things, you can't make good implementation choices. Commented Jun 7, 2016 at 7:56

3 Answers 3

2

The data structure you're looking for is called an interval tree and it basically has a binary-search tree like format with the start of the interval (login time) as the values (ordering as in BST).

This DS has the following time complexities:

  • Add an interval (login-logout): O(logN)
  • Remove an interval: O(logN)
  • Given an interval [start-finish], find overlapping intervals: O(logN)
Sign up to request clarification or add additional context in comments.

Comments

0

If you want to do this within a program:

Just create a simple array containing class User type variable.

User class should have attributes: userId, loginTime, logoutTime.

Checking the total number of users who logged in and logged out in a given time range will be something like:

for (user in userArray) if (user.loginTime > inputLoginTime && user.logoutTime < inputLogoutTime) count++; 

You can check for the total number of users in O(n) time.

If you want to do it in Server, eg. MySQL.

Create a table User with userId, loginTime, logoutTime as column.

SELECT COUNT(*) FROM User WHERE User.loginTime > inputLoginTime AND User.logoutTime < inputLogoutTime; 

Comments

0

I don't think there is any single data structure which would give you better complexities than below mentioned approach :-

  1. Prepare two sorted list one for login times and other for logout times.
    • O(N logN)
  2. For each query, do a binary search on both lists to count the number of logins and number of logouts prior to given time.
    • O(log N)
  3. Then the number of logged in user count would be (logins - logouts).
    • O(1)

1 Comment

Unfortunately for point 2. total number of users is slightly different to total number of logins (and logouts), if a single user has logged in and out multiple times, they're still a single user. I don't think that there is going to be any good way to avoid scanning the entire range to filter out the duplicate users.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.