Data Structure Interview

Question

Given login/logout time of all users for a particular website in the form: (userId, login time, logout time). Store this data, to query the total number of users who logged in and logged out in a given time range.

What data structure should I use? and how to implement it?

A naive solution would bethe following. Have the data sorted by loginTime and a copy of the data sorted by logoutTime. To compute the users that logged in between A and B and logged out between C and D, first compute users S that logged in between A and B and then users T that logged out between C and D. Return the intersection of S and T. — blazs
– blazs, Commented Jun 7, 2016 at 6:30
In an interview, the solution is probably not the only important thing. You also need to demonstrate how you think by asking questions to refine this. How much data do you need to do this with is probably most important as it has serious impacts on what a good solution will be. The characteristics of the data - estimated number of users / frequency of login/logout for a single user also impact good solutions. — moreON
– moreON, Commented Jun 7, 2016 at 7:38
probably you can look into multi-index containers of boost library? — DAG
– DAG, Commented Jun 7, 2016 at 7:39
For a start, the question needs clarification. Say user A logs in at 9am, out at 11am, while for B it's 10am to 12pm. For a time range of 9:30 to 11:30, the " total number of users who logged in and logged out" is 0 if you mean users who've done both inside the time window, or you might want an answer of "1 log in, 1 log out", or simply the "total" 2. Say another user logs in a 8am and out at 2pm, do they count at all? Until you decide these things, you can't make good implementation choices. — Tony Delroy
– Tony Delroy, Commented Jun 7, 2016 at 7:56

Tamas Ionut · Accepted Answer · 2016-06-07 08:42:20Z

The data structure you're looking for is called an interval tree and it basically has a binary-search tree like format with the start of the interval (login time) as the values (ordering as in BST).

This DS has the following time complexities:

Add an interval (login-logout): O(logN)
Remove an interval: O(logN)
Given an interval [start-finish], find overlapping intervals: O(logN)

seokhoonlee · Accepted Answer · 2016-06-07 07:16:04Z

If you want to do this within a program:

Just create a simple array containing class User type variable.

User class should have attributes: userId, loginTime, logoutTime.

Checking the total number of users who logged in and logged out in a given time range will be something like:

for (user in userArray) if (user.loginTime > inputLoginTime && user.logoutTime < inputLogoutTime) count++;

You can check for the total number of users in O(n) time.

If you want to do it in Server, eg. MySQL.

Create a table User with userId, loginTime, logoutTime as column.

SELECT COUNT(*) FROM User WHERE User.loginTime > inputLoginTime AND User.logoutTime < inputLogoutTime;

ravi · Accepted Answer · 2016-06-07 07:24:37Z

I don't think there is any single data structure which would give you better complexities than below mentioned approach :-

Prepare two sorted list one for login times and other for logout times.
- O(N logN)
For each query, do a binary search on both lists to count the number of logins and number of logouts prior to given time.
- O(log N)
Then the number of logged in user count would be (logins - logouts).
- O(1)

Unfortunately for point 2. total number of users is slightly different to total number of logins (and logouts), if a single user has logged in and out multiple times, they're still a single user. I don't think that there is going to be any good way to avoid scanning the entire range to filter out the duplicate users.

Collectives™ on Stack Overflow

Data Structure Interview

3 Answers 3

Comments

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Related