I was asked a question in which we're looking for duplicate social security numbers, as in ssn that are in both array A and also array B. My immediate thought was to iterate over each array storing the count of each ssn in a HashMap.Then of course, all keys which have count of more than 1 are duplicates. My understanding was that this would be an O(n + m) operation, where m and n are the number of elements in each array. I was told that this is "very inefficient" in regards to time-complexity and that the time complexity was actually O(n * m). How is this correct?
- 2Yeah, that's definitely O(n+m).Joe C– Joe C2017-02-05 22:14:34 +00:00Commented Feb 5, 2017 at 22:14
- 1"I was told..." Told by whom? Why does it matter? It sounds pretty strange. It's linear on average. It's quadratic in the worst case.kraskevich– kraskevich2017-02-05 22:15:05 +00:00Commented Feb 5, 2017 at 22:15
- @JoeC that's what I was thinking. How do you correct an interviewer without sounding like a complete tool?Elroy Jetson– Elroy Jetson2017-02-05 22:15:26 +00:00Commented Feb 5, 2017 at 22:15
- 5The right tact to take is to ask the interviewer why he believes that he is right. Either he will realize his mistake and correct himself, or you will have an opportunity to clarify your answer.Joe C– Joe C2017-02-05 22:17:41 +00:00Commented Feb 5, 2017 at 22:17
- 1The quadratic worst case would be if the hash code were absolutely useless in that it put all of the strings in the same hash bucket.Joe C– Joe C2017-02-05 22:19:06 +00:00Commented Feb 5, 2017 at 22:19
| Show 8 more comments
1 Answer
First of all, O(n+m) is faster than O(n*m). Time complexity of O(n*m) is going to be the case when you have two loops like this :
for x in [1..n] for y in [1..m] So, the term "very inefficient" doesn't apply at all, unless there is a crunch of extra space that O(n+m) incurs. In that case you can clarify what is the most important factor time complexity or space complexity.