I have been practicing some algorithm questions for interviews and have stumbled upon various questions that relate to sorting data that comes from an infinite stream and designing a data structure to search billions of records
Describe how to sort integers read one at a time from an infinite stream
Search through a tremendous number of elements is a search space. I.E. You're asked to design a storage structure and search algorithm to search through 100 billion data records. You can have multiple servers and multiple threads.
Here is what I think, please correct me if I am wrong or if there is a better solution
For sorting integers read one at a time from infinite stream, could we use insertion sort? Worst case of insertion sort is O(n2) to sort an unsorted list but in this case the run time could be lowered down to O(logn). When the new element is to be inserted into the already sorted stream, we could just perform a binary search for the new element and insert it in logn time. But we would need to shift all the items to the right by 1 which would result in O(N). I am still not sure if this is right though
We would use a Balanced BST which would have the worst case for insertion and searching to be logN or we could just use a HashMap which would ideally perform lookup in O(1) and insertion in O(1). However as we are dealing with 100 billion records, our worst case lookup for HashMap would be O(N) with a linked list implementation.
I still don't have a clear answer for these questions. If someone could provide some more insight, it would be great!
thanks!