I'm working with multiple ordered lists, each containing tuples of the form (key, value), where each list is sorted in ascending order by key. I want to merge these lists into a single ordered list that includes every distinct key from the input lists. The merged output should be a list of tuples in the form:
(key, value_from_list1, value_from_list2, ..., value_from_listK) with the following behavior:
- For each key present in the union of all keys, the output tuple should contain the "current" value from each list.
- If an input list does not have an entry at that key, it should carry forward the most recent value from that list (i.e., use the value from the latest key less than or equal to the current key).
For example, given two lists:
A = [(1, a1), (2, a2), (10, a3)] B = [(1, b1), (8, b2), (10, b3)] the desired output is:
C = [ (1, a1, b1), // At key 1, both lists provide values. (2, a2, b1), // At key 2, list A updates to a2; list B still uses b1. (8, a2, b2), // At key 8, list B updates to b2; list A remains at a2. (10, a3, b3) // At key 10, both lists update. ] This merging behavior should extend to more than two lists as well (e.g., merging 5 lists into tuples with 6 elements, where the first element is the key).
All input lists are assumed to contain the element (0, null), but we omit this element (i.e. (0, null, null, ..., null)) from the final output. This ensures that, when merging, if a list doesn't have an entry for a given key, it carries forward the most recent value (initially null). For example, merging A = [(1, a1)] and B = [(2, b1)] yields [(1, a1, null), (2, a1, b1)].
My Questions:
- Algorithm Efficiency: What is the most efficient algorithm for performing this merge?
I've considered a solution using a sweeping line algorithm with multiple pointers—one for each list—to keep track of the most recent value as I iterate through the union of keys. However, I'm looking for insights or more efficient methods, especially in the general case with k lists.
Edit 2025-03-13
Pseudocode
I´ve created pseudocode based on the suggestion from @Smylic.
function mergeLists(lists): k = number of lists pointers = array of size k, all initialized to 0 lastValues = array of size k, initially set to None (or some default) output = empty list while true: minKey = None // Find the minimum key among the current pointers in all lists. for i from 0 to k-1: if pointers[i] < length(lists[i]): currentKey = lists[i][pointers[i]].key if minKey is None or currentKey < minKey: minKey = currentKey // If no minimum key was found, all lists are exhausted. if minKey is None: break // Update lastValues for each list that has the current minKey for i from 0 to k-1: if pointers[i] < length(lists[i]) and lists[i][pointers[i]].key == minKey: lastValues[i] = lists[i][pointers[i]].value pointers[i] = pointers[i] + 1 // Append the merged tuple (minKey, lastValues[0], lastValues[1], ..., lastValues[k-1]) output.append( (minKey, lastValues[0], lastValues[1], ..., lastValues[k-1]) ) return output