Skip to main content
AI Assist is now on Stack Overflow. Start a chat to get instant answers from across the network. Sign up to save and share your chats.
added 391 characters in body
Source Link
Dan D.
  • 75k
  • 15
  • 111
  • 129

If the input is sorted or can be sorted, then one could do this which only needs to store one value in memory:

r = read_row() if r is None: os.exit() last = r[0] write_row(r) while True: r = read_row() if r is None: os.exit() if r[0] != last: write_row(r) last = r[0] 

Otherwise:

What I'd do is keep a set of the first column values that I have already seen and drop the row if it is in that set.

S = set() while True: r = read_row() if r is None: breakos.exit() if r[0] not in S: write_row(r) S.add(r[0]) 

This will stream over the input using only memory proportional to the size of the set of values from the first column.

What I'd do is keep a set of the first column values that I have already seen and drop the row if it is in that set.

S = set() while True: r = read_row() if r is None: break if r[0] not in S: write_row(r) S.add(r[0]) 

This will stream over the input using only memory proportional to the size of the set of values from the first column.

If the input is sorted or can be sorted, then one could do this which only needs to store one value in memory:

r = read_row() if r is None: os.exit() last = r[0] write_row(r) while True: r = read_row() if r is None: os.exit() if r[0] != last: write_row(r) last = r[0] 

Otherwise:

What I'd do is keep a set of the first column values that I have already seen and drop the row if it is in that set.

S = set() while True: r = read_row() if r is None: os.exit() if r[0] not in S: write_row(r) S.add(r[0]) 

This will stream over the input using only memory proportional to the size of the set of values from the first column.

Source Link
Dan D.
  • 75k
  • 15
  • 111
  • 129

What I'd do is keep a set of the first column values that I have already seen and drop the row if it is in that set.

S = set() while True: r = read_row() if r is None: break if r[0] not in S: write_row(r) S.add(r[0]) 

This will stream over the input using only memory proportional to the size of the set of values from the first column.