If the input is sorted or can be sorted, then one could do this which only needs to store one value in memory:
r = read_row() if r is None: os.exit() last = r[0] write_row(r) while True: r = read_row() if r is None: os.exit() if r[0] != last: write_row(r) last = r[0] Otherwise:
What I'd do is keep a set of the first column values that I have already seen and drop the row if it is in that set.
S = set() while True: r = read_row() if r is None: breakos.exit() if r[0] not in S: write_row(r) S.add(r[0]) This will stream over the input using only memory proportional to the size of the set of values from the first column.