How to divert data in an iterator into two others?

Question

I know that I can copy an iterator using

x1, x2 = itertools.tee(x)

Then, in order to get two generators, I could filter:

filter(..., x1); filter(..., x2)

However, then I would run the same computation twice, i.e. go through x in x1 and x2.

Thus, I would do something more efficient like that:

x1, x2 = divert(into x1 if ... else x2, x)

Does anything like this exist in python 3?

At the risk of saying something dumb - do you want/require them to stay in generators? You could simply create the two lists in a regular old for-loop; but I'm guessing you don't want to expand x unnecessarily? — dwanderson
– dwanderson, Commented May 23, 2016 at 13:34
No comment is dump if it clarifies the question :-) Yes, I want to maintain the generator approach in order to free my memory as much as possible. — Xiphias
– Xiphias, Commented May 23, 2016 at 13:58

Lærne · Accepted Answer · 2016-05-23 13:06:51Z

There's no built-in tool written in python that I know of. It's a bit trick to get working, because there is no guarantee on the call order of each iterator you can produce.

For example x could produce a x1 value followed by a x2 value, but your code could iterate over x1 until it produces a signal value, then iterate on x2 until it produces a signal value... So basically the code would have to hold all the x2 values until a x1 value is generated, which can be arbitrarily late.

If that's really what you want to do, here is a quick idea on how to do this buffer. Warning, it's not tested at all and suppose x is an endless generator. Plus, you have to code two actual iterator class that implement __next__ that refers to this general iterator, one with category==True and the other with category==False.

class SeparatedIterator: def __init__( self, iterator, filter ): self.it = iterator self.f = filter #The buffer contains pairs of (value,filterIsTrue) self.valueBuffer = [] def generate(): value = next( self.it ) filtered = self.f( value ) self.valueBuffer.append(( value, filtered )) def nextValue( category ): #search in stored values for i in range(len(self.valueBuffer)): value, filtered = self.valueBuffer[i] if filtered == category: del self.valueBuffer[i] return value #else, if none of the category found, #generate until one of the category is made self.generate() while self.valueBuffer[-1][1] != category: self.generate() #pop the value and return it value, _ = self.valueBuffer.pop() return value

Else if you have more control on the iterator call order, you have to use that knowledge to implement a more customized and optimized way to switch between iterators values.

Thanks for the idea — I might do it in a similar but different way. Anyway, I am surprised that there is no pre-built solution to that not so uncommon case.
The problem is not it's rare albeit it's not common either, the problem is that there is no obvious solution. My above solution uses a list to store the values and can have a horrible performance. The may be awful in time performance as it iterates the full buffer at each new value request. Using two queues for each category should alleviate that problem, but not the generate-until-one-of-the-good-category-is-found problem.

fhart · Accepted Answer · 2025-03-27 15:21:44Z

To chime in on an old question and expand on the previous answer:

The existing approach of OP

copy an iterator using

x1, x2 = itertools.tee(x)

Then, in order to get two generators, I could filter:

filter(..., x1); filter(..., x2)

could be reworded as the following single line
```
map(filter, [filter_funcs], itertools.tee(x, n)) 
```
Figuratively speaking, if you were to cast this to a list of list, the return would have shape "at most" (n_filter, length(x)).
If you cast the return of the first part of the approach

Thus, I would do something more efficient like that:

x1, x2 = divert(into x1 if ... else x2, x)

i.e. filtering x element-wise by the filter_funcs, it would have shape "at most" (length(x), n_filter).

zip could then be used to get you from the second result to the first "most of the way" by transposing it (see).

Addressing the point in the previous answer about possible interactions or cross-referencing between the iterators and my usage of "most":
The OP could be asking for two different answers:

That the resulting iterators keep a reference of the position of each returned item in the original iterator. This would allow for interaction or cross-referencing and would lead to a list of list with the full shape if achieved by inserting fillvalues.
That the iterators only return the items.

Delegating the buffering to itertools.tee, the following allows for both outputs:

# get iterator of tuple of applied filters def apply_filter_tuple(filter_funcs, iterable, fillvalue=None): for item in iterable: yield tuple(item if filter_func(item) else fillvalue for filter_func in filter_funcs) def _reduce_helper(ind, tee_inst): return map(operator.itemgetter(ind), tee_inst) def full_answer(filter_funcs, iterable, fillvalue=None, _remove_fillvalue=False): _x = apply_filter_tuple(filter_funcs, iterable, fillvalue=fillvalue) _x = itertools.tee(_x, len(filter_funcs)) # reduce each iterator to the single result needed # could also be directly incorporated in the __next__ methods # of a variant of the '_tee' class mentioned in the 'itertools.tee' documentation _x = itertools.starmap(_reduce_helper, enumerate(_x)) if _remove_fillvalue: def _is_not_fillvalue(x): return x != fillvalue return tuple(map(functools.partial(filter, _is_not_fillvalue), _x)) else: return tuple(_x)

The caveat on memory usage of the previous answer remains:
To quote the itertools docs on itertools.tee

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

so the list of list approach using zip might outperform this approach.

Collectives™ on Stack Overflow

How to divert data in an iterator into two others?

2 Answers 2

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Linked

Related