5

As far as I know, erasing elements during iteration of a collection should break the iteration or cause you to skip elements. Why does calling std::for_each with a predicate which erases not cause this to happen? (It works).

Code snip:

#include <iostream> #include <map> #include <algorithm> using namespace std; int main() { map<int,long long> m; m[1] = 5000; m[2] = 1; m[3] = 2; m[4] = 5000; m[5] = 5000; m[6] = 3; // Erase all elements > 1000 std::for_each(m.begin(), m.end(), [&](const decltype(m)::value_type& v){ if (v.second > 1000) { m.erase(v.first); } }); for (const auto& a: m) { cout << a.second << endl; } return 0; } 

it prints out

1 2 3 

EDIT: I now see that if it actually increments the iterator before calling the function then it could work. But does this count as compiler specific/undefined behavior?

5
  • If you're actually doing this, I suggest std::remove_if. Commented Feb 24, 2014 at 2:14
  • That won't work for associative containers (like map) :( Remove_if returns an iterator for a position to start deleting from (it moves all the ones to be removed to the back of the collection) Commented Feb 24, 2014 at 2:16
  • Oh, I see what you're getting at. My bad. Commented Feb 24, 2014 at 2:19
  • Yea, more info here on that specifically: stackoverflow.com/questions/9515357/map-lambda-remove-if Commented Feb 24, 2014 at 2:21
  • About your actual question, I would say it's implementation-defined and not guaranteed. There aren't any specific points for or against it from what I can tell. Commented Feb 24, 2014 at 2:32

4 Answers 4

2

It's undefined behaviour, and won't work reliably. After adding a line to print keys and values inside your erasing lambda function, I see:

1=5000 2=1 3=2 4=5000 2=1 // AGAIN!!! 3=2 // AGAIN!!! 5=5000 6=3 

With my Standard library's map implementation, after erasing the element with key 4, iteration returns to the node with key 2! It then revisits the node with key 3. Because your lambda happily retested such nodes (v.second > 1000) and returned without any side effects, the broken iteration wasn't affecting the output.

You might reasonably ask: "but isn't it astronomically unlikely that it'd have managed to continue iteration (even if to the wrong next node) without crashing?"

Actually, it's quite likely.

Erasing a node causes delete to be called for the memory that node occupied, but in general the library code performing the delete will just:

  • invoke the destructor (which has no particular reason to waste time overwriting the left-child-, right-child- and parent-pointers), then

  • modify its records of which memory regions are allocated vs. available.

It's unlikely to "waste" time arbitrarily modifying the heap memory being deallocated (though some implementations will in memory-usage debugging modes).

So, the erased node probably sits there intact until some other heap allocation's performed.

And, when you erase an element in a map, the Standard requires that none of the container's other elements be moved in memory - iterators, pointers and references to other elements must remain valid. It can only modify nodes' left/right/parent pointers that maintain the binary tree.

Consequently, if you continue to use the iterator to the erased element, it is likely to see pointers to the left/right/parent nodes the erased element linked to before erasure, and operator++() will "iterate" to them using the same logic it would have employed if the erased element were still in the map.

If we consider an example map's internal binary tree, where N3 is a node with key 3:

 N5 / \ N3 N7 / \ / N1 N4 N6 

The way iteration is done will likely be:

  • initially, start at the N1; the map must directly track where this is to ensure begin() is O(1)

  • if on a node with no children, repeat { Nfrom = where you are, move to parent, if nullptr or right != Nfrom break} (e.g. N1->N3, N4->N3->N5, N6->N7->N5->nullptr)

  • if on a node with right-hand child, take it then any number of left-hand links (e.g. N3->N4, N5->N7->N6)

So, if say N4 is removed (so N3->right = nullptr;) and no rebalancing occurs, then iteration records NFrom=N4 then moves to the parent N3, then N3->right != Nfrom, so it will think it should stop on the already-iterated-over N3 instead of moving on up to N5.

On the other hand, if the tree has been rebalanced after the erase, all bets are off and the invalidated iterator could repeat or skip elements or even iterate "as hoped".

This is not intended to let you reason about behaviour after an erase - it's undefined and shouldn't be relied on. Rather, I'm just showing that a sane implementation can account for your unexpected observations.

Sign up to request clarification or add additional context in comments.

Comments

2

std::for_each is defined in the C++ draft standard (25.2.4) as a non-modifying sequence operation. The fact that modifying the sequence using your implementation of the function works is probably just luck.

This is definitely implementation-defined, and you shouldn't be doing it. The standard expects that you don't modify the container inside the function object.

Comments

2

This happened to work for you, but I wouldn't count on it -- it's likely undefined behavior.

Specifically, I'd be concerned that erasing a map element while running std::for_each would attempt to increment an invalid iterator. For example, it looks like libc++ implements std::for_each like so:

template<typename _InputIterator, typename _Function> _Function for_each(_InputIterator __first, _InputIterator __last, _Function __f) { // concept requirements __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>) __glibcxx_requires_valid_range(__first, __last); for (; __first != __last; ++__first) __f(*__first); return _GLIBCXX_MOVE(__f); } 

If calling __f ends up performing an erase, it seems likely that __first would be invalidated. Attempting to subsequently increment an invalid iterator would then be undefined behavior.

Comments

0

I find this operation pretty common, so to avoid the undefined behaviour above, I wrote a container based algorithm.

void remove_erase_if( Container&&, Test&& ); 

to deal with both associative and not containers, I tag dispatch on a custom trait class is_associative_container -- the associative goes to a manual while loop, while the others go to a remove_if-erase version.

In my case I just hard code the 4 associative containers in the trait -- you could duck type it, bit it is a higher level concept, so you would be just pattern matching anyhow.

5 Comments

remove_erase_if( Container&&, Test&& ); - why would you take the container by &&?
@tonyd perfect forwarding style allowing both const and non const arguments.
yes, I appreciate what it can do - question was why. C++ disallowed binding a normal reference to a temporary because it's generally considered to be more likely to introduce bugs when done accidentally than be useful, and in this case modifying a temporary container has few potential uses.
@TonyD and if remove_erase_if returned Container? My actual version doesn't return void.
and there we have enough information for readers to see what you've done and why, then they can decide whether they'd rather a const reference or to rely on moves... all good. Cheers.