2

The List<T>.RemoveAll is a quite useful method, that allows to remove efficiently multiple items from a list. Unfortunately in some scenarios I needed some extra features that the method doesn't have, and some guarantees that the documentation doesn't provide. It also has a questionable behavior in case the match predicate fails, that causes me anxiety. So in this question I am asking for an implementation of the same method, in the form of an extension method, with these features and characteristics:

  1. Instead of a Predicate<T> it accepts a Func<T, int, bool> delegate, where the int is the zero-based index of the T item.
  2. It guarantees that the predicate will be invoked exactly once for each item, in a stricly ascending order.
  3. In case the predicate returns true for some items and then fails for another item, the items that have been elected for removal are removed from the list before the propagation of the exception.

Here is the signature of the extension method that I am trying to implement:

public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate); 

It returns the number of elements that were removed.

I attempted to implement it using as starting point the existing implementation, but it has some performance optimizations that make it quite complex, and injecting the desirable "exceptional" behavior is not obvious. I am interested for an implementation that is simple and reasonably efficient. Using LINQ in the implementation is not desirable, because it implies memory allocations that I would like to avoid.


Context: I should demonstrate the behavior of the built-in List<T>.RemoveAll method, and explain why I don't like it. In case the match predicate fails for an item in the middle of the list, the items that have already been elected for removal are either not removed, or they are replaced with duplicates of other elements. In all cases the list retains its original size. Here is a minimal demo:

List<int> list = new(Enumerable.Range(1, 15)); Console.WriteLine($"Before RemoveAll: [{String.Join(", ", list)}]"); try { list.RemoveAll(item => { if (item == 10) throw new Exception(); bool removeIt = item % 2 == 1; if (removeIt) Console.WriteLine($"Removing #{item}"); return removeIt; }); } catch (Exception ex) { Console.WriteLine(ex); } finally { Console.WriteLine($"After RemoveAll: [{String.Join(", ", list)}]"); } 

The list has 15 numbers, and the intention is to remove the odd numbers from the list. The predicate fails for the 10th number.

Output:

Before RemoveAll: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] Removing #1 Removing #3 Removing #5 Removing #7 Removing #9 System.Exception: Exception of type 'System.Exception' was thrown. at Program.<>c.<Main>b__0_0(Int32 item) at System.Collections.Generic.List`1.RemoveAll(Predicate`1 match) at Program.Main() After RemoveAll: [2, 4, 6, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] 

Online demo.

As you can see the numbers 1 and 3 have been removed, the 5, 7 and 9 are still there, and the numbers 6 and 8 have been duplicated (there are two occurrences of each). On the contrary the output that I expected to see is:

After RemoveAll: [2, 4, 6, 8, 10, 11, 12, 13, 14, 15] 

This would be a reasonable and predictable behavior I could count on. It keeps the levels of danger in a manageable level. I am not risking, for example, duplicating items in a virtual shopping cart, or printing twice some PDF documents from a selection. The existing behavior stretches a bit too much my comfort levels.

I have reported this behavior to Microsoft, and the feedback that I've got is that in case of failure the outcome is undefined. From their point of view there is no difference between the two above outputs (the actual and the expected). Both are equally corrupted, because both represent a state that is neither the original nor the final/correct state after a successful execution. So they don't think that there is any bug that needs to be fixed, and they are not keen on doing changes that could affect negatively the performance of successful executions. They also believe that the existing behavior is not surprising or unexpected, so there is no reason to document it.

3
  • I would debate your requirement #3: removing some items but then throwing also leaves the list in some intermediate state (though not as corrupt as the current RemoveAll behaviour). Wouldn't it be cleaner to not touch the list in this case, so you get sort of transactional behaviour? Commented Jan 6, 2023 at 7:56
  • What about splitting into two methods: int[] GetIndexes(this List<T> list, Func<T, int, bool> predicate); (does not modify the state of the list) and void RemoveAllIndexes(this List<T> list, int[] indexes); Commented Jan 6, 2023 at 7:59
  • @KlausGütter in order to implement transactional behavior you'll have to take a snapshot of the list before starting to remove items. This would be a big allocation for large lists, and a significant performance hit. If you want you could post a transactional implementation as an answer, and I will likely upvote it, but it's unlikely that I'll accept it. Commented Jan 6, 2023 at 8:29

4 Answers 4

1

So they don't think that there is any bug that needs to be fixed. They also believe that this behavior is not surprising or unexpected, so there is no need to document it.

They're correct. The method is documented as:

Removes all the elements that match the conditions defined by the specified predicate.

This supports two scenarios: the predicate returning true, removing an element, or false for leaving it as-is. A predicate throwing an exception is not a use case intended to be supported.

If you want to be able to pass a predicate that may throw, you could wrap it like this:

public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate) { Exception? caught = null; int index = 0; int removed = 0; list.RemoveAll(item => { // Ignore the rest of the list once thrown if (caught != null) return false; try { var remove = predicate(item, index); if (remove) { removed++; } return remove; } catch (Exception e) { caught = e; return false; } index++; }); if (caught != null) { throw caught; } return removed; } 
Sign up to request clarification or add additional context in comments.

10 Comments

Thanks CodeCaster for the answer. Catching and deferring the throw of the exception is quite clever, I don't know why I didn't thought about it. What I found problematic in this solution is that it is based on the assumption that the built-in RemoveAll invokes the match once for each element, in an ascending order. As far as I can see this behavior is not documented explicitly, so in theory it could change in a future version of the .NET runtime. So it doesn't satisfy the 2nd requirement in my question. Also a minor flaw is that the stack trace of the rethrown exception is lost.
"They're correct." -- I know that my question provokes the expression of opinions, and you are free to express yours, but if I had asked peoples' opinions about the behavior of the built-in API, my question most likely would be closed as opinion-based. It's a pity that Microsoft has the policy of freezing automatically all GitHub issues one month after they are closed and inactive, so people can't express their opinions without going through the hassle of opening new follow-up issues. IMHO this is a bad policy, because it results in Microsoft receiving less feedback about their products.
Right on account of the code criticism, the index exposes an implementation detail and doesn't guarantee what you want - but you can't, without iterating yourself, losing the built-in optimization. You can just use the try-catch clause without the extension method if you're willing to drop the index requirement. For the throw, see stackoverflow.com/questions/57383. For the opinion: yes, that last paragraph smells of frustration, so I thought I'd respond to that.
I might prove you wrong if/when I have my implementation ready. :-)
@Theodor whoops. Looks like that should be in a finally block, I'll edit later.
|
1

This solution is based on the idea to separate the selection of the items to be removed from the removal itself.

This has the following advantages:

  • If during the selection process, an exception occurs, the list will be left untouched
  • The removal process can only fail in catastrophic cases (OutOfMemoryException etc.)

But of course also some disadantages:

  • it requires extra memory to store the intermediate selection result
  • some optimizations might not be as effective

Because of the mentioned optimizations, I chose to base the selection result on ranges instead of individual indexes, so we can use List.RemoveRange which if more effective than individual RemoveAt calls (assumed that there are in fact ranges with more than one element).

public static List<(int start, int count)> GetIndexRanges<T>(this List<T> list, Func<T, int, bool> predicate) { var result = new List<(int start, int count)>(); int start = -1; for (var i = 0; i < list.Count; i++) { // see note 1 below bool toBeRemoved = predicate(list[i], i); if (toBeRemoved) { if (start < 0) start = i; // new range starts } else if (start >= 0) { // range finished result.Add((start, i - start)); start = -1; } } if (start >= 0) { // orphan range at the end result.Add((start, list.Count - start)); } return result; } public static int RemoveIndexRanges<T>(this List<T> list, List<(int start, int count)> ranges) { var removed = 0; foreach (var range in ranges) { // the "- removed" is there to take into account // that deletion moves the indexes. list.RemoveRange(range.start - removed, range.count); removed += range.count; } return removed; } 

Usage:

var ranges = list.GetIndexRanges((item, index) => { //if (item == 10) throw new Exception(); return item % 2 == 1; }); // See note 2 below list.RemoveIndexRanges(ranges); 

Note 1: As is, an exception in the predicate would just be propagated during the selection process, with no change to the ecollection. To give the caller more control over this, the following could be done: extend GetIndexRanges to still return everything collected so far, and in addition also return any exception as out parameter:

public static List<(int start, int count)> GetIndexRanges<T>(this List<T> list, Func<T, int, bool> predicate, out Exception exception) { var result = new List<(int start, int count)>(); int start = -1; for (var i = 0; i < list.Count; i++) { bool toBeRemoved = false; try { toBeRemoved = predicate(list[i], i); } catch (Exception e) { exception = e; break; // omit this line to continue with the selection process } if (toBeRemoved) { if (start < 0) start = i; // new range starts } else if (start >= 0) { // range finished result.Add((start, i - start)); start = -1; } } if (start >= 0) { // orphan range at the end result.Add((start, list.Count - start)); } return result; } var ranges = list.GetIndexRanges((item, index) => { if (item == 10) throw new Exception(); return item % 2 == 1; }, out var exception); // to fulfil requirement #3, we remove the ranges collected so far // even in case of an exception list.RemoveIndexRanges(ranges); // and then throw the exception afterwards if (exception != null) ExceptionDispatchInfo.Capture(exception).Throw(); 

Note 2: As this is now a two-step process, it will fail if the list changes between the calls.

11 Comments

This doesn't do what OP wants (if predicate() throws, no ranges-to-remove are returned at all), and the operation isn't atomic anymore (one could modify the list between calling GetIndexRanges() and RemoveIndexRanges()).
Thank you for the comment. I extended the answer regarding the exception handling. You are right regarding atomicity.
Also, multiple RemoveRange() calls incur multiple moves/resizes, possibly inefficient.
@CodeCaster As far as I can see, there is no way to avoid this inefficiency when we can only use the public API of List<T>.
Thanks Klaus for the answer. What is the point of splitting the RemoveAll to GetIndexRanges+RemoveIndexRanges? Do you anticipate that any of those can be usefull independently? If not, then why aren't you just combine them to one? Also if I understand correctly your suggestion try { toBeRemoved = predicate(list[i], i); } catch { } will swallow the exception, and partial ranges will be returned, without the caller being informed about it. This is not what I want. Also the worse-case performance of this solution is probably far worse than the built-in RemoveAll.
|
0

I think that I've managed to come up with an implementation that satisfies all three requirements:

/// <summary> /// Removes all the elements that match the conditions defined by the specified /// predicate. In case the predicate fails for some element, the list is left /// in a state recognizable as the result of successful individual Remove calls. /// </summary> public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate) { ArgumentNullException.ThrowIfNull(list); ArgumentNullException.ThrowIfNull(predicate); Span<T> span = CollectionsMarshal.AsSpan(list); int i = 0, j = 0; try { for (; i < span.Length; i++) { if (predicate(span[i], i)) continue; if (j < i) span[j] = span[i]; j++; } } finally { if (j < i) { for (; i < span.Length; i++, j++) span[j] = span[i]; list.RemoveRange(j, span.Length - j); } } return i - j; } 

For better performance it uses the CollectionsMarshal.AsSpan method (.NET 5) to get a Span<T> out of the list. The algorithm works just as well by using the indexer of the list instead of the span, and replacing the span.Length with list.Count.

Online demo.

I haven't benchmark this implementation, but I expect it to be only marginally slower than the native implementation.

Comments

-3

I don't know microsoft is how to wrote this method.

I tried some code block. And i found case.

Actually problem is your throw new Exception(). If you dont this code that time yo code will run perfect. Exception trigger some another case. But i dont know what is that.

if (item >= 10) return false; bool removeIt = item % 2 == 1; if (removeIt) Console.WriteLine($"Removing #{item}"); return removeIt; 

I found this. EDIT

Actually Func<T, int, bool> property is not deleted some item. It return boolean. As if return true he succesful deleted from list. If return false. it is not deleted from list.

7 Comments

Thanks Emin for your answer, but the code in my question has demonstration purposes. In the actual code that uses the RemoveAll method I am calling API's that are not under my direct control, and so I can't prevent them from throwing exceptions. Even if I catch the exception inside the match predicate, what value should I return, true or false? Non of them makes sense. The exception should be propagated.
yes. it is correct. You cant use throw exception.
@TheodorZoulias personally, I wouldn't call APIs and do complex logic inside the match predicate, rather, pre-fetch all the data required and then process it once you've successfully done that
@MindSwipe would you also do the same if the List<T>.RemoveAll had the behavior that I am asking in my question, and that behavior was documented?
The point of OP's code is that they have code that does sometimes throw an exception. You can't say "don't throw exceptions", because that invalidates the premise of this question.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.