3

This is an example: originalList is a list of objects

var subList = (originalList.Where(x => x.number < 0)).ToList(); originalList.RemoveAll(x => x.number < 0); 

I will use the subList later. In this example the originaList is iterated two times. This function is called billions of times and originalList is a large List

Is there a easy way to improve the performance?


One important thing: The value of number of the object can change between two calls of this function.

4
  • This feels like it might be an XY problem - do you have to use a List? A better data structure could be more optimal. Commented Jan 31, 2022 at 19:58
  • For example, using a LinkedList instead of a List, I get a RemoveAllAndReturn method that runs 50 to 200 times faster then a List version, depending on the frequency of removal. Commented Jan 31, 2022 at 20:23
  • No, i dont have to use a List. Very thanks, i will test now. Commented Jan 31, 2022 at 20:57
  • Note that LinkedList takes more space - if this is critical, or time is really critical, it might be more efficient to implement a singly linked list instead. Commented Jan 31, 2022 at 20:58

3 Answers 3

1

An efficiency improvement (though still ultimately O(n)) is to batch the removals together. My testing shows that depending on the frequency of removal, this can be the same speed or over 4 times faster. Here is the function as an extension method:

public static List<T> RemoveAllAndReturn<T>(this List<T> input, Func<T, bool> condition) { List<T> result = new List<T>(); var removeCt = 0; for (int i = input.Count - 1; i >= 0; --i) { if (condition(input[i])) { result.Add(input[i]); ++removeCt; } else if (removeCt > 0) { input.RemoveRange(i + 1, removeCt); removeCt = 0; } } if (removeCt > 0) input.RemoveRange(0, removeCt); return result; } 
Sign up to request clarification or add additional context in comments.

Comments

0

This method removes all elements fulfilling a condition and returns a list of the removed elements. It only iterates once.

public static List<T> RemoveAll<T>(List<T> input, Func<T,bool> condition) { List<T> removedEntries = new List<T>(); int offset = 0; for(int i = 0; i < input.Count - offset; i++) { while(i < input.Count - offset && condition.Invoke(input[i + offset])) { removedEntries.Add(input[i + offset]); offset++; Console.WriteLine("i="+i+", offset="+offset); } if(i < input.Count - offset) { input[i] = input[i+offset]; } } input.RemoveRange(input.Count - offset, offset); return removedEntries; } 

We loop over the list and check if an element matches the condition. If the condition is matched, the element after that element is copied into the position. So all elements that don't fulfill the condition will be at the start of the list, and all elements that fulfill the condition are at the end of the list. In the last step, the elements at the end of the list are removed.

It might be wise to give an initial capacity to the removedEntries list. Per default, a list has a capacity of 4, which is doubled every time it is exceeded. If you have 100 elements to be removed, the capacity has to be extended 5 times. This is an O(n) operation each time. If you can estimate that you will delete about 10% of the elements, you might write

List<T> removedEntries = new List<T>(input.Count / 10); 

This might save you some time, but on the other hand, if you don't need the full initial capacity of the list, you waste some memory.

Online demo: https://dotnetfiddle.net/dlthkH

12 Comments

List.RemoveAt is an O(n) operation, where n = Count - index.
OPs code may therefore be more efficient, depending on which / how many elements are being removed.
I do not think thread safe collections will help much unless you are adding/removing objects from the list concurrently. The degree of synchronization needed will depend on what guarantees the code need to fulfill, and that is not described clearly in the OP.
Why are you doing condition.Invoke(input[i]) instead of condition(input[i])?
Running some tests, this is faster than even using a LinkedList when the removal percentage is 13% or over; under 13% removal, a LinkedList appears to be faster.
|
0

You could consider doing this hack:

List<SomeType> subList = new(); originalList.RemoveAll(item => { bool shouldBeRemoved = item.Number < 0; if (shouldBeRemoved) subList.Add(item); return shouldBeRemoved; }); 

The Predicate<T> passed to the RemoveAll is not pure: it has the side-effect of inserting matched elements in the subList. Based on the implementation of the RemoveAll method, this hack should work as expected. The documentation though does not make the explicit guarantee that the predicate will be invoked only once per element:

The elements of the current List<T> are individually passed to the Predicate<T> delegate, and the elements that match the conditions are removed from the List<T>.

That's why I call it a hack. It wouldn't be a hack if the current behavior of the RemoveAll method was documented and guaranteed. If you want to be ultra safe you could use a custom RemoveAll implementation that has well defined behavior, like the one found in this answer.

You could also make it an extension method:

public static int RemoveAll<T>(this List<T> source, Predicate<T> match, out List<T> removed) { ArgumentNullException.ThrowIfNull(source); ArgumentNullException.ThrowIfNull(match); List<T> removedLocal = removed = new(); int removedCount = source.RemoveAll(item => { bool shouldBeRemoved = match(item); if (shouldBeRemoved) removedLocal.Add(item); return shouldBeRemoved; }); Debug.Assert(removedCount == removed.Count); return removedCount; } 

Usage example:

originalList.RemoveAll(x => x.number < 0, out var subList); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.