21

I've a std::vector<int> and I need to remove all elements at given indexes (the vector usually has high dimensionality). I would like to know, which is the most efficient way to do such an operation having in mind that the order of the original vector should be preserved.

Although, I found related posts on this issue, some of them needed to remove one single element or multiple elements where the remove-erase idiom seemed to be a good solution. In my case, however, I need to delete multiple elements and since I'm using indexes instead of direct values, the remove-erase idiom can't be applied, right? My code is given below and I would like to know if it's possible to do better than that in terms of efficiency?

bool find_element(const vector<int> & vMyVect, int nElem){ return (std::find(vMyVect.begin(), vMyVect.end(), nElem)!=vMyVect.end()) ? true : false; } void remove_elements(){ srand ( time(NULL) ); int nSize = 20; std::vector<int> vMyValues; for(int i = 0; i < nSize; ++i){ vMyValues.push_back(i); } int nRandIdx; std::vector<int> vMyIndexes; for(int i = 0; i < 6; ++i){ nRandIdx = rand() % nSize; vMyIndexes.push_back(nRandIdx); } std::vector<int> vMyResult; for(int i=0; i < (int)vMyValues.size(); i++){ if(!find_element(vMyIndexes,i)){ vMyResult.push_back(vMyValues[i]); } } } 
4
  • 2
    The problem is, that the indices won't be valid anymore after the first element is erased, same with iterators (you can get an iterator from an index with vec.begin() + index). Commented Jul 7, 2011 at 11:11
  • @Georg, the code does what it should. The idea is to remove the element that is at a given position. In my code, an element is represented by vMyValues and the position by the vMyIndexes. Commented Jul 7, 2011 at 11:39
  • I think i had the same blind spot as Andy while reading... your current code doesn't remove in place so that issue isn't there ;) Commented Jul 7, 2011 at 11:45
  • If efficiency is an issue, and if you are doing a lot of deleting-by-position (and maybe also inserting), consider using other container than std::vector --- sdt::list (linked-list container) or std::set (container where the key is the values themselves) Commented Jul 7, 2011 at 11:50

6 Answers 6

22

I think it could be more efficient, if you just just sort your indices and then delete those elements from your vector from the highest to the lowest. Deleting the highest index on a list will not invalidate the lower indices you want to delete, because only the elements higher than the deleted ones change their index.

If it is really more efficient will depend on how fast the sorting is. One more pro about this solultion is, that you don't need a copy of your value vector, you can work directly on the original vector. code should look something like this:

... fill up the vectors ... sort (vMyIndexes.begin(), vMyIndexes.end()); for(int i=vMyIndexes.size() - 1; i >= 0; i--){ vMyValues.erase(vMyValues.begin() + vMyIndexes[i]) } 
Sign up to request clarification or add additional context in comments.

7 Comments

+1 for elegant solution. I'm not sure about its efficiency, because erasing from highest to lowest you'll move tail elements many times
@Andy T.: no matter how you erase, you will always move all the elements "after" the removed element. By erasing from the end, you minimize the number of elements to move for each "index", and therefore it is the most efficient "in-place" solution.
@Matthieu: right, order doesn't matter, but I don't agree about the most efficient in-place solution, please see my post
this method is very slow. I did not recommend to use this.
Yes, agree with tqjustc, this method is very very slow compared to others and it becomes very important when working on large datasets. For instance this takes nearly 8 mins to run where other methods run in 0.03 secs.
|
7

to avoid moving the same elements many times, we can move them by ranges between deleted indexes

// fill vMyIndexes, take care about duplicated values vMyIndexes.push_back(-1); // to handle range from 0 to the first index to remove vMyIndexes.push_back(vMyValues.size()); // to handle range from the last index to remove and to the end of values std::sort(vMyIndexes.begin(), vMyIndexes.end()); std::vector<int>::iterator last = vMyValues.begin(); for (size_t i = 1; i != vMyIndexes.size(); ++i) { size_t range_begin = vMyIndexes[i - 1] + 1; size_t range_end = vMyIndexes[i]; std::copy(vMyValues.begin() + range_begin, vMyValues.begin() + range_end, last); last += range_end - range_begin; } vMyValues.erase(last, vMyValues.end()); 

P.S. fixed a bug, thanks to Steve Jessop that patiently tried to show me it

12 Comments

@Andy T, I'm not sure if it works properly in situations where the indexes to be removed are contiguous, e.g. (13,14,15,16)
@Peter: it should, since std::copy will be passed two iterators that are equal, and hence will copy nothing. I'm worried about the first time through the loop, though. We copy the second range that we're supposed to keep, from indexes vMyIndexes[0]+1 to vMyIndexes[1], so I think we lose some values at the front if the first value in vMyIndexes is not 0. Possibly we should put a -1 at the start of vMyIndexes, or equivalently add vMyIndexes[0] to last before we start.
@Steve: I put vMyValues.size() at the end, it's similar to what you propose
Actually, my idea of sticking in a -1 is not valid, since last isn't allowed to be in the range [vMyValues.begin() + range_begin, vMyValues.begin() + range_end) according to the rules of std::copy. So last needs to not start at the beginning.
@Andy T: indeed, you've solved the problem at the end but you haven't solved the equivalent problem at the start.
|
7

Erase-remove multiple elements at given indices

Update: after the feedback on performance from @kory, I've modified the algorithm not to use flagging and move/copy elements in chunks (not one-by-one).

Notes:
  • indices need to be sorted and unique
  • uses std::move (replace with std::copy for c++98):

Github Live example

Code:
template <class ForwardIt, class SortUniqIndsFwdIt> inline ForwardIt remove_at( ForwardIt first, ForwardIt last, SortUniqIndsFwdIt ii_first, SortUniqIndsFwdIt ii_last) { if(ii_first == ii_last) // no indices-to-remove are given return last; typedef typename std::iterator_traits<ForwardIt>::difference_type diff_t; typedef typename std::iterator_traits<SortUniqIndsFwdIt>::value_type ind_t; ForwardIt destination = first + static_cast<diff_t>(*ii_first); while(ii_first != ii_last) { // advance to an index after a chunk of elements-to-keep for(ind_t cur = *ii_first++; ii_first != ii_last; ++ii_first) { const ind_t nxt = *ii_first; if(nxt - cur > 1) break; cur = nxt; } // move the chunk of elements-to-keep to new destination const ForwardIt source_first = first + static_cast<diff_t>(*(ii_first - 1)) + 1; const ForwardIt source_last = ii_first != ii_last ? first + static_cast<diff_t>(*ii_first) : last; std::move(source_first, source_last, destination); // std::copy(source_first, source_last, destination) // c++98 version destination += source_last - source_first; } return destination; } 
Usage example:
std::vector<int> v = /*...*/; // vector to remove elements from std::vector<int> ii = /*...*/; // indices of elements to be removed // prepare indices std::sort(ii.begin(), ii.end()); ii.erase(std::unique(ii.begin(), ii.end()), ii.end()); // remove elements at indices v.erase(remove_at(v.begin(), v.end(), ii.begin(), ii.end()), v.end()); 

3 Comments

This runs fairly quick, 0.47 secs on my dataset of millions, others are slower around 5-10 mins and only Andriy Tylychko's is faster at half of the time.
@kory I've updated the algorithm in answer to avoid flagging and move/copy elements in chunks (not one-by-one)
yes, everything here is working and this is now the fastest overall algorithm, congrats!
2

What you can do is split the vector (actually any non-associative container) in two groups, one corresponding to the indices to be erased and one containing the rest.

template<typename Cont, typename It> auto ToggleIndices(Cont &cont, It beg, It end) -> decltype(std::end(cont)) { int helpIndx(0); return std::stable_partition(std::begin(cont), std::end(cont), [&](typename Cont::value_type const& val) -> bool { return std::find(beg, end, helpIndx++) != end; }); } 

you can then delete from (or up to) the split point to erase (keep only) the elements corresponding to the indices

std::vector<int> v; v.push_back(0); v.push_back(1); v.push_back(2); v.push_back(3); v.push_back(4); v.push_back(5); int ar[] = { 2, 0, 4 }; v.erase(ToggleIndices(v, std::begin(ar), std::end(ar)), v.end()); 
  • If the 'keep only by index' operation is not needed you can use remove_if insted of stable_partition (O(n) vs O(nlogn) complexity)
  • To work for C arrays as containers the lambda function should be [&](decltype(*(std::begin(cont))) const& val) -> bool { return std::find(beg, end, helpIndx++) != end; } but then the .erase() method is no longer an option

1 Comment

This is very slow in my testing, taking around 5 mins to run compared to the 0.24 secs for Andriy Tylychko's answer.
1

If you want to ensure that every element is only moved once, you can simply iterate through each element, copy those that are to remain into a new, second container, do not copy the ones you wish to remove, and then delete the old container and replace it with the new one :)

Comments

1

This is an algorithm based on Andriy Tylychko's answer so that this can make it easier and faster to use the answer, without having to pick it apart. It also removes the need to have -1 at the beginning of the indices list and a number of items at the end. Also some debugging code to make sure the indices are valid (sorted and valid index into items).

template <typename Items_it, typename Indices_it> auto remove_indices( Items_it items_begin, Items_it items_end , Indices_it indices_begin, Indices_it indices_end ) { static_assert( std::is_same_v<std::random_access_iterator_tag , typename std::iterator_traits<Items_it>::iterator_category> , "Can't remove items this way unless Items_it is a random access iterator"); size_t indices_size = std::distance(indices_begin, indices_end); size_t items_size = std::distance(items_begin, items_end); if (indices_size == 0) { // Nothing to erase return items_end; } // Debug check to see if the indices are already sorted and are less than // size of items. assert(indices_begin[0] < items_size); assert(std::is_sorted(indices_begin, indices_end)); auto last = items_begin; auto shift = [&last, &items_begin](size_t range_begin, size_t range_end) { std::copy(items_begin + range_begin, items_begin + range_end, last); last += range_end - range_begin; }; size_t last_index = -1; for (size_t i = 0; i != indices_size; ++i) { shift(last_index + 1, indices_begin[i]); last_index = indices_begin[i]; } shift(last_index + 1, items_size); return last; } 

Here is an example of usage:

template <typename T> std::ostream& operator<<(std::ostream& os, std::vector<T>& v) { for (auto i : v) { os << i << " "; } os << std::endl; return os; } int main() { using std::begin; using std::end; std::vector<int> items = { 1, 3, 6, 8, 13, 17 }; std::vector<int> indices = { 0, 1, 2, 3, 4 }; std::cout << items; items.erase( remove_indices(begin(items), end(items), begin(indices), end(indices)) , std::end(items) ); std::cout << items; return 0; } 

Output:

1 3 6 8 13 17 17 

The headers required are:

#include <iterator> #include <vector> #include <iostream> // only needed for output #include <cassert> #include <type_traits> 

And a Demo can be found on godbolt.org.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.