2

I have sequence of numbers (unsorted, no duplicate) with the target to sort them.

Approach 1: insert into vector. O(n), use sort algo and sort. O(nlogn)

Approach 2: insert into set. o(nlogn)

Which approach will be faster?

I feel set will be faster as every insertion in vector has to allocate complete array element and copy it an then delete it which may be expensive. But I read over web most of the place vector is over set.

Can anyone suggests me which one is faster with proper logic?

EDIT: if we don't know the no of element in advance which one will be faster set or vector (for both no of element are smaall andd no of elementt is large? note: if no of element is large set is better option it seems but is it good for small also? dont know)

8
  • 3
    Try them both, over different sizes of elements. Plot a chart. The result shows what? (To simply obtain a sorted result, I would expect the vector+sort to be faster as the sort employed can do less book-keeping which could result in a small C - or wall-clock time.) Commented Jul 26, 2014 at 6:34
  • It depends on how you read the result. set will be faster if you read it from start to end, while vector support random access. btw, insert into set take O(logn) time, insert 1 by 1 from empty set=log1+log2+log3+...+logn = O(n). Commented Jul 26, 2014 at 6:35
  • 1
    @tom87416: Why would reading a set from start to end be faster than reading a vector from start to end? Iterating through a set requires following node pointers up and down a tree, while iterating through a vector is just incrementing a pointer. Also, your math is wrong, Suri's is right. Commented Jul 26, 2014 at 6:40
  • Of course profile. But my instinct is that reserved vector will faster. Why? Less dynamic allocations. That's a big cost. Commented Jul 26, 2014 at 6:50
  • 1
    @Suri : I edited my answer to reflect that. Commented Jul 26, 2014 at 7:14

3 Answers 3

6
  • If you know in advance of many elements to expect, you can reserve() the space in your vector to avoid reallocations, making the first choice very interesting (fast insertions, single sort).

    If you need to do it once, go for the std::vector<>. If other insertions will occur later in the program, the std::set<> might be more interesting.

  • If you don't know the expected size in advance, then reallocations may occur with the vector, and std::set<> is a good choice (better theoretical average complexity).

    O(n) + O(n * log(n)) for the vector vs O(n * log(n)) for the set

  • If the number of elements is very small, you still can reserve some space (e.g. if you expect 10 elements, you might reserve 100 to be safe), and go with the std::vector

Anyway, profiling both solutions is always a good practice, the actual result will depend (among other things) of the initial sorting state of the input, and on the quality of your implementation for each container.

Note:

  • Remember that std::set has a bigger memory foot-print, if it matters for you.
Sign up to request clarification or add additional context in comments.

Comments

3

It depends on your use cases.

For set:

+Fast data insert O(logn)

+You do not need to care about sorting

-Since set is implemented as a tree, it has memory overhead for each element.

-Data can be spread across memory heap, so CPU cache is not working very well.

For vector:

+The data is contained in continuous chunk of memory. So, your CPU cache is wokring better.

+Your searches are same O(logn)

+You can reserve the memory for it.

-You need to sort after every change.

So, if you have many elements and do a "once insert, many searches", i'd prefer vector. If you making many inserts/search queries, its better to stick with set.

Thinking in O() terms i'd say vector insert cost is O(nlogn), but it can be called once after all inserts. set insert costs are O(logn) and called each insert. So, if you need insertions after vector is sorted, you will pay num_insertions*O(nlogn) for vector and O(log (n+num_insertions)) for set, which is really cheaper.

6 Comments

"[std:set]s are containers that store unique elements following a specific order." - ie sorted. The question asks about which method of obtaining a sorted result is "faster", but neither imposes (nor asks about) other access.
Set is not a linked-list.
Set is not so much a linked list but rather a node-based tree, but the conclusion is the same.
You don't have to sort after each insertion if the vector is already sorted. You just have to find the insertion location (O(log n)), and insert (O(n)).
How about using std::set with a custom allocator that reserves a single large block (or a few large blocks) of memory for all elements?
|
2

I thought I'd look into this using a profiler. My code:

//g++ -std=c++11 -Wall vectorvsset.cpp -g -pg -o vectorvsset #include <time.h> #include <iostream> #include <vector> #include <set> #include <algorithm> using namespace std; void GenRandom(vector<int32_t> &original) { original.clear(); for(size_t i=0; i<10000; i++) original.push_back(rand()); } void TestSet(const vector<int32_t> &original) { set<int32_t> testset; testset.insert(original.begin(), original.end()); } void TestVector(const vector<int32_t> &original) { vector<int32_t> testvec; testvec.insert(testvec.end(), original.begin(), original.end()); sort(testvec.begin(), testvec.end()); } void TestVectorConvertToSet(const vector<int32_t> &original) { vector<int32_t> testvec; testvec.insert(testvec.end(), original.begin(), original.end()); sort(testvec.begin(), testvec.end()); std::set<int32_t> testsec(testvec.begin(), testvec.end()); } int main() { srand(time(NULL)); cout << "RAND_MAX " << RAND_MAX << endl; vector<int32_t> original; for(size_t i=0; i<100; i++) { GenRandom(original); TestSet(original); TestVector(original); TestVectorConvertToSet(original); } return 0; } 

Using gprof on g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 my (abbreviated) results are:

Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 0.00 1.42 0.00 100 0.00 3.36 TestVector(std::vector<int, std::allocator<int> > const&) 0.00 1.42 0.00 100 0.00 6.86 TestVectorConvertToSet(std::vector<int, std::allocator<int> > const&) 0.00 1.42 0.00 100 0.00 3.57 TestSet(std::vector<int, std::allocator<int> > const&) 

So it is faster to use a vector, but there is not much difference.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.