0

I Have to design an order-book data structure that allows me to query for the highest price of an order which has been inserted and not yet deleted. Insert and delete operations are given upfront in a file in which each operation looks like one of the following two:

  • TIMESTAMP insert ID PRICE
  • TIMESTAMP delete ID

where ID is an integer identifier of a order, timestamp are always in increasing order and each ID appears exactly twice: once in a insert and once in a delete operation, in this order.

From this list of operations, I need to output the time-weighted average of the highest price.


As an example, let's say we have the following input: 10 I 1 10 20 I 2 13 22 I 3 13 24 E 2 25 E 3 40 E 1 We can say that after the ith operation, the max is 10, 13, 13, 13, 10 and the time-weigthed average is 10*(20-10) + 13*(22-20) + 13*(24-22)+13*(25-24)+10*(40-25) = 10.5 because 10 is the max price between timestamps [10-20] and [25,40] while 13 in the rest.

I was thinking to use an unordered_map<ID,price> and a multiset<price> for supporting:

  • insert in O(log(n))
  • delete in O(log(n))
  • getMax in O(1)

Here is an example of what I come up with:

struct order { int timestamp, id; char type; double price; }; unordered_map<uint, order> M; multiset<double> maxPrices; double totaltime = 0; double avg = 0; double lastTS = 0; double getHighest() { return !maxPrices.empty() ? *maxPrices.rbegin() : std::numeric_limits<double>::quiet_NaN(); } void update(const uint timestamp) { const double timeLeg = timestamp - lastTS; totaltime += timeLeg; avg += timeLeg * getHighest(); lastTS = timestamp; } void insertOrder(const order& ord) { if (!maxPrices.empty()) { if (ord.price >= getHighest()) { // we have a new maxPrice update(ord.timestamp); } } else // if there are not orders this is the mex for sure lastTS = ord.timestamp; M[ord.id] = ord; maxPrices.insert(ord.price); } void deleteOrder( const uint timestamp, const uint id_ord) { // id_ord is assumed to exists in both M and maxPrices order ord = M[id_ord]; if (ord.price >= getHighest()) { update(timestamp); } auto it = maxPrices.find(ord.price); maxPrices.erase(it); M.erase(id_ord); } 

This approach has a complexity of nlogn where n is the number of active orders.

Is there any faster asymptotic and/or more elegant approach to solving this problem?

10
  • does the algorithm have to be online or offline can work ? Commented Sep 8, 2017 at 14:29
  • with some bookkeeping you can get getMax in O(1) easily and for inserting and removing elements there are better containers than a map Commented Sep 8, 2017 at 14:29
  • @marvel308 offline. You have the whole input up front. Commented Sep 8, 2017 at 14:31
  • @tobi303 I already have getMax with O(1). map keep its element sorted, so I can pick the maximum in O(1). Regarding alternatives to map, I am having trouble coming up with a better alternative. Can you suggest me something I can take a look at? Commented Sep 8, 2017 at 14:35
  • Search the internet for "database normal forms". Commented Sep 8, 2017 at 14:35

1 Answer 1

2

I recommend you take the database approach.

Place all your records into a std::vector.

Create an index table, std::map</* key type */, size_t>, which will contain a key value and the index of the record in the vector. If you want the key sorted in descending order, also supply a comparison functor.

This strategy allows you to create many index tables without having to re-sort all of your data. The map will give good search times for your keys. You can also iterate through the map to list all the keys in order.

Note: with modern computers, you may need a huge amount of data to provide a significant timing improvement between a binary search (map) and an linear search (vector).

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.