Skip to main content
26 of 47
added 708 characters in body

Summary

 Type BST (*) Heap Insert average log(n) 1 Insert worst log(n) log(n) Find any worst log(n) n Find max worst 1 (**) 1 Create worst n log(n) n Delete worst log(n) log(n) 

All average times on this table are the same as their worst times except for Insert.

  • *: everywhere in this answer, BST == Balanced BST, since unbalanced sucks asymptotically
  • **: using a trivial modification explained in this answer

Advantages of binary heap over a BST

Advantage of BST over binary heap

  • search for arbitrary elements is O(log(n)). This is the killer feature of BSTs.

    For heap, it is O(n) in general, except for the largest element which is O(1).

"False" advantage of heap over BST

Average binary heap insert is O(1)

Sources:

Intuitive argument:

  • bottom tree levels have exponentially more elements than top levels, so new elements are almost certain to go at the bottom
  • heap insertion starts from the bottom, BST must start from the top

In a binary heap, increasing the value at a given index is also O(1) for the same reason. But if you want to do that, it is likely that you will want to keep an extra index up-to-date on heap operations https://stackoverflow.com/questions/17009056/how-to-implement-ologn-decrease-key-operation-for-min-heap-based-priority-queu e.g. for Dijkstra. Possible at no extra time cost.

BST cannot be efficiently implemented on an array

Heap operations only need to bubble up or down a single tree branch, so O(log(n)) worst case swaps, O(1) average.

Keeping a BST balanced requires tree rotations, which can change the top element for another one, and would require moving the entire array around (O(n)).

Heaps can be efficiently implemented on an array

Parent and children indexes can be computed from the current index as shown here.

There are no balancing operations like BST.

Delete min is the most worrying operation as it has to be top down. But it can always be done by "percolating down" a single branch of the heap as explained here. This leads to an O(log(n)) worst case, since the heap is always well balanced.

If you are inserting a single node for every one you remove, then you lose the advantage of the asymptotic O(1) average insert that heaps provide as the delete would dominate, and you might as well use a BST. Dijkstra however updates nodes several times for each removal, so we are fine.

Dynamic array heaps vs pointer tree heaps

Heaps can be efficiently implemented on top of pointer heaps: https://stackoverflow.com/questions/19720438/is-it-possible-to-make-efficient-pointer-based-binary-heap-implementations

The dynamic array implementation is more space efficient. Suppose that each heap element contains just a pointer to a struct:

  • the tree implementation must store three pointers for each element: parent, left child and right child. So the memory usage is always 4n (3 tree pointers + 1 struct pointer).

    Tree BSTs would also need further balancing information, e.g. black-red-ness.

  • the dynamic array implementation can be of size 2n just after a doubling. So on average it is going to be 1.5n.

You might still want to use the tree heaps however if the latency of extending the dynamic array is too serious for you as mentioned here.

Philosophy

  • BSTs maintain a global property between a parent and all descendants (left smaller, right bigger).

    The top node of a BST is the middle element, which requires global knowledge to maintain (knowing how many smaller and larger elements are there).

    This global property is more expensive to maintain (log n insert), but gives more powerful searches (log n search).

  • Heaps maintain a local property between parent and direct children (parent > children).

    The top note of a heap is the big element, which only requires local knowledge to maintain (knowing your parent).

Doubly-linked list

A doubly linked list can be seen as subset of the heap where first item has greatest priority, so let's compare them here as well:

  • insertion:
    • position:
      • doubly linked list: the inserted item must be either the first or last, as we only have pointers to those elements.
      • binary heap: the inserted item can end up in any position. Less restrictive than heap.
    • time:
      • doubly linked list: O(1) worst case we have pointers to the items, and the update is really simple
      • binary heap: O(1) average, thus worse. Tradeoff for having more general insertion position.
  • search: O(n) for both

An use case for this is when the key of the heap is the current timestamp: in that case, new entries will always go to the beginning of the list. So we can even forget the exact timestamp altogether, and just keep the position in the list as the priority.

This can be used to implement an LRU cache. Just like for heap applications like Dijkstra, you will want to keep an additional hashmap from the key to the corresponding node of the list, to find which node to update quickly.

Failed benchmark C++ stlib experimental attempt

I tried to benchmark the C++ std::set (BST-based ?) and std::priority_queue (heap based?) with this code and this plot script to see if I was right, but I cannot interpret the std::set results which are basically constant, maybe someone can enlighten me why I don't see a logarithm instead:

enter image description here

Heap is constant as expected, and we can even see the dynamic array resize points.

Ubuntu 18.04, GCC 7.3, i7-7820HQ CPU, DDR4 2400 MHz RAM, Lenovo Thinkpad P51.

See also

Similar question on SO: https://stackoverflow.com/questions/6147242/heap-vs-binary-search-tree-bst