Skip to main content
added 2305 characters in body
Source Link
Per
  • 2.6k
  • 14
  • 18

Theoretically speaking, a van Emde Boas tree is your best bet, with O(log log M)-time operations where M isTheoretically speaking, a van Emde Boas tree is your best bet, with O(log log M)-time operations where M is the size of the range. The space usage is quite large, though there are more efficient variants.

Actually the sizetheoretical state of the range. The space usageart is quite largedescribed in the paper On Range Reporting in One Dimension, though there are more efficient variantsby Mortensen, Pagh, and Patrascu.

I'm not sure if the existing lower bounds rule out O(1), but M won't be large enough to make the distinction matter. Instead of the vEB structure, I would just use a k-ary trie with k a power of two like 32 or 64.

EDIT: here's one way to do range search with a trie.

Let's assume each datum is a bit pattern (easy enough; that's how the CPU think of it). Each subtree consists of all of the nodes with a certain prefix. For example, {0000, 0011, 0101, 1001} is represented by the following 4-ary trie, where X denotes a null pointer.

+---+---+---+---+ |00\|01\|10\|11X| +--|+--|+--|+---+ | | | | | +----------------------------+ +--+ | | | +------------+ | | | | v v v +---+---+---+---+ +---+---+---+---+ +---+---+---+---+ |00\|01X|10X|11\| |00X|01\|10X|11X| |00X|01\|10X|11X| +--|+---+---+--|+ +---+--|+---+---+ +---+--|+---+---+ | | | | v v v v 0000 0011 0101 1001 

A couple optimizations quickly become apparent. First, if all of the bit patterns are the same length, then we don't need to store them at the leaves—they can be reconstructed from the descent path. All we need is the bitmap, which if k is the number of bits in a machine word, fits nicely where the pointer from the previous level used to be.

+--------+--------+--------+--------+ |00(1001)|01(0100)|10(0100)|11(0000)| +--------+--------+--------+--------+ 

In order to search the trie for a range like [0001, 1000], we start at the root, determine which subtrees might intersect the range and recurse on them. In this example, the relevant children of the root are 00, 01, and 10. The relevant children of 00 are the subtrees representing the prefixes 0001, 0010, and 0011.

For k fixed, reporting from a k-ary trie is O(log M + s), where M is the number of bit patterns and s is the number of hits. Don't be fooled though—when k is medium, each node occupies a couple cache lines but the trie isn't very high, so the number of cache misses is pretty small.

Theoretically speaking, a van Emde Boas tree is your best bet, with O(log log M)-time operations where M is the size of the range. The space usage is quite large, though there are more efficient variants.

I'm not sure if the existing lower bounds rule out O(1), but M won't be large enough to make the distinction matter. Instead of the vEB structure, I would just use a k-ary trie with k a power of two like 32 or 64.

Theoretically speaking, a van Emde Boas tree is your best bet, with O(log log M)-time operations where M is the size of the range. The space usage is quite large, though there are more efficient variants.

Actually the theoretical state of the art is described in the paper On Range Reporting in One Dimension, by Mortensen, Pagh, and Patrascu.

I'm not sure if the existing lower bounds rule out O(1), but M won't be large enough to make the distinction matter. Instead of the vEB structure, I would just use a k-ary trie with k a power of two like 32 or 64.

EDIT: here's one way to do range search with a trie.

Let's assume each datum is a bit pattern (easy enough; that's how the CPU think of it). Each subtree consists of all of the nodes with a certain prefix. For example, {0000, 0011, 0101, 1001} is represented by the following 4-ary trie, where X denotes a null pointer.

+---+---+---+---+ |00\|01\|10\|11X| +--|+--|+--|+---+ | | | | | +----------------------------+ +--+ | | | +------------+ | | | | v v v +---+---+---+---+ +---+---+---+---+ +---+---+---+---+ |00\|01X|10X|11\| |00X|01\|10X|11X| |00X|01\|10X|11X| +--|+---+---+--|+ +---+--|+---+---+ +---+--|+---+---+ | | | | v v v v 0000 0011 0101 1001 

A couple optimizations quickly become apparent. First, if all of the bit patterns are the same length, then we don't need to store them at the leaves—they can be reconstructed from the descent path. All we need is the bitmap, which if k is the number of bits in a machine word, fits nicely where the pointer from the previous level used to be.

+--------+--------+--------+--------+ |00(1001)|01(0100)|10(0100)|11(0000)| +--------+--------+--------+--------+ 

In order to search the trie for a range like [0001, 1000], we start at the root, determine which subtrees might intersect the range and recurse on them. In this example, the relevant children of the root are 00, 01, and 10. The relevant children of 00 are the subtrees representing the prefixes 0001, 0010, and 0011.

For k fixed, reporting from a k-ary trie is O(log M + s), where M is the number of bit patterns and s is the number of hits. Don't be fooled though—when k is medium, each node occupies a couple cache lines but the trie isn't very high, so the number of cache misses is pretty small.

Source Link
Per
  • 2.6k
  • 14
  • 18

Theoretically speaking, a van Emde Boas tree is your best bet, with O(log log M)-time operations where M is the size of the range. The space usage is quite large, though there are more efficient variants.

I'm not sure if the existing lower bounds rule out O(1), but M won't be large enough to make the distinction matter. Instead of the vEB structure, I would just use a k-ary trie with k a power of two like 32 or 64.