Return to Answer

added 540 characters in body

edited Dec 13, 2014 at 9:20

25k
3
48
101

Let's assume for the moment that the sort keys are all distinct, and think about the information theory of sorting.

An unsorted array is a permutation of the sorted array. There are $n!$ possible permutations. The job of sorting, in an information theoretic sense, is to discover precisely which permutation it is.

To transmit a number between $1$ and $m$ requires transmitting $\log_2 m$ bits of information. To transmit a permutation of $n$ elements, therefore, requires $\log_2 n!$ bits of information. By Stirling's approximation, this turns out to be $n \log_2 n + O(\hbox{low order})$ bits.

A binary comparison operation discovers one bit of information. It follows that any sorting algorithm which only uses a binary comparison operation must perform $O(n \log n)$ comparisons.

A radix sort beats this by discovering more than one bit per query. If you can discover $\Theta(\log n)$ bits per query, for example, then you can sort using $O(n)$ queries.

EDIT The key word in that last sentence is "if".

Suppose that we have $n$ distinct integers, and we wish to radix sort them using $k$ buckets.

Clearly, each query will discover $\log_2 k$ bits of information. It follows that any such sort must perform $\frac{n \log_2 n}{\log_2 k}$ queries (ignoring the low-order term).

If $k = 2$, then this has the same complexity as a comparison-based sort. However, if $k = \Theta(\log n)$, then sorting will be linear. In most cases, $k$ will be somewhere in between these two values.

Let's assume for the moment that the sort keys are all distinct, and think about the information theory of sorting.

An unsorted array is a permutation of the sorted array. There are $n!$ possible permutations. The job of sorting, in an information theoretic sense, is to discover precisely which permutation it is.

To transmit a number between $1$ and $m$ requires transmitting $\log_2 m$ bits of information. To transmit a permutation of $n$ elements, therefore, requires $\log_2 n!$ bits of information. By Stirling's approximation, this turns out to be $n \log_2 n + O(\hbox{low order})$ bits.

A binary comparison operation discovers one bit of information. It follows that any sorting algorithm which only uses a binary comparison operation must perform $O(n \log n)$ comparisons.

A radix sort beats this by discovering more than one bit per query. If you can discover $\Theta(\log n)$ bits per query, for example, then you can sort using $O(n)$ queries.

Let's assume for the moment that the sort keys are all distinct, and think about the information theory of sorting.

An unsorted array is a permutation of the sorted array. There are $n!$ possible permutations. The job of sorting, in an information theoretic sense, is to discover precisely which permutation it is.

To transmit a number between $1$ and $m$ requires transmitting $\log_2 m$ bits of information. To transmit a permutation of $n$ elements, therefore, requires $\log_2 n!$ bits of information. By Stirling's approximation, this turns out to be $n \log_2 n + O(\hbox{low order})$ bits.

A binary comparison operation discovers one bit of information. It follows that any sorting algorithm which only uses a binary comparison operation must perform $O(n \log n)$ comparisons.

A radix sort beats this by discovering more than one bit per query. If you can discover $\Theta(\log n)$ bits per query, for example, then you can sort using $O(n)$ queries.

EDIT The key word in that last sentence is "if".

Suppose that we have $n$ distinct integers, and we wish to radix sort them using $k$ buckets.

Clearly, each query will discover $\log_2 k$ bits of information. It follows that any such sort must perform $\frac{n \log_2 n}{\log_2 k}$ queries (ignoring the low-order term).

If $k = 2$, then this has the same complexity as a comparison-based sort. However, if $k = \Theta(\log n)$, then sorting will be linear. In most cases, $k$ will be somewhere in between these two values.

answered Dec 12, 2014 at 5:16

25k
3
48
101

Let's assume for the moment that the sort keys are all distinct, and think about the information theory of sorting.

An unsorted array is a permutation of the sorted array. There are $n!$ possible permutations. The job of sorting, in an information theoretic sense, is to discover precisely which permutation it is.

To transmit a number between $1$ and $m$ requires transmitting $\log_2 m$ bits of information. To transmit a permutation of $n$ elements, therefore, requires $\log_2 n!$ bits of information. By Stirling's approximation, this turns out to be $n \log_2 n + O(\hbox{low order})$ bits.

A binary comparison operation discovers one bit of information. It follows that any sorting algorithm which only uses a binary comparison operation must perform $O(n \log n)$ comparisons.

A radix sort beats this by discovering more than one bit per query. If you can discover $\Theta(\log n)$ bits per query, for example, then you can sort using $O(n)$ queries.