Who first showed the famous best worst case complexity on sorting algorithms?

Question

It is common knowledge that a sorting algorithm based on comparisons cannot have a worst case complexity better than $O(n \log n)$.

This can be demonstrated by a counting argument on the number of permutations of $n$ objects, using Stirling's asymptotic formula for the factorial.

My question is: who was the first person to demonstrate this fundamental limitation and when?

Permutation counting arguments including the Sterling's formula already appeared in the thesis of Demuth (1956). See section 1.a. "Minimum Number of Two-Key comparisons in the worst case". — Georg Essl
– Georg Essl, Commented Apr 5 at 11:13

user6530 · Accepted Answer · 2025-04-06 16:43:27Z

In Mathematical Snapshots (second edition, 1950) pp. 38-39, Hugo Steinhaus discusses the problem of ordering $n$ objects with comparisons with respect to a transitive relation. The book, addressed to a general audience, presented the problem of ordering weights by having a scale, and then that of rank $n$ tennis players, thus generalising a problem proposed by Steinhaus himself in a mathematical seminar during the 1929-1930 academic year, in which he asked to find the minimum number of matches needed to determine the first and second best player in a tennis tournament (the problem was solved in 1932 by Schreier, but his demonstration was wrong, the first correct demonstration was only published in 1964 by Kislitsym). De facto, he describes binary insertion and shows that the problem can be solved with $M(n)=1+nS(n)-2^{S(n)}$ comparisons, where $S(n)=1+[\log_2(n)]$. In the volume, Steinhaus states that "has not been proved that there is no shorter proceeding possible, but we rather think it to be true".

Several years later, in Some remarks about tournaments. Calcutta Math. Soc. Golden Jubilee Commemoration Vol. (1958/59), Part II, pp. 323-327, Steinhaus reports that two of his colleagues, Stanisław Trybuła and Czen Ping, disproved his own conjecture by calculating the exact values of $S(n)$ for $n\leq 11$ (but, apparently, without publishing their results). From this fact, Knuth, in TAOCP, second edition, volume 3, p. 186, argues that Trybuła and Ping may have independently discovered the method known as merge-insertion, proposed by Lestor R. Ford, Jr. and Selmer M. Johnson in A tournament problem. The American Mathematical Monthly, 66(5):387-389, 1959. In this paper, the authors introduce the so-called decision tree model and prove the lower bound $1+[\log_2(n!)]$, which is in $\Theta(n\log(n))$.

The merge-insertion sort or Ford-Johnson algorithm uses fewer comparisons in the worst case than the best previously known algorithms, i.e., binary insertion sort and merge sort. Merge-insertion sort is the sorting algorithm with the minimum possible comparisons for $n\leq 22$ items and it has the fewest comparisons known for $n \leq 46$. Only in 1979, Glenn Manacher published another sorting algorithm that used even fewer comparisons, for large enough inputs.

Unfortunately, at this point the story gets a little complicated. As correctly pointed out by Georg Essl, a permutation counting arguments already appeared in Dumuth's thesis Electronic Data Sorting (1956). This was "perhaps the first publication to deal in any detail with questions of computability complexity. Demuth considered several abstract models for sorting devices, and established lower and upper bounds on the mean and maximum, execution times achievable with each model." [Knuth, cit., p. 246] "It considered three abstract models of the sorting problem, using cyclic, linear, and random-access memories [... A]lthough no practical consequences flowed immediately from Demuth's thesis, it establish important ideas about hot to link theory with practice" [Knuth, cit., p. 388]. "[... ] one of Demuth's main results was that in a certain sense bubble sorting is the optimum way to sort. [...] It turns out that [all other sorting methods studied] are always better in practice, in spite of the fact that Demuth has proved the optimality of bubble sorting on a certain peculiar type of machine. [Knuth, The Dangers of Computer Science Theory, pp. 190-191,emphasis mine]

Knuth thus suggests that Demuth's work not only fails to take into account certain peculiarities of the implementation of the algorithm (but the same holds also for Ford-Johnson algorithm, as "their approach has fortunately never been used by programmers, because the complex method used to decide what comparison to make costs much more time than the comparisons themselves", Knuth, The Dangers of Computer Science Theory, p. 191), but that it is also too constrained by particular calculation models. Knuth, again, describes an optimal way to order five elements and states that this method "was first found by H. B. Demuth [in his] thesis, [pp.] 41-43" and that "a pleasant generalization [...] has been discovered by Lester Ford, Jr. and Selmer Johnson". [TAOCP, volume 3, p. 184]

Since I cannot (at least for now) access Demuth's thesis in its entirety, I will allow myself to formulate my own hypothesis (based on section 5.5 of TAOCP, Summary, History, and Bibliography, pp. 380-391, whose reading, if possible, I recommend). In the second half of the 1940s, the problem of sorting large amounts of data began to be tackled extensively, and many internal sorting techniques were already more or less known, but without any real theory having been developed. In 1952 Daniel Goldenberg published Time analyses of various methods of sorting data. Digital Computer Laboratory memo M-1680, where he codes and makes best-case and worst-case analysis for five different sorting methods. Then in 1956 Demuth's thesis lays the foundations of computational complexity theory and arguably states the permutation counting argument. His thesis probably remained inaccessible and circulated only in relatively restricted circles, but thanks to a number of surveys published on the subject in the years 1956-1959, his ideas were disseminated and Ford and Johnson finally made them known to a wider audience.

A very interesting answer. Your additions, if provided, will also be of further interest. — Thomas Perry
– Thomas Perry, Commented Apr 5 at 15:40
cyclic memory? Like a delay line, or tape loop that can only move one direction? Interesting, that would indeed line up well with Bubble Sort's access pattern. — Peter Cordes
– Peter Cordes, Commented Apr 6 at 19:45
I will be able to be more precise when I see Demuth's thesis, but I think he is referring to circular buffers or delay lines. I also note that Demuth's thesis is from October 1956, and that in January 1956 an IBM 650 arrived at the Electronics Lab at Stanford (where Demuth worked), which was a drum computer, meaning that the program and data resided on a rotating magnetic drum, rather than in random access memory. — user6530
– user6530, Commented Apr 6 at 20:46

Georg Essl · Accepted Answer · 2025-04-07 13:07:59Z

This is an extended comment. Primarily regarding Demuth's thesis (1956). The history here is indeed complicated. For example, permutation arguments can be found in:

Neumann, J. V., & Goldstine, H. H. (1948). Planning and coding of problems for an electronic computing instrument. Institute for Advanced Study, Princeton, New Jersey.

In context of discussing their merge sort, but the relating to Stirling's asymptotic formula in this context appears to be Demuth's contribution and in a sense brings in asymptotic analysis that feels like computational complexity theory (I assume this is one of the senses in which Knuth credits Demuth being a pioneer of the topic).

Papers of the time predate modern analysis of algorithm and complexity theory and are present at a time where the nature and shape of computer hardware was emerging, but had not fully crystalized, and the precise trajectory of their configuration uncertain.

There is a lot of discussion referencing hardware, and also a lot of arguing about complications that today we would neglect. For example, Demuth talks about "wired programs" to get rid of overhead time from some kind of programming mechanism. Today programs are generally not thought of as wired and we are comfortable to view fetching of instruction code as rather trivial and just absorbed in the constant factor of some asymptotic analysis. This is a difference in technology, and charitable reading should read these papers in its time.

The paper is very concious of the cost of memory. In fact it argues (p. 4):

The cost of this memory is usually a substantial percentage of the total cost of the sorter.

In an age of giga/terabytes this needs to be recontextualized in the consideration of the work.

It is my understanding - trying to parse Demuth - that he argues that Bubblesort (his Program I) is optimal for sequential access circular buffer architecture (p. 63f), including the worst case of $n^2$. This is not the same as arguing that Bubblesort is optimal on a RAM architecture. Demuth does discuss RAM architecture, but it seems to me that the algorithm he studies is actually merge sort (Program VI, based on his Scheme A2, p. 88) and that he correctly gives the performance analysis for merge sort (worst case $n \log n$).

I have a somewhat different interpretation to what happened here than Knuth. To me it seems Demuth's work is temporally situated, and in fact if one could criticize his work for considering architectures that became essentially obsolete, and performance and cost concerns that were relevant at the time disappeared, hence not finding just the lastingly applicable abstractions. Even there I would be cautious. One might be generous and actually view Demuth's analysis of an algorithm on a circular buffer as combining computational complexity work with analysis of data structures, and one can find methodological value if one so chooses.

Demuth's thesis is an interesting example of work that seems to have been overlooked and undercited. It does not seem to appear to get cited in numerous (often quite famous) early sorting papers. Knuth seems to be the main citer of the work for a while, and on numerous occasions highlights its pioneering contribution regarding complexity analysis. For the Stanford school the work should have been accessible, but it is true that it was a hard copy thesis/technical report.

A journal article did appear in print in 1985, some 29 years after the the thesis was finished:

Demuth, H. B. (1985). Electronic data sorting. IEEE transactions on computers, 100(4), 296-310.

"Bubble sort" (not a term that appears in Demuth) has become a kind of case for a poorly performing sorting algorithm, and its value in pedagogy of sorting has been debated. It seems to me unfortunate that Demuth's thesis is discussed through this lens, and Knuth's characterization is wielded to this effect (compare Astrachan, O. (2003). Bubble sort: an archaeological algorithmic analysis. ACM Sigcse Bulletin, 35(1), 1-5.)

Rather it seems to me sensible to describe Demuth's as showing that if you can escape sequential comparison, you can outperform bubble sort by using merge sort, a result that stood the test of time, while introducing numerous ideas of complexity analysis.

Stack Exchange Network

Who first showed the famous best worst case complexity on sorting algorithms?

2 Answers 2

Hot Network Questions

Who first showed the famous best worst case complexity on sorting algorithms?

2 Answers 2

Related

Hot Network Questions