General issues with stating comparison-based algorithms as a pure maths problem
Check out discussions on comparison-based algorithms (a class of algorithms that includes most sorting algorithms) [0]. It is a good starting point for formalisation even though the definition of these algorithms in the literature remains too vague to satisfy a pure mathematician.
Your question on what constitutes an algorithm and what are the basic operations is answered in [0] and [4]. For imperative programming languages basic operations typically are the lower-level operations carried out by higher level constructs (such as loops, conditionals etc). These can be specified from the outset when specifying a language.
A common ``definition" is that comparison-based algorithms are algorithms for which each basic operation (swap, assignment etc.) must be based on a prior comparison (or sequence of comparisons). This definition of comparison-based algorithms has a gap (not the only one). What really matters is that the basic operations are decided on the basis of the transitive closure of the outcome of prior comparisons.
A second issue to look into is the relationship between comparison-based algorithms and partial order production (this will help with your question on lower bounds). This is rarely spelled out in text books. Partial order production is (among other aspects) an approach to establish more general lower bounds for comparison-based algorithms that need not be sorting algorithms[1]. Partial order production involves the topological sort model discussed below.
A third issue (for mathematicians) is the specification of the operations involved. The tightest presentation would be Knuth's volume III on sorting and searching (The Art of Computer Programming, Vol III), an excellent resource already mentioned upstream. Even at that level of precision a pure mathematician will want to abstract away from the detail, which is a challenge in its own right. See below on the tension between code and maths.
A fourth issue is the distinction between implicit data structures [2] and explicit ones. For the case of heap sort, the list (array) data structure is typically used. However, the implicit data structure is that of a heap which is in the programmer's mind when writing the code, even though the heap data structures are not explicitly used in the code (only arrays are used).
Finally there is the status of algorithm analysis in general which remains a mixture of pure and applied maths. Many techniques are tailored to the algorithm under consideration. There is no unified approach. Knuth's TAOCP Vol III would have the most detailed info on a formal approach (be it with methods tailored still to various algorithms). Flajolet's work on algorithm analysis can be viewed as a way towards unification (though the operations used at times veer away from commonly used operations in computer science to ensure that generating functions can be applied. This as always creates a tension between computer science and maths due to shoe-horning code to fit the maths). I wrote a book on the topic focused on a unifying theory, remaining closer to standard computation [3]. It comes with the disclaimer (from this pure mathematician and computer scientist) that it will still not fully satisfy a pure mathematician.
A lot of work remains to be done to bring the theory to the right level of abstraction to pass a pure mathematician's expectations (your original question I believe). The game is to balance clarity and precision and achieve unification while respecting the actual goals of the discipline (CS in this case).
It seems that your questions relate as well to what happens at the hardware level. I would recommend A Guide to Experimental Algorithmics [4]. This area is a halfway house between purely experimental algorithmic analysis (too machine dependent) and theoretical analysis (machine independent, but too far removed from practice in some ways). It uses techniques similar to a controlled lab environment in Computer Science.
The above sketches why defining comparison-based algorithms formally is challenging. It can be done. For instance [3] does it for a restricted class (specifically aimed at realising modular average-case timing). [1] takes the view that all comparison-based algorithms have computations that (for lists of fixed size $n$) can be modelled by decision trees. Hence [1] takes the view that the algorithms are identified with the trees (i.e. the algorithms for partial order production in [1] are simply taken to be decision trees with specific properties). Once that leap is made, the pure maths context arises, based on decision trees and topological sorts. It side steps your question on a formal framework in which general comparison-based computation is formulated in a way that satisfies a pure mathematician (or, say, a theoretical computer scientist). I believe it certainly can be done, i.e. a logic should be feasible in which comparison-based computation is captured. I don't know of a reference that does this (and would be interested if anyone has one). Afaik the hand-waving definition is used throughout the literature (all basic operations must be based on prior comparisons).
A starting point to address the pure maths formulation [1,3,5]
The following approach can be regarded as a pure maths formulation of the problem. It is related to Yao's approach in [1] as well as the approach in [3] where both use topological sorts to model comparison-based computation Rather than focusing on the code, the focus is on the input-output functions associated with operations performed in the code (a semantics style approach).
The designer's rule of engagement: transitive closure
A common definition of comparison-based algorithms is that each basic operation (swap, assignment etc.) must be based on a prior comparison or sequence of comparisons. This definition has an unfortunate gap. What really matters is that the basic operations are decided on the basis of the transitive closure of the outcome of prior comparisons. Let $k \geq 2$.
For a sorting algorithm such as bubble sort, each swap between list elements is the direct result of a comparison between these two elements. For quicksort, two elements $a, b$ may be compared to a pivot element $p$, and may be swapped on the basis that $a > p$ and $p > b$ even though $a$ and $b$ never have been compared directly. The swap is executed on the basis of the transitive closure of the information gained on the ordering of elements up to that point of the computation. At that stage of the computation the algorithm designer knows all order relations between compared pairs of labels as well as any legitimately inferred order relations, i.e. obtained through transitive closure. It is this collection of relations, and only this collection, which supports the designer's decisions to include the execution of basic code-operations.
The design of every comparison-based algorithm requires that following each comparison made in the computation, a swap or other basic computational step, is made on the basis of the order established via the prior comparisons. This order regards the transitive closure of all pairs for which the correct order has been determined. This sets the rules of engagement for the algorithm designer who can never make the algorithm carry out a basic operation outside this context.
Note that there is no requirement on the designer to make swaps consistent with a given order. The designer needs to ensure that the algorithm produces the correct outputs. Whether executed swaps bring the algorithm closer to this goal or not is a decision to be made, but not a decision that is enforced. The only requirement is that a "target" order is achieved in the end (given as part of the specifications for the algorithm). To impose a target order which must be achieved in the end, the topological sort model is introduced.
Filtering order-info and permuting data accordingly
Each comparison made during a comparison-based computation determines whether two labels occur in order or out of order. Every comparison filters out one additional bit of information on the data: 0 for an out-of-order pair and 1 for an in-order pair.
When a comparison has been made between two labels $a$ and $b$ establishing that $a \leq b$ in the linear order on labels we refer to the pair $(a,b)$ as a filtered pair. In case $a \geq b$ is established, we refer to the pair $(b,a)$ as a filtered pair.
The established input-order
For any execution of a comparison-based algorithm on an input $I$ and any comparison made in this execution, the established input-order (up to that point in the computation) is the partial order obtained by taking the transitive-reflexive closure of the filtered pairs obtained during the computation up to and including this comparison.
The input-order is the partial input order reached when producing the output for $I$, i.e. the established input-order for the last comparison made.
In the execution of a comparison-based algorithm, two labels can be swapped (or moved using other basic operations). A swap is a permutation that constitutes a single transposition on the input data, exchanging two labels.
Comparison-based algorithms execute by:
- (implicitly) filtering the data via comparisons and
- (explicitly) permuting data using only swaps (or other basic operations) on pairs included in the established input-order.
The filtering process
The filtering process implicitly gathers more information on an input's linear order on labels. This information is captured by the established input-order. The filtering process also partitions the inputs into sets satisfying the established input-order.
Consider any comparison-based sort and any input list $L$ of size $n$ storing, for the sake of simplicity, distinct elements. Once a first comparison is made determining the order between two list elements $L[i]$ and $L[j]$ with $i < j$, the possible inputs for which this decision will be made is halved (from $n!$ to $\frac{n!}{2}$ lists). These inputs now only regard the lists for which $L[i]< L[j]$ if this was the outcome of the comparison, or the lists for which $L[j] > L[i]$ otherwise.
Comparison-based sorting algorithms gradually gather information on pair-orderings, captured in the established input-order for a list's elements. At any point of the computation, i.e. after each comparison, this order can be interpreted to filter out the input lists that do not satisfy the order established up to that point.
The process of filtering out inputs that do not satisfy the established order is implicit. It is not part of the code. It is however part of the programmer's intended design for the algorithm.
The permutation process
The permutation process is executed by each comparison-based algorithm as determined by its code. For any input, a comparison-based algorithm can execute a series of swaps (or, more generally, basic operations) on pairs of labels. The composition of these swaps forms a permutation which transforms the original input into an output. At this stage we focus solely on comparison-based sorting algorithms.
Following each comparison, every comparison-based sorting algorithm has the option to permute data-labels to fit the intended sorted order of the output lists. For instance, if lists elements $L[i]$ and $L[j]$ are determined to occur out of order by a comparison-based sort, i.e. $L[j]> L[i]$ where $i<j$, then a swap may be made to impose their relative order consistent with the sorted output to be computed.
For any comparison-based sort the combined process of input-filtering and input-permuting has the following effect:
By the time an output has been computed, the possible inputs have been filtered down to a single list.
The permutation-process executed by the algorithm on this input, i.e. the composition of the collection of swaps carried out up to that point on the input, produces the sorted output.
This leads naturally to the following mathematical model capturing "source" and "target" orders for comparison-based computation.
The topological sort model
The mathematical model
The mathematical model for comparison-based algorithms [1] and [3] consists of two parts:
(a) modelling data structures as partially-ordered finite sets;
(b) modelling data on these by topological sorts;
In this view, an abstract specification of a sorting algorithm has input state given by any possible permutation of a finite set of elements (represented, according to (a) and (b), by a discrete partially-ordered set together with its topological sorts given by all permutations) and output state a sorted list of elements (represented, again according to (a) and (b), by a linearly-ordered finite set with its unique topological sort).
Example 1: The (unordered) input lists of size 2 of a sorting algorithm are modelled by the topological sorts of a discrete order of size 2 over the set of elements $\{x_1,x_2\}$: $$\{(x_1: 1,x_2:2),(x_1:2,x_2:1)\}$$
The function values $s(x_1) = 2$ and $s(x_2) = 1$ of, say, the topological sort $s = (x_1:2,x_2:1)$ (over the discrete order of size 2) are referred to as labels and correspond, for the case of list data structure, to the list's elements. The location $i$ of a label $a$ for which $s(x_i) = a$ is referred to as the label's index and corresponds to the location of an element in a list. The list $(2,1)$ contains the element 2 in position 1 (i.e., the index of $x_1$) and the element 1 in position 2 (i.e. the index of $x_2$). We say that the index of label 2 is 1 and the index of label 1 is 2 for this topological sort. The 2! topological sorts $\{(x_1: 1,x_2:2),(x_1:2,x_2:1)\}$ consist of 2 permutations representing the unordered lists $(1,2)$ and $(2,1)$. These topological sorts form the root states that lists of size 2 (with distinct elements) can occur in. Indeed, a list of size 2 is either sorted, represented by $(1,2)$, or reverse sorted, represented by $(2,1)$. Together, these topological sorts form the state space. A state space intuitively serves to represent the uniform distribution over the data: each of the infinitely many possible input lists $(a,b)$ of size 2 (with distinct elements) is assumed to occur with equal probability in one of the two root states of the state space. This interpretation serves to underpin the traditional complexity analysis of algorithms, which has a uniform distribution as starting point.
Example 2: Trivial sort of lists of size 2
Computations will transform topological sorts to new topological sorts. All computations will be based on comparisons. For instance, consider a sorting algorithm, be it a very primitive one, that operates only over lists of size 2 and executes a single comparison on the two elements of the list (the labels of the corresponding topological sort). This comparison is followed by a swap of the labels in case these labels are out of order. Such an algorithm leaves the topological sort $(x_1: 1,x_2:2)$ unchanged and transform the topological sort $(x_1:2,x_2:1)$ via a single swap to the topological sort $(x_1:1,x_2:2)$. Sorting, in this model, produces the unique topological sort over the linear order.
Part (a) of the model description stipulates that we use finite partial orders to represent data structures. This order may be implicitly or explicitly represented in the output data structure depending on the implementation. For instance, in the case of a heap-formation from a unordered input list, the computation can establish the heap explicitly by transforming the input list into a binary tree data structure satisfying the heap property, and constitutes a de facto heap. Alternatively, the algorithm may take an input list and retain the list data structure for its outputs. Elements of the input list will be reorganised in place, i.e. a new list will be produced, for which the elements satisfy a heap structure. The tree-structure underlying this heap remains an ``implicit" part of the implementation. For all purposes the algorithm makes use of the heap-structure intended by the programmer, but the data structure remains a list at all times during the computation. This is for instance the case for the Heapify process in traditional (in-place) Heapsort. We refer to the heap-structure in that case as the implicit data structure and the list data structure as the explicit data structure. These may coincide or not depending on the implementation. In our context, partial orders model the implicit data structure.
A basic example of the computational model: sorting algorithms
We illustrate (a), (b) of the model for a sorting algorithm operating over lists of size $n$.
For sorting algorithms, inputs are list data structures, represented by finite discrete orders. The elements of the order are labelled with positive integers drawn from a linearly ordered label set $\mathcal{L}$, which in this case is the usual linear order on the positive integers. The only requirement on this labeling is that its combination with the discrete order forms a topological sort.
It is possible to deal with lists that have repeated elements. These would need to be modelled by topological sorts for which conditions are relaxed to allow for repeated labels. See [3] for a discussion of how repeated labels can be handled through the assignment of random tie-breakers. It is standard practice in algorithmic analysis to undertake the analysis in first instance for lists without duplicate elements--an approach adopted here.
Sorting algorithms hence transform topological sorts of the discrete order (permutations) into the unique topologic sort of the linear order (the sorted list).
Sorting algorithms are a restrictive type of comparison-based algorithm. The inputs are unordered lists of size $n$ that occur in $n!$ possible states. The outputs occur in exactly one state, the sorted output list $[1,2, 3, \ldots, n]$.
We consider more general comparison-based algorithms, operating over inputs and outputs that need not be a list data structure.
Source and target of an algorithm
Consider a sequence $\alpha = (\alpha_n)_n$ of finite posets of size $n$ and a sequence $\beta= (\beta_n)_n$ of refinements of $(\alpha_n)_n$, i.e. $\alpha_n$ refines the order $\beta_n$ for every $n$. The pair $(\alpha,\beta)$ is called a refinement with a source $\alpha =(\alpha_n)_n$ and a target $\beta = (\beta_n)_n$.
An algorithm $A$ realises the refinement $(\alpha,\beta)$ in case:
the inputs of size $n$, considered as a set, form the state space over the source order $\alpha$. Recall that for any finite poset $\alpha$, the state space $\Sigma(\alpha)$ is the set of all topological sorts with labels in $\{1,\ldots,n\}$ where $n = |\alpha|$.
the corresponding outputs, considered as a set, form the state space $\Sigma(\beta_n)$.
$\Sigma(\alpha_n)$ is referred to as the source-space (for inputs of size $n$).
$\Sigma(\beta_n)$ is referred to as the target-space (for inputs of size $n$).
With some abuse of terminology: if $\alpha = (\alpha_n)_n$ is the source of such a comparison-based algorithm $A$ and we consider inputs of size $n$, then $\alpha_n$ is also referred to as ``the source".
One can consider a comparison-based algorithm as an algorithm which executes basic operations solely based on the transitive closure of prior comparisons, filtering and permuting at each stage, such that for a given source $\alpha$ and a refining target $\beta$, the algorithm realises the refinement. Alternatively, one can abstract away from code and focus entirely on decision trees [1].
Example: For a comparison-based sorting algorithm $A$ the source is the sequence $\Delta = (\Delta_n)_n$ of discrete orders of size $n$ and the target is the sequence $L = (L_n)_n$ of linear orders of size $n$.
For the comparison-based algorithm heapify which transforms unordered lists of size $n$ to heaps, the source is the sequence of discrete orders $\Delta = (\Delta_n)_n$. The target is the sequence $H = (H_n)_n$ where $H_n$ is the poset determined by the Hasse diagram of a heap of size $n$.
The topological sort model underpins complexity lower-bounds for general comparison-based algorithms (i.e. algorithms that need not be sorts)
The entropic ratio
If $A$ is a comparison-based algorithm with inputs of size $n$ occurring $k$ states and outputs occurring in $l$ states then the source-entropy (i.e. the inputs entropy) $H^{s}_{A}(n) = log_2{k}$ and the target-entropy $H^{t}_{A}(n) = log_2(l)$.
Entropic Ratio (ER) The source states as well as the target states are assumed to occur with equal likelihood, i.e. are uniformly distributed.
Consider a comparison-based algorithm $A$ with inputs with root states over the source $\alpha_n$ and outputs with root states over the target $\beta_n$. Let $k = |\Sigma(\alpha_n)|$ denote the source-space size and $l = |\Sigma(\beta_n)|$ the target-space size.
The entropic ratio is the the difference between the source and target entropies denoted by $\Delta H_{A}$: $$\Delta H_{A}(n) = H^{s}_{A} (n)- H^{t}_{A}(n) = log_{2}(\frac{k}{l})$$ the $log_2$ of the ratio of the source-space size over the target-space size.
Theorem (see [1], [5])(The ER lower bound): Consider a comparison-based algorithm $A$ with source $\alpha$ and target $\beta$, and with inputs of $A$ of size $n$ occurring in $k$ root states and outputs occurring in $l$ root states, i.e. $A$ has an entropic ratio $\Delta H_{A} = log_2{\frac{k}{l}}$.
The following lower-bounds on worst-case and average-case comparison time hold for every comparison-based algorithm $A$ of this nature: $$T^{W}_{A}(n) \geq \Delta H_{A} \mbox{ and }\overline{T}_{A}(n) \geq \Delta H_{A}$$
Lower-bound for comparison-based sorting algorithms $A$
The lower-bound is $$\Delta H_{A}(n) = log_{2}(\frac{n!}{1}) =log_2(n!)$$ The ER in this case is the usual information theoretic lower-bound, i.e. the entropy of the inputs providing a lower-bound for the worst-case and average-case time.
Lower-bound for constructing perfect heaps of size $7$ Consider a comparison-based algorithm $H^{*}$ constructing perfect heaps from unordered lists of size $7$. There are $\frac{7!}{63}$ perfect heaps of size $7$. Hence $$\Delta H_{H^{*}}(7) = log_{2}(\frac{7!}{\frac{7!}{63}}) = 63$$
The topological sort model of [1] is the closest approach to a pure math problem statement for comparison-based computation. It does not delve into the syntax (code) of the algorithms, but identifies comparison-based algorithms with decision trees modelling their computation paths.
[0] The Art of Computer Programming, Vol III, Sorting and Searching, D. Knuth, Addison-Wesley, 1998.
[1] On the complexity of partial order productions, Andrew Chi-Chih Yao, SIAM J. COMPUT. Vol. 18, No. 4, pp. 679-689, Society for Industrial and Applied Mathematics, 1989.
[2] Munro, J.Ian; Suwanda, Hendra; "Implicit data structures for fast search and update". Journal of Computer and System Sciences. 21 (2): 236–250, 1980.
[3] Schellekens, M.; A Modular Calculus for the Average Cost of Data Structuring, 2008.
[4] McGeogh, Catherine c.; A Guide to Experimental Algorithmics, CUP. 2012.
[5] Schonhage, A. (1976), The production of posets, Asterisque 38-39:229-246.