Okay, I'll go first. This is not an answer per se to the post, but more an invitation to write a fast sorting code for machine reals. That way we can get some sense of what might be feasible (showing timings of existing implementations would also be useful; I leave that for others). In Mathematica. Using Compile, of course. The point is to illustrate a few things.
(1) It's not easy. (2) My code might be somewhat on the lame side. (3) Data locality is extremely important. (4) The Compile usage of a C compiler might be less optimized than one would like.
I'll do implementations of heapsort and mergesort. For the latter I have both a recursive and nonrecursive variant. We'll see that heapsort, at least my take on it, is hopelessly slow. I suspect that has as much to do with data nonlocality as with my own coding of the algorithm. The "R" in these stands for "reals". Also for reasons I no longer recall, I started by implementing sorts from large to small, then switched at the end to the "usual" sort. So some of these need reversing before comparing to built-in Sort.
heapSortR = Compile[{{ll, _Real, 1}}, Module[ {n = Length[ll], heap, j2, jj, res}, heap = ConstantArray[0., n]; Do[ heap[[j]] = ll[[j]]; {jj, j2} = {j, Floor[j/2]}; While[j2 >= 1 && heap[[j2]] < heap[[jj]], heap[[{j2, jj}]] = heap[[{jj, j2}]]; jj = j2; j2 = Floor[j2/2]; ]; , {j, n}]; res = ConstantArray[0., n]; Do[ res[[j]] = heap[[1]]; heap[[1]] = heap[[n - j + 1]]; jj = 1; j2 = 2; While[j2 <= n - j + 1, If[j2 <= n - j && heap[[j2]] < heap[[j2 + 1]], j2++]; If[heap[[j2]] < heap[[jj]], Break[]]; heap[[{j2, jj}]] = heap[[{jj, j2}]]; jj = j2; j2 = 2*jj; ]; , {j, n}]; res ], CompilationOptions -> {"ExpressionOptimization" -> True}, CompilationTarget -> "C", RuntimeOptions -> "Speed"]; mergeSortR = Compile[{{ll, _Real, 1}}, Module[ {n = Length[ll], n2l, n2r, left = {0.}, right = {0.}, jl = 1, jr = 1}, n2l = Floor[n/2]; If[n <= 2, Return[Reverse[Sort[ll]]]]; left = mergeSortR[Take[ll, n2l]]; right = mergeSortR[Drop[ll, n2l]]; n2r = Length[right]; Table[ If[jl <= n2l && jr <= n2r, If[left[[jl]] > right[[jr]], jl++; left[[jl - 1]] , jr++; right[[jr - 1]]] , If[jr > n2r, jl++; left[[jl - 1]] , jr++; right[[jr - 1]]] ], {n}] ], CompilationOptions -> {"ExpressionOptimization" -> True}, CompilationTarget -> "C", RuntimeOptions -> "Speed"];
--- edit remark ---
The version below is a (very) small improvement over the one I had originally posted. It is still around 5-6 times slower than built-in Sort.
--- end remark ---
mergeSortRnonrecursive = Compile[{{ll, _Real, 1}}, Module[ {n = Length[ll], m = 1, n2, jmid, jl, jr, jlnew, jrnew, lr, j1, j2, j, merged, stack, work, work2}, n2 = Floor[n/2]; If[n <= 2, Return[Reverse[Sort[ll]]]]; work = ConstantArray[0., n]; work2 = ConstantArray[0., n]; stack = ConstantArray[0, {Ceiling[n/2], 3}]; stack[[{m, m + 1, m + 2}]] = {{-1, 1, n}, {-1, 1, n2}, {1, n2 + 1, n}}; m += 3; While[m > 1, {lr, jl, jr} = stack[[m - 1]]; If[jr - jl <= 2, work[[Range[jl, jr]]] = Sort[ll[[jl ;; jr]]]; m--; While[lr == -1 && m > 1,(*merge with partner on right*) m--; {lr, jlnew, jrnew} = stack[[m]]; j1 = jl; j2 = jr + 1; Do[ If[j1 <= jr && j2 <= jrnew, If[work[[j1]] < work[[j2]], j1++; work2[[j]] = work[[j1 - 1]] , j2++; work2[[j]] = work[[j2 - 1]]] , If[j2 > jrnew, j1++; work2[[j]] = work[[j1 - 1]] , j2++; work2[[j]] = work[[j2 - 1]]] ]; , {j, jl, jrnew}]; work[[Range[jl, jrnew]]] = work2[[jl ;; jrnew]]; jr = jrnew; ]; Continue[]; ]; jmid = Floor[(jr + jl)/2]; stack[[{m, m + 1}]] = {{-1, jl, jmid}, {1, jmid + 1, jr}}; m += 2; ]; work ], CompilationOptions -> {"ExpressionOptimization" -> True}, CompilationTarget -> "C", RuntimeOptions -> "Speed"];
Here is a basic test.
SeedRandom[1111]; rvals = RandomReal[{-10, 10}, 10^6]; Timing[svalsH = heapSortR[rvals];] Timing[svalsM = mergeSortR[rvals];] Timing[svalsMnr = mergeSortRnonrecursive[rvals];] svalsH === svalsM === Reverse[svalsMnr] === Reverse[Sort[rvals]] (* {6.804000, Null} {5.136000, Null} {1.228000, Null} True *)
It should be clear that none of these is competitive in speed with the built-in Sort. Yet they do use C code under the hood. So while I do not question the possibility that Sort is needlessly slow, I will point out that it might take serious effort to make one considerably faster.
--- edit ---
One other thing to note is that there are dependencies on a "lowest level" sorter, one that gets used below a given threshold. In the code above I use built in Sort on lists of length 3 or less. If we up that threshold to 16 then the speed improves by a factor of 2.5 or so. But now we use Sort on substantially larger lists, and that, arguably, is cheating. To make it fair I'd need to write a pedestrian (perhaps O(n^2)) sorting code that works fast for small lists. Offhand I do not know if that would be in the realm of possibility (and do not have time or patience to find out).
--- end edit ---
Sortand why isSort[DeleteDuplicates[list]]not a solution? Could you maybe insert a question mark anywhere in your, well, Question? $\endgroup$Union[]so slow?", and this is more-or-less answered in the answers. I will edit. $\endgroup$Sortshould be slow? Not to mention Julia, Let us compare mma with Python's numpy package, numpy is also written in C. But sorting a 10 million reals takes 6.3 sec with mma, while only 2.3 sec with numpy. AlthoughSortin mma is more general of being able to sort symbols. But this is not the reason thatSortshould be slow, because we already feedSortpacked array, there should not be any performance lag $\endgroup$