I recently stumbled on an algorithm while trying to write a program to estimate a median without sorting, a randomized group of non-repeating numbers with a known number of elements. Performance is reliably on average 1/3 O(N) for positive numbers and positive sums, but falls apart with negative numbers. The closest algorithm I've come across is the binning technique to find the median of medians but I couldn't find anything else like it.
I have tested this on no more than 10,000 numbers as I couldn't find a random number generator that did more than 10,000 non-repeating numbers.
First pick a pivot point eg 10,000 elements = 5,000 as a pivot point.
(Sum of numbers / 2) / ((number of elements) * (multiplier)).
Count the number of elements that are less than this pseudomedian. If the number of elements don't fall under the pivot point, deduct the largest number from the sum and repeat the calculation. Iterate until the median is found.
I used the largest number of digits as a multiplier to speed up the search. eg if the largest number has 6 digits, the multiplier is 6. My reasoning was that it wouldn't blow past the
For negative numbers, I made this modification:
(Sum of numbers) / ((number of elements) * (multiplier)).
Count the number of elements that are less than this median. If the number of elements don't fall under the pivot point, add the smallest number from the sum and repeat the calculation. Iterate until the median is found.
Performance is on average 4 * O(N) for negative numbers and negative sums. Based on the readouts I got, it appears to fall apart the closer it gets to the actual median, with the pseudomedian being repeated over each iteration exactly 3 times.
Just a disclaimer. I am absolutely TERRIBLE at math. I was just messing around to see if I could get a better result than the QuickSelect algorithm. I'm a programming student and curious as to why it works the way it does. It's probably a fluke anyway but just thought I'd try and ask about it.