1
$\begingroup$

I recently stumbled on an algorithm while trying to write a program to estimate a median without sorting, a randomized group of non-repeating numbers with a known number of elements. Performance is reliably on average 1/3 O(N) for positive numbers and positive sums, but falls apart with negative numbers. The closest algorithm I've come across is the binning technique to find the median of medians but I couldn't find anything else like it.

I have tested this on no more than 10,000 numbers as I couldn't find a random number generator that did more than 10,000 non-repeating numbers.

First pick a pivot point eg 10,000 elements = 5,000 as a pivot point.

(Sum of numbers / 2) / ((number of elements) * (multiplier)).

Count the number of elements that are less than this pseudomedian. If the number of elements don't fall under the pivot point, deduct the largest number from the sum and repeat the calculation. Iterate until the median is found.

I used the largest number of digits as a multiplier to speed up the search. eg if the largest number has 6 digits, the multiplier is 6. My reasoning was that it wouldn't blow past the

For negative numbers, I made this modification:

(Sum of numbers) / ((number of elements) * (multiplier)).

Count the number of elements that are less than this median. If the number of elements don't fall under the pivot point, add the smallest number from the sum and repeat the calculation. Iterate until the median is found.

Performance is on average 4 * O(N) for negative numbers and negative sums. Based on the readouts I got, it appears to fall apart the closer it gets to the actual median, with the pseudomedian being repeated over each iteration exactly 3 times.

Just a disclaimer. I am absolutely TERRIBLE at math. I was just messing around to see if I could get a better result than the QuickSelect algorithm. I'm a programming student and curious as to why it works the way it does. It's probably a fluke anyway but just thought I'd try and ask about it.

$\endgroup$

1 Answer 1

2
$\begingroup$

Let me start by applauding you for trying new stuff. It's unlikely you'll discover a brilliant new algorithm just by messing around, but you'll never discover anything until you start trying things, and learning what are fruitful approaches and what are not. (I'll also make a pitch here for "If you're terrible at math, it's time to fix that. It's like trying to become a chef if you have no sense of smell: math is simply one of the tools of the trade for CS.)

How do you find "the largest number" --- the one you need to delete --- quickly? It seems as if in each "adjustment" to your pseudomedian, you need to make a pass through the numbers you've got and delete one -- that's about N operations. And you have to do this a bunch of times.

Also: $\frac13 O(N)$ is still $O(N)$.

If you're using a random number generator to test this, then it may not be nearly as good an algorithm as it looks. The big challenge in these things is to handle the oddball cases rather than the generic one.


Your explicit questions:

  • Is there a name for this?

Not that I know of, because it's probably not as fast as you think it is.

  • Why does it work?

It doesn't seem to work. I coded up your description in matlab, and for 1000 random numbers uniformly distributed between 0 and 1000, the "pivot" usually turned out to be about 80. Because this was less than halfway, I terminated. (Your instructions said to eliminate the biggest number if the pivot was too high, but mine never was.) For those 1000 random numbers, you expect the mean and median each to be about 500. Your pivot estimate gives something like

500,000

for the sum, divided by 2 to get 250,000, divided by 1000 to get 250, and then divided by the largest number of digits, which is 3...so your estimated pivot is about 83 for an average run, when it should be more like 500.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.