Find majority element in an array given constant external space

Question

Given constant external space (yes no HashMap), how would you improve the following code (code-wise or complexity-wise with stress on the latter) to find the majority element in an array?

/** * Created by mona on 5/27/16. */ import java.util.Arrays; public class FinaMajority { public static int findMajority(int[] a) { Arrays.sort(a); int maxCount = Integer.MIN_VALUE; int maxIdx = Integer.MIN_VALUE; for (int i=0; i<a.length; i++) { int count = 0; while ( (i+count) < a.length && a[i] == a[i+count]) { count++; if (count == a.length/2) { return a[i]; } if (count > maxCount) { maxCount = count; maxIdx = i; } } } return a[maxIdx]; } public static void main(String[] args) { int[] a = {3,1,2,2,3,2,2,4,4,3,5,2,3,1,3,4,3,3,3}; System.out.println(findMajority(a)); } }

Are numbers in the input array limited to single digit (0-9)? — Psycho Punch
– Psycho Punch, Commented May 28, 2016 at 5:09
So the objective is to not allocate any auxiliary data structures? — coderodde
– coderodde, Commented May 28, 2016 at 6:41
right! you can allocate if auxiliary data structure (space complexity) is constant! — Mona Jalal
– Mona Jalal, Commented May 28, 2016 at 6:51
Strictly, speaking, Arrays.sort uses O(log n) stack space and O(n) heap space (due to being a merge sort), so your solution does not qualify. — meriton
– meriton, Commented May 28, 2016 at 13:54

janos · Accepted Answer · 2016-05-28 08:51:49Z

What is a "majority element" ?

It would be good to define this, as there could be multiple interpretations:

It could be the element that occurs more than any other in the array. This seems the most natural, intuitive interpretation
It could be the element that occurs more than n / 2 times, as in this online challenge. I think they made a mistake, the common name for this is leader element instead of "majority".

Off-by-one error

The program gives incorrect result for the input [3, 2, 3]. The culprit is here:

int count = 0; while ( (i+count) < a.length && a[i] == a[i+count]) { count++; if (count == a.length/2) { return a[i]; }

The sorted array contains [2, 3, 3]. At i=0, since (i+0) < a.length and a[i] == a[i+0], count is incremented to 1 on the next line, and since a.length/2 is 1, the method prematurely returns 2.

The fix is to move count++ to the end of the while loop.

Notice that starting from count = 0 is pointless. It would be better to start from count = 1.

Index out of bounds for singleton array

The program will crash for the input [1]. In this case maxIdx will never change, its value remains at Integer.MIN_VALUE, and so it will crash on the line return a[maxIdx].

Slowness

The program is too slow for large arrays, because after counting the consecutive numbers, it does not skip those, and continues checking from the next value of i. Due to this, the time complexity of the loop degenerates to \$O(n^2)\$.

You can improve by adding to i the number of elements that can be skipped.

int count = 1; while ((i + count) < a.length && a[i] == a[i + count]) { if (count == a.length / 2) { return a[i]; } if (count > maxCount) { maxCount = count; maxIdx = i; } count++; } i += count - 1;

Alternative implementation

If you're really looking for the majority element and not the leader element, then your implementation can be written slightly simpler without a nested loop:

public int findMajority(int[] nums) { Arrays.sort(nums); int maxCount = 0; int maxIndex = 0; int count = 1; for (int i = 1; i < nums.length; i++) { if (nums[i - 1] == nums[i]) { count++; if (count > maxCount) { maxIndex = i; maxCount = count; } } else { count = 1; } } return nums[maxIndex]; }

Finding the leader element

If you're actually looking for the leader element, the element that occurs more than n / 2 times, then an \$O(n)\$ solution is possible:

track a candidate and its count
increment the count when the same candidate is seen
decrement the count when a different value is seen
replace the candidate when the count is 0

Like this:

public int majorityElement(int[] nums) { int candidate = 0; int count = 0; for (int num : nums) { if (count == 0) { count = 1; candidate = num; } else if (candidate == num) { count++; // as a further optimization, you can add a condition here to return early if the required count is reached } else { count--; } } return candidate; }

This works when the leader element is guaranteed to exist. If it's not guaranteed to exist, then you need another \$O(n)\$ pass to verify the count of the candidate is indeed greater than half of the number of all elements.

I'm not sure if your solution works. If you have the input sorted into something like [1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3], won't you end up with 3 as the answer instead of 2? — Psycho Punch
– Psycho Punch, Commented May 28, 2016 at 7:34
It was not included in the problem description, but I presumed the question is about a classic problem that defines the majority element as a value that occurs in more than half of the elements. In your example there is no such element, and this is what I meant that an additional counting step is required to decide if the candidate really occurs in more than half of the elements. My proposed algorithm only works for that case, and doesn't work in the more interpretation of majority. Thanks for pointing this out, I will update my post to mention this assumption. — janos
– janos, Commented May 28, 2016 at 7:48
Yeah, it's not explicitly stated even if the original code in the question kind of hints on this. — Psycho Punch
– Psycho Punch, Commented May 28, 2016 at 7:58
Is it possible to produce a O(n) solution for the Majority Element (not leader element) solution? I was asked to produce a O(n) solution in an interview, as the sorting solution was not a good enough answer. — Eternalcode
– Eternalcode, Commented May 30, 2017 at 18:25

Psycho Punch · Accepted Answer · 2016-05-28 08:36:02Z

Here's my solution written in Groovy (for quick coding), which shouldn't be very different from Java. The actual finding of majority element runs in linear time, so roughly speaking, the code runs in \$f(n) + O(n)\$, where \$f(n)\$ is the time complexity of the sorting algorithm. This solution assumes that it's ok to write into the array, since there's limited space anyway. However it uses 2 more variables than your original solution.

int findMajorityElement(int[] numbers) { Arrays.sort(numbers) int index = 0 int currentNumber = numbers[index] int majorityElement = currentNumber numbers[index] = 1 int maxCount = numbers[index] for (index = index + 1; index < numbers.length; index++) { if (currentNumber == numbers[index]) { numbers[index] = numbers[index - 1] + 1 } else { currentNumber = numbers[index] numbers[index] = 1 } if (numbers[index] > maxCount) { maxCount = numbers[index] majorityElement = currentNumber } } return majorityElement }

What happens here is that, since the array can be sorted, each iteration will increment the count of each value by 1 as long as it is the same value. When it finds a different one, it starts again. While these are happening it checks if a new majority element candidate gets discovered.

There's a flaw in Psycho Punch's code, I think (int maxCount = numbers[index]), but he's essentially got it. — tpdi
– tpdi, Commented May 28, 2016 at 12:49
I think you missed the line: numbers[index] = 1. Whatever the first number is, maxCount is always initialized to 1. — Psycho Punch
– Psycho Punch, Commented May 29, 2016 at 17:38

tpdi · Accepted Answer · 2016-05-28 13:16:19Z

As Pyscho Punch suggests, this is pretty simple if you can first sort. I'd add to his answer, that given the constraint of constant external space, you want to use a sort that obeys that constraint, e.g., Heapsort.

One the array is sorted, you just need to walk it fro start to end, finding the length of each run of of repeated values (they'll be in a contiguous run, because you've sorted the array).

As each run length is found, it's compared to the longest (maximum) run-length found so far, and if the newly found run length is longer, its length replaces that maximum, and its element value replaces the majority element.

In pseudo-code:

 Heapsort(array) maxRun <- 0 majorityElement <- None index <- 0 runStart <- 0 while index < length(array): while index < length(array) and array[index] == array[runStart]: index <- index + 1 if index - runStart > maxRun: maxRun <- index - runStart majorityElement <- array[runStart] runStart <- index return majorityElement

What's the space complexity? It's constant (in the pseudo code above, it's four variables), as long as Heapsort is correctly implemented as constant space.

Wha's the time complexity? Heapsort is O(n log n) in the worst case, walking the array is O(n). So the time complexity is O(n log n + n) = O(n log n). The sort dominates the work done.

Stack Exchange Network

Find majority element in an array given constant external space

3 Answers 3

What is a "majority element" ?

Off-by-one error

Index out of bounds for singleton array

Slowness

Alternative implementation

Finding the leader element

You must log in to answer this question.

Hot Network Questions

Find majority element in an array given constant external space

3 Answers 3

What is a "majority element" ?

Off-by-one error

Index out of bounds for singleton array

Slowness

Alternative implementation

Finding the leader element

You must log in to answer this question.

Related

Hot Network Questions