4

IntStream may be a the easiest way but I can only pick up smallest M numbers as below:

public class Test { private static final int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6}; public static void main(String[] args) throws Exception { System.out.println(Arrays.asList(IntStream.of(arr).sorted().limit(5).boxed().toArray())); } } 

btw, considering algorithm complexity and assuming N >> M, a "sorted + limit" approach just have a complexity of O(N log(N)).

I think the best complexity may reach to O(N log(M)) but I do not know whether Java 8 has this kind of stream methods or collectors.

5
  • 4
    IntStream.of(arr).sorted().skip(N-M).boxed() should do it. Commented Jun 11, 2015 at 4:03
  • Related (from yesterday): stackoverflow.com/questions/30740471/… Commented Jun 11, 2015 at 4:18
  • Find another related: stackoverflow.com/questions/19227698/… But seem no existing method in Java. Commented Jun 11, 2015 at 4:34
  • The quickselect algorithm, or related selection algorithms, do partial sorting in O(N) time. en.wikipedia.org/wiki/… Commented Jun 11, 2015 at 4:52
  • I think more important than the algorithmic time complexity is the memory complexity. with N >> M you would only need M instead if N intermediate storage. Commented Jun 11, 2015 at 5:25

5 Answers 5

5

If you must use Streams:

IntStream.of(arr).sorted().skip(N-M) 

Otherwise use a PriorityQueue and write yourself an inverting Comparator. Insertion will be O(N(log(N)) and removal of M elements will be O(M(log(N)). Not what you asked for, but maybe close enough.

Sign up to request clarification or add additional context in comments.

Comments

3

EJP has it right, I tested it - yields 8 and 9 when given an input of 2.

import java.util.stream.IntStream; public class Test { private static final int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6}; public static void main(String[] args) throws Exception { int n = Integer.parseInt(args[0]); System.out.println("Finding "+n+" largest numbers in arr"); IntStream.of(arr).sorted().skip(arr.length-n).boxed().forEach(big -> System.out.println(big)); } } 

2 Comments

Thanks. But how to reduce algorithm complexity?
I'm not sure how you can reduce the complexity and still be using java-8, I think one of the big selling features of lambda expressions is that the underlying implementation isn't specified leaving room for compiler optimizations. At least that is my understanding.
2

If you are already using google guava in your project, you can take advantage of MinMaxPriorityQueue:

Collection<..> min5 = stream.collect( toCollection(MinMaxPriorityQueue.maximumSize(5)::create) ); 

Comments

2

It's possible to create a custom collector using the JDK PriorityQueue to solve your task:

public static <T> Collector<T, ?, List<T>> maxN(Comparator<? super T> comparator, int limit) { BiConsumer<PriorityQueue<T>, T> accumulator = (queue, t) -> { queue.add(t); if (queue.size() > limit) queue.poll(); }; return Collector.of(() -> new PriorityQueue<>(limit + 1, comparator), accumulator, (q1, q2) -> { for (T t : q2) { accumulator.accept(q1, t); } return q1; }, queue -> new ArrayList<>(queue)); } 

Usage:

int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6}; System.out.println(IntStream.of(arr).boxed().collect(maxN(Comparator.naturalOrder(), 2))); // [8, 9] System.out.println(IntStream.of(arr).boxed().collect(maxN(Comparator.reverseOrder(), 3))); // [3, 1, 2] 

It might be faster for big data sets and small limits as it does not sort. If you want a sorted result, you can add the sorting step to the finisher.

2 Comments

Great code. But combiner seems useless (i.e. (q1, q2) -> null is ok) because PriorityQueue is not thread safe.
@auntyellow: it's not necessary for Collector to be a thread-safe when using parallel streams (unless the Collector has CONCURRENT characteristic). Existing collectors work fine in parallel with HashMap, ArrayList, etc.
2

You can achieve your complexity goal by creating a histogram of the values:

public static IntStream maxValues(IntStream source, int limit) { TreeMap<Integer,Integer> m=new TreeMap<>(); source.forEachOrdered(new IntConsumer() { int size, min=Integer.MIN_VALUE; public void accept(int value) { if(value<min) return; m.merge(value, 1, Integer::sum); if(size<limit) size++; else m.compute(min=m.firstKey(), (k,count)->count==1? null: count-1); } }); if(m.size()==limit)// no duplicates return m.keySet().stream().mapToInt(Integer::valueOf); return m.entrySet().stream().flatMapToInt(e->{ int value = e.getKey(), count = e.getValue(); return count==1? IntStream.of(value): IntStream.range(0, count).map(i->value); }); } 

It creates a map from int values to their corresponding number of occurrences but limits its contents to the desired number of values, hence, it’s operation has a O(log(M)) complexity (worst case, if no duplicates) and, since the operation is performed once for each value, it’s overall complexity is O(N×log(M)) as you wished.

You may test it with your original array as

int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6}; maxValues(Arrays.stream(arr), 3).forEach(System.out::println); 

but to test some corner cases, you may use an array containing duplicates like

int[] arr = {8, 5, 3, 4, 2, 2, 9, 1, 7, 9, 8, 6}; // note that the stream of three max elements contains one of the two eights 

If you strive for maximum performance, replacing the boxing treemap with an adequate data structure using primitive data types may be feasible but that would be a minor performance optimization as this solution already solved the complexity problem.

By the way, this solution works for arbitrary streams, i.e. doesn’t need to know the value of N.

2 Comments

compare to PriorityQueue, TreeMap seems more complicated
A PriorityQueue might be a working kludge as long as you have no duplicates but when you have duplicates, it requires storage space for each of them instead of simply counting the number of occurrences as the map based approach does. Anyway, the accepted answer’s solution is even simpler and it’s not clear whether the difference in complexity (what you originally asked for) really make a difference in performance.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.