1

I've got a List of Objects, this list could potentially contain thousands of elements.

I want to obtain 10, 20, 34, 56 (whatever size portion the user chooses) subset of those, this subset must be randomly selected and I can't have duplicates.

Would Collections.shuffle() be sufficient for large Lists of POJOs? Or is there a more efficient/safer way of doing this?

Taking my example here, if myStrings had 50,000 Strings in it, would calling Collections.shuffle() be a good idea if you only wanted 5 items?

public class ShuffleMe { public static void main(String[] args) { int NUM_OF_ELEMENTS_TO_PICK = 3; List<String> myStrings = new ArrayList<String>(); myStrings.add("A"); myStrings.add("B"); myStrings.add("C"); myStrings.add("D"); myStrings.add("E"); myStrings.add("F"); Collections.shuffle(myStrings); for (int i = 0; i < NUM_OF_ELEMENTS_TO_PICK; i++) { System.out.println(myStrings.get(i)); } } } 
3
  • 3
    possible duplicate of best way to pick a random subset from a collection? Commented Aug 5, 2011 at 20:33
  • Can you clarify, is the user picking the size of a single subset, or the number of subsets to generate? If the latter, what size should the subsets be? Commented Aug 5, 2011 at 20:34
  • The reason why I don't want to use Collections.sort() is because I don't want to have to retrieve all items from a database call if I'll only actually use a few of them, does this alter any answers? Commented Aug 5, 2011 at 20:48

3 Answers 3

4

Shuffling the whole list would be a bit of a waste of resources if what you want is significantly smaller. I'd personally just pick n unique random numbers between 0..size and use the objects at those indexes for the randomised subset.

If you're talking about picking a random subset very close to the size of the whole collection, then chances are you're better of just calling Collections.shuffle() and choosing the first n entries. But if we're talking about ~5 / 50,000, definitely use the above approach.

Sign up to request clarification or add additional context in comments.

Comments

4

If the number of items you want is much less than the size of the collection, just draw them at random:

Set<Integer> randSubSet = new HashSet<Integer>(); while(randSubSet.size() < NUM_OF_ELEMENTS_TO_PICK) { randSubSet.add((int)(Math.random()*myStrings.size())); } for (int i : randSubSet) { System.out.println(myStrings.get(i)); } 

Comments

1

Use a Fisher-Yates shuffle, but only run it far enough to pick the number of elements you need.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.