0

I have to to compare two large arrays of Integers(approximatly 150,000 values) to see which values are unique and which are not. I want the output to be stored in three data structures, UniqueTo_A, UniqueTo_B & SharedElements. I would ordinary use something like an ArrayList for this sort of operation, as Values can be added ad-hoc, however I am aware the overheads for add() and remove() are rather large for ArrayLists. So my question is:- In Java, what is the most memory efficient way to store large lists, that items can be added dynamically, performance is the key. All help or comments will be much appreciated.

Edit: Thanks for you're input's people. TheLostMind, I will need to add to datasets but the Hashset will facilitate this, so I'm going to go ahead and use a Hashset. Nafas + NeplatnyUdaj thanks for examples. Eckles, I should get to grips with collections I'll study this to use another time. Implementation to follow....

6
  • will you be adding only at the end?.. Commented Feb 27, 2015 at 11:54
  • 3
    Use a HashSet. When you add an item with Set.add method, the return value will tell you if it was there already or not. I'm not sure if it's most memory efficient, but 150000 numbers is nothing and it will be pretty fast. docs.oracle.com/javase/7/docs/api/java/util/Set.html#add%28E%29 Commented Feb 27, 2015 at 11:58
  • 5
    The standard Java HashSet uses a underlying map and is therefore not the best option if you really care about memory consumption. I would look for add-on libraries which support collections for simple data types instead. And if you need to safe more memory you might want to use some more compression methods. Commented Feb 27, 2015 at 12:16
  • Are you more concerned with memory consumption or speed of the operation? Commented Feb 27, 2015 at 12:20
  • @eckes I agree with you. but Set would remove so much of the complexities, as a result less of Carbage Collection and other internal computaion and memory usage. in practice using Set like implementation is going to use less memeory overall Commented Feb 27, 2015 at 12:24

2 Answers 2

3

use Set, because, it adds in constant time, it removes multiple values in constant time . I use set in daily basis with over millions of stored Objects. and removeAll still in miliseconds

Set<Integer> setA= new HashSet<Integer>(); Set<Integer> setB= new HashSet<Integer>(); //add stuff to setA and setB by add() method Set<Integer> uniqueToA=new HashSet<Integer>(setA); Set<Integer> uniqueToB=new HashSet<Integer>(setB); Set<Integer> shared=new HashSet<Integer>(); shared.addAll(setA); shared.addAll(setB); uniqueToA.removeAll(setB); uniqueToB.removeAll(setA); shared.removeAll(uniqueToA); shared.removeAll(uniqueToB); System.out.println(uniqueToA); //unique to A System.out.println(uniqueToB); //unique To B System.out.println(shared); //shared 
Sign up to request clarification or add additional context in comments.

1 Comment

But you loose ordering when you use a set. Something that a list provides.
1

I don't think lists are a very good way to do that. Do you need to preserve the order of elements? Can a single list contain duplicate entries? If not, then I would go with HashSets like this:

 //initialization Random r = new Random(); Set<Integer> aSet = new HashSet<Integer>(); Set<Integer> bSet = new HashSet<Integer>(); for (int i = 0; i< 150000; i++){ aSet.add(r.nextInt()); bSet.add(r.nextInt()); } //Computation Set<Integer> aUnique = new HashSet<Integer>(); Set<Integer> bUnique = new HashSet<Integer>(bSet); //we will remove duplicate entries later Set<Integer> shared = new HashSet<Integer>(); for (Integer aval: aSet){ if (bSet.contains(aval)){ shared.add(aval); }else{ aUnique.add(aval); } } bUnique.removeAll(shared); 

In the end, you get three sets as requested(aUnique, bUnique and shared)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.