1

I have a doubt regarding data structures in Java. While solving a typical hashing problem in Java, I was using the HashSet data structure, which worked fine until there were duplicate objects (object contents). Since HashSet does not support insert of duplicates, my logic was failing.

I replaced the hashset with the typical Arraylist since the methods of hashset such as .add(), .contains(), .remove() are supported in both, and my logic worked perfectly then.

But does this necessarily mean ArrayList is the logical choice over Hashset when duplicates are involved? There should be some time complexity advantages of Hashset over ArrayList right? Can someone please provide me some insight regarding this?

EDIT: What would be the ideal data structure when you want to do hashing when duplicates are involved. I mean when the duplicates should not be ignored and should be inserted.

2
  • What do you mean by "do hashing"? Commented Jul 12, 2015 at 6:11
  • Adding objects such as Strings into a hash based data structure, and check for their presence later on. If the same string occurs twice, it should be inserted again. Commented Jul 12, 2015 at 6:13

4 Answers 4

4

It's not clear what you mean by a "hashing problem," but maybe you're looking for a multiset. From the Guava docs:

A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.

Elements of a multiset that are equal to one another are referred to as occurrences of the same single element. The total number of occurrences of an element in a multiset is called the count of that element (the terms "frequency" and "multiplicity" are equivalent, but not used in this API).

No such thing exists in the JDK.

Sign up to request clarification or add additional context in comments.

Comments

2
  • When you use a HashMap it replaces the original value with the new duplicate.
  • When you use a HashSet, subsequent duplicates are ignored (not inserted).
  • When you use an ArrayList, it simply adds the duplicate to the end of the list

It all depended on what you need given your requirements.

4 Comments

But Hashset has some time complexity advantages over arraylist when it comes to hashing requirements right? Otherwise why would you need hashset when you can achieve the same using arraylist?
For simplicity...If you need to ensure that duplicates are not allowed, why would you ever use an ArrayList?
See my edit. I WANT the duplicates to be inserted, should not be ignored.
By hashing, I am guessing you want to be able to identify the object by its hash value? Regardless, I suspect what you want is ArrayList
2

ArrayList is not the logical choice if you don't want duplicates. Different tools for different use cases.

You would use a Set in areas where duplicates wouldn't make sense, for example, a set of students. A List allows duplicates.

4 Comments

So what would be the ideal data structure when you want to do hashing when duplicates are involved?
@SoulRayder Why would you want such a thing? Lists do not use buckets to store data, it wouldn't make sense. What are you trying to do? If you're worried about access time of an ArrayList, get(int) is constant time, as mentioned in the docs
But .contains() is linear time in ArrayList right? Or that constant too in both cases?
@SoulRayder Accessing an item from a Set is linear time, seeing how you have to iterate over it to access the values within it. If you showed what you were trying to do, I could edit my answer to better fit
0

If you specifically need a HashSet that handles duplicates, a HashMap will be able to do the job. If you just need a count of the number of objects added (with quick lookup/etc), a HashMap<T,Integer> will be ideal, where T is the type of your object. If you actually need to keep references to the duplicate objects you've added, go with HashMap<T, List<T>>. That way you can look up by using HashMap's .containsKey(T t), and iterate through all of the similarly hashing objects in the resulting list. So for example, you could create this class:

public class HashSetWithDuplicates<T> { private HashMap<T, List<T>> entries; private int size; public HashSetWithDuplicates(){ entries = new HashMap<>(); size = 0; } public HashSetWithDuplicates(Collection<? extends T> col){ this(); for(T t : col){ add(t); } } public boolean contains(T t){ return entries.containsKey(t); } public List<T> get(T t){ return entries.get(t); } public void add(T t){ if (!contains(t)) entries.put(t, new ArrayList<>()); entries.get(t).add(t); size++; } public void remove(T t){ if (!contains(t)) return; entries.get(t).remove(t); if(entries.get(t).isEmpty()) entries.remove(t); size--; } public int size(){ return size; } public boolean isEmpty(){ return size() == 0; } } 

Add functionality to your needs.

1 Comment

Don't understand why somebody downvoted this answer without comment. It seems to be exactly what SoulRayder needs. Duplicate objects almost never make any sense except for threads (even there I had doubts) or something.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.