Adavantages of HashSet over ArrayList and vice versa

Question

I have a doubt regarding data structures in Java. While solving a typical hashing problem in Java, I was using the HashSet data structure, which worked fine until there were duplicate objects (object contents). Since HashSet does not support insert of duplicates, my logic was failing.

I replaced the hashset with the typical Arraylist since the methods of hashset such as .add(), .contains(), .remove() are supported in both, and my logic worked perfectly then.

But does this necessarily mean ArrayList is the logical choice over Hashset when duplicates are involved? There should be some time complexity advantages of Hashset over ArrayList right? Can someone please provide me some insight regarding this?

EDIT: What would be the ideal data structure when you want to do hashing when duplicates are involved. I mean when the duplicates should not be ignored and should be inserted.

Adding objects such as Strings into a hash based data structure, and check for their presence later on. If the same string occurs twice, it should be inserted again. — SoulRayder
– SoulRayder, Commented Jul 12, 2015 at 6:13

Community · Accepted Answer · 2020-06-20 09:12:55Z

It's not clear what you mean by a "hashing problem," but maybe you're looking for a multiset. From the Guava docs:

A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.

Elements of a multiset that are equal to one another are referred to as occurrences of the same single element. The total number of occurrences of an element in a multiset is called the count of that element (the terms "frequency" and "multiplicity" are equivalent, but not used in this API).

No such thing exists in the JDK.

Constantin · Accepted Answer · 2015-07-12 06:02:56Z

2

When you use a HashMap it replaces the original value with the new duplicate.
When you use a HashSet, subsequent duplicates are ignored (not inserted).
When you use an ArrayList, it simply adds the duplicate to the end of the list

It all depended on what you need given your requirements.

answered Jul 12, 2015 at 6:02

Constantin

1,51610 silver badges16 bronze badges

4 Comments

SoulRayder Over a year ago

But Hashset has some time complexity advantages over arraylist when it comes to hashing requirements right? Otherwise why would you need hashset when you can achieve the same using arraylist?

Constantin Over a year ago

For simplicity...If you need to ensure that duplicates are not allowed, why would you ever use an ArrayList?

SoulRayder Over a year ago

See my edit. I WANT the duplicates to be inserted, should not be ignored.

Constantin Over a year ago

By hashing, I am guessing you want to be able to identify the object by its hash value? Regardless, I suspect what you want is ArrayList

Vince · Accepted Answer · 2015-07-12 06:05:49Z

2

ArrayList is not the logical choice if you don't want duplicates. Different tools for different use cases.

You would use a Set in areas where duplicates wouldn't make sense, for example, a set of students. A List allows duplicates.

edited Jul 12, 2015 at 6:05

answered Jul 12, 2015 at 6:05

Vince

15.2k8 gold badges44 silver badges92 bronze badges

4 Comments

SoulRayder Over a year ago

So what would be the ideal data structure when you want to do hashing when duplicates are involved?

Vince Over a year ago

@SoulRayder Why would you want such a thing? Lists do not use buckets to store data, it wouldn't make sense. What are you trying to do? If you're worried about access time of an ArrayList, get(int) is constant time, as mentioned in the docs

SoulRayder Over a year ago

But .contains() is linear time in ArrayList right? Or that constant too in both cases?

Vince Over a year ago

@SoulRayder Accessing an item from a Set is linear time, seeing how you have to iterate over it to access the values within it. If you showed what you were trying to do, I could edit my answer to better fit

Mshnik · Accepted Answer · 2015-07-12 06:23:44Z

If you specifically need a HashSet that handles duplicates, a HashMap will be able to do the job. If you just need a count of the number of objects added (with quick lookup/etc), a HashMap<T,Integer> will be ideal, where T is the type of your object. If you actually need to keep references to the duplicate objects you've added, go with HashMap<T, List<T>>. That way you can look up by using HashMap's .containsKey(T t), and iterate through all of the similarly hashing objects in the resulting list. So for example, you could create this class:

public class HashSetWithDuplicates<T> { private HashMap<T, List<T>> entries; private int size; public HashSetWithDuplicates(){ entries = new HashMap<>(); size = 0; } public HashSetWithDuplicates(Collection<? extends T> col){ this(); for(T t : col){ add(t); } } public boolean contains(T t){ return entries.containsKey(t); } public List<T> get(T t){ return entries.get(t); } public void add(T t){ if (!contains(t)) entries.put(t, new ArrayList<>()); entries.get(t).add(t); size++; } public void remove(T t){ if (!contains(t)) return; entries.get(t).remove(t); if(entries.get(t).isEmpty()) entries.remove(t); size--; } public int size(){ return size; } public boolean isEmpty(){ return size() == 0; } }

Add functionality to your needs.

Don't understand why somebody downvoted this answer without comment. It seems to be exactly what SoulRayder needs. Duplicate objects almost never make any sense except for threads (even there I had doubts) or something.

Collectives™ on Stack Overflow

Adavantages of HashSet over ArrayList and vice versa

4 Answers 4

Comments

4 Comments

4 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

4 Comments

4 Comments

1 Comment

Related