Skip to main content
added 113 characters in body
Source Link
Yojimbo
  • 209
  • 1
  • 7

I'm having a big amount of large lists of objects. Each object has a unique id. It looks something like this:

List a = {obj1, obj2, obj3} List b = {obj3, obj4, obj5} List c = {obj1, obj2, obj3} // up to 100 million of them 

Now I'd like to remove "List c" since it has the same content as "List a" in order to save memory.

For this purpose I'm simply adding them all to a hashmap and check if the key already exists. The objects are actually references in a large network graph. If only one is wrong the whole application crashs. Because it is very important that there will never be the same key for different objects I don't use the default

List.hashCode() 

function but do this instead:

StringBuilder sb = new StringBuilder(); for ( List list : myList ) sb.append(list.getId()); return Hashing.sha256().hashString(sb.toString(), Charsets.US_ASCII).toString(); 

This works perfectly fine. Just it is very slow. Is there any way to achieve the same result in less time?

I'm having a big amount of large lists of objects. Each object has a unique id. It looks something like this:

List a = {obj1, obj2, obj3} List b = {obj3, obj4, obj5} List c = {obj1, obj2, obj3} // up to 100 million of them 

Now I'd like to remove "List c" since it has the same content as "List a" in order to save memory.

For this purpose I'm simply adding them all to a hashmap and check if the key already exists. Because it is very important that there will never be the same key for different objects I don't use the default

List.hashCode() 

function but do this instead:

StringBuilder sb = new StringBuilder(); for ( List list : myList ) sb.append(list.getId()); return Hashing.sha256().hashString(sb.toString(), Charsets.US_ASCII).toString(); 

This works perfectly fine. Just it is very slow. Is there any way to achieve the same result in less time?

I'm having a big amount of large lists of objects. Each object has a unique id. It looks something like this:

List a = {obj1, obj2, obj3} List b = {obj3, obj4, obj5} List c = {obj1, obj2, obj3} // up to 100 million of them 

Now I'd like to remove "List c" since it has the same content as "List a" in order to save memory.

For this purpose I'm simply adding them all to a hashmap and check if the key already exists. The objects are actually references in a large network graph. If only one is wrong the whole application crashs. Because it is very important that there will never be the same key for different objects I don't use the default

List.hashCode() 

function but do this instead:

StringBuilder sb = new StringBuilder(); for ( List list : myList ) sb.append(list.getId()); return Hashing.sha256().hashString(sb.toString(), Charsets.US_ASCII).toString(); 

This works perfectly fine. Just it is very slow. Is there any way to achieve the same result in less time?

Source Link
Yojimbo
  • 209
  • 1
  • 7

Optimize Hashing Java

I'm having a big amount of large lists of objects. Each object has a unique id. It looks something like this:

List a = {obj1, obj2, obj3} List b = {obj3, obj4, obj5} List c = {obj1, obj2, obj3} // up to 100 million of them 

Now I'd like to remove "List c" since it has the same content as "List a" in order to save memory.

For this purpose I'm simply adding them all to a hashmap and check if the key already exists. Because it is very important that there will never be the same key for different objects I don't use the default

List.hashCode() 

function but do this instead:

StringBuilder sb = new StringBuilder(); for ( List list : myList ) sb.append(list.getId()); return Hashing.sha256().hashString(sb.toString(), Charsets.US_ASCII).toString(); 

This works perfectly fine. Just it is very slow. Is there any way to achieve the same result in less time?