0

I have 2 files with different names but the same content. When I create streams of these files and use the FileInputStream.HashCode(); method to find the hash value I receive different values

Can somebody provide me with the correct API if possible for the hash method in Java that receives the same hash values for files with the same content.

5
  • I don't think such a function exists. For this to happen, you'd have to read all of the content of both files which is obviously an expensive operation. Commented Apr 7, 2013 at 20:54
  • Are you sure that FileInputStream.HashCode()'s implementation is guaranteed to be equal for two input streams over the same content? Keep in mind that this would require the input stream to read all the way to the end - which might be impractical. Commented Apr 7, 2013 at 20:55
  • Unless the file name is somehow part of the hash algorithm, why would it matter whether you read to the end? Commented Apr 7, 2013 at 20:58
  • It is my task to create a hash function that returns the same value for files with different names and the same content. Commented Apr 7, 2013 at 21:04
  • Edit your post to explain: 'I have to write a function hash( Stream...'. It will be more clear! Commented Apr 7, 2013 at 21:09

3 Answers 3

1

It sounds like a Cryptographic Hash Function will meet your needs.

The Apache Commons Codec library has a utility class for creating cryptographic hash values (a.k.a., message digests) called DigestUtils. For example, the sha256 method takes an InputStream and returns a SHA-256 message digest as a byte array.

Sign up to request clarification or add additional context in comments.

Comments

0

FileInputStream has no method named hashCode(). It use the general Object.hasCode() method.

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.

The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

1 Comment

Yes you are right, but I think that we understand each other. Thanks!
0

FileInputStream.HashCode() inherits the hashCode from Object which is the internal address. It does not take content into account.
Also if you want to compare 2 files for equality why would you want to use a hashfunction?
2 strings that are different can have the same hash code due to colision. Same with the contents of files.

You can use FileUtils.contentEquals(file1, file2); if you can use a third party library (Commons IO)

6 Comments

It is my task to use hash function and the problem is, that all algorithms I've checked return different values for equal streams.
Did you try to load the contents in-memory e.g. in a String and compare the hashCode of the String?
I don't like this hashCode function. Hash made by this function is sometimes different for the same file if you compile a program few times.
@Noran:If 2 String have the same contents they must have the same hashCode.Not sure what you mean
I mean that I was testing hashCode on files. When I took file.jpg returned value of hashCode was different sometimes, the hashCode was not always the same. After 1st compilation hash was (lets say) 1234 and after 5th compilation hash was 5123 for the same file with same name, content and all other. By this I also mean that i dont like this hashCode() function, because I can't be 100% sure that if I take file2.jpg hash value wont be the same.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.