32

I am trying to write a C program that proves SHA1 is nearly collision free, but I cannot figure out how to actually create the hash for my input values. I just need to create the hash, and store the hex value into an array. After some Google searches, I've found OpenSSL documentation directing me to use this:

 #include <openssl/sha.h> unsigned char *SHA1(const unsigned char *d, unsigned long n, unsigned char *md); int SHA1_Init(SHA_CTX *c); int SHA1_Update(SHA_CTX *c, const void *data, unsigned long len); int SHA1_Final(unsigned char *md, SHA_CTX *c); 

I believe I should be using either unsigned char *SHA1 or SHA1_Init, but I am not sure what the arguments would be, given x is my input to be hashed. Would someone please clear this up for me? Thanks.

7
  • what are your input values: in-memory strings, or file contents? Commented Feb 14, 2012 at 21:31
  • I am writing a birthday attack that should create a new hash and adds it to the end every time I clear through the array. I was just going to keep it simple and hash the value of i. Quick answer, in memory strings. Commented Feb 14, 2012 at 21:34
  • 2
    What do you mean with 'proving that SHA1 is nearly collision-free'? SHA1 is a 160-bit hash, so there are 2^160 possible values, but there are far more than 2^160 possible strings (say shorter than 1MB), so there are tons of collisions. If you just want to test whether you get collisions from a number of randomly generated strings, the number of strings needed for a halfway reliable answer is unfeasibly high (unless you happen to find a collision early, but SHA1 is tested well enough to assign that a negligibly small probability). Commented Feb 14, 2012 at 21:41
  • I realize there are plenty of possible collisions, but the goal is to prove it would take a significant amount of time to find a collision (about 2^80) and take even more time to find a collision that matches a specific hash. Commented Feb 14, 2012 at 21:44
  • 2
    But realistically you cannot test more than 2^34 strings or so. Even if SHA1 were skewed in a way that you'd only need 2^50 strings for a collision, you almost certainly won't see it. Commented Feb 14, 2012 at 22:10

7 Answers 7

56

If you have all of your data at once, just use the SHA1 function:

// The data to be hashed char data[] = "Hello, world!"; size_t length = strlen(data); unsigned char hash[SHA_DIGEST_LENGTH]; SHA1(data, length, hash); // hash now contains the 20-byte SHA-1 hash 

If, on the other hand, you only get your data one piece at a time and you want to compute the hash as you receive that data, then use the other functions:

// Error checking omitted for expository purposes // Object to hold the current state of the hash SHA_CTX ctx; SHA1_Init(&ctx); // Hash each piece of data as it comes in: SHA1_Update(&ctx, "Hello, ", 7); ... SHA1_Update(&ctx, "world!", 6); // etc. ... // When you're done with the data, finalize it: unsigned char hash[SHA_DIGEST_LENGTH]; SHA1_Final(hash, &ctx); 
Sign up to request clarification or add additional context in comments.

6 Comments

I tried using the sha1 function, but when I compile in the terminal it says Undefined reference to SHA1. I don't get any complaints about anything else. Any idea what I'm missing?
You need to link with the OpenSSL runtime library. Assuming you're using gcc, add -lcrypto to your linker command line.
how would one generate hmacsha1?
No error handling needed? For me SHA1_Final crashes but have no clue why. Is there any way to print the error?
How to get the hash value? I'm getting �*���qE)DDF:� �4� with printf("<<sha1=%s>>\n", hash);
|
12

They're two different ways to achieve the same thing.

Specifically, you either use SHA_Init, then SHA_Update as many times as necessary to pass your data through and then SHA_Final to get the digest, or you SHA1.

The reason for two modes is that when hashing large files it is common to read the file in chunks, as the alternative would use a lot of memory. Hence, keeping track of the SHA_CTX - the SHA context - as you go allows you to get around this. The algorithm internally also fits this model - that is, data is passed in block at a time.

The SHA method should be fairly straightforward. The other works like this:

unsigned char md[SHA_DIGEST_LENGTH]; SHA_CTX context; int SHA1_Init(&context); for ( i = 0; i < numblocks; i++ ) { int SHA1_Update(&context, pointer_to_data, data_length); } int SHA1_Final(md, &context); 

Crucially, at the end md will contain the binary digest, not a hexadecimal representation - it's not a string and shouldn't be used as one.

3 Comments

how would one generate hmacsha1?
@Clustermagnet hmacsha1 is a HMAC algorithm, using SHA1 as the hash. It's the same idea as in my answer here(see here) but for the EVP_MD argument specific to HMAC you specify EVP_sha1().
@Cmag - see EVP Signing and Verifying | HMAC on the OpenSSL wiki. Also see Using HMAC vs EVP functions in OpenSSL on Stack Overflow.
4

I believe I should be using either unsigned char *SHA1 or SHA1_Init ...

For later versions of the OpenSSL library, like 1.0.2 and 1.1.0, the project recommends using the EVP interface. An example of using EVP Message Digests with SHA256 is available on the OpenSSL wiki:

#define handleErrors abort EVP_MD_CTX *ctx; if((ctx = EVP_MD_CTX_create()) == NULL) handleErrors(); if(1 != EVP_DigestInit_ex(ctx, EVP_sha256(), NULL)) handleErrors(); unsigned char message[] = "abcd .... wxyz"; unsinged int message_len = sizeof(message); if(1 != EVP_DigestUpdate(ctx, message, message_len)) handleErrors(); unsigned char digest[EVP_MAX_MD_SIZE]; unsigned int digest_len = sizeof(digest); if(1 != EVP_DigestFinal_ex(ctx, digest, &digest_len)) handleErrors(); EVP_MD_CTX_destroy(ctx); 

Comments

4

Adam Rosenfield's answer is fine, but use strlen rather than sizeof, otherwise hash will be calculated including null terminator. Which is probably fine in this case, but not if you need to compare your hash with one generated by other tool.

// The data to be hashed char data[] = "Hello, world!"; size_t length = strlen(data); unsigned char hash[SHA_DIGEST_LENGTH]; SHA1(data, length, hash); // hash now contains the 20-byte SHA-1 hash 

Comments

3

The first function (SHA1()) is the higher-level one, it's probably the one you want. The doc is pretty clear on the usage - d is input, n is its size and md is where the result is placed (you alloc it).

As for the other 3 functions - these are lower level and I'm pretty sure they are internally used by the first one. They are better suited for larger inputs that need to be processed in a block-by-block manner.

Comments

2

Calculate hash like this

// Object to hold the current state of the hash SHA_CTX ctx; SHA1_Init(&ctx); // Hash each piece of data as it comes in: SHA1_Update(&ctx, "Hello, ", 7); ... SHA1_Update(&ctx, "world!", 6); // etc. ... // When you're done with the data, finalize it: unsigned char tmphash[SHA_DIGEST_LENGTH]; SHA1_Final(tmphash, &ctx); 

Finally you can decode hash to human-readable form by code like this.

unsigned char hash[SHA_DIGEST_LENGTH*2]; int i = 0; for (i=0; i < SHA_DIGEST_LENGTH; i++) { sprintf((char*)&(hash[i*2]), "%02x", tmphash[i]); } // And print to stdout printf("Hash: %s\n", hash); 

Comments

0

Let the code speak

SQLite dev tree contains a source for the tool DBHASH for applying SHA1 to whole databases. Complete with SHA1 implementation.

You might find it feasible to study that code.

ps: There is also SHA1 implemented as SQLite user-defined function.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.