0

I was following a page on Wikipedia about the implementation of MD5. The pseudocode is directly embedded in the link.

However, while I was translating the pseudocode into python, it gives a pretty weird output that has nothing similar to the one shown in the demo.

md5.py

import math def leftrotate(x, c): return (x << c) or (x >> (32 - c)) # All variables are unsigned 32 bit and wrap modulo 2^32 when calculating s = [None] * 64 K = [None] * 64 # s specifies the per-round shift amounts s[0:16] = [7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22] s[16:32] = [5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20] s[32:48] = [4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23] s[48:64] = [6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21] # Use binary integer part of the sines of integers (Radians) as constants: K = [math.floor(math.pow(2, 32) * math.fabs(math.sin(i + 1))) for i in range(64)] # Initialize variables a0 = 0x67452301 b0 = 0xefcdab89 c0 = 0x98badcfe d0 = 0x10325476 msg0 = "The quick brown fox jumps over the lazy dog" # Converting String to binary msg = ''.join(format(ord(i), 'b') for i in msg0) # Pre-processing: adding a single bit # The input bytes are considered as bits strings, # where the first bit is the most significant bit of the byte message = list(msg) message.append("1") # Pre-processing: padding with zeros for i in range(447 - len(msg)): message.append("0") for i in range(64): message.append("64") msg = "".join(message) M = [] for i in range(0, 512, 32): M.append("".join(message[i:i + 32])) # Initialize hash value for this chunk A = a0 B = b0 C = c0 D = d0 for i in range(64): F = 0 g = 0 if 0 <= i <= 15: F = D ^ (B and (C ^ D)) g = i elif 16 <= i <= 31: F = C ^ (D and (B ^ C)) g = (5 * i + 1) % 16 elif 32 <= i <= 47: F = B ^ C ^ D g = (3 * i + 5) % 16 elif 48 <= i <= 63: F = C ^ (B or (not D)) g = (7 * i) % 16 # Be wary of the below definitions of a, b, c, d F = F + A + K[i] + int(M[g]) A = D D = C C = B B = B + leftrotate(F, s[i]) a0 += A b0 += B c0 += C d0 += D print(a0 + b0 + c0 + d0) 

output

(input = msg0 = "The quick brown fox jumps over the lazy dog") 64753135551430969455044517516606091785810184138594829859667366632969211689605539951611300227364936455768076694404884226395355419 03996976374438250472707827439981815374596773986313857734975864212023704601385676211761628136173288119161166663698645808505752643 4 

In detail, may somebody please explain this line to me:

break chunk into sixteen 32-bit words M[j], 0 ≤ j ≤ 15

I was pretty confused at this point, having believed that the message should be inputted as a string but converting it to binary doesn't work. I thought that, on the page, the phrase bit means binary or am I wrong?

Any help is appreciated.

1
  • 1
    What do you think "binary" means? Because a "bit" is a single digit in binary data ("bit" is short for "Binary digIT"), but they're not synonyms. Commented Feb 28, 2020 at 3:17

1 Answer 1

4

First obvious error:

K = [math.floor(math.pow(2, 32) * math.fabs(math.sin(i + 1))) for i in range(63)] 

The pseudo-code is using inclusive bounds notation; range is exclusive on the upper bound, so you just made a K that is 63 elements long, when it's supposed to be 64 elements long.

Same issue with your final loop:

for i in range(63): 

Probably ought to check your other bounds to make sure you haven't made similar errors elsewhere.

This is also dead wrong:

msg = ''.join(format(ord(i), 'b') for i in msg0) 

because it doesn't pad the input bytes the way it should, so your bits are all a jumble. There's a lot of ways to do this (you could probably make it work with a format string of '08b' assuming all your characters have ordinal values below 256, though it's a rather inefficient solution).

Also wrong:

for i in range(64): message.append("64") 

You're supposed to append the original length in bits mod 2**64, not the number 64 repeated 64 times (I have no idea where you came up with that).

For your final question, to explain "break chunk into sixteen 32-bit words M[j], 0 ≤ j ≤ 15", "32 bit words" means "integers comprised of 32 binary bits". For a simple example, breaking 0b100100001111 into three four bit words would produce 0b1001, 0b0000, 0b1111.

Unfortunately for you, in Python, ints are infinite precision; they don't drop overflowing bits when a computation produces a result with more than 32 bits, and MD5 calculations assume they do. So you're going to need to simulate it with a lot of masking, by bitwise and-ing via & 0xffffffff.

In short implementing MD5 in plain Python from the pseudocode is a royal pain in the butt, because the pseudocode relies on the limitations of a low level language like C. And you're clearly not being careful enough implementing even the bits that do translate naturally to Python.

Sign up to request clarification or add additional context in comments.

2 Comments

Should I still convert the initial input to an array, then?
@cpy24: You didn't convert anything to an array, you made a Python list (the distinction is going to be important, keep reading). I'd suggest looking at the Python modules that give you access to C-like data types, e.g. array (which unlike list, actually stores raw C level data types), struct and/or ctypes, to let you simulate the limitations of C in Python.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.