Implementation of MD5 in Python

Question

I was following a page on Wikipedia about the implementation of MD5. The pseudocode is directly embedded in the link.

However, while I was translating the pseudocode into python, it gives a pretty weird output that has nothing similar to the one shown in the demo.

md5.py

import math def leftrotate(x, c): return (x << c) or (x >> (32 - c)) # All variables are unsigned 32 bit and wrap modulo 2^32 when calculating s = [None] * 64 K = [None] * 64 # s specifies the per-round shift amounts s[0:16] = [7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22] s[16:32] = [5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20] s[32:48] = [4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23] s[48:64] = [6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21] # Use binary integer part of the sines of integers (Radians) as constants: K = [math.floor(math.pow(2, 32) * math.fabs(math.sin(i + 1))) for i in range(64)] # Initialize variables a0 = 0x67452301 b0 = 0xefcdab89 c0 = 0x98badcfe d0 = 0x10325476 msg0 = "The quick brown fox jumps over the lazy dog" # Converting String to binary msg = ''.join(format(ord(i), 'b') for i in msg0) # Pre-processing: adding a single bit # The input bytes are considered as bits strings, # where the first bit is the most significant bit of the byte message = list(msg) message.append("1") # Pre-processing: padding with zeros for i in range(447 - len(msg)): message.append("0") for i in range(64): message.append("64") msg = "".join(message) M = [] for i in range(0, 512, 32): M.append("".join(message[i:i + 32])) # Initialize hash value for this chunk A = a0 B = b0 C = c0 D = d0 for i in range(64): F = 0 g = 0 if 0 <= i <= 15: F = D ^ (B and (C ^ D)) g = i elif 16 <= i <= 31: F = C ^ (D and (B ^ C)) g = (5 * i + 1) % 16 elif 32 <= i <= 47: F = B ^ C ^ D g = (3 * i + 5) % 16 elif 48 <= i <= 63: F = C ^ (B or (not D)) g = (7 * i) % 16 # Be wary of the below definitions of a, b, c, d F = F + A + K[i] + int(M[g]) A = D D = C C = B B = B + leftrotate(F, s[i]) a0 += A b0 += B c0 += C d0 += D print(a0 + b0 + c0 + d0)

output

(input = msg0 = "The quick brown fox jumps over the lazy dog") 64753135551430969455044517516606091785810184138594829859667366632969211689605539951611300227364936455768076694404884226395355419 03996976374438250472707827439981815374596773986313857734975864212023704601385676211761628136173288119161166663698645808505752643 4

In detail, may somebody please explain this line to me:

break chunk into sixteen 32-bit words M[j], 0 ≤ j ≤ 15

I was pretty confused at this point, having believed that the message should be inputted as a string but converting it to binary doesn't work. I thought that, on the page, the phrase bit means binary or am I wrong?

Any help is appreciated.

What do you think "binary" means? Because a "bit" is a single digit in binary data ("bit" is short for "Binary digIT"), but they're not synonyms. — ShadowRanger
– ShadowRanger, Commented Feb 28, 2020 at 3:17

ShadowRanger · Accepted Answer · 2020-02-28 03:12:37Z

First obvious error:

K = [math.floor(math.pow(2, 32) * math.fabs(math.sin(i + 1))) for i in range(63)]

The pseudo-code is using inclusive bounds notation; range is exclusive on the upper bound, so you just made a K that is 63 elements long, when it's supposed to be 64 elements long.

Same issue with your final loop:

for i in range(63):

Probably ought to check your other bounds to make sure you haven't made similar errors elsewhere.

This is also dead wrong:

msg = ''.join(format(ord(i), 'b') for i in msg0)

because it doesn't pad the input bytes the way it should, so your bits are all a jumble. There's a lot of ways to do this (you could probably make it work with a format string of '08b' assuming all your characters have ordinal values below 256, though it's a rather inefficient solution).

Also wrong:

for i in range(64): message.append("64")

You're supposed to append the original length in bits mod 2**64, not the number 64 repeated 64 times (I have no idea where you came up with that).

For your final question, to explain "break chunk into sixteen 32-bit words M[j], 0 ≤ j ≤ 15", "32 bit words" means "integers comprised of 32 binary bits". For a simple example, breaking 0b100100001111 into three four bit words would produce 0b1001, 0b0000, 0b1111.

Unfortunately for you, in Python, ints are infinite precision; they don't drop overflowing bits when a computation produces a result with more than 32 bits, and MD5 calculations assume they do. So you're going to need to simulate it with a lot of masking, by bitwise and-ing via & 0xffffffff.

In short implementing MD5 in plain Python from the pseudocode is a royal pain in the butt, because the pseudocode relies on the limitations of a low level language like C. And you're clearly not being careful enough implementing even the bits that do translate naturally to Python.

@cpy24: You didn't convert anything to an array, you made a Python list (the distinction is going to be important, keep reading). I'd suggest looking at the Python modules that give you access to C-like data types, e.g. array (which unlike list, actually stores raw C level data types), struct and/or ctypes, to let you simulate the limitations of C in Python.

Collectives™ on Stack Overflow

Implementation of MD5 in Python

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related