1

My assignment asks me to compute a hash from a video file. So I'm guessing what I need to do, is to somehow read the video file as binary data, and then do the hashing shenanigans with that. The problem is, I only know how to read from and write to .txt files - video files are completely new to me. So my questions are:

How do I take a file and read it as binary data?

How do I handle this data? I mean, should I just stick it into a string or should I use an array of some sort? I imagine the amount of numbers is going to be huge and I wouldn't like my computer to crash because I handled the data in some horribly inefficient way :D.

Also, I am not entirely sure what I am talking about when I say "binary data", as I have limited experience with that kind of stuff. I mean, it's not just a string of 1s and 0s right?. So I would also appreciate a crash course on "binary data" :D

2
  • 1
    binary data just means the raw data ... , in order to read binary data you need to open the file in binary mode open(fname,"rb") Commented Jul 16, 2013 at 20:52
  • Text data is binary data, but when reading a 'text file' the reader looks for things such as UTF encoded bytes, new lines, etc. By using "b" as the open mode, you bypass this and tell python to just hand you the raw untouched data. Commented Jul 16, 2013 at 20:59

1 Answer 1

2

There is really no difference between text data and binary data. A string is just a sequence of bytes. Every byte, or several byte, values corresponds to a text character. Because of this we can read and store binary data (a sequence of bytes) just like a string. The only difference is that the sequence of characters we read from binary will probably not be readable by humans.

Mark the file open format with "rb" (read binary) to avoid text line ending problems. To handle large files, you could read a small number of bytes at a time and compute the hash of the bytes as you go.

started = 0 hash_val = 0 with open("video", "rb") as file: byte = file.read(1) # read a byte (a single character in text) byte_val = ord(byte) # convert the string character into a number if started == 0: hash_val = byte_val started = 1 hash_val = (hash_val << 5) - hash_val + byte_val # this is a basic hash print(hash_val) 
Sign up to request clarification or add additional context in comments.

2 Comments

Ok, that explains it quite well! I'll see what I can do now.
@NorsulRonsul I have no idea how reliable that particular hash is in python since you cannot rely on wrapping behavior etc - it should give you an idea of how hashes like that work though. An optimization might be to read 64 or 128 bytes at a time, instead of just one, and iterate of each character