How to cipher UTF-8 beyond A-Z in python?

Question

Many years ago, I made a program in C# on Windows which "encrypts" text files using (what I thought was) caeser chipher.

Back then I wanted more characters than just A-Z,0-9 and made it possible but never thought about the actual theory behind it.

Looking at some of the files, and comparing it to this website, it seems like the UTF-8 is being shifted.

I started up a Windows VM (because I'm using Linux now) and typed this: abcdefghijklmnopqrstuvwxyz

It generated a text that looks like this in hexadecimals (Shifted 15 times):

70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f c280 c281 c282 c283 c284 c285 c286 c287 c288 c289

How can I shift the hexadecimals to look like this?

61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 78 79 7a

Or are there any easier/better methods of doing this?

UPDATE

I'm using Python 3.5.3, and this is the code I have so far:

import sys arguments = sys.argv[1:] file = "" for arg in arguments: if arg[0] != "-": file = arg lines = [] with open(file) as f: lines = f.readlines() for line in lines: result = 0 for value in list(line): #value = "0x"+value temp=value.encode('utf-8').hex() temp+=15 if(temp>0x7a): temp-=0x7a elif(temp<=0): temp+=0x7a #result = result + temp print (result)

Unfortunately, I don't have the C# source code available for the moment. I can try to find it

I'm not sure that I understand what the underlying problem is. Why can't you just subtract from each character the value 15? — andreihondrari
– andreihondrari, Commented Apr 22, 2019 at 13:20

wovano · Accepted Answer · 2019-04-22 15:34:41Z

Assuming your input is ASCII text, the simplest solution is to encode/decode as ASCII and use the built-in methods ord() and chr() to convert from character to byte value and vice versa.

Note that the temp value cannot be less than 0, so the second if-statement can be removed.

NB: This is outside the scope of the question, but I also noticed that you're doing argument parsing yourself. I highly recommend using argparse instead, since it's very easy and gives you a lot extra for free (i.e. it performs error checking and it prints a nice help message if you start your application with '--help' option). See the example code below:

import argparse parser = argparse.ArgumentParser() parser.add_argument(dest='filenames', metavar='FILE', type=str, nargs='+', help='file(s) to encrypt') args = parser.parse_args() for filename in args.filenames: with open(filename, 'rt', encoding='ascii') as file: lines = file.readlines() for line in lines: result = "" for value in line: temp = ord(value) # character to int value temp += 15 if temp > 0x7a: temp -= 0x7a result += chr(temp) # int value to character print(result)

MyNameIsCaleb · Accepted Answer · 2019-04-22 13:20:59Z

You can convert hex back and forth between integers and hex using int() and hex(). However, the hex() method only works on integers. So first you need to convert to an integer using base=16.

hex_int = int(hex_str, 16) cipher = hex_int - 15 hex_cipher = hex(cipher)

Now apply that in a loop and you can shift your results left or right as desired. And you could of course condense the code as well.

result = hex(int(hex_string, 16) - 15) #in a loop hexes = ['70', '71', 'c280'] ciphered = [] for n in hexes: ciphered.append(hex(int(n, 16) - 15))

Ali Nuri Şeker · Accepted Answer · 2019-04-29 08:28:31Z

You can use int('somestring'.encode('utf-8').hex(),16) to get the exact values on that website. If you want to apply the same rules to each character, you can do it in a character list. You can use

import codecs def myencode(character,diff): temp=int(character.encode('utf-8').hex(),16) temp+=diff if(temp>0x7a): temp-=0x7a elif(temp<=0): temp+=0x7a result=codecs.decode(hex(temp)[2:],"hex").decode("utf-8") return result

diff should be the shift for the cipher (It could be an integer). encode('utf-8') converts string to byte array and .hex() displays bytes as hex. You should feed this function only one character of a string at a time so there would be no issues shifting everything.

After you are done with the encoding you need to decode it in to a new character which you can do by library codecs to convert from integer to byte (char) and then return it back to a string with decode("utf-8")

Edit: Updated, now it works.

What should the diff be? I would assume it is an integer (the shift), but I get an error message saying: TypeError: cannot concatenate 'str' and 'int' objects. Also, should it be .hex() or .encode("hex") ?
@Typewar The diff is the shift as you guessed. I am updating my answer to explain it in more detail.
Now I get this error TypeError: Can't convert 'int' object to str implicitly. I tried adding 0x in front of the string like this temp=str("0x"+character).encode('utf-8').hex() but it didn't help. I can see that by adding 0x in front, the int value changes from 70 to 307870
You should not need to add 0x in the beginning as a string. Whatever this function returns is already hexadecimal
And please take note i did design this cipher in range between 0 and 7a. You also might want to tweak that too.(by changing 0 and 7a values)

Collectives™ on Stack Overflow

How to cipher UTF-8 beyond A-Z in python?

UPDATE

3 Answers 3

Comments

Comments

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

UPDATE

3 Answers 3

Comments

Comments

5 Comments

Related