0

I have this text.ucs file which I am trying to decode using python.

file = open('text.ucs', 'r') content = file.read() print content 

My result is

\xf\xe\x002\22

I tried doing decoding with utf-16, utf-8

content.decode('utf-16') 

and getting error

Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode return codecs.utf_16_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode bytes in position 32-33: illegal encoding

Please let me know if I am missing anything or my approach is wrong

Edit: Screenshot has been asked enter image description here

6
  • @Rakesh Sorry, can't post that here Commented May 7, 2018 at 10:33
  • Where do you get text.ucs from? Commented May 7, 2018 at 10:34
  • Do you know what alphabet it's supposed to be? Commented May 7, 2018 at 10:35
  • This is having load balancer information Commented May 7, 2018 at 10:38
  • Try encoding='utf_16_be' (stackoverflow.com/a/14488478/1388292) Commented May 7, 2018 at 10:42

4 Answers 4

1

The string is encoded as UTF16-BE (Big Endian), this works:

content.decode("utf-16-be") 
Sign up to request clarification or add additional context in comments.

3 Comments

@JacquesGaudin unfortunately both are not working but as per python docs, i see '-' and not '_'
@cyborg I executed this on the bytes that you provided just now, worked fine. The names with dashes and underscores are equivalent, first paragraph of docs.python.org/3/library/codecs.html#standard-encodings
>>> content.decode("utf_16_be") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\encodings\utf_16_be.py", line 16, in decode return codecs.utf_16_be_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode byte 0x5c in position 64: truncat ed data
1

oooh, as i understand you using python 2.x.x but encoding parameter was added only in python 3.x.x as I know, i am doesn't master of python 2.x.x but you can search in google about io.open for example try:

file = io.open('text.usc', 'r',encoding='utf-8') content = file.read() print content 

but chek do you need import io module or not

Comments

0

You can specify which encoding to use with the encoding argument:

with open('text.ucs', 'r', encoding='utf-16') as f: text = f.read() 

5 Comments

the error message I got: Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'encoding' is an invalid keyword argument for this function
maybe you forgotten that if you usin with open() you need to set name for this for example : with open('text.ucs', 'r', encoding='utf-16') as file
I did it, I am aware of with usage :)
really i don't know why it's doesn't work, i tested it 20 secs ago and this construction haven't any errors my full code is with open(source, 'r', encoding='utf-8') as csvfile:...
I have added screenshot in question, please check
0

your string need to Be Uncoded With The Coding utf-8 you can do What I Did Now for decode your string

f = open('text.usc', 'r',encoding='utf-8') print f 

2 Comments

A few words to explain why you think this fixes the OP issue would be a good idea. stackoverflow.com/help/how-to-answer
@JacquesGaudin thanks for that i will be more attentive about my answer next time

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.