0

I am trying to decode byte string which has unicode character EN DASH, to get the proper unicode string.

Below code is running fine on windows with python 3.6:

decode_header_sequence = [(b'Excel to csv \xe2\x80\x93 Conversion .csv', 'utf-8')] print(decode_header_sequence[0][0].decode('utf-8')) 

which gives me string - 'Excel to csv – Conversion .csv'

But when I execute the same lines on linux platform. Code is failing with unicode error: 'ascii' codec can't encode characters in position 16-18: ordinal not in range(128)

I have tried almost everything that I found under the threads like this.But no luck. Anyone can help me with solving this issue as i really don't know Why this is happening?

3
  • The decoding goes fine. The problems happen when you try to print. Commented Apr 3, 2020 at 12:58
  • @user2357112 supports Monica what problem is happening there..can you please elaborate me this? Commented Apr 3, 2020 at 13:06
  • Possibly useful stackoverflow.com/a/57224678/5320906, stackoverflow.com/a/54599110/5320906. Commented Apr 3, 2020 at 13:47

1 Answer 1

0

On windows I found sys.getfilesystemencoding() was set to UTF-8 and on linux it was set to ascii.Therefore on windows it was easily decoding utf-8 characters in the input string.But giving error on linux.I just get the rid of this by ignoring utf-8 characters from input string. I decoded string as below :

ascii_String = input_string.decode('ascii', errors="ignore").encode('ascii') 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.