2

I was trying to encode a plus-minus symbol in python 2.7 between two numbers (e.g. 10 ± 8.9).

From looking at the python documentation I found I needed to encode the plus-minus symbol in UTF-8 rather than the standard ASCII.

Here is a short example highlighting the issue I found, taking the Unicode value for plus-minus from Wikipedia

plusminus = u'\u00b1' print(plusminus) #All seems fine, but this is in ASCII format ± plusminus.encode('utf-8') #Two symbols are outputted. This is strange! '\xc2\xb1' print(a.encode('utf-8')) #Yep. two symbols were encoded from one Unicode ± print(u'\xb1') #Parital solution is to accept latter symbol ± 

Even though I have solved the issue (sort off) by taking the latter symbol, this seems strange that the encoding would output two symbols. I assume I am doing something wrong here, but I can't find any other examples of this happening.

Here are the questions I have:

1) What am I doing wrong here?

2) Is there a way to encode symbols (e.g. plus-minus) in UTF-8 directly without the additional symobols?

1
  • UTF-8 is a multibyte encoding. Unicode code points in the range U+0080 to U+07FF will be encoded in two bytes. Commented Oct 22, 2018 at 17:12

1 Answer 1

2

I found the root cause of my issue. It was caused by the terminal I was using had 'iso-8859-15' encoding. Changing the encoding used in python to match the terminal encoding fixed this issue and outputted a ± correctly.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi! Are you using Osx? I have the same problem, but the encodings seem to be correctly set.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.