This is what I got from wikipedia bout the character 0xff, which is a symbol for UTF-16.
UTF-16[edit] In UTF-16, a BOM (U+FEFF) may be placed as the first character of a file or character stream to indicate the endianness (byte order) of all the 16-bit code units of the file or stream. If the 16-bit units are represented in big-endian byte order, this BOM character will appear in the sequence of bytes as 0xFE followed by 0xFF. This sequence appears as the ISO-8859-1 characters þÿ in a text display that expects the text to be ISO-8859-1. if the 16-bit units use little-endian order, the sequence of bytes will have 0xFF followed by 0xFE. This sequence appears as the ISO-8859-1 characters ÿþ in a text display that expects the text to be ISO-8859-1. Programs expecting UTF-8 may show these or error indicators, depending on how they handle UTF-8 encoding errors. In all cases they will probably display the rest of the file as garbage (a UTF-16 text containing ASCII only will be fairly readable).
So I have two thoughts here:
(1) It could be due to the reason that it should be treat as utf-16 instead of utf-8
(2) The error happens because you are trying to print the whole soup to the screen. Then it involves will your IDE (Eclipse/Pycharm) be smart enough to display those unicode.
If I were you, I will try to move on without printing the whole soup and collect only the piece you want. See you have problem reaching that step. If there is no problem there, then why bother you cannot print the whole soup to the screen.
If you really want to print the soup to screen, try:
print soup.prettify(encoding='utf-16')
BeautifulSoup(open(file_path), "xml")in Eclipse. The exact same code works in IPython Notebook! Both use Anaconda Python 3.6