This is my first time using BeautifulSoup.
Basically, I use BeautifulSoup to extract data. I am trying to construct a table in csv based on the webtable. And an example row of my table looks like this:
[<td>1</td>, <td> Chief executives and senior officials</td>, <td>£120,830</td>,<td>-3.8</td>] Now, the problem is when I use .text.encode('utf8'), the output becomes:
('1', ' Chief executives and senior officials', '\xc2\xa3120,830', '-3.8') The figure £120,830 becomes \xc2\xa3120,830, which I have no idea what kind of encoding this is. Is there a way that I can get the proper output £120,830 rather than the crazy encoding ?
Alternatively, is there a way to make this crazy encoded thing \xc2\xa3120,830 to look like £120,830 in my csv ? Does anyone know how to deal with these kind of problem ?
Another alternative is to remove the <td> tags and keep the content, but how can I do that in python ? Is there an efficient way of getting rid of these tags ? Any help will be appreciated. Thanks