0

I am crawling a particular url from google.com but i get some error

'utf8' codec can't decode byte 0xc3 in position 72: invalid continuation byte 

Code:

import re import os import MySQLdb import codecs import requests import base64 import random import gzip import time from multiprocessing.pool import Pool import datetime import time import sys reload(sys) sys.setdefaultencoding('utf-8') def proxy_mesh(): while True: try: data = requests.get('google.com') print data.text.encode('utf-8') except Exception, e: print e print "Trying again" time.sleep(3) proxy_mesh() 

What is the FIX and how to over come this error?

3
  • In other words, you're trying to decode using utf-8 while the encoding was done differently. Commented Mar 23, 2016 at 1:33
  • Can you give the traceback? This could be occurring implicitly in several places. Commented Mar 23, 2016 at 1:37
  • @Mounarajan as suggested in the link I provided, you need to use different encoding. Can't tell you which one without more information. Commented Mar 23, 2016 at 1:41

1 Answer 1

0

Keep it simple and it works. The data has already been decoded by the requests module.

import requests data = requests.get('https://www.whoisxmlapi.com/whoisserver/WhoisService?domainName=http://N%E2%94%[email protected]&outputFormat=json') print data.text 

Since it is a JSON response, you may also want to process it:

import json print json.loads(data.text) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.