4

The following two lines of code hangs forever:

import urllib2 urllib2.urlopen('https://www.5giay.vn/', timeout=5) 

This is with python2.7, and I have no http_proxy or any other env variables set. Any other website works fine. I can also wget the site without any issue. What could be the issue?

1
  • I see this on both Linux (Amazon AMI) and Mac OS. Also, doesn't seem related to DNS, because even this hangs: urllib2.urlopen('210.245.123.158', timeout=1) Commented Dec 6, 2014 at 4:27

1 Answer 1

5

If you run

import urllib2 url = 'https://www.5giay.vn/' urllib2.urlopen(url, timeout=1.0) 

wait for a few seconds, and then use C-c to interrupt the program, you'll see

 File "/usr/lib/python2.7/ssl.py", line 260, in read return self._sslobj.read(len) KeyboardInterrupt 

This shows that the program is hanging on self._sslobj.read(len).

SSL timeouts raise socket.timeout.

You can control the delay before socket.timeout is raised by calling socket.setdefaulttimeout(1.0).

For example,

import urllib2 import socket socket.setdefaulttimeout(1.0) url = 'https://www.5giay.vn/' try: urllib2.urlopen(url, timeout=1.0) except IOError as err: print('timeout') 

% time script.py timeout real 0m3.629s user 0m0.020s sys 0m0.024s 

Note that the requests module succeeds here although urllib2 did not:

import requests r = requests.get('https://www.5giay.vn/') 

How to enforce a timeout on the entire function call:

socket.setdefaulttimeout only affects how long Python waits before an exception is raised if the server has not issued a response.

Neither it nor urlopen(..., timeout=...) enforce a time limit on the entire function call.

To do that, you could use eventlets, as shown here.

If you don't want to install eventlets, you could use multiprocessing from the standard library; though this solution will not scale as well as an asynchronous solution such as the one eventlets provides.

import urllib2 import socket import multiprocessing as mp def timeout(t, cmd, *args, **kwds): pool = mp.Pool(processes=1) result = pool.apply_async(cmd, args=args, kwds=kwds) try: retval = result.get(timeout=t) except mp.TimeoutError as err: pool.terminate() pool.join() raise else: return retval def open(url): response = urllib2.urlopen(url) print(response) url = 'https://www.5giay.vn/' try: timeout(5, open, url) except mp.TimeoutError as err: print('timeout') 

Running this will either succeed or timeout in about 5 seconds of wall clock time.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the investigation. With a timeout of 1, it does time out. But if you make the timeout=5.0, it hangs forever. Strange!
Thanks, in this case the webserver was misconfigured, sending 1 character every second. So the timeout was not hitting, and still the request will take forever.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.