Improving performance of python multithreading for downloading webpages

Question

I am trying to write python code that downloads web pages using separate threads. Here is an example of my code:

import urllib2 from threading import Thread import time URLs = ['http://www.yahoo.com/', 'http://www.time.com/', 'http://www.cnn.com/', 'http://www.slashdot.org/' ] def thread_func(arg): t = time.time() page = urllib2.urlopen(arg) page = page.read() print time.time() - t for url in URLs: t = Thread(target = thread_func, args = (url, )) t.start() t.join()

I run the code and the threads seem to execute serially, if I'm not mistaken, with the time of the download measured but each one is output to console after a certain amount of time. Am I coding this correctly?

Colonel Thirty Two · Accepted Answer · 2014-09-18 22:41:28Z

The call to t.join() blocks the current thread until the target thread ends. You're calling that right after you create the thread, so you don't have more than one downloader thread running at a time.

Change your code to this:

threads = [] for url in URLs: t = Thread(target = thread_func, args = (url, )) t.start() threads.append(t) # All threads started, now wait for them to finish for t in threads: t.join()

thanks. just out of curiosity, i'm getting output like this: 3.60282206535 4.05780601501 5.74620199203 9.5616710186...it looks like each time is taking longer instead of them being around the same time. is this correct behavior? ...
The threads will start at roughly the same time, but will compete for your network bandwidth. A few threads will immediately start making requests, while the others block until the network is available.

Collectives™ on Stack Overflow

Improving performance of python multithreading for downloading webpages

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related