0

I am trying to write python code that downloads web pages using separate threads. Here is an example of my code:

import urllib2 from threading import Thread import time URLs = ['http://www.yahoo.com/', 'http://www.time.com/', 'http://www.cnn.com/', 'http://www.slashdot.org/' ] def thread_func(arg): t = time.time() page = urllib2.urlopen(arg) page = page.read() print time.time() - t for url in URLs: t = Thread(target = thread_func, args = (url, )) t.start() t.join() 

I run the code and the threads seem to execute serially, if I'm not mistaken, with the time of the download measured but each one is output to console after a certain amount of time. Am I coding this correctly?

1 Answer 1

1

The call to t.join() blocks the current thread until the target thread ends. You're calling that right after you create the thread, so you don't have more than one downloader thread running at a time.

Change your code to this:

threads = [] for url in URLs: t = Thread(target = thread_func, args = (url, )) t.start() threads.append(t) # All threads started, now wait for them to finish for t in threads: t.join() 
Sign up to request clarification or add additional context in comments.

2 Comments

thanks. just out of curiosity, i'm getting output like this: 3.60282206535 4.05780601501 5.74620199203 9.5616710186...it looks like each time is taking longer instead of them being around the same time. is this correct behavior? ...
The threads will start at roughly the same time, but will compete for your network bandwidth. A few threads will immediately start making requests, while the others block until the network is available.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.