1

I was reading a article on Python multi threading using Queues and have a basic question.

Based on the print stmt, 5 threads are started as expected. So, how does the queue works?

1.The thread is started initially and when the queue is populated with a item does it gets restarted and starts processing that item? 2.If we use the queue system and threads process each item by item in the queue, how there is a improvement in performance..Is it not similar to serial processing ie; 1 by 1.

import Queue import threading import urllib2 import datetime import time hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com", "http://ibm.com", "http://apple.com"] queue = Queue.Queue() class ThreadUrl(threading.Thread): def __init__(self, queue): threading.Thread.__init__(self) print 'threads are created' self.queue = queue def run(self): while True: #grabs host from queue print 'thread startting to run' now = datetime.datetime.now() host = self.queue.get() #grabs urls of hosts and prints first 1024 bytes of page url = urllib2.urlopen(host) print 'host=%s ,threadname=%s' % (host,self.getName()) print url.read(20) #signals to queue job is done self.queue.task_done() start = time.time() if __name__ == '__main__': #spawn a pool of threads, and pass them queue instance print 'program start' for i in range(5): t = ThreadUrl(queue) t.setDaemon(True) t.start() #populate queue with data for host in hosts: queue.put(host) #wait on the queue until everything has been processed queue.join() print "Elapsed Time: %s" % (time.time() - start) 

1 Answer 1

1

A queue is similar to a list container, but with internal locking to make it a thread-safe way to communicate data.

What happens when you start all of your threads is that they all block on the self.queue.get() call, waiting to pull an item from the queue. When an item is put into the queue from your main thread, one of the threads will become unblocked and receive the item. It can then continue to process it until it finishes and returns to a blocking state.

All of your threads can run concurrently because they all are able to receive items from the queue. This is where you would see your improvement in performance. If the urlopen and read take time in one thread and it is waiting on IO, that means another thread can do work. The queue objects job is simply to manage the locking access, and popping off items to the callers.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.