1

Lets say I have a function:

from time import sleep def doSomethingThatTakesALongTime(number): print number sleep(10) 

and then I call it in a for loop

for number in range(10): doSomethingThatTakesALongTime(number) 

How can I set this up so that it only takes 10 seconds TOTAL to print out:

$ 0123456789 

Instead of taking 100 seconds. If it helps, I'm going to use the information YOU provide to do asynchronous web scraping. i.e. I have a list of sites I want to visit, but I want to visit them simultaneously, rather than wait for each one to complete.

2
  • Guaranteed 10 seconds? You would need a RT-OS to do that. If you just want to do async I/Os you may look into the threading package or into a async/eventlet solution like fedosov mentioned. Commented Jul 12, 2012 at 18:46
  • @jessh, did any of proposed solutions help you? Commented Jul 13, 2012 at 12:27

4 Answers 4

2

Try to use Eventlet — the first example of documentation shows how to implement simultaneous URL fetching:

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif", "https://wiki.secondlife.com/w/images/secondlife.jpg", "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"] import eventlet from eventlet.green import urllib2 def fetch(url): return urllib2.urlopen(url).read() pool = eventlet.GreenPool() for body in pool.imap(fetch, urls): print "got body", len(body) 

I can also advise to look toward Celery for more flexible solution.

Sign up to request clarification or add additional context in comments.

Comments

2

asyncoro supports asynchronous, concurrent programming. It includes asynchronous (non-blocking) socket implementation. If your implementation does not need urllib/httplib etc. (that don't have asynchronous completions), it may fit your purpose (and easy to use, as it is very similar to programming with threads). Your above problem with asyncoro:

import asyncoro def do_something(number, coro=None): print number yield coro.sleep(10) for number in range(10): asyncoro.Coro(do_something, number) 

Comments

1

Take a look at scrapy framework. It's intended specially for web scraping and is very good. It is asynchronus and built on twisted framework.

http://scrapy.org/

Comments

0

Just in case, this is the exact way to apply green threads to your example snippet:

from eventlet.green.time import sleep from eventlet.greenpool import GreenPool def doSomethingThatTakesALongTime(number): print number sleep(10) pool = GreenPool() for number in range(100): pool.spawn_n(doSomethingThatTakesALongTime, number) import timeit print timeit.timeit("pool.waitall()", "from __main__ import pool") # yields : 10.9335260363 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.