Non-blocking, non-concurrent tasks in Python

Question

I am working on an implementation of a very small library in Python that has to be non-blocking.

On some production code, at some point, a call to this library will be done and it needs to do its own work, in its most simple form it would be a callable that needs to pass some information to a service.

This "passing information to a service" is a non-intensive task, probably sending some data to an HTTP service or something similar. It also doesn't need to be concurrent or to share information, however it does need to terminate at some point, possibly with a timeout.

I have used the threading module before and it seems the most appropriate thing to use, but the application where this library will be used is so big that I am worried of hitting the threading limit.

On local testing I was able to hit that limit at around ~2500 threads spawned.

There is a good possibility (given the size of the application) that I can hit that limit easily. It also makes me weary of using a Queue given the memory implications of placing tasks at a high rate in it.

I have also looked at gevent but I couldn't see an example of being able to spawn something that would do some work and terminate without joining. The examples I went through where calling .join() on a spawned Greenlet or on an array of greenlets.

I don't need to know the result of the work being done! It just needs to fire off and try to talk to the HTTP service and die with a sensible timeout if it didn't.

Have I misinterpreted the guides/tutorials for gevent ? Is there any other possibility to spawn a callable in fully non-blocking fashion that can't hit a ~2500 limit?

This is a simple example in Threading that does work as I would expect:

from threading import Thread class Synchronizer(Thread): def __init__(self, number): self.number = number Thread.__init__(self) def run(self): # Simulating some work import time time.sleep(5) print self.number for i in range(4000): # totally doesn't get past 2,500 sync = Synchronizer(i) sync.setDaemon(True) sync.start() print "spawned a thread, number %s" % i

And this is what I've tried with gevent, where it obviously blocks at the end to see what the workers did:

def task(pid): """ Some non-deterministic task """ gevent.sleep(1) print('Task', pid, 'done') for i in range(100): gevent.spawn(task, i)

EDIT: My problem stemmed out from my lack of familiarity with gevent. While the Thread code was indeed spawning threads, it also prevented the script from terminating while it did some work.

gevent doesn't really do that in the code above, unless you add a .join(). All I had to do to see the gevent code do some work with the spawned greenlets was to make it a long running process. This definitely fixes my problem as the code that needs to spawn the greenlets is done within a framework that is a long running process in itself.

I haven't looked that closely at the code but, please say that you are not continually creating/terminating threads.. '~2500 threads spawned' - I fear the worst.. 'gevent.joinall(workers)' !! — Martin James
– Martin James, Commented Jul 7, 2012 at 18:33
Of course not :) I am forcing an extreme high load that tries to spawn as many threads as possible. The actual use case is a single spawn for every event. — alfredodeza
– alfredodeza, Commented Jul 7, 2012 at 23:42

Amber · Accepted Answer · 2012-07-07 17:46:12Z

Nothing requires you to call join in gevent, if you're expecting your main thread to last longer than any of your workers.

The only reason for the join call is to make sure the main thread lasts at least as long as all of the workers (so that the program doesn't terminate early).

I updated my examples but could not get the task to complete even if the process stayed around, care to expand on your response to get gevent to replicate the behavior of Threading?

ThatAintWorking · Accepted Answer · 2012-07-08 15:08:51Z

Why not spawn a subprocess with a connected pipe or similar and then, instead of a callable, just drop your data on the pipe and let the subprocess handle it completely out of band.

Community · Accepted Answer · 2017-05-23 12:28:33Z

As explained in Understanding Asynchronous/Multiprocessing in Python, asyncoro framework supports asynchronous, concurrent processes. You can run tens or hundreds of thousands of concurrent processes; for reference, running 100,000 simple processes takes about 200MB. If you want to, you can mix threads in rest of the system and coroutines with asyncoro (provided threads and coroutines don't share variables, but use coroutine interface functions to send messages etc.).

Collectives™ on Stack Overflow

Non-blocking, non-concurrent tasks in Python

3 Answers 3

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Linked

Related