77

The sequence I would like to accomplish:

  1. A user clicks a button on a web page
  2. Some functions in model.py start to run. For example, gathering some data by crawling the internet
  3. When the functions are finished, the results are returned to the user.

Should I open a new thread inside of model.py to execute my functions? If so, how do I do this?

3
  • 1
    What are you trying to accomplish? maybe you can do that by frontend tecnologies like AJAx, WebSocket, magic pony... Commented Jul 11, 2013 at 19:27
  • What is magic pony? Can't find it on google... Commented Jul 11, 2013 at 19:59
  • Possible duplicate of Multithreading for Python Django Commented Apr 29, 2018 at 7:02

3 Answers 3

86

As shown in this answer you can use the threading package to perform an asynchronous task. Everyone seems to recommend Celery, but it is often overkill for performing simple but long running tasks. I think it's actually easier and more transparent to use threading.

Here's a simple example for asyncing a crawler:

#views.py import threading from .models import Crawl def startCrawl(request): task = Crawl() task.save() t = threading.Thread(target=doCrawl,args=[task.id]) t.setDaemon(True) t.start() return JsonResponse({'id':task.id}) def checkCrawl(request,id): task = Crawl.objects.get(pk=id) return JsonResponse({'is_done':task.is_done, result:task.result}) def doCrawl(id): task = Crawl.objects.get(pk=id) # Do crawling, etc. task.result = result task.is_done = True task.save() 

Your front end can make a request for startCrawl to start the crawl, it can make an Ajax request to check on it with checkCrawl which will return true and the result when it's finished.


Update for Python3:

The documentation for the threading library recommends passing the daemon property as a keyword argument rather than using the setter:

t = threading.Thread(target=doCrawl,args=[task.id],daemon=True) t.start() 

Update for Python <3.7:

As discussed here, this bug can cause a slow memory leak that can overflow a long running server. The bug was fixed for Python 3.7 and above.

Sign up to request clarification or add additional context in comments.

10 Comments

wouldn't the process created for serving the web request runs until the thread is finished ?
@SandeepBalagopal That's a good point, and you're probably right, but you still return a response to the user before that process and the daemon process exit. Since the maximum number of processes is an OS level issue I suppose your architecture will determine the limit of the feasibility of this solution. A messaging queue is more robust in that sense or maybe you could use the queue library docs.python.org/3.7/library/queue.html
I'm using that exact method for my webpage and (yet) the haven't been any issues.
@Flimm that's a good question. Thread safety concerns accessing values in memory from separate threads. Your question is more related to concurrent database access. Django is inherently built for handling concurrent requests on multiple threads and/or processes that all may access the database. So it seems to me that the ORM should be able to handle the concurrency of the threading library as well.
The Django team introduced asynchronous support in v4.0. Would this be relevant? in regards to the async_to_sync etc. methods when handling databases.Asynchronous Support
|
36
  1. Yes it can multi-thread, but generally one uses Celery to do the equivalent. You can read about how in the celery-django tutorial.
  2. It is rare that you actually want to force the user to wait for the website. While it's better than risks a timeout.

Here's an example of what you're describing.

User sends request Django receives => spawns a thread to do something else. main thread finishes && other thread finishes ... (later upon completion of both tasks) response is sent to user as a package. 

Better way:

User sends request Django receives => lets Celery know "hey! do this!" main thread finishes response is sent to user ...(later) user receives balance of transaction 

13 Comments

Celery is overkill for many purposes. Please stop recommending it as the magic bullet for anything that needs to not block request/response. It's like recommending an RDBMS whenever anyone asks how to store a line of text.
@andybak Feel free to suggest an alternative. To me, this sounds like a legit use.
depends on the specifics but you can just spawn a thread and poll for completion, you can use a simple cron job that checks for tasks, or if you do need more features, you can use one of several 'not as complex as celery' projects such as huey or django-background-tasks.
Celery is too heavyweight in many cases and should not be the fallback position for requests involving async work. If an async transaction is going to kill a minute of CPU time, fine, go Celery. When a user logs in, I want to pull certain user data to memcache so that I can access it quickly as they navigate my system. For this, Celery sucks. I don't want the user login page to block while that caching takes place, though. Django is great for some things, but if you depend on sequential, external RPCs (ORM, memcache, etc.), it will flush cycles/memory down the toilet with reckless abandon.
(If you have other suggestions, then please make them answers. I recommended something which I had seen work in the past, it may be outdated, and it may be a battle-axe for a hangnail, but it happened to work. One of the major reasons we have this site is so that people can propose alternate answers and not just one-off on comments).
|
-3

If you don't want to add some overkill framework to your project, you can simply use subprocess.Popen:

def my_command(request): command = '/my/command/to/run' # Can even be 'python manage.py somecommand' subprocess.Popen(command, shell=True) command = '/other/command/to/run' subprocess.Popen(command, shell=True) return HttpResponse(status=204) 

[edit] As mentioned in the comments, this will not start a background task and return the HttpResponse right away. It will execute both commands in parallel, and then return the HttpResponse once both are complete. Which is what OP asked.

2 Comments

This doesn't work (at least on my fairly standard setup of django + uwsgi + nginx) to quickly return the HTTP Response while launching a long-running task to churn in the background. It instead launches the subprocess, but will not return the HTTP Response until after the subprocess terminates (even if you add '&' at the end of the command). Further, if the webserver times out it kills the process which will not finish. E.g., try the command with /bin/sleep 15 (will take 15 seconds) or /bin/sleep 60 or /bin/sleep 900 && echo 'hello' > /tmp/tmptest123 (will timeout and not finish).
Indeed, but that is not what OP asked for. subprocess will let you run multi-threaded functions and return the http response after those are completed.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.