I am building celery + django + selenium application. I am running selenium-based browsers in separate processes with help celery. Versions:
celery==5.2.6 redis==3.4.1 selenium-wire==5.1.0 Django==4.0.4 djangorestframework==3.13.1 I found out that after several hours application generates thousands of zombie processes. Also found out that problem deals with celery docker container, because after sudo /usr/local/bin/docker-compose -f /data/new_app/docker-compose.yml restart celery I have 0 zombie processes.
My code
from rest_framework.decorators import api_view @api_view(['POST']) def periodic_check_all_urls(request): # web-service endpoint ... check_urls.delay(parsing_results_ids) # call celery task Celery task code
from celery import shared_task @shared_task() def check_urls(parsing_result_ids: List[int]): """ Run Selenium-based parser the parser exctracts data and saves in database """ try: logger.info(f"{datetime.now()} Start check_urls") parser = Parser() # open selenium browser parsing_results = ParsingResult.objects.filter(pk__in=parsing_result_ids).exclude(status__in=["DONE", "FAILED"]) parser.check_parsing_result(parsing_results) except Exception as e: full_trace = traceback.format_exc() finally: if 'parser' in locals(): parser.stop() Selenium browser stop function and destructor
class Parser(): def __init__(self): """ Prepare parser """ if not USE_GUI: self.display = Display(visible=0, size=(800, 600)) self.display.start() """ Replaced with FireFox self.driver = get_chromedriver(proxy_data) """ proxy_data = { ... } self.driver = get_firefox_driver(proxy_data=proxy_data) def __del__(self): self.stop() def stop(self): try: self.driver.quit() logger.info("Selenium driver closed") except: pass try: self.display.stop() logger.info("Display stopped") except: pass Also I was trying several settings to limit celery task resources and time of work (it didn't help with Zombie processes)
My celery settings in dgango settings.py
# celery setting (documents generation) CELERY_BROKER_URL = os.environ.get("CELERY_BROKER", "redis://redis:6379/0") CELERY_RESULT_BACKEND = os.environ.get("CELERY_BROKER", "redis://redis:6379/0") CELERY_IMPORTS = ("core_app.celery",) CELERY_TASK_TIME_LIMIT = 10 * 60 My celery settings in dockers
celery: build: ./project command: celery -A core_app worker --loglevel=info --concurrency=15 --max-memory-per-child=1000000 volumes: - ./project:/usr/src/app - ./project/media:/project/media - ./project/logs:/project/logs env_file: - .env environment: # environment variables declared in the environment section override env_file - DJANGO_ALLOWED_HOSTS=localhost 127.0.0.1 [::1] - CELERY_BROKER=redis://redis:6379/0 - CELERY_BACKEND=redis://redis:6379/0 depends_on: - django - redis I read Django/Celery - How to kill a celery task? but it didn't help
Also read Celery revoke leaving zombie ffmpeg process but my task already contains try/except
Example of zombie processes
ps aux | grep 'Z' root 32448 0.0 0.0 0 0 ? Z 13:45 0:00 [Utility Process] <defunct> root 32449 0.0 0.0 0 0 ? Z 13:09 0:00 [Utility Process] <defunct> root 32450 0.0 0.0 0 0 ? Z 11:13 0:00 [sh] <defunct> root 32451 0.0 0.0 0 0 ? Z 13:44 0:00 [Utility Process] <defunct> root 32452 0.0 0.0 0 0 ? Z 10:12 0:00 [Utility Process] <defunct> root 32453 0.0 0.0 0 0 ? Z 09:52 0:00 [sh] <defunct> root 32454 0.0 0.0 0 0 ? Z 10:40 0:00 [Utility Process] <defunct> root 32455 0.0 0.0 0 0 ? Z 09:52 0:00 [Utility Process] <defunct> root 32456 0.0 0.0 0 0 ? Z 10:13 0:00 [sh] <defunct> root 32457 0.0 0.0 0 0 ? Z 10:51 0:00 [Utility Process] <defunct> root 32459 0.0 0.0 0 0 ? Z 14:01 0:00 [Utility Process] <defunct> root 32460 0.0 0.0 0 0 ? Z 13:16 0:00 [Utility Process] <defunct> root 32461 0.0 0.0 0 0 ? Z 10:40 0:00 [Utility Process] <defunct> root 32462 0.0 0.0 0 0 ? Z 10:12 0:00 [Utility Process] <defunct>
returnafterparser.stop()in your Celery task code from the 2nd last link? Also I don't think limiting task resources would prevent zombie processes?stopmethod to @Apex862-2's answer, and show us the log data of the traceback