A custom app engine environment fails to start up and it seems to be due to failing health checks. The app has a few custom dependencies (e.g. PostGIS, GDAL) so a few layers on top of the app engine image. It builds successfully and it runs locally in a Docker container.
ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section. The Dockerfile looks as follows (Note: no CMD as entrypoint is defined in docker-compose.yml and app.yaml):
FROM gcr.io/google-appengine/python ENV PYTHONUNBUFFERED 1 ENV DEBIAN_FRONTEND noninteractive RUN apt -y update && apt -y upgrade\ && apt-get install -y software-properties-common \ && add-apt-repository -y ppa:ubuntugis/ppa \ && apt -y update \ && apt-get -y install gdal-bin libgdal-dev python3-gdal \ && apt-get autoremove -y \ && apt-get autoclean -y \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* ADD requirements.txt /app/requirements.txt RUN python3 -m pip install -r /app/requirements.txt ADD . /app/ WORKDIR /app This unfortunately creates an image of a whopping 1.58GB, but the original gcr.io python image starts at 1.05GB, so I don't think the size of the image would or should be a problem.
Running this locally with the following docker-compose.yml config beautifully spins up a container in no time:
version: "3" services: web: build: . command: gunicorn gisapplication.wsgi --bind 0.0.0.0:8080 So, I would have expected the following app.yaml would do the trick:
runtime: custom env: flex entrypoint: gunicorn -b :$PORT gisapplication.wsgi beta_settings: cloud_sql_instances: <sql-db-connection> runtime_config: python_version: 3 No luck. So, as per error above, it seemed to have something to do with the readiness check. Tried increasing the timeout for the app to start (15 mins!) There seemed to have been some issues with health checks previously and rolling back to legacy health checks is not a solution as of Sept 2019.
readiness_check: path: "/readiness_check" check_interval_sec: 10 timeout_sec: 10 failure_threshold: 3 success_threshold: 3 app_start_timeout_sec: 900 liveness_check: path: "/liveness_check" check_interval_sec: 60 timeout_sec: 4 failure_threshold: 3 success_threshold: 2 initial_delay_sec: 30 Split health checks are definitely on. The output from gcloud beta app describe is:
authDomain: gmail.com codeBucket: staging.proj-id-000000.appspot.com databaseType: CLOUD_DATASTORE_COMPATIBILITY defaultBucket: proj-id-000000.appspot.com defaultHostname: proj-id-000000.ts.r.appspot.com featureSettings: splitHealthChecks: true useContainerOptimizedOs: true gcrDomain: asia.gcr.io id: proj-id-000000 locationId: australia-southeast1 name: apps/proj-id-000000 servingStatus: SERVING That didn't work, so also tried to increase the resources available to the instance and allocated the maximum amount of memory for 1 CPU (6.1GB):
resources: cpu: 1 memory_gb: 6.1 disk_size_gb: 10 Just to be on the safe side, I added health check endpoints to the app (legacy health checks and the split health checks) - it's a Django app, so this went into the project's urls.py:
path(r'_ah/health/', lambda r: HttpResponse("OK", status=200)), path(r'readiness_check/', lambda r: HttpResponse("OK", status=200)), path(r'liveness_check/', lambda r: HttpResponse("OK", status=200)), So, when I dive into the logs, there seems to be a successful request to /liveness_check from a curl user agent, but the subsequent requests to /readiness_check from GoogleHC agent return a 503 (Service Unavailable)
Shortly after (after 8 failed requests - why 8?) a shutdown trigger seems to be sent of:
2020-07-05 09:00:02.603 AEST Triggering app shutdown handlers. Any ideas of what is going on here? I think I've pretty much exhausted the options to fix this problem and wonder whether the time wouldn't have been better invested in getting things up and running in Compute/EC2.
ADDENDUM:
in addition to the SO issue linked, I've gone through issues on Google (here and here)
