If your containers are not responding to HTTP traffic, the health check fails.
These health checks are called Release Health Checks.
There are several reasons why the health check might fail, each with their own fix:
If your app crashes immediately upon start up, it's not healthy. In this case, Aptible Deploy will indicate that your Containers exited, and report their Container Command and exit code.
You'll need to identify why your Containers are exiting immediately. There are usually two possible causes:
- There's a bug and your container is crashing. If this is the case, it should be obvious from the logs. To proceed, fix the issue, and try again.
- Your container is starting a program that immediately daemonizes. In this case, your container will appear to have exited from Aptible Deploy's perspective. To proceed, make sure the program you're starting stays in the foreground and does not daemonize, then try again.
If your app is listening on
127.0.0.1), then Aptible Deploy cannot connect to it, so the health check won't pass.
Indeed, your app is running in Containers, so if the app is listening on
127.0.0.1, then it's only routable from within those Containers, and notably it's not routable from the Endpoint.
To solve this issue, you need to make sure your app is listening on all interfaces. Most application servers let you do so by binding to
If your Containers are listening on a given port, but the Endpoint is trying to connect to a different port, the health check can't pass.
There are two possible scenarios here:
- Your Image does not expose the port your app is listening on.
- Your Image exposes multiple ports, but your Endpoint and your app are using different ports.
In either case, to solve this problem, you should make sure that:
- The port your app is listening on is exposed by your image. For example, if your app listens on port
8000, your :ref:
Dockerfilemust include the following directive:
- Your Endpoint is using the same port as your app. By default, Aptible Deploy HTTP(S) Endpoints automatically select the lexicographically lowest port exposed by your image (e.g. if your image exposes port
80, then the default is
443), but you can select the port Aptible Deploy should use when creating the Endpoint, and modify it at any time.
It's possible that your app Containers are is simply taking longer to finish booting up and start accepting traffic than Aptible Deploy is willing to wait.
Indeed, by default, Aptible Deploy waits for up to 3 minutes for your app to respond. However, you can increase that timeout by setting the
RELEASE_HEALTHCHECK_TIMEOUT Configuration variable on your app.
There is one particular error case worth mentioning here:
When starting a Python app using Gunicorn as your application server, the health check might fail with a repeated set of
[CRITICAL] WORKER TIMEOUT errors.
These errors are generated by Gunicorn when your worker processes fail to boot within Gunicorn's timeout. When that happens, Gunicorn terminates the worker processes, then starts over.
By default, Gunicorn's timeout is 30 seconds. This means that if your app needs e.g. 35 seconds to boot, Gunicorn will repeatedly timeout then restart it from scratch.
As a result, even though Aptible Deploy gives you 3 minutes to boot up (configurable with
RELEASE_HEALTHCHECK_TIMEOUT), an app that needs 35 seconds to boot will time out on the Release Health Check, because Gunicorn is repeatedly killing then restarting it.
30 seconds might seem like a long time for your app to boot up, but with a large app and a small Container on a Stack enforcing CPU Limits, hitting this timeout is fairly common. Besides, you might have configured the timeout with a lower value (via the
There are two recommended strategies to address this problem:
- If you are using a synchronous worker in Gunicorn (the default), use Gunicorn's
--preloadflag. This option will cause Gunicorn to load your app before starting worker processes. As a result, when the worker processes are started, they don't need to load your app, and they can immediately start listening for requests instead (which won't time out).
- If you are using an asynchronous worker in Gunicorn, increase your timeout using Gunicorn's
If neither of the options listed above satisfies you, you can also reduce your worker count using Gunicorn's
--workersflag, or scale up your Container to make more resources available to them.
We don't recommend these options to address boot-up timeouts because they affect your app beyond the boot-up stage, respectively by reducing the number of available workers and increasing your bill.
That said, you should definitely consider making changes to your worker count or Container size if your app is performing poorly or Metrics are reporting you're undersized: just don't do it only for the sake of making the Release Health Check pass.
HTTP(S) Endpoints expect your app to be listening for HTTP traffic. If you need to expose an app that's not expecting HTTP traffic, you shouldn't be using an HTTP(S) Endpoint.
Updated 6 months ago