Zero-downtime application updates and restarts
for Meteor apps on Passenger + Nginx
Passenger has excellent support for zero-downtime application updates and restarts in its Enterprise edition.
Normally, when you instruct Passenger to restart an application after you have deployed an application update, Passenger performs a blocking restart. Passenger would shut down all processes for that application and spawn a new one. The spawning of a new application process could take a while, and any requests that come in during this time will be blocked until this first application process has spawned, which users may experience as "downtime".
If your organization follows an agile development methology then you will want to deploy application updates very often. For example, Github was reported to deploy updates six times per day at one point. Deploying updates quickly means that you receive feedback about them more quickly, which aids with the development process. The more often you deploy, the less desirable blocking restarts are.
To solve this problem, Passenger Enterprise provides zero-downtime restarts, also known as "rolling restarts". This mechanism prevents visitors from experiencing any delays when you are restarting your application.
Table of contents
- Loading...
How it works
A zero-downtime restart works as follows:
- Spawn new process Y - Passenger Enterprise spawns a new process Y in the background and waits for it to be ready
- Gracefully shutdown old process X - Passenger Enterprise stops sending new requests to X and waits until X has handled all of its current requests - The shutdown signal is sent to X and Passenger Enterprise waits for X to exit
- Replace & repeat - X is now gone, Y is added to the pool as its replacement - Repeat steps 1-3 until all processes have been replaced
This is a very resource-friendly way of performing a zero-downtime restart, because the memory and spawn load profile remain pretty much the same as during normal server operation. It would be faster to respawn all processes in parallel, but that might result in the deployment suddenly requiring twice as much memory, and the disk & cpu being hit with parallel spawns. This is why we have chosen not to use the parallel approach with Passenger Enterprise, especially since restarts are a vulnerable moment anyway.
Caveats
Zero-downtime restarts have a few caveats that you should be aware of:
- Upgrading an application sometimes involves upgrading the database schema. With zero-downtime restarts, there may be a point in time during which processes belonging to the previous version and processes belonging to the new version both exist at the same time. Any database schema upgrades you perform must therefore be backwards-compatible with the old application version.
- Because there's no telling which process will serve a request, users may not see changes brought about by the new version until all processes have been restarted. It is for this reason that you should not use rolling restarts in development, only in production.
- If your application takes a long time to shutdown, the entire restart procedure will also take much longer because Passenger needs to wait for each application process in your pool (one-by-one) to stop. Optimizing your application shutdown procedure will make restarts much faster too.
Enabling zero-downtime restarts
Passenger provides several ways to restart your application. Each way has its own mechanism for enabling zero-downtime restarts.
- passenger-config restart-app
-
If you use
passenger-config restart-app
to restart your application, then pass the--rolling-restart
command line argument. - restart.txt
-
If you use
restart.txt
to restart your application, then enable the passenger_rolling_restarts on configuration option. - Restarting Nginx
-
Normally, Passenger restarts all your applications when you restart Nginx. If you want to prevent this from happening, please use the Flying Passenger feature.
Automation scripts
If you followed the Automating deployments of application updates guide, and you want to enable zero-downtime restarts there, then here's how you can do that.
Open deploy/work.sh
and look for this line:
RESTART_ARGS=
Change it to:
RESTART_ARGS=--rolling-restart
Tracking rolling restart progress
You can track the progress of rolling restarts using passenger-status. That tool will tell you which processes are already rolling restarted, which processes haven't been rolling restarted, and which process is currently being rolling restarted.
Here is an example:
$ sudo passenger-status Version : 5.0.13 Date : 2015-07-06 12:18:38 +0200 Instance: bdVuBLEf (nginx/1.8.0 Phusion_Passenger/5.0.13) ----------- General information ----------- Max pool size : 9 App groups : 1 Processes : 2 Requests in top-level queue : 0 ----------- Application groups ----------- /var/www/phusion_blog/current/public: App root: /var/www/phusion_blog/current Requests in queue: 0 * PID: 29843 Sessions: 0 Processed: 1 Uptime: 0h 0m 14s CPU: 0% Memory : 60M Last used: 4s ago * PID: 18334 Sessions: 0 Processed: 2873 Uptime: 5h 53m 26s CPU: 0% Memory : 96M Last used: 29s ago Rolling restarting... * PID: 18339 Sessions: 0 Processed: 2873 Uptime: 5h 53m 26s CPU: 0% Memory : 96M Last used: 29s ago
The "Rolling restarting…" indicator tells you which process is currently being rolling restarted.
Processes that have already been rolling restarted have a low uptime. In this example, you can see that process 29843 has an uptime of 14 seconds, which is evidence that it has recently been rolling restarted.
Processes that have not yet been rolling restarted have a high uptime. Process 18339 in this example has not yet been rolling restarted.
Deployment error resistance
A related feature is deployment error resistance. This feature reduces the chance of visitors experiencing any errors in case the deployment of an application update failed. If you use zero-downtime restarts, then we.
Error handling and troubleshooting
If Passenger Enterprise could not rolling restart a process (let's call it 'A') because it is unable to spawn a new process (let's call it 'B'), then Passenger Enterprise will give up trying to rolling restart that particular process 'A'. What happens next depends on whether deployment error resistance is enabled:
- If deployment error resistance is disabled (the default), then Passenger Enterprise will proceed with trying to restart the remaining processes.
- If deployment error resistance is enabled, the Passenger Enterprise will give up rolling restarting immediately. The application group will be put into Deployment Error Resistance Mode.
In any case, all errors encountered during a rolling restart process are logged. If you suspect that something is wrong, please read the log file.