Request load balancing
Relevant selection for this article:
At its core, Passenger is a process manager and HTTP request router. In order to minimize response times, and to distribute load over multiple CPU cores for optimal performance, Passenger load balances requests over processes in an intelligent manner. This article explains the implications and details of our request load balancing mechanism.
Table of contents
- Loading...
Introduction
Node.js applications normally execute in a single thread/process, using a single CPU core. Passenger enables running multiple instances of the application (multiple processes) using pooled application groups, distributing requests to the process that is currently handling the least amount of requests.
In this way, load can be distributed across multiple cores without using the cluster module. Assuming a properly written asynchronous app, incoming requests will be handled concurrently (i.e. the next request can already start being handled while the previous is still being handled), and the Passenger request queue will remain empty.
Maximum process concurrency
A core concept in the load balancing algorithm is that of the maximum process concurrency. This is the maximum number of concurrent requests that a particular process can handle.
For Node.js applications, the maximum process concurrency is assumed to be unlimited. This is because Node.js uses evented I/O for concurrency which is very lightweight, so it can handle a virtually unlimited number of requests concurrently.
Having said that, as a Node.js process handles more concurrent requests, it is normal for performance to degrade. The practical concurrency is not really unlimited, and the limit varies for every application and every workload.
For this reason, load balancing requests between multiple processes is beneficial.
App Generations
Another core concept in the load balancing algorithm is that of the app generations. Every time Passenger is asked to restart an app group, a counter is incremented. Processes created after the restart are labeled with a generation which is taken from this counter. Passenger usually performs a blocking restart, which means that there is only one generation of processes alive at a time, however if you initiate a rolling restart, multiple generations can be alive at the same time, and Passenger will take this into account when load balancing requests.
Intelligent routing
Algorithm summary
Passenger keeps a list of application processes. For each application process, Passenger keeps track of: the app generation that resulted in it being started, when it was started, and how many requests it is currently handling. When a new request comes in, Passenger routes the request to the process that meets the following criteria:
- is not completely busy,
- is part of the newest available app generation1 meeting the previous condition,
- is the oldest process meeting the previous conditions,
- is the least busy process meeting the previous conditions.
[1] Unless blocked by Deployment Error Resistance.
Process with the highest app generation in the list has highest priority
If there are multiple processes that are not completely busy, then Passenger will pick one from the list with the highest app generation:
For example, suppose that there are 3 application processes:
Process A: gen 1, spawned 1s ago, handling 0 requests
Process B: gen 1, spawned 1s ago, handling 0 requests
Process C: gen 2, spawned 1s ago, handling 0 requests
Process C will be chosen for the next incoming request.
This speeds the rate at which requests are handled by the newer generation of app processes.
Oldest available process in a generation has highest priority
If there are multiple processes from the newest generation, that not completely busy; then Passenger will pick the one that was started first, from the list.
For example, suppose that there are 3 application processes:
Process A: gen 2, spawned 30s ago, handling 0 requests
Process B: gen 3, spawned 20s ago, handling 0 requests
Process C: gen 3, spawned 10s ago, handling 0 requests
Process B will be chosen for the next incoming request.
This property is used by the dynamic process scaling algorithm. Dynamic process scaling works by shutting down processes that haven't received requests for a while (processes that are "idle"). By routing to the oldest process that is not completely busy (instead of, say, a random one, using round-robin), Passenger gives other processes the chance to become idle and thus eligible for shutdown.
Another advantage of picking the oldest process is that it improves caching. Since the oldest process is the most likely candidate for load balancing, it will have the most chance to keep its cache warm, and therefore process requests more quickly.
Least busy process as tie breaker
If there are multiple processes that have met all the previous conditions, then Passenger will pick the least busy one. For example, suppose that there are 3 application processes:
Process A: gen 3, spawned 10s ago, handling 1 requests
Process B: gen 3, spawned 10s ago, handling 0 requests
Process C: gen 3, spawned 10s ago, handling 2 requests
Process B will be chosen for the next incoming request.
Traffic may appear unbalanced between processes
Because Passenger prefers to load balance to the oldest request in the generation, traffic may appear unbalanced between processes. Here is an example from passenger-status
:
/var/www/phusion_blog/current/public:
App root: /var/www/phusion_blog/current
Requests in queue: 0
* PID: 18334 Sessions: 0 Processed: 4595 Uptime: 5h 53m 29s
CPU: 0% Memory : 99M Last used: 4s ago
* PID: 18339 Sessions: 0 Processed: 2873 Uptime: 5h 53m 26s
CPU: 0% Memory : 96M Last used: 29s ago
* PID: 18343 Sessions: 0 Processed: 33 Uptime: 5m 8s
CPU: 0% Memory : 96M Last used: 1m 4s ago
* PID: 18347 Sessions: 0 Processed: 16 Uptime: 2m 16s
CPU: 0% Memory : 96M Last used: 2m 9s ago
As you can see, there are 4 processes, but the "Processed" field doesn't look balanced at all. This is completely normal, it is supposed to look like this. Passenger does not use round-robin load balancing. The reasons are explained in the above section. Despite not looking balanced, Passenger is in fact balancing requests between processes just fine.
Example with maximum concurrency 1
Suppose that you have 3 application processes, spawned in order, and each process's maximum concurrency is 1. When the application is idle, none of the processes are handling any requests:
Process A [ ]
Process B [ ]
Process C [ ]
When a new request comes in (let's call this α), Passenger will decide to route the request to process A. A will then reach its maximum concurrency:
Process A [α]
Process B [ ]
Process C [ ]
Suppose that, while α is still in progress, a new request comes in (which we call β). That request will be load balanced to process B because it is the oldest not-completely-busy process:
Process A [α]
Process B [β]
Process C [ ]
Suppose α finishes. The situation will look like this:
Process A [ ]
Process B [β]
Process C [ ]
If another request comes in (which we call ɣ), that request will be routed to A, not C:
Process A [ɣ]
Process B [β]
Process C [ ]
This keeps process A's caches warm, and may allow process C to idle and be shutdown sooner.
Example with maximum concurrency 4
Suppose that you have 2 application processes, and the processes' maximum concurrency is configured to 4. When the application is idle, none of the processes are handling any requests:
Process A [ ]
Process B [ ]
When a new request comes in (which we call α, Passenger will decide to route the request to process A.
Process A [α ]
Process B [ ]
Suppose that, while α is still in progress, 1 more request comes in (which we call β). That request will be load balanced to process A because it is the oldest not-completely-busy one:
Process A [αβ ]
Process B [ ]
Suppose that another request comes in (which we call ɣ). That will be load balanced to process A again, not to B:
Process A [αβɣ ]
Process B [ ]
Customized balancing through additional request queues
By default, all requests for an application are balanced in a fair way. You may want to adjust this behavior; for example if your app consists of pages for visitors as well as some kind of web service, you probably want to make sure the visitor requests don't get stuck behind web service requests when under heavy load.
This can be solved by configuring two application groups for your app: one for the web service URL and one for the visitor pages, each with their own request queue. The requests for the visitor pages will keep being handled even if there is an overload of requests for the web service. See the app group name option (Apache and Nginx only, not available on Standalone).
The two application groups will still compete with each other for server resources. You can control how many instances are allowed per application group, for example allowing more instances to serve the visitor pages than the web service URL. See the max instances option.
Request queue overflow
If a request arrives for an application group, and all its processes are busy, and the application pool is full, the request remains in the Passenger request queue. If this keeps happening (e.g. due to a flood of requests) the queue will eventually overflow its limit see: max request queue size. Overflowing requests will no longer be balanced, but simply dropped (with a HTTP 503 error).