Request load balancing
Relevant selection for this article:
At its core, Passenger is a process manager and HTTP request router. In order to minimize response times, and to distribute load over multiple CPU cores for optimal performance, Passenger load balances requests over processes in an intelligent manner. This article explains the implications and details of our request load balancing mechanism.
Table of contents
- Loading...
Introduction
Ruby applications can normally handle only 1 request at the same time. With Passenger this is improved by running multiple processes of the application using pooled application groups. For each request from the incoming request queue, a non-completely-busy (free) application process is selected to handle the request.
For thread-safe Ruby apps it is also possible to enable multithreading, which allows the application processes to concurrently handle multiple requests at the same time – up to the amount of threads configured. In this case, Passenger forwards the request to the instance that meets the criteria described below.
If all application processes and threads are completely busy, Passenger spawns a new instance, up to the process limit for the group. The amount of processes of the groups combined may also not exceed the limit for the application pool. If either limit is reached, the request remains in the queue (which has its own limit).
Maximum process concurrency
A core concept in the load balancing algorithm is that of the maximum process concurrency. This is the maximum number of concurrent requests that a particular process can handle.
For Ruby applications, the maximum process concurrency is assumed to be 1. This means that Passenger assumes each process can handle 1 request at a time.
This can be changed by setting concurrency model to thread
, and by setting thread count. If you do that, the assumed maximum process concurrency will equal the number of configured threads. This reflects the fact that each thread can handle 1 request at a time.
For this reason, load balancing requests between multiple processes is beneficial.
App Generations
Another core concept in the load balancing algorithm is that of the app generations. Every time Passenger is asked to restart an app group, a counter is incremented. Processes created after the restart are labeled with a generation which is taken from this counter. Passenger usually performs a blocking restart, which means that there is only one generation of processes alive at a time, however if you initiate a rolling restart, multiple generations can be alive at the same time, and Passenger will take this into account when load balancing requests.
Intelligent routing
Algorithm summary
Passenger keeps a list of application processes. For each application process, Passenger keeps track of: the app generation that resulted in it being started, when it was started, and how many requests it is currently handling. When a new request comes in, Passenger routes the request to the process that meets the following criteria:
- is not completely busy,
- is part of the newest available app generation1 meeting the previous condition,
- is the oldest process meeting the previous conditions,
- is the least busy process meeting the previous conditions.
[1] Unless blocked by Deployment Error Resistance.
Process with the highest app generation in the list has highest priority
If there are multiple processes that are not completely busy, then Passenger will pick one from the list with the highest app generation:
For example, suppose that there are 3 application processes:
Process A: gen 1, spawned 1s ago, handling 0 requests
Process B: gen 1, spawned 1s ago, handling 0 requests
Process C: gen 2, spawned 1s ago, handling 0 requests
Process C will be chosen for the next incoming request.
This speeds the rate at which requests are handled by the newer generation of app processes.
Oldest available process in a generation has highest priority
If there are multiple processes from the newest generation, that not completely busy; then Passenger will pick the one that was started first, from the list.
For example, suppose that there are 3 application processes:
Process A: gen 2, spawned 30s ago, handling 0 requests
Process B: gen 3, spawned 20s ago, handling 0 requests
Process C: gen 3, spawned 10s ago, handling 0 requests
Process B will be chosen for the next incoming request.
This property is used by the dynamic process scaling algorithm. Dynamic process scaling works by shutting down processes that haven't received requests for a while (processes that are "idle"). By routing to the oldest process that is not completely busy (instead of, say, a random one, using round-robin), Passenger gives other processes the chance to become idle and thus eligible for shutdown.
Another advantage of picking the oldest process is that it improves caching. Since the oldest process is the most likely candidate for load balancing, it will have the most chance to keep its cache warm, and therefore process requests more quickly.
Least busy process as tie breaker
If there are multiple processes that have met all the previous conditions, then Passenger will pick the least busy one. For example, suppose that there are 3 application processes:
Process A: gen 3, spawned 10s ago, handling 1 requests
Process B: gen 3, spawned 10s ago, handling 0 requests
Process C: gen 3, spawned 10s ago, handling 2 requests
Process B will be chosen for the next incoming request.
Traffic may appear unbalanced between processes
Because Passenger prefers to load balance to the oldest request in the generation, traffic may appear unbalanced between processes. Here is an example from passenger-status
:
/var/www/phusion_blog/current/public:
App root: /var/www/phusion_blog/current
Requests in queue: 0
* PID: 18334 Sessions: 0 Processed: 4595 Uptime: 5h 53m 29s
CPU: 0% Memory : 99M Last used: 4s ago
* PID: 18339 Sessions: 0 Processed: 2873 Uptime: 5h 53m 26s
CPU: 0% Memory : 96M Last used: 29s ago
* PID: 18343 Sessions: 0 Processed: 33 Uptime: 5m 8s
CPU: 0% Memory : 96M Last used: 1m 4s ago
* PID: 18347 Sessions: 0 Processed: 16 Uptime: 2m 16s
CPU: 0% Memory : 96M Last used: 2m 9s ago
As you can see, there are 4 processes, but the "Processed" field doesn't look balanced at all. This is completely normal, it is supposed to look like this. Passenger does not use round-robin load balancing. The reasons are explained in the above section. Despite not looking balanced, Passenger is in fact balancing requests between processes just fine.
No head-of-line blocking problem
Many web servers and load balancers use independent queues, which causes the problem of head-of-line blocking, whereby HTTP requests to your app are queued behind slow or long-running requests. Passenger avoids this problem by using a single (per application group) shared queue.
Imagine a supermarket with a number of (independent) checkout lanes and a queue in each of them. If someone in one of the queues has trouble with the checkout, everyone already in that queue will be delayed, which is especially unfair if the other queues keep moving fast.
Instead, the Passenger implementation can be compared to using a single (shared) queue and sending only 1 person per checkout lane at a time, thereby preventing head-of-line blocking.
Example with maximum concurrency 1
Suppose that you have 3 application processes, spawned in order, and each process's maximum concurrency is 1. When the application is idle, none of the processes are handling any requests:
Process A [ ]
Process B [ ]
Process C [ ]
When a new request comes in (let's call this α), Passenger will decide to route the request to process A. A will then reach its maximum concurrency:
Process A [α]
Process B [ ]
Process C [ ]
Suppose that, while α is still in progress, a new request comes in (which we call β). That request will be load balanced to process B because it is the oldest not-completely-busy process:
Process A [α]
Process B [β]
Process C [ ]
Suppose α finishes. The situation will look like this:
Process A [ ]
Process B [β]
Process C [ ]
If another request comes in (which we call ɣ), that request will be routed to A, not C:
Process A [ɣ]
Process B [β]
Process C [ ]
This keeps process A's caches warm, and may allow process C to idle and be shutdown sooner.
Example with maximum concurrency 4
Suppose that you have 2 application processes, and you configured the number of threads to 4, causing each process's maximum concurrency to be 4. When the application is idle, none of the processes are handling any requests:
Process A [ ]
Process B [ ]
When a new request comes in (which we call α, Passenger will decide to route the request to process A.
Process A [α ]
Process B [ ]
Suppose that, while α is still in progress, 1 more request comes in (which we call β). That request will be load balanced to process A because it is the oldest not-completely-busy one:
Process A [αβ ]
Process B [ ]
Suppose that another request comes in (which we call ɣ). That will be load balanced to process A again, not to B:
Process A [αβɣ ]
Process B [ ]
Customized balancing through additional request queues
By default, all requests for an application are balanced in a fair way. You may want to adjust this behavior; for example if your app consists of pages for visitors as well as some kind of web service, you probably want to make sure the visitor requests don't get stuck behind web service requests when under heavy load.
This can be solved by configuring two application groups for your app: one for the web service URL and one for the visitor pages, each with their own request queue. The requests for the visitor pages will keep being handled even if there is an overload of requests for the web service. See the app group name option (Apache and Nginx only, not available on Standalone).
The two application groups will still compete with each other for server resources. You can control how many instances are allowed per application group, for example allowing more instances to serve the visitor pages than the web service URL. See the max instances option.
Request queue overflow
If a request arrives for an application group, and all its processes are busy, and the application pool is full, the request remains in the Passenger request queue. If this keeps happening (e.g. due to a flood of requests) the queue will eventually overflow its limit see: max request queue size. Overflowing requests will no longer be balanced, but simply dropped (with a HTTP 503 error).