Advanced Guides

It looks like your Javascript isn't working. This website requires you to have Javascript enabled. However we have an old documentation where you might be able to find your answer.

Request load balancing

Relevant selection for this article:

At its core, Passenger is a process manager and HTTP request router. In order to minimize response times, and to distribute load over multiple CPU cores for optimal performance, Passenger load balances requests over processes in an intelligent manner. This article explains the implications and details of our request load balancing mechanism.

Table of contents

Introduction

Node.js applications normally execute in a single thread/process, using a single CPU core. Passenger enables running multiple instances of the application (multiple processes) using pooled application groups, distributing requests to the process that is currently handling the least amount of requests.

In this way, load can be distributed across multiple cores without using the cluster module. Assuming a properly written asynchronous app, incoming requests will be handled concurrently (i.e. the next request can already start being handled while the previous is still being handled), and the Passenger request queue will remain empty.

Maximum process concurrency

A core concept in the load balancing algorithm is that of the maximum process concurrency. This is the maximum number of concurrent requests that a particular process can handle.

For Node.js applications, the maximum process concurrency is assumed to be unlimited. This is because Node.js uses evented I/O for concurrency which is very lightweight, so it can handle a virtually unlimited number of requests concurrently.

Having said that, as a Node.js process handles more concurrent requests, it is normal for performance to degrade. The practical concurrency is not really unlimited, and the limit varies for every application and every workload.

For this reason, load balancing requests between multiple processes is beneficial.

App Generations

Another core concept in the load balancing algorithm is that of the app generations. Every time Passenger is asked to restart an app group, a counter is incremented. Processes created after the restart are labeled with a generation which is taken from this counter. Passenger usually performs a blocking restart, which means that there is only one generation of processes alive at a time, however if you initiate a rolling restart, multiple generations can be alive at the same time, and Passenger will take this into account when load balancing requests.

Intelligent routing

Algorithm summary

Passenger keeps a list of application processes. For each application process, Passenger keeps track of: the app generation that resulted in it being started, when it was started, and how many requests it is currently handling. When a new request comes in, Passenger routes the request to the process that meets the following criteria:

is not completely busy,
is part of the newest available app generation¹ meeting the previous condition,
is the oldest process meeting the previous conditions,
is the least busy process meeting the previous conditions.

[1] Unless blocked by Deployment Error Resistance.

Process with the highest app generation in the list has highest priority

If there are multiple processes that are not completely busy, then Passenger will pick one from the list with the highest app generation:

For example, suppose that there are 3 application processes:

Process A: gen 1, spawned 1s ago, handling 0 requests
Process B: gen 1, spawned 1s ago, handling 0 requests
Process C: gen 2, spawned 1s ago, handling 0 requests

Process C will be chosen for the next incoming request.

This speeds the rate at which requests are handled by the newer generation of app processes.

Oldest available process in a generation has highest priority

If there are multiple processes from the newest generation, that not completely busy; then Passenger will pick the one that was started first, from the list.

For example, suppose that there are 3 application processes:

Process A: gen 2, spawned 30s ago, handling 0 requests
Process B: gen 3, spawned 20s ago, handling 0 requests
Process C: gen 3, spawned 10s ago, handling 0 requests

Process B will be chosen for the next incoming request.

This property is used by the dynamic process scaling algorithm. Dynamic process scaling works by shutting down processes that haven't received requests for a while (processes that are "idle"). By routing to the oldest process that is not completely busy (instead of, say, a random one, using round-robin), Passenger gives other processes the chance to become idle and thus eligible for shutdown.

Another advantage of picking the oldest process is that it improves caching. Since the oldest process is the most likely candidate for load balancing, it will have the most chance to keep its cache warm, and therefore process requests more quickly.

Least busy process as tie breaker

If there are multiple processes that have met all the previous conditions, then Passenger will pick the least busy one. For example, suppose that there are 3 application processes:

Process A: gen 3, spawned 10s ago, handling 1 requests
Process B: gen 3, spawned 10s ago, handling 0 requests
Process C: gen 3, spawned 10s ago, handling 2 requests

Process B will be chosen for the next incoming request.

Traffic may appear unbalanced between processes

Because Passenger prefers to load balance to the oldest request in the generation, traffic may appear unbalanced between processes. Here is an example from passenger-status:

/var/www/phusion_blog/current/public:
  App root: /var/www/phusion_blog/current
  Requests in queue: 0
  * PID: 18334   Sessions: 0       Processed: 4595    Uptime: 5h 53m 29s
    CPU: 0%      Memory  : 99M     Last used: 4s ago
  * PID: 18339   Sessions: 0       Processed: 2873    Uptime: 5h 53m 26s
    CPU: 0%      Memory  : 96M     Last used: 29s ago
  * PID: 18343   Sessions: 0       Processed: 33      Uptime: 5m 8s
    CPU: 0%      Memory  : 96M     Last used: 1m 4s ago
  * PID: 18347   Sessions: 0       Processed: 16      Uptime: 2m 16s
    CPU: 0%      Memory  : 96M     Last used: 2m 9s ago

As you can see, there are 4 processes, but the "Processed" field doesn't look balanced at all. This is completely normal, it is supposed to look like this. Passenger does not use round-robin load balancing. The reasons are explained in the above section. Despite not looking balanced, Passenger is in fact balancing requests between processes just fine.

Example with maximum concurrency 1

Suppose that you have 3 application processes, spawned in order, and each process's maximum concurrency is 1. When the application is idle, none of the processes are handling any requests:

Process A [ ]
Process B [ ]
Process C [ ]

When a new request comes in (let's call this α), Passenger will decide to route the request to process A. A will then reach its maximum concurrency:

Process A [α]
Process B [ ]
Process C [ ]

Suppose that, while α is still in progress, a new request comes in (which we call β). That request will be load balanced to process B because it is the oldest not-completely-busy process:

Process A [α]
Process B [β]
Process C [ ]

Suppose α finishes. The situation will look like this:

Process A [ ]
Process B [β]
Process C [ ]

If another request comes in (which we call ɣ), that request will be routed to A, not C:

Process A [ɣ]
Process B [β]
Process C [ ]

This keeps process A's caches warm, and may allow process C to idle and be shutdown sooner.

Example with maximum concurrency 4

Suppose that you have 2 application processes, and the processes' maximum concurrency is configured to 4. When the application is idle, none of the processes are handling any requests:

Process A [    ]
Process B [    ]

When a new request comes in (which we call α, Passenger will decide to route the request to process A.

Process A [α   ]
Process B [    ]

Suppose that, while α is still in progress, 1 more request comes in (which we call β). That request will be load balanced to process A because it is the oldest not-completely-busy one:

Process A [αβ  ]
Process B [    ]

Suppose that another request comes in (which we call ɣ). That will be load balanced to process A again, not to B:

Process A [αβɣ ]
Process B [    ]

Customized balancing through additional request queues

By default, all requests for an application are balanced in a fair way. You may want to adjust this behavior; for example if your app consists of pages for visitors as well as some kind of web service, you probably want to make sure the visitor requests don't get stuck behind web service requests when under heavy load.

This can be solved by configuring two application groups for your app: one for the web service URL and one for the visitor pages, each with their own request queue. The requests for the visitor pages will keep being handled even if there is an overload of requests for the web service. See the app group name option (Apache and Nginx only, not available on Standalone).

The two application groups will still compete with each other for server resources. You can control how many instances are allowed per application group, for example allowing more instances to serve the visitor pages than the web service URL. See the max instances option.

Request queue overflow

If a request arrives for an application group, and all its processes are busy, and the application pool is full, the request remains in the Passenger request queue. If this keeps happening (e.g. due to a flood of requests) the queue will eventually overflow its limit see: max request queue size. Overflowing requests will no longer be balanced, but simply dropped (with a HTTP 503 error).