Setting up a Node.js Cluster

We all know Node.js is great at handling lots of events asynchronously, but what a lot of people don't know is that all of this is done on a single thread. Node.js actually is not multi-threaded, so all of these requests are just being handled in the event loop of a single thread.

So why not get the most out of your quad-core processor by using a Node.js cluster? This will start up multiple instances of your code to handle even more requests. This may sound a bit difficult, but it is actually pretty easy to do with the cluster module, which was introduced in Node.js v0.8.

Obviously this is helpful to any app that can divide work between different processes, but it's especially important to apps that handle many IO requests, like a website.

Unfortunately, due to the complexities of parallel processing, clustering an application on a server isn't always straight-forward. What do you do when you need multiple processes to listen on the same port? Recall that only one process can access a port at any given time. The naive solution here is to configure each process to listen on a different port and then set up Nginx to load balance requests between the ports.

This is a viable solution, but it requires a lot more work setting up and configuring each process, and not to mention configuring Nginx. With this solution you're just adding more things for yourself to manage.

Instead, you can fork the master process in to multiple child processes (typically having one child per processor). In this case, the children are allowed to share a port with the parent (thanks to inter-process communication, or IPC), so there is no need to worry about managing multiple ports.

This is exactly what the cluster module does for you.

Working with The Cluster Module

Clustering an app is extremely simple, especially for web server code like Express projects. All you really need to do is this:

var cluster = require('cluster');  
var express = require('express');  
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {  
    for (var i = 0; i < numCPUs; i++) {
        // Create a worker
        cluster.fork();
    }
} else {
    // Workers share the TCP connection in this server
    var app = express();

    app.get('/', function (req, res) {
        res.send('Hello World!');
    });

    // All workers use this port
    app.listen(8080);
}

The functionality of the code is split up in to two parts, the master code and the worker code. This is done in the if-statement (if (cluster.isMaster) {...}). The master's only purpose here is to create all of the workers (the number of workers created is based on the number of CPUs available), and the workers are responsible for running separate instances of the Express server.

When a worker is forked off of the main process, it re-runs the code from the beginning of the module. When the worker gets to the if-statement, it returns false for cluster.isMaster, so instead it'll create the Express app, a route, and then listens on port 8080. In the case of a quad-core processor, we'd have four workers spawned, all listening on the same port for requests to come in.

But how are requests divided up between the workers? Obviously they can't (and shouldn't) all be listening and responding to every single request that we get. To handle this, there is actually an embedded load-balancer within the cluster module that handles distributing requests between the different workers. On Linux and OSX (but not Windows) the round-robin (cluster.SCHED_RR) policy is in effect by default. The only other scheduling option available is to leave it up to the operating system (cluster.SCHED_NONE), which is default on Windows.

The scheduling policy can be set either in cluster.schedulingPolicy or by setting it on the environment variable NODE_CLUSTER_SCHED_POLICY (with values of either 'rr' or 'none').

You might also be wondering how different processes can be sharing a single port. The difficult part about running so many processes that handle network requests is that traditionally only one can have a port open at once. The big benefit of cluster is that it handles the port-sharing for you, so any ports you have open, like for a web-server, will be accessible for all children. This is done via IPC, which means the master just sends the port handle to each worker.

Thanks to features like this, clustering is super easy.

cluster.fork() vs child_process.fork()

If you have prior experience with child_process's fork() method then you may be thinking that cluster.fork() is somewhat similar (and they are, in many ways), so we'll explain some key differences about these two forking methods in this section.

There are a few main differences between cluster.fork() and child_process.fork(). The child_process.fork() method is a bit lower-level and requires you to pass the location (file path) of the module as an argument, plus other optional arguments like the current working directory, the user that owns the process, environment variables, and more.

Another difference is that cluster starts the worker execution from the beginning of the same module from which it ran. So if your app's entry point is index.js, but the worker is spawned in cluster-my-app.js, then it'll still start its execution from the beginning at index.js. child_process is different in that it spawns execution in whatever file is passed to it, and not necessarily the entry point of the given app.

You might have already guessed that the cluster module actually uses the child_process module underneath for creating the children, which is done with child_process's own fork() method, allowing them to communicate via IPC, which is how port handles are shared among workers.

To be clear, forking in Node is very different than a POISIX fork in that it doesn't actually clone the current process, but it does start up a new V8 instance.

Although this is one of the easiest ways to multi-thread, it should be used with caution. Just because you're able to spawn 1,000 workers doesn't mean you should. Each worker takes up system resources, so only spawn those that are really needed. The Node docs state that since each child process is a new V8 instance, you need to expect a 30ms startup time for each and at least 10mb of memory per instance.

Error Handling

So what do you do when one (or more!) of your workers die? The whole point of clustering is basically lost if you can't restart workers after they crash. Lucky for you the cluster module extends EventEmitter and provides an 'exit' event, which tells you when one of your worker children dies.

You can use this to log the event and restart the process:

cluster.on('exit', function(worker, code, signal) {  
    console.log('Worker %d died with code/signal %s. Restarting worker...', worker.process.pid, signal || code);
    cluster.fork();
});

Now, after just 4 lines of code, it's like you have your own internal process manager!

Performance Comparisons

Okay, now to the interesting part. Let's see how much clustering actually helps us.

For this experiment, I set up a web-app similar to the example code I showed above. But the biggest difference is that we're simulating work being done within the Express route by using the sleep module and by returning a bunch of random data to the user.

Here is that same web-app, but with clustering:

var cluster = require('cluster');  
var crypto = require('crypto');  
var express = require('express');  
var sleep = require('sleep');  
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {  
    for (var i = 0; i < numCPUs; i++) {
        // Create a worker
        cluster.fork();
    }
} else {
    // Workers share the TCP connection in this server
    var app = express();

    app.get('/', function (req, res) {
        // Simulate route processing delay
        var randSleep = Math.round(10000 + (Math.random() * 10000));
        sleep.usleep(randSleep);

        var numChars = Math.round(5000 + (Math.random() * 5000));
        var randChars = crypto.randomBytes(numChars).toString('hex');
        res.send(randChars);
    });

    // All workers use this port
    app.listen(8080);
}

And here is the 'control' code from which we'll make our comparisons. It is essentially the same exact thing, just without cluster.fork():

var crypto = require('crypto');  
var express = require('express');  
var sleep = require('sleep');

var app = express();

app.get('/', function (req, res) {  
    // Simulate route processing delay
    var randSleep = Math.round(10000 + (Math.random() * 10000));
    sleep.usleep(randSleep);

    var numChars = Math.round(5000 + (Math.random() * 5000));
    var randChars = crypto.randomBytes(numChars).toString('hex');
    res.send(randChars);
});

app.listen(8080);  

To simulate a heavy user load, we'll be using a command line tool called Siege, which we can use to make a bunch of simultaneous requests to the URL of our choice.

Siege is also nice in that it tracks performance metrics, like availability, throughput, and the rate of requests handled.

Here is the Siege command we'll be using for the tests:

$ siege -c100 -t60s http://localhost:8080/

After running this command for both versions of the app, here are some of the more interesting results:

Type Total Requests Handled Requests/second Average response time Throughput
No clustering 3467 58.69 1.18 secs 0.84 MB/sec
Clustering (4 processes) 11146 188.72 0.03 secs 2.70 MB/sec

As you can see, the clustered app has around a 3.2x improvement over the single-process app for just about all of the metrics listed, except for average response time, which has a much more significant improvement.