A Random Process

最近在看«Interactive Computer Graphics: A Top-Down Approach with WebGL»,里面看到一个很有意思的几何问题,大致是这样的:

  1. 在平面上给定一个正三角形,三角形的三个顶点分别为V1, V2, V3
  2. 在三角形内任意取一个点P
  3. 随机从V1, V2, V3中选择一个点,Vx,取Vx和P的中点,记为点Q,然后在平面上标记点Q
  4. 将Q视为新的点P,然后回到步骤3

请问重复上述步骤N次之后, P所构成的点集在平面上呈什么分布。

Random Process

刚看到这个过程描述觉得这过程好随机,很难想象会有什么规律。看到这里要是你不知道结果的话也可以猜一猜。

The Final Destination

确实很好奇,就实现了一下,被结果震惊了。后来去wiki上查了一下,这个图案叫做Sierpinski Triangle,整个过程非常有意思,无论起点是否在最后的收敛的图案上,整个循环过程都会迅速收敛,形成一个很好看的分形结构。具有这种性质的过程叫做Chaos Game。另外这个性质在任意三角形上都成立,不一定非要正三角形。

在jsfiddler上放了一个实现,拖动slider可以控制程序循环几个周期。

Recently, I’m doing some experiments to try to rewrite our Nodejs service in a more maintainable style. I’d like to write several posts to share my findings.

Pain Points

We use express.js as the framework to build our Nodejs service. The service was started at about 2011, when Nodejs was still in its infant state. At that time there was no common paradigm to construct a fairly complex web service in javascript. When I joined the team, I found the performance of our service was not as good as I expected. I then started to investigate how to improve it. During my work I found several big pain points when trying to make breaking changes to the code base. The top 4 of them are:

  • The magic req, res and context object. They are everywhere and you have to pass them everywhere, but you don’t know exactly how they are used without reading every line of the project.
  • Error handling is like crossing a big maze. You have to pass the error object from it’s origin through probably 5-10 stacks to reach the handling logic.
  • Heavily rely on callbacks, callback of callbacks, or even deeper callbacks to control the program’s execution flow, which not only creates a callback hell but also creates extreme tight coupling of function calls.
  • Difficult to test. Because we rely on closures, i.e. anonymous functions, to do async callbacks, we have a lot of anonymous functions that are not testable.

In Part 1, I’d like to talk about how to solve first pain point by using Nodejs’ Domain module.

Hello Domain, goodbye req

Because of Nodejs’ async nature, you cannot store your current request’s context in some centralized place and then using accessor methods to access the current working context, like thread-local contexts commonly used in Java web frameworks. Instead, you have to specifically pass these context objects into all the methods you called, so that these context can be captured in closures and being accessible when callback function is called.

If you keep writing code carefully, or you only have 1-2 contributors, it’s probably not a big problem. However when the code is maintained by several guys, the code starts rotting. req will soon become a magic global garbage bin. You can “easily” put something into it in a function then accessing it from another to create a magic tight couple. You can “easily” access and modify the req’s parameters at anywhere as you wish. You can “easily” use res.write to send partial response in the middle of anywhere.

In the end you will have a code base that no one really knows exactly how the context is used and what is in it.

Kent Beck’s four simple design rules for clean code:

  1. Runs all the tests
  2. Contains no duplications
  3. Expresses the intent of the programmers
  4. Minimizes the number of classes and methods

Passing the req around violates rule #2 and #3 and makes your code hard to understand.

So how to solve the problem? By all means, we don’t have thread local variables, which on the other hand is a good thing because we now have concurrency on a single thread.

There are 2 solutions:

  1. Use Domain
  2. Use continuation-local-storage module

If you are writing a library that might be included in other module then continuation-local-storage is recommended. If like me, you are implementing a web service, I would recommend you just use Domain. Domain is a native support in the Nodejs core while continuation-local-storage is a user land implementation. Another major difference is the error handling, which is out of the scope of this article.

Domain in action

Domain was introduced in Nodejs v0.8, which mainly provides two things:

  1. A sandbox for exception handling and isolation for an async process chain
  2. An object that is bound to an async process chain

We will cover #1 in a later post, now we focus on #2.

When you run your code inside a domain, your code will have the access to the single domain object even in an async callback! It magically works for all Nodejs’ native async functions, such as net, fs, timers, etc.

Let’s see an example.

var domain = require('domain');

var d = domain.create();

d.run(function() {
d.context = 'hello';

setTimeout(function() {
// process.domain returns the domain object
// bound to the current async chain
console.log(process.domain.context);
}, 1000)
});

The above code will print out “hello” in the console. As you can see, the callback function doesn’t require anything to access the current context even that it is in an async callback.

Integrate with express

To integrate that into express framework, you can create a middleware to create a domain for each request and run the request handler in the domain thus provides you an “async local variable”.

Here is a demo. In the demo, we use the context to keep track a requestId, an Id to uniquely identify a request, a common pattern used in web service implementation. Then the server did a simple async http get request.

var express = require('express');
var domain = require('domain');
var http = require('http');

var app = express();

var rid = 1;

// Centralized context object
function Context() {
this.reqId = rid;
rid += 1;
}
Context.getCurrent = function() {
return process.domain.context;
}

app.use(function(req, res, next) {
// create a domain for every request
var d = domain.create();
d.context = new Context();
// req and res are created before the d domain is created,
// so we have to add them to the domain manually.
// newly created EventEmitter or other async constructs are
// automatically added to the current domain.
d.add(req);
d.add(res);
// run next in the created domain
d.run(function() {
next();
})
});

app.get('/test', function(req, res) {
http.get("http://www.google.com", function(resp) {
var body = '';
resp.on('data', function(data) {
body += data.toString();
});
resp.on('end', function() {
res.json({
message: body.substring(0,100),
reqId: Context.getCurrent().reqId
});
});
});
});

app.listen(3000);

When the server starts. You can run test to see that it works.

jiaji.zhou in ~/Workspace/playground/expressdemo [23:25:09]
$ curl http://localhost:3000/test
{
"message": "<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org/WebPage\" lang=\"en\"><head><meta content",
"reqId": 1
}
jiaji.zhou in ~/Workspace/playground/expressdemo [23:25:26]
$ curl http://localhost:3000/test
{
"message": "<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org/WebPage\" lang=\"en\"><head><meta content",
"reqId": 2
}

Conclusion

With Domain, you can centralize your request states in an easily accessible context, which not only makes your code cleaner but also is extremely useful for writing functionalities such as transaction management.

The Problem

Last week we try to reproduce an incident in our production system which is caused by an eleviation of HTTPS connection creation rate. The service was flooded by more than 2000 concurrent connections per server at peak. We then tried to use apache benchmark (ab) to reproduce the issue in a local environment. However we found that although ab allows you to specify a concurrency level by using -c flag, it actually cannot reach that amount of concurrency because ab will hit the server’s QPS bottleneck first and stop generating more concurrent connections.

What we want is a load testing client that can serve limited QPS at high concurrent level.

I know there are several existing tools that can tweak QPS/delay and concurrency, but I had some bad experience on these tools before, either for performance or stability season and I don’t want to waste time to learn and try them one by one, which probably would take me 2-3 hours to find a suitable one. Since the requirement is really simple, I decided to work on my own solution.

The Solution

I’ve never written such a tool before so I basically have no preference on which platform I use to implement it. I think Go should be good at modeling concurrency so I decided to give it a try.

I found goroutine a very natural way for modeling concurrent connections. So here is how to generate concurrent connections. (Please note that the code is not runnable since I picked them from the whole program. I’ll provide a link to the full source code in the end of this post.)

for i := 0; i < *concurrency; i++ {
go func() {
client := &http.Client{}
for {
sendRequest(client, *req, *concurrency)
}
}()
}

Here concurrency and req are input parameters to the program given by command line.

However these goroutines will generate requests as quick as possible so we need a way to limit the QPS. I used a buffered channel as a blocking queue to generate tokens for sending requests, then all client goroutines are required to get the token first then sending the request. Also setup a separate goroutine to constantly fill in the token queue to allow more requests to be generated. Here is the code:

bucket := make(chan bool, *maxQPS)

go func() {
for {
for i := 0; i < *maxQPS; i++ {
select {
case bucket <- true:
default:
}
}
time.Sleep(time.Second)
}
}()

for i := 0; i < *concurrency; i++ {
go func() {
client := &http.Client{}
for {
<- bucket
sendRequest(client, *req, *concurrency)
}
}()
}

Here I used the select..default construct to cancel adding tokens to the queue if the queue is full to avoid token generator generating more tokens than wanted under race condition.

That’s it! The basic functionality is done!

Workaround TIME_WAIT connection problem

Because we want to test the scenario that clients keep generating new connections to the server, we specifically disabled keep-alive for the underlying connection. After running the program for a while we found that the client used up the IP port range reserved for client connection which is roughly 28k ports. The result is that on the client host, you observe a lot of TCP connections in TIME_WAIT state. This is an expected behavior of TCP connections. I did the following trick to workaround the problem:

  • Tweak the OS configuration on the running host. We run the client on a Linux server, you can follow this article to set net.ipv4.tcp_tw_reuse to 1.
  • Change the Dial of http.Transport to use TCPConn.SetLinger(0). Tweaking the network configuration is not enough, we also have to change the client’s TCP connection to set SO_LINGER to 0 when create connections. See this article for detailed explanation.

After all these tweak, the performance of the final tool is really good, I can generate 200 concurrent connection with 400 max QPS using a single 2.4G Hz Xeon core.

Another beautiful feature of Go is that you can develop in your Mac OS and directly cross compile a Linux working binary on your Dev machine and deploy-by-copying-a-single-file. Sweet!

You can checkout the full working source code here: https://github.com/jiaz/simpleloadclient.

Conclusion

Go is really good at modeling concurrencies. I can’t imagine how many code I need to write to implement this tool in other language. In fact, I don’t like Go for its lack of Generics and duck typing only type system. There are a lot of criticism of Go and some of which are quite reasonable. However, despite the fact that it is not perfect, it is really sharp for the concurrency modeling. As a developer, we should open our mind by learning these different views to the world so that we can model our problems better with more suitable tools.

TCP connection tweaking is interesting. I’ve seen the TIME_WAIT problem a lot before in production service development. You need to understand TCP states pretty well to implement a decent performing service.

The inconvenience of encapsulation

When writing unit tests, the most ugly & painful & stupid thing you might have to do is to expose some internal implementation of your module to the testing code so that:

  • It is easier to enumerate all possible conditional branches to cover some edge cases; (although if you have a complex design, you can always achieve this by mocking things, however sometimes it’s an overkill)
  • It is possible to inspect side effects of your code; (accept the dark side of the reality, you cannot totally avoid side effects)
  • It is easier to replace some dependencies with mocks. (although if you have a good design, you should be able to inject all dependencies, but it’s not a common thing in javascript realm)

Sometimes, you might even find yourself wasting a lot of precious time thinking about solutions to avoid writing such smelly code. But, hey! It’s testing code! Shouldn’t we have fun and enjoy writing it with all the power and flexibility we ought to have rather than struggling in these design holes? Why do we have to break core code for testing code?

Recently I found an excellent library rewire that neatly solves this problem. Moreover, the idea behind it is really creative so that I’d like to share it in this post.

Nothing is more valuable than an example. Let’s try to make a really simple one. The code logic might not be very reasonable, just made for showing the problems and how rewire can solve them. You can find the whole example here.

Given the following scenario: You are implementing a log appender, which writes aggregated log messages to some database in batch. And somehow the database has throttling control so that you cannot write to it more than once per second. Here is something you might write.

var db = require('./db');

var batchSize = 10;
var lastWriteTime = null;
var count = 0;
var data = "";

function flushToDB() {
db.write(data);
data = "";
count = 0;
lastWriteTime = Date.now();
}

exports.append = function(message) {
data += message + "\n";
count += 1;

var duration = Date.now() - lastWriteTime;
if (count >= batchSize && duration > 1000) {
flushToDB();
}
};

Now you might want to write some test cases to test your code. The three basic cases might be:

  • The appender shouldn’t flush until batch size is reached.
  • The appender shouldn’t flush twice within one second.
  • The appender should flush when batch size is reached and more than 1 second is elapsed from last flush.

Here comes to the problems:

  • How do I know the batch size? It’s not exposed anywhere. And probably it’s a private information that need to be hidden in the module.
  • How do I know if the flushToDB method is called?
  • How do I change to a mock db for unit test?
  • How do I easily control the duration so that the if branch can be tested without using some fancy thing to mock the datetime or wasting your test time for setTimeout?

The fundamental challenge is that we have hidden states in the module which are not supposed to be exposed.

Here comes the rewire

rewire is a replacement for require for testing purpose. When it requires the module it also magically modifies your source code so that variables hidden in the module can be accessed and modified. Here is how you can use rewire to solve the above problems without touching the core module.

var rewire = require('rewire');
var assert = require('assert');

describe("LogAppender", function() {
var logappender = rewire('../lib/logappender');
it('should not flush when append less than batchSize times', function() {
var batchSize = logappender.__get__('batchSize');
var flushCalled = false;
logappender.__set__('flushToDB', function() {
flushCalled = true;
})
for (var i = 0; i < batchSize - 1; i++) {
logappender.append("hello");
}
assert.strictEqual(flushCalled, false);
});

it('should flush when append more than batchSize times', function() {
var batchSize = logappender.__get__('batchSize');
var flushCalled = false;
logappender.__set__('flushToDB', function() {
flushCalled = true;
})
for (var i = 0; i < batchSize + 1; i++) {
logappender.append("hello");
}
assert.strictEqual(flushCalled, true);
});

it('should not flush when duration less than 1 second', function() {
var batchSize = logappender.__get__('batchSize');
var flushCalled = false;
logappender.__set__('flushToDB', function() {
flushCalled = true;
})
logappender.__set__('lastWriteTime', Date.now() - 500);
for (var i = 0; i < batchSize + 1; i++) {
logappender.append("hello");
}
assert.strictEqual(flushCalled, false);
});
});

And yeah, it runs successfully!

$ node_modules/mocha/bin/mocha


LogAppender
✓ should not flush when append less than batchSize times
✓ should flush when append more than batchSize times
✓ should not flush when duration less than 1 second


3 passing (5ms)

The idea behind the scene

I’m extremely interested how it implements the __set__ and __get__ function. How can they bind to the hidden variables in your module? So I step into the source code. It turns out that eval and closure does the magic. Let’s take a close look at how __set__ is implemented. __get__ is very similar.

Key point 1: It modifies your source code to adding __set__ functions to your code before requiring your module.

This is important, in this way, the added function is in the same lexical scope of your module so that it can access the module’s internal variables.

Key point 2: __set__ uses eval function to execute the variable replacing code.

Variables in the scope of __set__ function are closed in the closure created by eval statement, thus in the eval function you can not only access the hidden variables in your module (inherited from the __set__ closure), but also access the variables in the __set__ function. In this way, you can build a general function that can replace any declared variables in the module’s scope.

Literally it append the following function to your module’s source code:

module.exports.__set__ = function(varName, varValue) {
// here we can access the "varName" variable in the module and also the arguments of __set__ function.
eval(varName + "= varValue");
}

This technique really opens my mind. It’s so powerful and I believe it has a lot more practical applications. Hope you will also enjoy this library.

Nodejs Profiling Tools

At Hulu, we use Nodejs to build a proxy or orchestration service to expose APIs to our client applications in order to provide a unified interface (in API schema) which hides the complexity to deal with our versatile backend “micro-services”.

The reason we choose Nodejs over other technologies is that its asynchronized nature perfectly fits the requirements of the service:

  • Handling large number of concurrent clients;
  • Requesting backend micro-services to get the data, where most of the time is spend on IO;
  • Enhancing the backend response with data from other backend micro-services.

This is a typical front-tier API service that has low CPU consumption and high concurrent IO requirement.

However when you deploy the application to production, it is very likely that you would hit CPU bottleneck if you are not careful enough when writing code. Javascript is so flexible that can easily get you trapped into performance problem if you don’t understand the execution engine (v8) well enough. Recently, I spent a lot of time doing profiling and investigating the performance issues of our service.

In this post, I’ll list some tools and techniques I have used to profile/debug Nodejs program for daily development and production trouble-shooting as a summary of the work I’ve done.

(We use MacOS X as development environment and Ubuntu server as production OS. The version of node we currently use is a custom build of v0.10.)

Debugging Tools

node-inspector

node-inspector is my favorite tool for Nodejs application trouble-shooting.

You can use node-debug command to launch your application so that it starts your application and the debugger in your browser. However my favorite way to launch the debugger is to have a separate debugger process and attach node applications to it on demand so that I can use a single debugger process to debug multiple node applications.

First launch node-inspector, I prefer to open a separate console and leave it there.

$ node-inspector
Node Inspector v0.9.2
Visit http://127.0.0.1:8080/debug?port=5858 to start debugging.

Then add --debug flag to the node option to run the node application with debugger enabled, which by default listen on port 5858. You can change the port by providing a port number to the debug flag, as --debug=<port>.

$ node --debug lib/mozart_server.js
debugger listening on port 5858

If you try to debug some issue when node app starts, you can change --debug to --debug-brk so that your app starts in paused status.

After your app is started, you can visit the URL http://127.0.0.1:8080/debug?port=<port> to debug your application, just like in any IDE.

node-inspector debugger

node-inspector can also be used remotely so that you can debug your applications on production servers.

WebStorm (spy-js)

Recently I used WebStorm a lot for javascript development. People always debate about whether a good programmer should use IDE or plain text editor (especially VIM, emacs). I don’t see any value in such debate, as a pragmatic developer, I just use the best tools I can find to get things done.

WebStorm comes with a really good debugger for Nodejs applications based on their open source javascript tracing library spy-js. With IDE’s magic, you can start your server and set break points in the code in one place, the overall feature might be similar as node-inspector, but life gets much easier.

WebStorm debugger

Besides the debugger, it also has good IDE features like auto-completion, smart code inspection, refactor and code navigation. It definitely worth you $99 to get the tool if you do professional programming in javascript.

TraceGL

TraceGL is a tracing tool to log every statement your Nodejs app is executing. It is used when I want to know what exactly happens during the execution, especially when debugging timers, which is really difficult to manage using regular debuggers, because attaching the debugger will stop the execution and thus changing timers’ behavior.

TraceGL is very easy to setup, just clone the repository and follow instructions to build the tracegl.js lib, put it somewhere makes you comfortable to access, like ~/Workspace/tools/tracegl.js. Then start your node application using the following command.

$ node ~/Workspace/tools/tracegl lib/mozart_server.js
[trace.GL] See your code.
[trace.GL] WebGL trace UI: http://0.0.0.0:2000

Then open http://localhost:2000 in your browser, you will see something like the below image. TraceGL outputs each executed statement in time order and the value of function arguments with the corresponding stack, which is really useful.

TraceGL

CPU Profiling

Now let’s talk about tools to profile your applications. During development it is critical to understand the bottleneck of your application, for any application that do serious business. Javascript programs can easily get slow and eating a lot of CPU if you don’t pay attention. Even good professional programmers can get into these problems. So when such problem happen it’s important to know how to find the culprit.

Frame Graph

Frame Graph is the first thing I want to get when I start to find CPU bottlenecks. It gives you a good overview understanding of where your application spends most of the time. Below is a screen-shot of the Frame Graph of our application. I recently identified a lot of performance issues using this tool.

Frame Chart

Frame Graph is obtained by sampling your running program, getting the call stacks and accumulating them into the graph, in the graph:

  • The x-axis stands for the accumulated “ticks” of the sampled function. Each block represents a javascript function. The longer the block, the more CPU it (and its children) consumes. Although there is a time line in the chart, don’t be cheated, the length of the block has no meaning in the real time perspective. Also the relative position of the bricks does not indicate the execution sequence. Blocks on the left are not necessarily executed before the blocks on the right.
  • The y-axis stands for the depth of call stack. The higher the block, the deeper it is in the call stack. It also shows the parent-children relationship of the functions.

Most of the tutorials, blogs, articles I can find demand you to have some OS that support DTrace to generate this graph. Unfortunately, most of the time I work in Linux or MacOS, which has no full DTrace support. The good news is, it turns out that we don’t need DTrace to get this! (as long as you only concern the javascript space, not inside v8 native space) node-inspector comes to the rescue again!

Launch your node-inspector, boot up your server with debug flag and open the web debugger. See this “Profiles” tab? That’s it!

node-inspector profiler

I won’t spend more time to explain on how to use it though :) Just try it yourself.

Moreover, node-inspector’s sampling is really light-weight and it’s very good for CPU bound performance tuning.

v8 Profile log

Sometimes you need to do more subtle inspection, such as finding if the method is optimized or not by v8, knowing what Nodejs is doing in native space. Then you need the v8 profile log.

To generate the v8 profile log is easy. Just add --prof to the node command line parameter.

$ node --prof lib/mozart_server.js

Then Nodejs will generate a file v8.log in the current folder.

The difficult part is how to analyze the log file. According the tutorial, you need to find a proper mac-tick-processor to parse the log file. Unfortunately, I have never successfully built one, because the v8 version embedded in node v0.10 is too old and it just can’t be built it in the latest OSX environment without changing the source code. That’s painful.

After some research, I found a particular useful tool called node-tick that can do the trick. node-tick is a pure javascript application that doesn’t require you to build. Below is an excerpt from the output of node-tick, which shows a suspicious unknown lib that consumes 15% of the CPU time.

[Unknown]:
ticks total nonlib name
5790 14.9%

[Shared libraries]:
ticks total nonlib name
25474 65.7% 0.0% /Users/jiaji.zhou/.nvm/v0.10.24/bin/node
2634 6.8% 0.0% /usr/lib/system/libsystem_kernel.dylib
1400 3.6% 0.0% /usr/lib/system/libsystem_platform.dylib
716 1.8% 0.0% /usr/lib/system/libsystem_c.dylib
537 1.4% 0.0% /usr/lib/system/libsystem_malloc.dylib
232 0.6% 0.0% /usr/lib/system/libsystem_pthread.dylib
33 0.1% 0.0% /usr/lib/libc++abi.dylib
13 0.0% 0.0% /usr/lib/system/libsystem_m.dylib
10 0.0% 0.0% /usr/lib/libstdc++.6.dylib

I’m not going to go into the details of how to understand the v8 log. Here are some references if you are interested:

Profiling on Production

Production is a serious business! Obviously, you should not add --debug flag to your process. Joyent has built a lot of tools that require DTrace functionality and their SmartOS to run well, but we are running Linux and I don’t see any potential plan to change it to SmartOS, it sucks.

There are some profiling and monitoring SaaS for Nodejs out there, such as nodetime, strongloop, PM2, etc. I like none of them, because it not only will introduce an external network/service dependency but also will leak your production data to a 3rd party, which looks awkward and stupid if you are doing serious business.

What we currently do is, setting up an environment as close to production as possible and constantly replaying production traffic to that environment, and using node-inspector to do profiling on that environment if necessary. Also we heavily rely on production metrics. We emit massive amount of metrics.

Recently, I found a tool called node-stap, developed by Uber, which looks promising. However SystemTap is intrusive and has impact on the system, I haven’t tried it on production. I’m doing some experiments on it and might write a post about it later.

Memory Profiling

Trace GC and Heap size

Nodejs has a pretty decent Garbage Collector, so usually you don’t need to worry much about the GC parameters.

A particular optimization we do on production is to enable --expose-gc and --nouse-idle-notification flag so that we can do manual GC when necessary. Here is a good reference about the effect of --expose-gc and --nouse-idle-notification to the system performance under high load: http://sysmagazine.com/posts/123154/.

On the other hand, GC activity log is a good resource for performance trouble shooting. It is better to record that. To enable GC log, you need to add the --trace_gc flag to the node command. See the below example. One thing to note: the number in the parenthesis is the total memory allocated, the number outside the parenthesis is the total used memory.

When system performance is low, you may check the log to see if there are excessive GC events at the same time. If so, you may need to investigate how to reduce memory usage.

$ node --debug --trace_gc --expose-gc lib/mozart_server.js
debugger listening on port 5858
[94062] 30 ms: Scavenge 2.0 (34.0) -> 1.7 (36.0) MB, 0 ms [Runtime::PerformGC].
[94062] 42 ms: Scavenge 2.6 (38.0) -> 2.1 (38.0) MB, 1 ms [allocation failure].
[94062] 51 ms: Scavenge 3.1 (38.0) -> 2.5 (38.0) MB, 0 ms [allocation failure].
[94062] 67 ms: Scavenge 4.6 (38.0) -> 3.3 (38.0) MB, 0 ms [Runtime::PerformGC].
[94062] 90 ms: Scavenge 5.5 (39.0) -> 4.5 (39.0) MB, 1 ms [allocation failure].
[94062] 111 ms: Scavenge 6.4 (39.0) -> 5.4 (40.0) MB, 1 ms [allocation failure].
[94062] 155 ms: Mark-sweep 9.4 (41.0) -> 5.6 (43.0) MB, 8 ms [allocation failure] [promotion limit reached].
[94062] 192 ms: Scavenge 10.2 (43.0) -> 7.4 (43.0) MB, 1 ms [Runtime::PerformGC].
[94062] 222 ms: Scavenge 10.8 (43.0) -> 8.7 (43.0) MB, 2 ms [Runtime::PerformGC].
[94062] 239 ms: Scavenge 12.6 (44.0) -> 10.9 (46.0) MB, 1 ms [allocation failure].
[94062] 287 ms: Scavenge 18.1 (46.0) -> 13.2 (47.0) MB, 2 ms [allocation failure].
[94062] 341 ms: Mark-sweep 19.7 (47.0) -> 12.3 (48.0) MB, 17 ms [allocation failure] [promotion limit reached].

To understand the GC space layout, you may add --trace_gc_verbose to show details for the size of each generation. This is useful when the problem you need to solve is subtle and you need to understand the distribution of new generation and old generation. For example:

[93955]      323 ms: Scavenge 10.8 (43.0) -> 8.7 (43.0) MB, 2 ms [Runtime::PerformGC].
[93955] Memory allocator, used: 44048 KB, available: 1455088 KB
[93955] New space, used: 917 KB, available: 3178 KB, committed: 8192 KB
[93955] Old pointers, used: 3249 KB, available: 278 KB, committed: 3535 KB
[93955] Old data space, used: 2795 KB, available: 106 KB, committed: 3215 KB
[93955] Code space, used: 1362 KB, available: 115 KB, committed: 1992 KB
[93955] Map space, used: 562 KB, available: 573 KB, committed: 1135 KB
[93955] Cell space, used: 39 KB, available: 0 KB, committed: 128 KB
[93955] Large object space, used: 0 KB, available: 1454047 KB, committed: 0 KB
[93955] All spaces, used: 8926 KB, available: 4251 KB, committed: 18198 KB
[93955] Total time spent in GC : 12 ms

Here is a good article about how GC works in Nodejs: https://strongloop.com/strongblog/node-js-performance-garbage-collection/

Another common metric we monitor is the heap size, which can be obtained by calling process.memoryUsage().heapUsed.

Profiling on Production

Sometimes, you may find abnormal memory usage on production. There is a good library to get heap dump on production: heapdump. You can require heapdump in you application and send signal to your process to get heap dump when necessary. This is extremely useful to investigate problems on production.

After you get the heap dump, you can use Chrome browser to load the dump and do comparison between different dumps to find potential memory leaks. See below image as an example.

heapdump

General Nodejs Performance Investigation Strategy

With the above tools and utilities, you should have enough power to investigate Nodejs performance problems. Here are some general strategies you can follow.

  • Find the bottleneck. First of all you need to understand what’s the bottleneck of your system. Is it bandwidth, memory or CPU?
  • Focus on a single code path at a time. After you know your bottleneck, try to use some consistent way to stress your system and using the profiling tools to get the idea why the system is under perform in such fixed condition. As an example, we have a lot of API endpoints in our Nodejs service. When I do performance investigation, I will first find the slowest endpoint. Then focus on that endpoint to start investigation.
  • Start from the most significant issue. If some function takes 50% of your server’s resource (CPU), when you optimize it by 30%, you get 15% total increase. On the other hand, if some function only takes 10% of your total resource, even if you totally eliminate the function, you still can only get 10%. So focus on most significant issue first.
  • Divide and Conquer. Most of the time the problem you are solving is too complex, so you need to break down the problem into small problems. Writing small benchmark scripts to understand how you can improve for each small problem.
  • Metric, metric, metric! On production, there is limited way to get information from your nodejs application. So please report as many metrics as possible when you write code. However please also profile how much cost you spend on reporting metrics, sometimes you may find it’s ridiculously high, then you need to find a way to optimize the reporting.