An expensive line of code: scaling Node.JS on Heroku

One of the perks of working in multidisciplinary teams for the past few years has been that there I have continuously been surrounded with people with more tech chops than me who would tackle the really hard problems™. While working on a little thing for the Human Tide project, I was reminded how tricky some of those problems can be sometimes.

In a nutshell, I was building a web service that uses the HTML5 Geolocation API to track the location and movement of people as they walk along the shoreline of Herne Bay. The plan was that a group of people – around a hundred – will each carry a mobile phone that sends its GPS coordinates to a server, which stores them in a database. It was a simple enough job, were it not that each phone continuously sends its coordinates back in real time – once per second – which quickly begins to look like a small-scale distributed denial of service attack. Furthermore, as each request needs to be stored in a database, it was not a matter that can easily be solved using caching. I needed a backend that could deal with 100+ requests per second without breaking a sweat. Just the use case node.js, with its asynchronous event-driven model, was supposed to be good at!

Building the little web service went well enough. Within a few hours, I had a small node.js app running on a free Heroku plan. Using the expressjs framework and the node-postgres library, the whole thing was done in under 100 lines of code. It was easy. Too easy. Unfortunately, it performed absolutely terrible.

To simulate the effect of a hundred phones hammering away at the service and measure its performance, I unleashed ApacheBench on my little app. Simulating 5,000 requests from 100 concurrent users is as easy as running:

ab -n 5000 -c 100

The results were devastating:

Concurrency Level:      100
Time taken for tests:   565.061 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      3385800 bytes
HTML transferred:       2655000 bytes
Requests per second:    8.85 [#/sec] (mean)
Time per request:       11301.229 [ms] (mean)
Time per request:       113.012 [ms] (mean, across all concurrent requests)
Transfer rate:          5.85 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       17   18   0.5     18      25
Processing:    31 11193 2975.6  11176   21911
Waiting:       31 11193 2975.6  11176   21911
Total:         49 11211 2975.5  11195   21928

Percentage of the requests served within a certain time (ms)
  50%  11195
  66%  11714
  75%  12074
  80%  12232
  90%  13032
  95%  14745
  98%  16573
  99%  17852
 100%  21928 (longest request)

My state-of-the-art node.js app managed to handle all of the requests (mostly thanks to Heroku’s routing stack queuing them up), but at a rate of less than 10 requests per second and taking up to 20 (!) seconds per request. Granted, this was on a single Heroku “dyno”, but with these results, it would take at least 10 additional dynos (at a cost of almost $400 a month) to cope with the load. Not exactly what I had in mind. I figured it might have something to do with my app using Heroku’s dev postgresql database plans, but running the same test against a route that didn’t hit the database produced similar results. It seemed that the slowness was the request handling itself. Looking in Heroku’s logs, I saw a depressing number of lines like this, showing severe delays in the “service” step:

heroku[router]: … connect=0ms service=4506ms status=200 bytes=543

I found an article explaining how to use node cluster to better benefit from the multiple cores in each Heroku dyno, which didn’t appear to make much of a difference. It was still just too slow.

And then it hit me. From Heroku’s own tutorial on getting started with Node.js on their platform, I copied the following line to enable logging in Express.js:

app[web.1]: - - - [Mon, 05 Aug 2013 14:05:51 GMT] "GET / HTTP/1.1" 200 - "-" "ApacheBench/2.3"

Deleting that line improved performance… with a factor 50. I ran my little concurrency benchmark again and this time it completed in a fraction of the time, even though I gave it a lot more requests to process:

Concurrency Level:      100
Time taken for tests:   105.179 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Total transferred:      33843447 bytes
HTML transferred:       26550000 bytes
Requests per second:    475.38 [#/sec] (mean)
Time per request:       210.358 [ms] (mean)
Time per request:       2.104 [ms] (mean, across all concurrent requests)
Transfer rate:          314.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       17   18   1.3     18      59
Processing:    23  192 160.6    135    1400
Waiting:       23  192 160.6    135    1400
Total:         41  210 160.6    153    1419

Percentage of the requests served within a certain time (ms)
  50%    153
  66%    280
  75%    341
  80%    366
  90%    419
  95%    458
  98%    546
  99%    656
 100%   1419 (longest request)

From less than 10 requests per second, it can now handle almost 500 requests per second and rather than 10 seconds, requests now take around 200ms to complete. All because of a single line of logging. In fact, I found that any logging, even simple “console.log”-statements completely destroy Node.JS performance on Heroku.

What I’ve learned from this is that scaling is hard and finding out why things are not performing as expected when using new technology is too, even for very simple apps like this. I’m a bit surprised that a bit of logging has such a severe impact on performance on Heroku and even more surprised that they recommend you to enable logging in their Express tutorial. Turning logging off allowed my app to handle 53x as many requests per second on a single “dyno”. I would have had to pay for quite a few of those dynos, at $34.50 a piece, in order to deal with the same volume of traffic. In the end, my little app performed great and I highly recommend Heroku or similar cloud services for short-term high-availability purposes, but keep an eye out for little things that hurt performance before reaching out for that credit card.

9 Responses to “An expensive line of code: scaling Node.JS on Heroku”

  1. Zeke says:

    Thanks for the info Michel. I will update the Heroku ‘Getting Started’ documentation to reflect this.

    According to the Node.js documentation, “The console functions are synchronous when the destination is a terminal or a file (to avoid lost messages in case of premature exit) and asynchronous when it’s a pipe (to avoid blocking for long periods of time).”

    You can enable asynchronous logging on Heroku by setting up a syslog drain:

  2. James Qualls says:

    Does anyone know if a syslog drain is used if you enable the Papertrail addon within heroku?

  3. @James correct: papertrail is set up a drain. Run `heroku drains -a myapp` and `heroku help drains` for more info.

  4. Wow, what an eye-opener. Thanks for sharing, Michel.

    @Zeke: I’m really surprised — you’re saying that the presence or not of a Heroku drain affects whether Node thinks it’s outputting to a terminal/file or pipe? I’d expect that Heroku always pipes Node’s stdout/stderr somewhere (e.g. to Logplex), and it’s afterward that drains get used.

  5. I found that the Node docs say how to programmatically determine if console output is blocking or not:

    process.stderr and process.stdout… are blocking in the case that they refer to regular files or TTY file descriptors. In the case they refer to pipes, they are non-blocking like other streams… To check if Node is being run in a TTY context, read the isTTY property on process.stderr [or] process.stdout.

    I whipped up a quick app to test this on Heroku (code), and found that no, logs are *not* blocking without a drain. The isTTY property is undefined when running as a Heroku dyno (while it’s true when running as a one-off process, which synchronously sends all output to your console), and furthermore, a private _type property has a value of 'pipe' in that case.
    (You can also see this in the full output: the inspection of process.stderr doesn’t log fully before the process exits.)
    So if this is indeed a performance issue on Heroku, I don’t think it’s due to logs being blocking to Node.

  6. Edward Muller says:

    Was/Is this app located in the EU region? If so, we recently (yesterday 2013/9/30) fixed a problem where each log line would result in a separate, almost synchronous post to logplex. This problem should now be solved.

  7. Michel says:

    @Edward and Fred: the app is indeed running in the EU region. I haven’t gotten around seeing if the performance has improved (the app in question was very ephemeral to begin with), but I’m glad this helped you close in on this issue!

  8. Thanks Michel. This info was a super saver.

    Added the following lines in app.js and it increased the speed very much.

    if (process.env.NODE_ENV === ‘production’) {
    var nullfun = function () {};
    console.log = nullfun; = nullfun;
    console.error = nullfun;
    console.warn = nullfun;

  9. Fred Hebert says:

    Regarding the synchronous nature of logs vs. drains, the question is more complex than that.

    Looking at will give you a more precise idea of how the stack is built.

    The output from the dyno makes it to either log-shuttle or logger through a pipe. Whether things block there depend on whether a pipe can block on write. Usually that’s fine and entirely depends on the next component in the stack, logger, syslog-ng or log shuttle.

    Logger itself writes to syslog-ng through a UNIX socket, and whether it will slow down will depend on the unix socket behaviour, which can be either blocking or non-blocking.

    As far as I am aware, it is blocking, but should be giving you enough space to not slow things down so much with a single benchmark.

    Log shuttle is a Go program doing the sending of logs over HTTPs (it’s TCP Syslog messages dumped into an HTTP request) and is used by default for apps in the EU region. Log shuttle can both be blocking and shed load.

    Until a few days ago, there was an issue with the configuration of log shuttle where it would send only one message per HTTP request, resulting in subpar performance. This has been fixed, as mentioned by Edward Muller above. If your application was in EU, that would explain things, and should now be fixed to use much larger buffers that should avoid the same issue happening again.

    If you can give more details about your app, this is something that could be looked into.