One of the perks of working in multidisciplinary teams for the past few years has been that there I have continuously been surrounded with people with more tech chops than me who would tackle the really hard problems™. While working on a little thing for the Human Tide project, I was reminded how tricky some of those problems can be sometimes.
In a nutshell, I was building a web service that uses the HTML5 Geolocation API to track the location and movement of people as they walk along the shoreline of Herne Bay. The plan was that a group of people – around a hundred – will each carry a mobile phone that sends its GPS coordinates to a server, which stores them in a database. It was a simple enough job, were it not that each phone continuously sends its coordinates back in real time – once per second – which quickly begins to look like a small-scale distributed denial of service attack. Furthermore, as each request needs to be stored in a database, it was not a matter that can easily be solved using caching. I needed a backend that could deal with 100+ requests per second without breaking a sweat. Just the use case node.js, with its asynchronous event-driven model, was supposed to be good at!
Building the little web service went well enough. Within a few hours, I had a small node.js app running on a free Heroku plan. Using the expressjs framework and the node-postgres library, the whole thing was done in under 100 lines of code. It was easy. Too easy. Unfortunately, it performed absolutely terrible.
To simulate the effect of a hundred phones hammering away at the service and measure its performance, I unleashed ApacheBench on my little app. Simulating 5,000 requests from 100 concurrent users is as easy as running:
ab -n 5000 -c 100 my.herokuapp.com
The results were devastating:
Concurrency Level: 100 Time taken for tests: 565.061 seconds Complete requests: 5000 Failed requests: 0 Write errors: 0 Total transferred: 3385800 bytes HTML transferred: 2655000 bytes Requests per second: 8.85 [#/sec] (mean) Time per request: 11301.229 [ms] (mean) Time per request: 113.012 [ms] (mean, across all concurrent requests) Transfer rate: 5.85 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 17 18 0.5 18 25 Processing: 31 11193 2975.6 11176 21911 Waiting: 31 11193 2975.6 11176 21911 Total: 49 11211 2975.5 11195 21928 Percentage of the requests served within a certain time (ms) 50% 11195 66% 11714 75% 12074 80% 12232 90% 13032 95% 14745 98% 16573 99% 17852 100% 21928 (longest request)
My state-of-the-art node.js app managed to handle all of the requests (mostly thanks to Heroku’s routing stack queuing them up), but at a rate of less than 10 requests per second and taking up to 20 (!) seconds per request. Granted, this was on a single Heroku “dyno”, but with these results, it would take at least 10 additional dynos (at a cost of almost $400 a month) to cope with the load. Not exactly what I had in mind. I figured it might have something to do with my app using Heroku’s dev postgresql database plans, but running the same test against a route that didn’t hit the database produced similar results. It seemed that the slowness was the request handling itself. Looking in Heroku’s logs, I saw a depressing number of lines like this, showing severe delays in the “service” step:
heroku[router]: … connect=0ms service=4506ms status=200 bytes=543
I found an article explaining how to use node cluster to better benefit from the multiple cores in each Heroku dyno, which didn’t appear to make much of a difference. It was still just too slow.
And then it hit me. From Heroku’s own tutorial on getting started with Node.js on their platform, I copied the following line to enable logging in Express.js:
app.use(express.logger());
app[web.1]: - - - [Mon, 05 Aug 2013 14:05:51 GMT] "GET / HTTP/1.1" 200 - "-" "ApacheBench/2.3"
Deleting that line improved performance… with a factor 50. I ran my little concurrency benchmark again and this time it completed in a fraction of the time, even though I gave it a lot more requests to process:
Concurrency Level: 100 Time taken for tests: 105.179 seconds Complete requests: 50000 Failed requests: 0 Write errors: 0 Total transferred: 33843447 bytes HTML transferred: 26550000 bytes Requests per second: 475.38 [#/sec] (mean) Time per request: 210.358 [ms] (mean) Time per request: 2.104 [ms] (mean, across all concurrent requests) Transfer rate: 314.23 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 17 18 1.3 18 59 Processing: 23 192 160.6 135 1400 Waiting: 23 192 160.6 135 1400 Total: 41 210 160.6 153 1419 Percentage of the requests served within a certain time (ms) 50% 153 66% 280 75% 341 80% 366 90% 419 95% 458 98% 546 99% 656 100% 1419 (longest request)
From less than 10 requests per second, it can now handle almost 500 requests per second and rather than 10 seconds, requests now take around 200ms to complete. All because of a single line of logging. In fact, I found that any logging, even simple “console.log”-statements completely destroy Node.JS performance on Heroku.
What I’ve learned from this is that scaling is hard and finding out why things are not performing as expected when using new technology is too, even for very simple apps like this. I’m a bit surprised that a bit of logging has such a severe impact on performance on Heroku and even more surprised that they recommend you to enable logging in their Express tutorial. Turning logging off allowed my app to handle 53x as many requests per second on a single “dyno”. I would have had to pay for quite a few of those dynos, at $34.50 a piece, in order to deal with the same volume of traffic. In the end, my little app performed great and I highly recommend Heroku or similar cloud services for short-term high-availability purposes, but keep an eye out for little things that hurt performance before reaching out for that credit card.
Thanks for the info Michel. I will update the Heroku ‘Getting Started’ documentation to reflect this.
According to the Node.js documentation, “The console functions are synchronous when the destination is a terminal or a file (to avoid lost messages in case of premature exit) and asynchronous when it’s a pipe (to avoid blocking for long periods of time).”
You can enable asynchronous logging on Heroku by setting up a syslog drain: https://devcenter.heroku.com/articles/logging#syslog-drains
Does anyone know if a syslog drain is used if you enable the Papertrail addon within heroku?
@James correct: papertrail is set up a drain. Run `heroku drains -a myapp` and `heroku help drains` for more info.
Wow, what an eye-opener. Thanks for sharing, Michel.
@Zeke: I’m really surprised — you’re saying that the presence or not of a Heroku drain affects whether Node thinks it’s outputting to a terminal/file or pipe? I’d expect that Heroku always pipes Node’s stdout/stderr somewhere (e.g. to Logplex), and it’s afterward that drains get used.
I found that the Node docs say how to programmatically determine if console output is blocking or not:
(source)
I whipped up a quick app to test this on Heroku (code), and found that no, logs are *not* blocking without a drain. The
isTTY
property isundefined
when running as a Heroku dyno (while it’strue
when running as a one-off process, which synchronously sends all output to your console), and furthermore, a private_type
property has a value of'pipe'
in that case.(You can also see this in the full output: the inspection of
process.stderr
doesn’t log fully before the process exits.)So if this is indeed a performance issue on Heroku, I don’t think it’s due to logs being blocking to Node.
Was/Is this app located in the EU region? If so, we recently (yesterday 2013/9/30) fixed a problem where each log line would result in a separate, almost synchronous post to logplex. This problem should now be solved.
@Edward and Fred: the app is indeed running in the EU region. I haven’t gotten around seeing if the performance has improved (the app in question was very ephemeral to begin with), but I’m glad this helped you close in on this issue!
Thanks Michel. This info was a super saver.
Added the following lines in app.js and it increased the speed very much.
if (process.env.NODE_ENV === ‘production’) {
var nullfun = function () {};
console.log = nullfun;
console.info = nullfun;
console.error = nullfun;
console.warn = nullfun;
}
Regarding the synchronous nature of logs vs. drains, the question is more complex than that.
Looking at https://github.com/heroku/logplex/blob/master/doc/README.stream-management.md#where-requests-come-from will give you a more precise idea of how the stack is built.
The output from the dyno makes it to either log-shuttle or logger through a pipe. Whether things block there depend on whether a pipe can block on write. Usually that’s fine and entirely depends on the next component in the stack, logger, syslog-ng or log shuttle.
Logger itself writes to syslog-ng through a UNIX socket, and whether it will slow down will depend on the unix socket behaviour, which can be either blocking or non-blocking.
As far as I am aware, it is blocking, but should be giving you enough space to not slow things down so much with a single benchmark.
Log shuttle is a Go program doing the sending of logs over HTTPs (it’s TCP Syslog messages dumped into an HTTP request) and is used by default for apps in the EU region. Log shuttle can both be blocking and shed load.
Until a few days ago, there was an issue with the configuration of log shuttle where it would send only one message per HTTP request, resulting in subpar performance. This has been fixed, as mentioned by Edward Muller above. If your application was in EU, that would explain things, and should now be fixed to use much larger buffers that should avoid the same issue happening again.
If you can give more details about your app, this is something that could be looked into.