Q: What do our 3 favorite open source projects (node, redis and nginx) have in common? Apart from being uber-cool?
A: They are all single threaded.
But aren’t they all really fast and highly scalable? Yep. So how does that work?
Nginx, redis and node are all event-based. They have an event loop that will listen for an event saying that an asynchronous operation (IO) has completed and then execute the callback that was registered when the async operation started. Rinse, then repeat. It never waits for anything, which means that the single thread can go hell-for-leather just running code. Which makes it really fast.
In days gone by, when we were Microsoft slaves, we used to wrestle with multithreading as a way of dividing up work. In web apps every request started a new thread. We’d also use the Task Parallel Library (TPL) which was not an easy abstraction. And combine that with some event processing library like Reactive Extensions (Rx). Now you’re asking for a lot of trouble. The new await keyword in C# helps out alot, but either way you have to think about thread safety all the time. And all kinds of locking strategies to deal with concurrent access to the same data. And even with all that, it isn’t as fast.
The difference between the two worlds lies in the way that pieces of work are orchestrated.
Event-based applications divide work up using callbacks, an event loop and a queue. The unit of work, or task, is a callback. Simple. Only one callback is ever executing at a time. There are no locking issues. You can write code like you’re the only kid on the block. You decide when you’re done and then effectively yield control to someone else. Everyone is really polite so it just works.
Thread-based applications essentially divide work up in hardware. Because each piece of work has its own thread, and will block if it needs to (like when it’s waiting for IO), the CPU will suspend that thread and start running another that is waiting. Every time that happens there is quite a hefty context switch, including moving about 2MB of data around. In effect the hardware decides when to yield control and you don’t get much of a say.
Who’d have thought that a single thread, dealing with everything, could be faster than multiple threads each dealing with just one thing? Well, on a single core, that may be true. On multiple cores it actually may also be true. That’s because you’ve probably got nginx and node and redis all running on the same machine – simplistically, on a quad core, that’s one core each and still one left over
But isn’t writing synchronous code for a multithreaded environment a lot easier than writing asynchronous code for a single threaded environment? Well, maybe, a little. But some great patterns have emerged within the node community that really help.
The simplest continuation-passing style (CPS) is the callback. Which actually is not at all hard when you get used to it. And it happens to be a great way to encapsulate and really easy to modularise. The pattern for async functions is that the last argument is always the callback, and the pattern for callbacks is that errors are always the first argument (with results after that). This standardisation makes composition really easy.
There are a ton of npm modules that can often help reduce complexity. The best, in my opinion, is still Caolan’s async. It’s still the most popular and follows the node conventions. And there are also a few CPS compilers that allow you to code in a more synchronous style. I wouldn’t have recommended these in the past, but there are a few, such as tamejs and Iced CoffeeScript, that use an “await, defer” pattern that is quite nice. We’re using CoffeeScript more and more these days, and this “icing” is very tempting (seeing as we’re compiling anyway), but we haven’t strayed that way yet.
We’ve been writing big apps in node since October 2011 and have learnt a lot about how to separate concerns and modularise our code. It’s a lot different to the object-oriented class-based separation we were used to, but after your head is reprogrammed to use a functional style it becomes second nature and actually much easier to structure. Caolan’s post on programming style for node sums it up nicely. If you hear anyone say that node is no good for big projects, tell them that all you have to do is follow a few simple rules and then it becomes perfect. And fast.