Using recursion combinators in JavaScript
In the previous post we explored “array extras” and how they can help us to write concise yet performant and clean code. In this post we take a look at generalizing recursive algorithms with recursion combinators — high-level functions that encapsulate all boilerplate code needed to set up the recursion. These functions were added to dojox.lang.functional and will be officially released with Dojo 1.2.
In general the recursion is a form of iterative problem solving in the same category as loops. There are two major natural sources of recursive algorithms: recursive data structures (lists, trees, and so on), and recursive definitions (factorial, Fibonacci numbers, the GCD algorithm, etc.). The recursion plays a prominent role in the functional programming (FP), and one of the best articles on this topic is “Recursion Theory and Joy” by Manfred von Thun, the creator of Joy (a purely functional programming language with Forth-like syntax). Manfred’s article explains intricacies of recursion including the venerable Y combinator, recursion combinators in general, and introduces a practical set of recursion combinators, which will guide us in this post.
FP programmers spent a lot of time advancing the theory of programming, and it shows. Studying functional languages gives you better understanding of computing, and provides elegant solutions to many difficult questions. Yet there is a problem of applicability: some solutions are not applicable nor practical for multi-paradigm languages like JavaScript. For example, while it is simple to replicate monads, it doesn’t make much sense: monads are mostly used to express sequential computations, introduce side-effects like input/output, and implement state. We can do all these things directly without artificial constructs. The same goes for the Y combinator, which solves the problem of self-reference to yet undefined or incomplete function — there is the direct way to do it in JavaScript (hint: arguments.callee).
The simplest kind of recursion is a linear recursion (linrec). This is a pattern when a function calls itself once, or terminates when some condition is met. Basically it means that in order to do any linear recursion function we need four non-recursive functions:
- a stop condition,
- a “then” function, which is called when condition is met before stopping the recursion, produces a final value,
- a “before” function, which is called before the recursive call to produce new arguments,
- an “after” function, which is called after the recursive call to process the returned value.
Let’s code it up using lambdas (read more on them in Functional fun in JavaScript with Dojo):
var df = dojox.lang.functional;
var linrec = function(cond, then, before, after){
var cond = df.lambda(cond),
then = df.lambda(then),
before = df.lambda(before),
after = df.lambda(after);
return function(){
if(cond.apply(this, arguments)){
return then.apply(this, arguments);
}
var args = before.apply(this, arguments);
var ret = arguments.callee.apply(this, args);
return after.call(this, ret, arguments);
};
};
The code is simple, yet flexible. Let’s go over all 4 parameter functions:
- cond() takes all parameters passed to the linrec() and returns a Boolean value. If it is true, we stop recursions and call then(), otherwise we proceed to before().
- then() takes all parameters passed to the linrec() and returns a value, which in turn will be the returned value of the linrec().
- before() sets up the recursion arguments. It takes all parameters passed to linrec() and returns an array of new parameters to call linrec() recursively.
- When the recursive call is finished we process its return with after(). It takes two parameters: the returned value, and the array of all arguments passed to our linrec(). It returns a new value, which will be the returned value of linrec().
Let’s see how we can code well-known algorithms using linrec():
// factorial
var fact0 = function(n){
return n <= 1 ? 1 : n * arguments.callee.call(this, n - 1);
};
var fact1 = linrec("<= 1", "1", "[n - 1]", "m * n[0]");
// find the greatest common divisor using the Euclidean algorithm
var gcd0 = function(a, b){
return b == 0 ? a : arguments.callee.call(this, b, a % b);
};
var gcd1 = linrec("a, b -> b == 0", "a", "a, b -> [b, a % b]", "x");
As you can see using linrec() is pretty straight-forward. But gsd0() demonstrates a very important case: a tail recursion. In terms of linrec() it means that we don’t process the result of the recursive call, but return it directly ⇒ we don’t need after() anymore. Let’s code it up:
var tailrec = function(cond, then, before){
var cond = df.lambda(cond),
then = df.lambda(then),
before = df.lambda(before);
return function(){
if(cond.apply(this, arguments)){
return then.apply(this, arguments);
}
var args = before.apply(this, arguments);
return arguments.callee.apply(this, args);
};
};
Let’s see our previous examples coded with tailrec():
// factorial: the tail recursive version with an accumulator
var fact2Aux = function(n, acc){
return n <= 1 ? acc : arguments.callee.call(this, n - 1, n * acc);
};
var fact2 = function(n){ return fact2Aux(n, 1); }
var fact3Aux = tailrec("<= 1", "a, b -> b", "[n - 1, n * acc]");
var fact3 = function(n){ return fact3Aux(n, 1); }
var gcd2 = tailrec("a, b -> b", "a", "a, b -> [b, a % b]");
What does it buy us? The recursive part is encapsulated in linrec() or tailrec(), and we saved some bytes on the boilerplate. That’s it? Wait, there is more.
The recursive solutions sound like fun, but in the real life we have to take into account harsh realities:
- Recursions can be expensive: stack frames are allocated, internal variables are allocated, we don’t reuse unneeded variables of the previous stack frame, and so on.
- Usually there is a restriction on how many times we can recurse.
How severe are restrictions on number of recursions? We can write a super-simple program to find out:
rec = function(n){
if(n <= 1){ return 0; }
return rec(n - 1) + 1; // to eliminate possible tail recursion optimization
};
Trying different n
we can find a limit. I tried this code on different browsers. Results are
below:
Practically all browsers limit the depth of recursion at ~3,000. Safari is the weak spot with ~500. Opera looks good with ~10,000 but it aborted the JavaScript interpreter thread that ran the test, while other browsers threw an exception — the “nuclear” response.
Let’s convert recursive algorithms into loops. Now we can easily do that, because by abstracting the recursive algorithms we set us up for improving their implementation without changing their interface. These are “loop” versions:
var linrec = function(cond, then, before, after){
var cond = df.lambda(cond),
then = df.lambda(then),
before = df.lambda(before),
after = df.lambda(after);
return function(){
var args = arguments, top, ret;
// 1st part
for(; !cond.apply(this, args); args = before.apply(this, args)){
top = {prev: top, args: args};
}
ret = then.apply(this, args);
//2nd part
for(; top; top = top.prev){
ret = after.call(this, ret, top.args);
}
return ret;
};
};
var tailrec = function(cond, then, before){
var cond = df.lambda(cond),
then = df.lambda(then),
before = df.lambda(before);
return function(){
var args = arguments;
for(; !cond.apply(this, args); args = before.apply(this, args));
return then.apply(this, args);
};
};
By eliminating the recursion we used an explicit stack (a list, really) in linrec to hold some necessary variables. But tailrec was converted without any linear structures. Now you can see why the tail recursion is the preferred form of recursion for many — it doesn’t require any intermediate storage for iterations, and can be optimized easily.
Is it all we can do to make it faster? No. As you’ve noticed we worked with lambdas, which allow to represent functions in a compact textual notation. We can easily inline them saving on calling external functions for small operations. Of course the inlining works only for text snippets, regular functions will be called as usual.
But before taking a look at numbers let me introduce two more recursion combinators — binrec, and multirec, which are here to deal with binary and generic tree-like recursion respectively:
var binrec = function(cond, then, before, after){
var cond = df.lambda(cond),
then = df.lambda(then),
before = df.lambda(before),
after = df.lambda(after);
return function(){
if(cond.apply(this, arguments)){
return then.apply(this, arguments);
}
var args = before.apply(this, arguments);
var ret1 = arguments.callee.apply(this, args[0]);
var ret2 = arguments.callee.apply(this, args[1]);
return after.call(this, ret1, ret2, arguments);
};
};
var multirec = function(cond, then, before, after){
var cond = df.lambda(cond),
then = df.lambda(then),
before = df.lambda(before),
after = df.lambda(after);
return function(){
if(cond.apply(this, arguments)){
return then.apply(this, arguments);
}
var args = before.apply(this, arguments),
ret = new Array(args.length);
for(var i = 0; i < args.length; ++i){
ret[i] = arguments.callee.apply(this, args[i]);
}
return after.call(this, ret, arguments);
};
};
Pay attention to different signatures of before() and after() functions.
Let’s implement the Fibonacci algorithm with binrec:
var fib0 = function(n){
return n <= 1 ? 1 :
arguments.callee.call(this, n - 1) +
arguments.callee.call(this, n - 2);
};
var fib1 = binrec("<= 1", "1", "[[n - 1], [n - 2]]", "+");
Yes, it is that simple.
All 4 recursion combinators implemented with loops and possible lambda inlining are the part of dojox.lang.functional package:
dojo.require("dojox.lang.functional.linrec");
dojo.require("dojox.lang.functional.tailrec");
dojo.require("dojox.lang.functional.binrec");
dojo.require("dojox.lang.functional.multirec");
And now it is time for obligatory numbers. I wrote a simple program, which measures performance of different versions of factorial functions, and Fibonacci functions. Here are the results for the factorial:
And here are the results for the Fibonacci:
The legend:
- raw rec is the manually-written recursive version of an algorithm,
- raw tail is the tail-recursive version of an algorithm,
- raw loop is the loop-based version of an algorithm,
- XXXrec rec is the naïve recursive version of linrec, tailrec, binrec, or multirec given above,
- XXXrec loop is the loop-based version of respective recursion combinators,
- XXXrec is the version from dojox.lang.factorial, which implements an optional inlining,
- Midori is the Webkit-based web browser for Linux,
- multirec() doesn’t match the factorial nor Fibonacci algorithms, I used it just to see how (badly) it stacks up against normal methods,
- all numbers were taken on the same machine under Linux, and Windows (the latter was running under VMware),
- all numbers were taken on the 3rd run,
- all numbers are in milliseconds.
You are invited to check out the code of the test program. I didn’t massage the numbers and you can spot certain artifacts, when the results look a little strange. Numbers change from run to run, and if we were unlucky and caught some massive garbage collection, the numbers would be a little bit inflated. Unfortunately most browsers don’t allow JavaScript to run more than several seconds in a row, and show an alert about “unresponsive script” making the result invalid. But the numbers are real, just disregard small differences due to inaccuracy if timers, and look for major trends.
The results are in.
As you can see while special versions of these simple functions written with the knowledge of their respective domains are extremely fast, our optimized generic versions can match and in some cases exceed the naïve versions without investing a lot of time in writing them. Obviously, if we are to test more calculation-heavy recursive functions (both factorial and Fibonacci use very light-weight snippets), the difference between the generic version and the original version will be much smaller, if any.
The conclusion: it pays off to abstract algorithms, because you can improve them independently of the applications they are used in.