Monday, July 28, 2008

reducto: the most complex single bytecode in SNAP

stefano recently brought up a potential efficiency issue in implementations of Arc. Specifically, the problem is that some variadic functions, such as '+, are really the reduction of a more basic function on its arguments, i.e.


(def + rest
  (reduce <base>+ rest))

The problem is that variadic functions normally put their arguments in a list, and in the most common case, '+ and related functions (such as '-, '*, and '/) will only be called on two arguments. Consing here is just a waste of memory and makes the dreaded GC loom nearer.


The solution which stefano proposed is to inline the code, so that potentially (+ a b c d) will become (<base>+ (<base>+ (<base>+ a b) c) d). However, inlining is a bit difficult in a dynamic language with 'eval; you need some way to "uninline" the code if the global functions are redefined.


(it also assumes that '+ associates left-to-right, but it's quite valid for the programmer to modify '+ to work right-to-left)


My alternative, since I am defining my own virtual machine and bytecode interpreter, is simply to make a really, really complex bytecode which handles this problem and avoids allocating in the most common cases.


reducto


reducto effectively implements the reduction function and avoid allocation for the common cases where only two parameters are needed. However, it is not completely standalone; it also needs to know the function(s) to apply. It thus expects that the function it is part of contains three entries in its closure, which are three functions or applicable objects.


The reduction is only useful if two or more arguments are given; special handling must be specified for the case where 0 or 1 arguments are given. For example, '- negates its argument if only 1 argument is given. This is why the closure has 3 function entries - (1) a function that handles the 0-argument case, (2) a function that handles the 1-argument case, and (3) a function that is used for reduction, and handles 2-argument and higher cases.


Running reducto: the simple cases


reducto chooses which function to apply based on the number of arguments it finds on the stack. Two of the arguments are the current closure and the continuation closure; the rest of the stack entries are the arguments given by the Arc code (the "Arc arguments"). If 2 or fewer Arc arguments are on the stack, then it simply directly indexes the closure: the 0-argument function is in closure index 0, 1-argument is in closure index 1, etc. For such simple cases, it performs the call directly, modifying only the "current closure" entry of the stack - no allocation is done. Simple, direct, and cheap in both memory and time.


Of course, the interesting bit is for the non-trivial cases where there are 3 or more Arc arguments.


3 arguments or more


For the non-trivial cases, reducto must allocate a closure and store the extra arguments there; in order to reduce allocation, it uses a continuation closure, which can be reused. The continuation closure holds the current continuation, the function to invoke, and all the arguments except the first two, as well as an index integer.


The actual size of the created closure depends on the number of arguments. For each extra argument beyond the first two, a slot must be reserved for that argument. In addition, the continuation closure must store the final continuation, the function to invoke, and the index; in effect, the size of the closure corresponds to 1 + number of Arc arguments.


In the C++ source, the reducto_continuation executor handles the actual reduction operation; it refers to the continuation closure and calls the reduction function for each argument.


The index to be specifies which argument to call with next. Now, SNAP integers are currently unboxed; fortunately, we know that the index number will not be used by anything else, so we can simply directly modify the contents of the box without additional allocation.


Handling nonreusability


Recall that 'ccc could make a continuation closure non-reusable. In such a case, we cannot modify the index integer.


Since the index integer is now nonmodifiable and the continuation closure is now immutable, we will have to create a fresh continuation closure and copy the contents. This allows 'ccc to operate properly.


(note that now that i've thought about it, we might actually still be able to just create a closure that contains an index number and a reference to the non-reusable continuation closure, and suffer through an additional indirection, which would be a small price to pay compared to copying; we would need to define an additional executor for such a continuation, though)