Sunday, August 3, 2008

The base functions

In a few of my recent posts on arclanguage.com, as well as the reducto discussion on this blog, I have been showing some functions that begin with the prefix <base>. However, I have been thinking of the <base> functions for some time already; their primary motivation is to make it easier to define new object types.


What started me was when I was experimenting with defining new object types in Arc, an example of which is my create your own collection series. Now, one bit I thought would be nice for collections is to allow them to be composed, and yet still be assignable:


(= foo (file-table "/tmp/"))
(= foos foo:string)
(= foos.2 "2")

Basically, the assignment to foos above would be equivalent to (= (foo (string 2)) "2")


In order to allow this, I needed to redefine compose. Unfortunately, I needed to redefine the complete compose.


This is when I started thinking about the <base> functions.


Dividing the Concept


Conceptually, compose is simply a reduction of a more basic "function-composition" operation on its arguments. We can thus divide compose into a reducer operation, implemented using the reducto bytecode, and a more basic operation, which we would prepend with the prefix <base>.


Then anyone who wishes to override compose doesn't have to reimplement the entire function; just the part he or she is interested in: the basic operation. Instead of handling the the case for having one argument, or zero arguments, or N arguments, only the simple case - the two argument case - needs to be handled.


compose is not the only function that would benefit from this separation; the mathematical and comparison functions would also benefit. This simplifies the effort needed to implement various numeric systems, such as quaternions.


Adding Lists


Not all is well in a <base>-ic world, though. The problem is performing + on lists, which performs a copying concatenation of the lists.


If we define (+ a b c) as being, effectively, equivalent to (<base>+ (<base>+ a b) c), then the inner <base>+ will create copies of a and b, and then return it. And then the outer <base>+ will recopy the returned list, even though reusing it would have been better; we thus end up with more allocation than we wanted.


The problem is also repeated, in a less memory-pressing manner, when working with SNAP integers. SNAP integers are boxed integers, i.e. they are part of the object hierarchy and will occupy real memory. + and other mathematical operations will have to create copies of the integer.


Although there is a solution (which I hope to fully present in a future post), it involves playing some significant tricks on the type system, and will probably also need to use significant portions of a proposed multimethod dispatching scheme I've thought of.


In brief, it means defining a "temporary" type which encloses the actual type of the object. We can then define overloads of the <base> functions which will reuse the objects in the encapsulating temporary type.


For now, however, we may have to first accept that this axiom will add overhead, but for that overhead it gains greater flexibility and ease-of-use.