Saturday, August 30, 2008

Interlude: the LIFO heap

(i feel obligated to also give an update on the i/o subsystem first; i currently have a small basic core for the central i/o process, and am now trying to grok the c++ interface of libev. i implemented this bit of optimization to the machine some time ago)


As I mentioned in the follow-up to the post regarding continuation closures, one thing pointed out in the linked Parrot article, "What C's Memory Management Gets Rightish", is that a stack-like allocator for call frames helps tremendously in easing the pressure in the garbage collector, since usually (barring call/cc) the lifetime of call frames is a stack.


So, since at least one stated goal of SNAP is to have a reasonably efficient system, I decided to implement this optimization.


(it is of course a valid concern that i'm doing an optimization when i have no real performance data for this particular virtual machine, and in particular since the machine doesn't even have i/o yet, but this is from a post from someone who appears to be a developer for parrot, which is quite well developed; their opinion thus counts quite a bit.)


A LIFO Heap


...is of course a stack. Like any stack it keeps track of the current "position" using a stack pointer, and like any stack, items can be pushed onto it and popped off. Of course, the items here are variable-length items (since some closures are inevitably larger than others), but that is a trivial detail.


Note however that our stack structure is special: it has to be garbage-collectible together with the rest of the heap. This is largely because it was developed after the main garbage collection algorithms, and it's easier to just let it be garbage-collectible so that we don't have to add special cases.


The Arc process type inherits from the Heap class. All Arc objects normally belong to a single Heap, i.e. belong to a single process.


Each Heap object handles one "main" memory area, represented by a Semispace class. A Heap object might actually have several Semispaces; this is because, when a process sends a message, it copies the data structure into a new Semispace and then sends the entire Semispace to the receiving process.


When data is allocated on a Heap, it actually just allocates from its main Semispace, if there is still space available on it. The Semispace allocation is very simple: it simply increments a pointer and returns the previous value.


When the Heap object decides to perform a garbage collection, it sums up the size of the main Semispace and all received Semispaces, then creates a new Semispace to hold the data. It then copies the live data into the new Semispace, which becomes the main Semispace afterwards. The Semispace may be resized if it turns out to be much, much larger than the actual live data.


Normally, memory allocation is done via the Heap objects:


Heap& hp = proc;
.
.
.
Generic* gp = new(hp) Cons();

The new operator simply ends up invoking an alloc() member on the Heap objects.


Although a valid syntax using Semispace references also exists, it is generally used only by the Heap objects.


A LIFO Semispace


For this optimization, however, we needed to have an additional kind of Semispace, called the LifoSemispace. (it could actually have been the same type as Semispace but that class was designed before i put this optimization up, so there were some bits of it that were inappropriate) The LifoSemispace does not have any inheritance relationship to the Semispace type at all, although it does have a similar set of methods.


The Semispace object supports only deallocating the most recently allocated memory area (this is necessary in case of a thrown exception in the constructor). However, the LifoSemispace allows deallocating, in reverse order, of all objects allocated on it.


And a LIFO aspect of the Heap


A LifoSemispace, of course, does not just float around: it's handled by a Heap object. We don't directly allocate on a LifoSemispace, in much the same way that we don't normally directly allocate on a Semispace.


This does have the minor problem that the nice new(hp) syntax is already taken.


To handle this, we introduce a new class, the LifoHeap. This class is composed of a single pointer to a Heap, and is constructed from a pointer to Heap. It is thus as lightweight as a reference to a Heap.


Heap now also provides a lifo() member function, which returns a LifoHeap from the this pointer. In order to allocate from a Heap's LIFO allocation area, we simply use new(hp.lifo()):


Heap& hp = proc;
.
.
.
Generic* gp = new(hp.lifo()) ClosureArray();

First.... Out!


The LifoHeap object also provides a normal_dealloc(...) member function. This member attempts to deallocate the specified Generic object, if it happens to be 1) allocated on the LifoHeap and 2) is the most recently allocated object on the LifoHeap.


This function will silently fail if the two conditions above are not met. Since objects on LIFO allocation are still subject to garbage collection, a failed deallocation is still OK: the memory space will simply be reclaimed automatically.

Wednesday, August 27, 2008

The backlog

This is just a list of stuff I'd like to work on, as soon as I finish the I/O subsystem:



  1. Proper polymorphic functions or methods - i.e. functions that fix the curse of chains


  2. Symbol-based packages, as suggested by cchooper. Probably I'll go with what I proposed.


  3. Implement some more higher-order function bytecodes, i.e. flesh out the reducto family of bytecodes.


  4. Improve arc2c, particularly quasiquotes, destructuring function variables, and macros


  5. Add tables to the runtime, since tables are used to represent data structures in the arc2c compiler.


Tuesday, August 19, 2008

Update: I/O

It's been some time since my last post on this blog, so I feel obligated to report a bit on what I've been doing.


We call it Input-Output


Currently I'm working on the I/O subsystem. I'm trying to concentrate on this now instead of adding various features to SNAP (and thinking up various axioms to help make Arc a better language for creating building blocks), and I'm forming a little backlog list while I'm building I/O.


One problem we have here is concurrent access to an I/O port. Of course, concurrent access to a port doesn't, quite, make sense: if you want to keep track of which one of several processes should be accessing the port right now, you'd have to use some sort of serializing system (i.e. message passing). In general having just one process keep access to the port and have it handle the synchronization will be simpler and probably easier to maintain.


However, the point is that in SNAP we will allow you to do this while making sure that the virtual machine doesn't crash as a whole, and that your process doesn't crash others just because they are effectively sharing the resource.


The other problem is that we'll be using green threads in the execution subsystem. This means that context switching is done in an explicit manner, and in theory, it should be possible to "run" multiple processes even without using OS threads. This means that we have to use nonblocking and/or asynchronous I/O.


No time to wait


Typically, I/O operations will wait for the I/O to complete. In the case of input from a user terminal or from a network socket, this means that if data is not available, we must wait.


However, waiting is not acceptable: we might have some other process that could be running and isn't going to use that port. This means that we should be able to determine if an I/O port has data available, or can accept data, and only talk to the I/O port if so; we need to use asynchronous I/O.


Surprisingly, Microsoft Windows seems to be better at asynchronous I/O than Unix-based OS'es. POSIX defines an asynch I/O interface but it doesn't appear to be well supported among otherwise POSIX-compliant operating systems, and we have a hodgepodge of interfaces, such as the Linux-only epoll and the Sun-only kqueue. Some of these interfaces are not even well supported and/or particularly stable; the only thing that appears reliable is the most basic select(), which has efficiency problems. (and of course, efficiency is never a concern, unless it is)


So, the I/O system backend has to be easily swapped with other back-ends. I'm currently implementing around libev, which was inspired by libevent. libev is newer (and consequently, probably less bug-free) and faster, but is limited only to the hodgepodge of interfaces supported by Unix-likes, while libevent is older and more well developed, and includes a Windows backend.


The I/O system backend, however, is presented to the rest of the SNAP VM world by the Central I/O Process.


The Central I/O Process


The Central I/O Process handles all the I/O done in the system, and feeds it into the backend. This allows the backend to be lock-free: it can only be run from one OS-level thread, specifically whichever drew the short stick and got the central I/O process. (libev and libevent supposedly properly support multiple threads, as long as you use the "reentrant" interface functions, but I'd rather use the default interface)


The Central I/O Process, like any good process, can also accept messages and send them. It accepts a set of "request" messages, which includes a tag, the source process, and the port data object, and when the backend has completed the task, sends a "response" message - either an "ok" message or an error - back to the requesting process.


Crucially, the Central I/O Process keeps its hands off the port data object. The exact format of the port data object is not known by the Central I/O Process; the port data objects are created and used only by the backend. Thus, the port data objects are effectively opaque to the rest of the SNAP virtual machine.


The Arc I/O Ports


But having a Central I/O Process is not sufficient. The problem is that part of the backend's assumptions include the fact that at any one time, for a particular port, only one asynchronous I/O event is on-going. This means that access to the I/O ports must be synchronized. In SNAP and similar message-passing concurrency environments, synchronization is handled by isolating the synchronized resource into a separate process.


Thus, the I/O ports on the Arc side are not even the opaque port data objects; they are wrappers around a process ID for a process which handles the synchronization of the actual port data objects.


Yes, asynchronous I/O is hard. ^^

Sunday, August 3, 2008

The base functions

In a few of my recent posts on arclanguage.com, as well as the reducto discussion on this blog, I have been showing some functions that begin with the prefix <base>. However, I have been thinking of the <base> functions for some time already; their primary motivation is to make it easier to define new object types.


What started me was when I was experimenting with defining new object types in Arc, an example of which is my create your own collection series. Now, one bit I thought would be nice for collections is to allow them to be composed, and yet still be assignable:


(= foo (file-table "/tmp/"))
(= foos foo:string)
(= foos.2 "2")

Basically, the assignment to foos above would be equivalent to (= (foo (string 2)) "2")


In order to allow this, I needed to redefine compose. Unfortunately, I needed to redefine the complete compose.


This is when I started thinking about the <base> functions.


Dividing the Concept


Conceptually, compose is simply a reduction of a more basic "function-composition" operation on its arguments. We can thus divide compose into a reducer operation, implemented using the reducto bytecode, and a more basic operation, which we would prepend with the prefix <base>.


Then anyone who wishes to override compose doesn't have to reimplement the entire function; just the part he or she is interested in: the basic operation. Instead of handling the the case for having one argument, or zero arguments, or N arguments, only the simple case - the two argument case - needs to be handled.


compose is not the only function that would benefit from this separation; the mathematical and comparison functions would also benefit. This simplifies the effort needed to implement various numeric systems, such as quaternions.


Adding Lists


Not all is well in a <base>-ic world, though. The problem is performing + on lists, which performs a copying concatenation of the lists.


If we define (+ a b c) as being, effectively, equivalent to (<base>+ (<base>+ a b) c), then the inner <base>+ will create copies of a and b, and then return it. And then the outer <base>+ will recopy the returned list, even though reusing it would have been better; we thus end up with more allocation than we wanted.


The problem is also repeated, in a less memory-pressing manner, when working with SNAP integers. SNAP integers are boxed integers, i.e. they are part of the object hierarchy and will occupy real memory. + and other mathematical operations will have to create copies of the integer.


Although there is a solution (which I hope to fully present in a future post), it involves playing some significant tricks on the type system, and will probably also need to use significant portions of a proposed multimethod dispatching scheme I've thought of.


In brief, it means defining a "temporary" type which encloses the actual type of the object. We can then define overloads of the <base> functions which will reuse the objects in the encapsulating temporary type.


For now, however, we may have to first accept that this axiom will add overhead, but for that overhead it gains greater flexibility and ease-of-use.