Saturday, August 30, 2008

Interlude: the LIFO heap

(i feel obligated to also give an update on the i/o subsystem first; i currently have a small basic core for the central i/o process, and am now trying to grok the c++ interface of libev. i implemented this bit of optimization to the machine some time ago)


As I mentioned in the follow-up to the post regarding continuation closures, one thing pointed out in the linked Parrot article, "What C's Memory Management Gets Rightish", is that a stack-like allocator for call frames helps tremendously in easing the pressure in the garbage collector, since usually (barring call/cc) the lifetime of call frames is a stack.


So, since at least one stated goal of SNAP is to have a reasonably efficient system, I decided to implement this optimization.


(it is of course a valid concern that i'm doing an optimization when i have no real performance data for this particular virtual machine, and in particular since the machine doesn't even have i/o yet, but this is from a post from someone who appears to be a developer for parrot, which is quite well developed; their opinion thus counts quite a bit.)


A LIFO Heap


...is of course a stack. Like any stack it keeps track of the current "position" using a stack pointer, and like any stack, items can be pushed onto it and popped off. Of course, the items here are variable-length items (since some closures are inevitably larger than others), but that is a trivial detail.


Note however that our stack structure is special: it has to be garbage-collectible together with the rest of the heap. This is largely because it was developed after the main garbage collection algorithms, and it's easier to just let it be garbage-collectible so that we don't have to add special cases.


The Arc process type inherits from the Heap class. All Arc objects normally belong to a single Heap, i.e. belong to a single process.


Each Heap object handles one "main" memory area, represented by a Semispace class. A Heap object might actually have several Semispaces; this is because, when a process sends a message, it copies the data structure into a new Semispace and then sends the entire Semispace to the receiving process.


When data is allocated on a Heap, it actually just allocates from its main Semispace, if there is still space available on it. The Semispace allocation is very simple: it simply increments a pointer and returns the previous value.


When the Heap object decides to perform a garbage collection, it sums up the size of the main Semispace and all received Semispaces, then creates a new Semispace to hold the data. It then copies the live data into the new Semispace, which becomes the main Semispace afterwards. The Semispace may be resized if it turns out to be much, much larger than the actual live data.


Normally, memory allocation is done via the Heap objects:


Heap& hp = proc;
.
.
.
Generic* gp = new(hp) Cons();

The new operator simply ends up invoking an alloc() member on the Heap objects.


Although a valid syntax using Semispace references also exists, it is generally used only by the Heap objects.


A LIFO Semispace


For this optimization, however, we needed to have an additional kind of Semispace, called the LifoSemispace. (it could actually have been the same type as Semispace but that class was designed before i put this optimization up, so there were some bits of it that were inappropriate) The LifoSemispace does not have any inheritance relationship to the Semispace type at all, although it does have a similar set of methods.


The Semispace object supports only deallocating the most recently allocated memory area (this is necessary in case of a thrown exception in the constructor). However, the LifoSemispace allows deallocating, in reverse order, of all objects allocated on it.


And a LIFO aspect of the Heap


A LifoSemispace, of course, does not just float around: it's handled by a Heap object. We don't directly allocate on a LifoSemispace, in much the same way that we don't normally directly allocate on a Semispace.


This does have the minor problem that the nice new(hp) syntax is already taken.


To handle this, we introduce a new class, the LifoHeap. This class is composed of a single pointer to a Heap, and is constructed from a pointer to Heap. It is thus as lightweight as a reference to a Heap.


Heap now also provides a lifo() member function, which returns a LifoHeap from the this pointer. In order to allocate from a Heap's LIFO allocation area, we simply use new(hp.lifo()):


Heap& hp = proc;
.
.
.
Generic* gp = new(hp.lifo()) ClosureArray();

First.... Out!


The LifoHeap object also provides a normal_dealloc(...) member function. This member attempts to deallocate the specified Generic object, if it happens to be 1) allocated on the LifoHeap and 2) is the most recently allocated object on the LifoHeap.


This function will silently fail if the two conditions above are not met. Since objects on LIFO allocation are still subject to garbage collection, a failed deallocation is still OK: the memory space will simply be reclaimed automatically.

No comments: