Tuesday, August 19, 2008

Update: I/O

It's been some time since my last post on this blog, so I feel obligated to report a bit on what I've been doing.


We call it Input-Output


Currently I'm working on the I/O subsystem. I'm trying to concentrate on this now instead of adding various features to SNAP (and thinking up various axioms to help make Arc a better language for creating building blocks), and I'm forming a little backlog list while I'm building I/O.


One problem we have here is concurrent access to an I/O port. Of course, concurrent access to a port doesn't, quite, make sense: if you want to keep track of which one of several processes should be accessing the port right now, you'd have to use some sort of serializing system (i.e. message passing). In general having just one process keep access to the port and have it handle the synchronization will be simpler and probably easier to maintain.


However, the point is that in SNAP we will allow you to do this while making sure that the virtual machine doesn't crash as a whole, and that your process doesn't crash others just because they are effectively sharing the resource.


The other problem is that we'll be using green threads in the execution subsystem. This means that context switching is done in an explicit manner, and in theory, it should be possible to "run" multiple processes even without using OS threads. This means that we have to use nonblocking and/or asynchronous I/O.


No time to wait


Typically, I/O operations will wait for the I/O to complete. In the case of input from a user terminal or from a network socket, this means that if data is not available, we must wait.


However, waiting is not acceptable: we might have some other process that could be running and isn't going to use that port. This means that we should be able to determine if an I/O port has data available, or can accept data, and only talk to the I/O port if so; we need to use asynchronous I/O.


Surprisingly, Microsoft Windows seems to be better at asynchronous I/O than Unix-based OS'es. POSIX defines an asynch I/O interface but it doesn't appear to be well supported among otherwise POSIX-compliant operating systems, and we have a hodgepodge of interfaces, such as the Linux-only epoll and the Sun-only kqueue. Some of these interfaces are not even well supported and/or particularly stable; the only thing that appears reliable is the most basic select(), which has efficiency problems. (and of course, efficiency is never a concern, unless it is)


So, the I/O system backend has to be easily swapped with other back-ends. I'm currently implementing around libev, which was inspired by libevent. libev is newer (and consequently, probably less bug-free) and faster, but is limited only to the hodgepodge of interfaces supported by Unix-likes, while libevent is older and more well developed, and includes a Windows backend.


The I/O system backend, however, is presented to the rest of the SNAP VM world by the Central I/O Process.


The Central I/O Process


The Central I/O Process handles all the I/O done in the system, and feeds it into the backend. This allows the backend to be lock-free: it can only be run from one OS-level thread, specifically whichever drew the short stick and got the central I/O process. (libev and libevent supposedly properly support multiple threads, as long as you use the "reentrant" interface functions, but I'd rather use the default interface)


The Central I/O Process, like any good process, can also accept messages and send them. It accepts a set of "request" messages, which includes a tag, the source process, and the port data object, and when the backend has completed the task, sends a "response" message - either an "ok" message or an error - back to the requesting process.


Crucially, the Central I/O Process keeps its hands off the port data object. The exact format of the port data object is not known by the Central I/O Process; the port data objects are created and used only by the backend. Thus, the port data objects are effectively opaque to the rest of the SNAP virtual machine.


The Arc I/O Ports


But having a Central I/O Process is not sufficient. The problem is that part of the backend's assumptions include the fact that at any one time, for a particular port, only one asynchronous I/O event is on-going. This means that access to the I/O ports must be synchronized. In SNAP and similar message-passing concurrency environments, synchronization is handled by isolating the synchronized resource into a separate process.


Thus, the I/O ports on the Arc side are not even the opaque port data objects; they are wrappers around a process ID for a process which handles the synchronization of the actual port data objects.


Yes, asynchronous I/O is hard. ^^

No comments: