Sunday, July 27, 2008

Shared Nothing Arc Processes: the introduction

Shared Nothing Arc Processes (SNAP) is a virtual machine designed for a massively multiprocess - where communications are done by shared-nothing message passing - implementation of Arc.


Why shared-nothing message passing?


As an electronics engineer doing mostly digital designs, I think I can safely say that multicore, highly parallel programming is the future. ^.^ I find Erlang interesting, although I don't really like its syntax.


Of course, there's another alternative for coordinating multiple processes, STM. And maybe it can be done for Arc. There's just one problem: I can grok shared-nothing message passing, but I can't grok STM. So, at least for now, shared-nothing message passing it is.


Why Arc?


Because it's very similar to Cadence SKILL, the extension language of Cadence, which develops Electronic Design Automation (EDA) tools. I'm an electronics engineer specializing (somewhat) in IC design and testing, which means that I make use of Cadence products quite a bit in the office; SKILL was the first Lisp-like I seriously programmed in.


Arc and SKILL have the following similarities:



  • Lisp-1, at least for SKILL++ mode.

  • t and nil

  • List-based macros, like in Common Lisp and unlike hygienic macros in Scheme


Why bother?


Why do this, when there's already a good, mature implementation of shared-nothing message passing, Erlang? Why do this when PG has finally released Arc, and some dabblers are building Arc implementations from scratch all over, including at least one compiler that compiles to C, another one that compiles to native x86 code, and an implementation in Java? And Lisp-likes are already being created with shared-nothing message passing, such as an on-going (as of Jul 2008) LispNYC Summer of Code project.


Well, the world needs yet another dead open source project.


Okay, it's mostly because I'm curious about how to implement an efficient language system from scratch, and I'm curious about the special problems that, say, JIT faces when another thread might be compiling the same code. Also, I happen to like Arc, but I have some issues with its axioms, and I'm also using SNAP as a sort of testbench to treat what I feel are better axioms for Arc.


Goals



  1. Massively multiprocess. If Erlang can launch hundreds of thousands of processes, SNAP should too!

  2. Support OS threads, so we can take advantage of multiple cores if the OS can do so. By the same token SNAP should also support not using OS threads, meaning that operations that would block a single process should still allow other processes to continue.

  3. Efficiency, because there's little point in multiprocessing if you're just being inefficient. This includes interpreter speed, as well as efficiency in garbage collection and message passing.

  4. Make a good standard for Arc. Including ways of introspecting into closured functions (which no Lisp-like has ever done), as well as introspecting function code, to allow serialization of functions from the Arc-side.