My programming language idea
Posted: Wed Apr 16, 2008 7:17 pm
About a year ago, I had an idea about a distributed massively parallel programming language based on the concept of actors. Basically, I wanted many computers connected over a rather high-speed network to act as one computer with many cores. I started off with some really bad examples but about 5 attempts later, I sat down and actually designed a system, which I believe is very robust. I started coding about three months ago and I've since created a distributed object system and a programming language parser and interpreter.
Anyway, the whole system is at its core a distributed actor system. Actors are located on nodes throughout the network and can be migrated between them. Everything is an actor including primitives(well, most of the time). Actors can send messages to other actors and messages are handled asynchronously because synchronization and atomicity is handled by transactions. Transactions either fail or update an actor's state atomically.
There are many layers to it that allow it to operate efficiently. Let me explain them:
The Routing Layer
First of all, any distributed language must be able to send messages to the computers that are taking part in the network. There is one central computer(known as the "Brain") that handles all the computers. The brain also delegates tasks out to other computers(called "Vertebrae"). Lastly, the vertebrae are connected to nodes. The Vertebrae and the Brain form the backbone over which messages are sent. When a node sends a message, the vertebrae try to find the receiver node in their children or by using a resource cache. If they cannot forward the message to the right place, they forward the message to their parent vertebrae(which might be the brain). I haven't optimized this much as actor migration still isn't working. However, I am trying to make it that the brain never has to do any more than any other vertebrae.
The Actor Layer
This layer uses the routing layer as transport. Basically, each actor is identified by a unique 64-bit resource id. Resource ids are allocated by the brain in a rather simple algorithm which is harder to explain in english than in C. Anyway, to send a message to an actor, the node first sees if the actor exists on the current computer. If it does, sending a message is rather simple: simply invoke a function. IF it is on the network, the node must find the node that has that specific actor. Once it is found, the node sends a message over the routing layer to it. The current actor is put to sleep while the message is being handled on the remote server. Actors run in a cooperatively-scheduled fiber system which fully takes advantage of multi-core computers. I chose a fiber system because creating new threads for each network request seemed to slow an too much(think of an addition message being sent: I would have to spawn a new thread just to add a number? that's a bit wasteful).
The language
The language syntax is very loosely based on Lisp. However, the semantics are more C-like. The only reason I chose the Lisp based syntax is because it is so much easier to parse. I have written a simple Lex and Yacc parser which only puts the syntax into a large tree. The bulk of the actual parsing is handcoded. This handcoded parser then converts the tree into a bytecode listing which is then optimized a little, serialized, and made into its own actor(a function actor). The language is still in its infancy, but it will support many parallel constructs. Right now it has built-in support for threading and parallelizing loops. Later on, I plan on having a "psyco"-like JIT specializer
Here is a simple "Hello world" actor:
It's a bit self-explanatory so I won't explain.
I know I haven't been very detailed and that I probably misstated some things as I have written this really quick. Let me know what you think of the idea and of my design. Ask me anything you'd like. I'm going to be getting a website soon and when I do I'll post it up here so you could look at some of the source).[/code]
Anyway, the whole system is at its core a distributed actor system. Actors are located on nodes throughout the network and can be migrated between them. Everything is an actor including primitives(well, most of the time). Actors can send messages to other actors and messages are handled asynchronously because synchronization and atomicity is handled by transactions. Transactions either fail or update an actor's state atomically.
There are many layers to it that allow it to operate efficiently. Let me explain them:
The Routing Layer
First of all, any distributed language must be able to send messages to the computers that are taking part in the network. There is one central computer(known as the "Brain") that handles all the computers. The brain also delegates tasks out to other computers(called "Vertebrae"). Lastly, the vertebrae are connected to nodes. The Vertebrae and the Brain form the backbone over which messages are sent. When a node sends a message, the vertebrae try to find the receiver node in their children or by using a resource cache. If they cannot forward the message to the right place, they forward the message to their parent vertebrae(which might be the brain). I haven't optimized this much as actor migration still isn't working. However, I am trying to make it that the brain never has to do any more than any other vertebrae.
The Actor Layer
This layer uses the routing layer as transport. Basically, each actor is identified by a unique 64-bit resource id. Resource ids are allocated by the brain in a rather simple algorithm which is harder to explain in english than in C. Anyway, to send a message to an actor, the node first sees if the actor exists on the current computer. If it does, sending a message is rather simple: simply invoke a function. IF it is on the network, the node must find the node that has that specific actor. Once it is found, the node sends a message over the routing layer to it. The current actor is put to sleep while the message is being handled on the remote server. Actors run in a cooperatively-scheduled fiber system which fully takes advantage of multi-core computers. I chose a fiber system because creating new threads for each network request seemed to slow an too much(think of an addition message being sent: I would have to spawn a new thread just to add a number? that's a bit wasteful).
The language
The language syntax is very loosely based on Lisp. However, the semantics are more C-like. The only reason I chose the Lisp based syntax is because it is so much easier to parse. I have written a simple Lex and Yacc parser which only puts the syntax into a large tree. The bulk of the actual parsing is handcoded. This handcoded parser then converts the tree into a bytecode listing which is then optimized a little, serialized, and made into its own actor(a function actor). The language is still in its infancy, but it will support many parallel constructs. Right now it has built-in support for threading and parallelizing loops. Later on, I plan on having a "psyco"-like JIT specializer
Here is a simple "Hello world" actor:
Code: Select all
(Actor simple:
(msg init:
(System.console out: "Hello, world!")
)
)
I know I haven't been very detailed and that I probably misstated some things as I have written this really quick. Let me know what you think of the idea and of my design. Ask me anything you'd like. I'm going to be getting a website soon and when I do I'll post it up here so you could look at some of the source).[/code]