OSDev.org

Posted: **Wed Apr 16, 2008 7:17 pm**

About a year ago, I had an idea about a distributed massively parallel programming language based on the concept of actors. Basically, I wanted many computers connected over a rather high-speed network to act as one computer with many cores. I started off with some really bad examples but about 5 attempts later, I sat down and actually designed a system, which I believe is very robust. I started coding about three months ago and I've since created a distributed object system and a programming language parser and interpreter.

Anyway, the whole system is at its core a distributed actor system. Actors are located on nodes throughout the network and can be migrated between them. Everything is an actor including primitives(well, most of the time). Actors can send messages to other actors and messages are handled asynchronously because synchronization and atomicity is handled by transactions. Transactions either fail or update an actor's state atomically.

There are many layers to it that allow it to operate efficiently. Let me explain them:

The Routing Layer
First of all, any distributed language must be able to send messages to the computers that are taking part in the network. There is one central computer(known as the "Brain") that handles all the computers. The brain also delegates tasks out to other computers(called "Vertebrae"). Lastly, the vertebrae are connected to nodes. The Vertebrae and the Brain form the backbone over which messages are sent. When a node sends a message, the vertebrae try to find the receiver node in their children or by using a resource cache. If they cannot forward the message to the right place, they forward the message to their parent vertebrae(which might be the brain). I haven't optimized this much as actor migration still isn't working. However, I am trying to make it that the brain never has to do any more than any other vertebrae.

The Actor Layer
This layer uses the routing layer as transport. Basically, each actor is identified by a unique 64-bit resource id. Resource ids are allocated by the brain in a rather simple algorithm which is harder to explain in english than in C. Anyway, to send a message to an actor, the node first sees if the actor exists on the current computer. If it does, sending a message is rather simple: simply invoke a function. IF it is on the network, the node must find the node that has that specific actor. Once it is found, the node sends a message over the routing layer to it. The current actor is put to sleep while the message is being handled on the remote server. Actors run in a cooperatively-scheduled fiber system which fully takes advantage of multi-core computers. I chose a fiber system because creating new threads for each network request seemed to slow an too much(think of an addition message being sent: I would have to spawn a new thread just to add a number? that's a bit wasteful).

The language
The language syntax is very loosely based on Lisp. However, the semantics are more C-like. The only reason I chose the Lisp based syntax is because it is so much easier to parse. I have written a simple Lex and Yacc parser which only puts the syntax into a large tree. The bulk of the actual parsing is handcoded. This handcoded parser then converts the tree into a bytecode listing which is then optimized a little, serialized, and made into its own actor(a function actor). The language is still in its infancy, but it will support many parallel constructs. Right now it has built-in support for threading and parallelizing loops. Later on, I plan on having a "psyco"-like JIT specializer

Here is a simple "Hello world" actor:

Code: Select all

(Actor simple:
  (msg init:
     (System.console out: "Hello, world!")
  )
)

It's a bit self-explanatory so I won't explain.

I know I haven't been very detailed and that I probably misstated some things as I have written this really quick. Let me know what you think of the idea and of my design. Ask me anything you'd like. I'm going to be getting a website soon and when I do I'll post it up here so you could look at some of the source).[/code]

Posted: **Wed Apr 16, 2008 7:36 pm**

iammisc wrote:Actors can send messages to other actors and messages are handled asynchronously because synchronization and atomicity is handled by transactions.

If it does, sending a message is rather simple: simply invoke a function. IF it is on the network, the node must find the node that has that specific actor. Once it is found, the node sends a message over the routing layer to it. The current actor is put to sleep while the message is being handled on the remote server.

This seems contradictory to me -- how can blocking sends be asynchronous? Also, if the receiving actor is on the same machine, how is it protected from other actors on that machine? By saying that you just call a function, that implies no memory protection, but you also said your language has C-like semantics (C being unsafe and weakly typed). Does your language include the juggling-daggers parts of C too?

Also, if you have more details on transactions that would be quite interesting to see.

Posted: **Wed Apr 16, 2008 8:19 pm**

What I meant was that actors can handle more than one message at the same time.

However, in when you call a message of one actor from another actor, the first actor blocks while the second one is handling a message(and it might be handling other messages as well).

Also, there is memory protection. Each actor has its own mutex which is used for synchronization.

Now that I look back,saying that the language has C-like semantics is not really true. I wrote this in a hurry. What I meant was that it isn't functional like Lisp but imperative like C. It also doesn't have the weird Lisp operator-as-function syntax and also has most C constructs. The way the language works is more like python and java in the sense of typing. Objects are dynamically typed.

These are the things I meant when I said I wrote this in a hurry.

Also, if you have more details on transactions that would be quite interesting to see.

The idea for transactions is that all side effects are logged. If everything works out, this log is sent to the actor to transact with and the actor commits the changes atomically(by using a mutex). The actor can also run a validation function to make sure that the data the actor contains is valid for that type of actor.

Take this example:

Code: Select all

(transact actor_to_transact_with:
   ($ value1: (function_that_may_fail:)
)

The $ sign refers to the actor being transacted with.

If function_that_may_fail doesn't fail and returns a value(let's say 42), the log will be as follows:
set value1 to 42

but if function_that_may_fail does fail, the log never is committed and an exception handler is called.

Also, the log doesn't necessarily have to be committed. The actor has the ability to validate itself and if the log would make the actor's data invalid, the actor can reject it in which case an exception handler is called.

This is just my idea of transactions. They still aren't implemented so I may have overlooked some things.

OSDev.org

My programming language idea

My programming language idea

Re: My programming language idea