What features should a systems programming language have?

Owen · Post by **Owen** » Wed Jan 29, 2014 3:16 am

Hobbes wrote:I like the idea of a native string datatype, but wonder how it may be implemented without run-time support. The compiler could inline (the equivalents of) strcpy and strcmp but that would be very space-inefficient.

You provide malloc and friends (for mutable strings). The standard library provides the rest of the support for the string type.

Combuster · Post by **Combuster** » Wed Jan 29, 2014 4:39 am

I like the idea of a native string datatype, but wonder how it may be implemented without run-time support.

The problem with many of the structural features is that they need heap support - i.e. a malloc/free pair. You can't sanely pass strings over the stack, and the same goes for continuations and passing datatypes by value.

From what I've seen with my FreeBasic porting efforts where runtime involves a significant portion but occasionally has unwanted dependencies, you will probably want to cut the language in three parts:
1: The largest subset of the language that needs no runtime except for things like a functional stack and known processor state. This should be enough to implement...
2: The largest subset of the language that can be compiled provided a functional implementation of malloc/free and nothing else (read: a functional heap). You need this to support all non-linear data flows in the language semantics, as well as dynamic array types and native string types.
3: The entire language, including all features that depend on a hosted environment.

C is so simplistic that there's no part 2. C++ does, but yet you don't have a properly defined mechanic to limit the language subset to the items you can use, and instead you have to manually disable individual language features. For a proper systems development language these separations should be defined and enforceable by the compiler. FreeBasic is troublesome here since it makes a bit of a mess here to include references to hosted features (diagnostics and such) in runtime functions where you wouldn't expect anything but a heap allocation. Plus, you don't want to stick with the choice of having to implement dummies for pretty much everything (and not be ticked off by the compiler in advance) to get the libc working.

There was an interesting discussion on forum.dlang.org and no definitive conclusion if inline assembler in D were a good idea.

Interfacing with assembly is not inline assembly. Besides, since it is platform-dependent, it's very hard to specify it as part of the language standard.

The operating systems developed at ETH Zürich have been written in languages with GC since the 80's and were actually in daily use until this millennium. So I would not outright exclude a GC.

Just because it's possible doesn't make it a good idea. Having a GC takes away your control over memory and puts it in a black box. If you wrote the thing, you know how the black box works and you can use it to force allocation semantics when you need them, but it'll make it hard to maintain for everyone that doesn't know how the system works. Also, if the garbage collector gets changed, memory allocation semantics change as well with all the effects thereof.

Could you please elaborate what you mean with continuations,

Full-thread continuations are harder and in some contexts the wrong solution, but the core idea involves that instead of just being able to pass a function pointer, you pass a code reference with arbitrary amounts of auxiliary data. The typical C construct for doing anything close to this is a tuple of ( type(*)(void*), void* ) which allows you to pass a function and data separately, and toss the data as the argument of the function. It works but it's not typesafe, the called function has no idea on how to manage the memory in the data side, and you have to write a struct, as well as the marshalling and demarshalling explicitly when the compiler can easily automate it for you. Doing this for entire threads and creating an continuation is impossible in C without doing platform-specific tricks. C does allow you to pull off its little brother the closure in its verbose and unsafe form, although there are fixes for that (example). Both closures and continuations are extremely powerful mechanics if made accessible.

skeen · Post by **skeen** » Wed Jan 29, 2014 9:05 am

I'd actually prefer the language to be somewhat seperated into two parts, namely a low level part and a high level part.

For the low level part I'd like;

1. Link-ability with assembly language
- * no inline assembly
2. Raw pointers, and direct memory access
- * type system units used for distinction between physical and virtual addresses.
3. Layout datastructures (Full fucos on physical layout)
- * no reordering
  * no padding
  * no funny buisness
4. No runtime dependency at all

For the high level part;

1. Logical datastructures (No focus on physical layout)
- * reordering
  * padding
  * splitting data structures
  * other optimization
2. Automatic reference counting for memory management
- * No garbage collection!
3. Nice abstractions to provide interfaces
- * Generics
4. A well defined memory model
- * Destructors
5. Focus on stack based allocations and value semantics
6. Support for other paradigms than OOP!
- * Generic and functional
7. Standard library, with minimal dependencies
- * Lambdas, Tuples, ect.
8. May require a minimal runtime
- * Memory allocation
  * Lite stack unrolling
9. Error handling
- * Exceptions, ect.

For both;

1. A strong type system, with a reasonably low amount of type annotations
- * No mixing of numerical and boolean types
  * Minimal implicit casting
  * Heavy static analysis
2. Strong seperation of logical and physical entities
- * No weird inbetween structures, like bitfields
3. Have the compiler do a lot of hard work for you

bwat · Post by **bwat** » Wed Jan 29, 2014 9:16 am

Owen wrote:Also: In the general case I'd exclude languages with a large GC emphasis from the general systems programming language domain, because they have a high impedance with implementing, say, a kernel for a non-GC'd userland efficiently

There's no reason why a kernel implemented in a language with automatic memory management needs to impose such a scheme on the userland applications. So why specifically is a kernel implemented in a language with GC inefficient for a userland implemented without GC?

bwat · Post by **bwat** » Wed Jan 29, 2014 9:21 am

Combuster wrote:2: ... because GC is not done in system languages)

Programmers who have worked at TI, Xerox, Symbolics, and Tektronix would say otherwise.

Combuster wrote: creating an continuation is impossible in C without doing platform-specific tricks. C does allow you to pull off its little brother the closure

Continuations and closures are two very different things. A continuation is a computational context, e.g., a snapshot containing the evaluation, environment and control stacks, whereas a closure is a piece of code/expression/formula that has no unbound variables.

Some languages represent continuations with closures but that is just how they chose to reify them. You could easily represent a continuation with a value of a new data-type which must be passed to a specific continue function along with any values that are to be returned to the computation that is continued.

bwat · Post by **bwat** » Wed Jan 29, 2014 9:51 am

skeen wrote: 2. Automatic reference counting for memory management

* No garbage collection!

But reference counting is garbage collection (automatic memory management).

HoTT · Post by **HoTT** » Wed Jan 29, 2014 10:42 am

Run-time error handling needs to be explicit too. For example, if an application asks to spawn a thread and your code tries to allocate memory for a "thread control block" but can't because you've run out of memory, then you want to return an error back to the application. You don't want the entire OS to crash because some idiot thought "new" was a good idea.

What mechanism to handle errors do you prefer?

HoTT · Post by **HoTT** » Wed Jan 29, 2014 10:42 am

skeen wrote:
2. Automatic reference counting for memory management
* No garbage collection!

But reference counting is garbage collection (automatic memory management).

Thanks, I was just going to write this

Acutally I'm thinking that a language needs a good way to keep track of the lifetime of resources in general, where memory is only one of. So, even if you have a GC and make heavy use of it, you still want to have the machinery for keeping track of everything else.

So the language should not depend on the GC, but some features might, for example: returning closures from a function, if those are allocated on some kind of global storage (e.g. heap).

OSwhatever · Post by **OSwhatever** » Wed Jan 29, 2014 10:48 am

When programming an OS you often get into some kind of garbage collection using reference counting if your kernel is supposed to work in an SMP system. However, a generic garbage collector that you'd find in a user mode library is often not fit for kernels. I have a reference counted garbage collection in my kernel but it is really a specialized piece of code. Different objects are treated differently and different thresholds when to run the garbage collector which is something you usually don't find with generic user mode GC. Because of this I don't really find any use for garbage collected languages for kernel programming. GC is great in user mode but in kernel GC often needs to be specialized.

Owen · Post by **Owen** » Wed Jan 29, 2014 11:30 am

bwat wrote:
Owen wrote:Also: In the general case I'd exclude languages with a large GC emphasis from the general systems programming language domain, because they have a high impedance with implementing, say, a kernel for a non-GC'd userland efficiently
There's no reason why a kernel implemented in a language with automatic memory management needs to impose such a scheme on the userland applications. So why specifically is a kernel implemented in a language with GC inefficient for a userland implemented without GC?

Garbage collection performance is traded off with memory usage. Current state of the art garbage collectors match performance with manual memory management around 3x the memory usage (in the "active" working set), so a GC'd kernel is going to use 3x as much memory or be slower (And even at 3x memory usage, there are consistency issues - the amortized performance might be the same, but thats little consolation if a GC cycle causes your whole system to pause for a while). Concurrent collectors make bigger memory - performance trade offs.

A GC'd kernel is either going to be less performant or use more memory (and likely that memory is going to be non-swapable). If you're developing a managed OS, you can devise solutions to amortize this cost (Because its' all on one heap, you need less indirection for a start, so can reduce general memory consumption to compensate)

HoTT · Post by **HoTT** » Wed Jan 29, 2014 11:46 am

A GC'd kernel is either going to be less performant or use more memory (and likely that memory is going to be non-swapable). If you're developing a managed OS, you can devise solutions to amortize this cost (Because its' all on one heap, you need less indirection for a start, so can reduce general memory consumption to compensate)

I'd like to say that having a garbage collector does not imply it is used for every memory allocation. On the other hand if you have a GC that uses a RC scheme language support can eliminate many accesses to the reference counts.

bwat · Post by **bwat** » Wed Jan 29, 2014 11:58 am

Owen wrote:Garbage collection performance is traded off with memory usage.

I agree.
http://www.cs.princeton.edu/~appel/papers/45.ps

Owen wrote: Current state of the art garbage collectors match performance with manual memory management around 3x the memory usage (in the "active" working set), so a GC'd kernel is going to use 3x as much memory or be slower

I implemented Cheney's algorithm in my Scheme compiler that is faster than straight malloc without free (extreme opposite of managed memory). That would be 2x in your terms.

Owen wrote: (And even at 3x memory usage, there are consistency issues - the amortized performance might be the same,

Can you define consistency here?

Owen wrote:
but thats little consolation if a GC cycle causes your whole system to pause for a while). A GC'd kernel is either going to be less performant or use more memory (and likely that memory is going to be non-swapable).

Is "performant" a word? And if so, does it not rely on some specification you've not revealed (w.r.t. acceptable pause times, collection times, acceptable heap sizes, available memory etc.).

HoTT · Post by **HoTT** » Wed Jan 29, 2014 1:20 pm

Combuster wrote: bwat wrote:
skeen wrote:
2. Automatic reference counting for memory management
* No garbage collection!

But reference counting is garbage collection (automatic memory management).
wikipedia wrote:
In computer science, garbage collection (GC) is a form of automatic memory management.
I call a troll.

According to this book reference counting is a form of garbage collection. However we should just agree on a common
terminology and continue the discussion.

Owen · Post by **Owen** » Wed Jan 29, 2014 2:55 pm

bwat wrote:
Owen wrote:Garbage collection performance is traded off with memory usage.
I agree.
http://www.cs.princeton.edu/~appel/papers/45.ps

Owen wrote: Current state of the art garbage collectors match performance with manual memory management around 3x the memory usage (in the "active" working set), so a GC'd kernel is going to use 3x as much memory or be slower
I implemented Cheney's algorithm in my Scheme compiler that is faster than straight malloc without free (extreme opposite of managed memory). That would be 2x in your terms.

The allocation is certainly faster (add ptr, size; cmp ptr, size_of_heap; if_greater call compact)

However, how fast is garbage collection? It requires scanning a sizable portion of the heap. The amortized cost of (allocate + collect) is probably more than the equivalent (malloc + free), especially as in non-GC languages stack object references are often passed around which in GC languages must be on the heap

bwat wrote:
Owen wrote: (And even at 3x memory usage, there are consistency issues - the amortized performance might be the same,
Can you define consistency here?

All memory allocators need to do some book keeping. State of the art mallocs largely have ~constant book keeping overhead per call, with occasional small spikes. GCs are spikier; allocation is normally really cheap, but occasionally it decides it needs to collect and spikes somewhat

bwat wrote:
Owen wrote: but thats little consolation if a GC cycle causes your whole system to pause for a while). A GC'd kernel is either going to be less performant or use more memory (and likely that memory is going to be non-swapable).
Is "performant" a word? And if so, does it not rely on some specification you've not revealed (w.r.t. acceptable pause times, collection times, acceptable heap sizes, available memory etc.).

That depends upon your system, but: if collection ever takes >1ms, that's probably too much for precise timing (quite possibly significantly too much, looking more towards ~200μS) dependent apps. What this means is your allocator must be per-emptible, which means that, for example, your scheduler can't use it*. This places constraints on use of a number of features of garbage collected languages

* It does at least have the advantage over non-GCd kernels that even if it can't malloc, it can free by beauty of that being just dropping the object on the floor

bwat · Post by **bwat** » Wed Jan 29, 2014 3:26 pm

Owen wrote: However, how fast is garbage collection? It requires scanning a sizable portion of the heap. The amortized cost of (allocate + collect) is probably more than the equivalent (malloc + free), especially as in non-GC languages stack object references are often passed around which in GC languages must be on the heap

The bigger the heap the less often the scan. My GC alloc routine including collection was faster than Linux malloc. My alloc routine in my Scheme runtime is

Code: Select all

void * GC_alloc (unsigned int size)
{
  void * addr;

#ifdef GARBAGE_COLLECTION_TEST
  addr = malloc(size);
#else
  if(GC_free + size >= GC_tospace + GC_space_size)
    {
      GC_flip();
      if(GC_free + size >= GC_tospace + GC_space_size)
        {
          fprintf(stderr, "Out of memory! - tried to allocate %u bytes\n", size);
          GC_dump();
          exit(EXIT_FAILURE);
       }
    }

  addr = (void *)GC_free;
  GC_free += size;
#endif

  return addr;
}

Times for auto-compilation of the cold compiler with GC (#undef GARBAGE_COLLECTION_TEST)
real 0m2.810s
user 0m2.736s
sys 0m0.044s

Times for auto-compilation of the cold compiler with malloc (#define GARBAGE_COLLECTION_TEST)
real 0m5.313s
user 0m4.828s
sys 0m0.444s

With GC, the heap was flipped (call of GC_flip which is the mark, scan and copy routine) 15 times. Heap size is 10000000 bytes which is roughly 9.5 megs (1 meg is 1024*1024 bytes for me).

Owen wrote: That depends upon your system, but: if collection ever takes >1ms, that's probably too much for precise timing (quite possibly significantly too much, looking more towards ~200μS) dependent apps. What this means is your allocator must be per-emptible, which means that, for example, your scheduler can't use it*. This places constraints on use of a number of features of garbage collected languages

Yep, predictable or precise timing would need some work and easily be not worth it, I agree. I've seen real-time Lisp processes not generate garbage to avoid collection, and systems where each interrupt service routine had its own GC'd heap.

OSDev.org

What features should a systems programming language have?

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have

Re: What features should a systems programming language have