Page 2 of 3

Posted: Tue Oct 16, 2007 11:25 pm
by earlz
I didn't bother to read everything, but...
I think something that is "object-based" (not oriented) would be pretty cool for OS deving..
Also, a way to shove assembly at it easily...

umm....I dunno if this would be practical, but some way to make your own calling conventions in some way....

I dunno how to make it look good or whatever, but...

Code: Select all

MY_CALL int func(){
    //....
}

void MY_CALL(int num_args,void * ...) 

I'm too tired....that isn't possible at all...lol


edit:
A must: A frikkin preprocessor that does something more than copy and paste!! something like the powerful ASM preprocessors that are out there, those capable of making preprocessing loops, generation of names, and all that....

Posted: Tue Oct 16, 2007 11:54 pm
by Colonel Kernel
Solar wrote:
Colonel Kernel wrote:Who should be doing the peeking and the poking? IMO, it should not be application developers. This leaves kernel developers and driver developers.
Unfair assumption, IMHO. For example, many of the more involved applications (databases, graphics applications, just to name two) benefit greatly from e.g. custom memory handling, implemented on top of the generic memory handling of the system. No way how good your garbage collector or how customizable your memory management system, there will always be someone requiring that bit more of performance or effectivity.
As long as that "someone" is trusted and not some random Joe off the street. My point is that there's no reason to give all applications the same access to the system just because some of them need specializations for performance reasons. Why not differentiate between the two as a matter of policy?

I agree that a certain amount of specialization is needed to get good performance, but I'm not convinced that such specialization has to break type-safety all the time. Your two examples are worth looking at since I think they show two different ways to deal with performance in such a system.

I concede that databases are a tricky example... A serious DBMS would probably do better to have its own dedicated OS, since it tends to take over all the resources of the machine anyway. DBMSes have their own memory management and often their own concurrency management as well. So, why not have a DBOS (this has been done before BTW)? Now, why couldn't that DBOS implement its memory and thread management as a trusted base, and everything else as type-safe code?

For graphics applications, what kind of specialization did you have in mind? I'm thinking it would be mainly related to memory handling -- allocating large blocks to store raster images, avoiding array bounds checking, etc. One of the things that MS Research has found with their work on Bartok and Singularity is that by keeping processes "closed" so that all the code that will run in a process is known ahead of time, it becomes possible to apply much more aggressive whole-program optimizations than were previously possible. A lot of array bounds checks and other run-time checks can be optimized out as a consequence.

Also, research into dependent types could yield promising results. In a dependently-typed language, you can tie types to run-time values to a certain extent. For example, a dynamically-allocated array's bounds could become part of its type identity, giving the compiler enough information to omit bounds checks nearly all the time.

I highly recommend Tim Sweeny's presentation The Next Mainstream Programming Language. There's a lot of neat stuff in there on how advances in static typing can give us safety and good performance.
I still don't understand how you can object to a theoretical systems architecture because it prevents you from hacking, when no one will force you to use it. Seriously, it's bizzare.
This is called constructive criticism. Of course you can go ahead and build a "perfect" system, but you should actually be happy if people step up and tell you why it wouldn't be as "perfect" for others, before you learn it the hard and frustrating way at the end of a long development.

To put it in other words, I don't understand how you can object to a criticism when no one is forcing you to take it into account. (Note big smiley --> ) :wink:
I think you misunderstand me... the use of type-safe languages in OS architecture is not my idea (it's not even a new idea), nor is it something I'm going to attempt in my own OS project. I'm not even telling other people that they should embrace this idea (which would be the height of hubris).

What I've been getting totally exasperated trying to say is that this idea is powerful and seductive because of the problems it promises to solve. I think we should at least seek to understand it fully so we can evaluate its advantages and disadvantages.

To put it simply, I am actively seeking constructive criticism of someone else's idea, but instead what I find is a lot of misunderstanding at best, and downright FUD at worst. It's just frustrating trying to have an intelligent conversation about something when the first thing people bring to the table are their prejudices about Java, .NET, Microsoft, VMs, interpreted languages, GC, etc., which as I've been trying to say, are either incidental or completely unrelated to the core (and very broad) idea of an OS based on language safety.

To use an analogy, the reaction I see is like someone who prefers a car with a manual transmission saying that automatic transmissions should never have been invented.

So far in my attempts at sparking real debate on this idea, I've raised a pretty coherent (IMO) objection -- that it may limit language choice too much by imposing certain requirements on the proof-carrying code that compilers targeting the system must produce.

Today I thought of another objection. I recently watched a lecture by Dave Patterson (of "Hennessy & Patterson" fame) about the future challenges our industry faces because of the shift towards explicit parallelism. One thing he mentioned was that as the feature size of CPUs shrink (65nm and falling), the number of "soft" errors increases. This means random weirdness that no amount of static code verification can solve. Perhaps MMUs are more likely to mitigate the negative consequences of such "soft" errors. In engineering terms, perhaps a better way to make things more robust is to expect them to fail, but design a way for easy recovery (Erlang is based on this principle).

So, above are two pieces of constructive criticism. Can I maybe get more than just "I don't like safe languages!" and "gimme back my pointers!" from other people...?
If progress was measured in terms...
{Warning sounds} Leaving the ground of discussion, entering argument...
Yes, I apologize for getting a little testy. I've just found it very frustrating to get this idea across, as I said above.
The OSes of the future should be designed in such a way that the amount of code that you'd be inclined to write in such a special-purpose "systems programming language" is as small as possible.
Who has the authority to define what "the OSes of the future", all of them, should look like?
You quoted me out of context. Here is the full quote:
I think you've largely missed the point of the idea though, which is this: The OSes of the future should be designed in such a way that the amount of code that you'd be inclined to write in such a special-purpose "systems programming language" is as small as possible.
This is a hypothesis, not a diktat. It is the same one put forth by the microkernel folks: Too much uber-privileged software leads to reliability and security problems. The kernels of Windows, Linux, and OS X are millions of lines of code. How many vulnerabilities do you think lurk in there, just waiting to be discovered? The type-safety folks are basically saying that you can have your cake (a microkernel) and eat it too (good performance; zero-copy IPC; etc.). So it's up to us to figure out if there is any arsenic in the cake. ;)
Taking away pointers and adding garbage collection might be a good thing for some, but you are taking away a freedom to be replaced with a feature - which not everyone might be OK with.
I suppose the ability to write unsafe code and expect it to run fast could be considered a freedom... There is no reason such unsafe code couldn't be run in a separate address space for safety/backwards compatibility/etc. It will just have slower IPC to the rest of the system. Given the problems such code causes though, I think the trade-off is worth it.

As far as GC goes, what about the ability to choose your own GC? GCs are trusted code, at least for now, but if someone has root and really feels like it, they can create and install a new GC... It would be only slightly more dangerous than installing a kernel-mode driver is today. I think it's nuts, but I'm especially paranoid. ;)

In the future, when type-safe GCs become possible, even this issue will become moot.
The principle of least privilege says we should not be granting things kernel-like powers unless they really, really need it.
Uh-huh... and by designing a system in a way that it requires something to be written in a specific way (manifest, "safe" language etc.), you take away the ability to do it differently if one really, really needs it.

I am not saying that it's a bad tradeoff per se (IMHO it very much depends on the implementation), I just want to make you aware that it is actually a tradeoff.
Of course it is a tradeoff, but I think appealing to "someone, somewhere might need this and we just can't predict it" is a cop out. IMO the tradeoff exists at a different level, like with your database example. Maybe "type-safe OSes" are good for desktop systems and web servers, but not for DB servers... or maybe they are. Maybe they'll be terrible for embedded systems and mobile devices. Maybe not. Without some prognostication, we will never figure it out.

Posted: Wed Oct 17, 2007 12:01 am
by Colonel Kernel
Crazed123 wrote:Now look, Colonel.
:oops:

You have my undivided attention. ;)
No matter how far low we put the barrier between "kernel hacking"/"device drivers" and things you can implement in your idealized safe language, someone somewhere still has to do kernel hacking and write device drivers.
At least kernel hacking, yes. Drivers, no. I/O can be encapsulated behind efficient type-safe abstractions. It's already been done.

But your point is taken anyway.
My idea is that rather than condemning him to C hell while giving everyone above him a shiny new Managed Language to use, we should improve the safety, correctness and expressiveness of kernel-hacking languages to make his life easier.
That's already been done too, although I know you're not keen on the language it's been done with (C#). :) So perhaps this thread could be called "a better type-safe kernel hacking language than C#". Then I wouldn't object that you're misrepresenting the topic. :)
Does anyone have any ideas for how to create a strongly-typed safe(r) reference type?
What do you think is unsafe about C#/Java/Scala references, besides the ability for them to be null and be down-cast (which at least boil down to run-time checks and exceptions... a kind of pseudo-safety with a performance cost)?

I also agree about C++ references being limiting... They are really more like aliases than references.

Posted: Wed Oct 17, 2007 6:22 am
by Solar
Crazed123 wrote:Why shouldn't you be able to reassign or change references? That always seemed a frustrating limitation of C++ to me.
Colonel Kernel wrote:I also agree about C++ references being limiting... They are really more like aliases than references.
I don't really get the point here. If you had C++ references that could be reassigned, how would they differ from C++ pointers?
hckr83 wrote:A must: A frikkin preprocessor that does something more than copy and paste!! something like the powerful ASM preprocessors that are out there, those capable of making preprocessing loops, generation of names, and all that....
Objection. In a "new" language, I am opposing anything that results in debugger output being different from your source code, especially preprocessors.
Colonel Kernel wrote:A serious DBMS would probably do better to have its own dedicated OS...
There will always be the need for a "serious" DBMS on machines that still have to run other code, because a two-machine solution with dedicated DBMS server is not viable. (Many web applications, for example, where you can hardly expect the admin to rent a second machine for the DBMS.)
For graphics applications, what kind of specialization did you have in mind?
Not my area of expertise, but I know that many gfx apps employ their own memory management, including custom swap files etc. Emulators are another application type that comes to mind.

I think I understood you correctly. But - and this was a big "but" already back when I started my own OS project six years ago - I have a strong "reality" approach. I believe that some "revolutions" of computing science might look great on the drawing board, and might indeed solve many problems, but are stillborn if they effectively require too much rethinking, especially on the app-developer part (as applications are the lifeblood of any OS, without which you'll never get out of the "experiment" phase).

Thus, I distinguish between "experiment" OS's that might prove a point but will never grow beyond "novelty" status, and "progress" OS's that might be only mediocre advancements on the theory of CS, but stand a chance of actually attracting a user base.

Bottom line, I don't believe that the combination of "new OS" and "new language" leads anywhere, if simply because "no-one" could be bothered to learn some exotic language first only to be able to meddle with an exotic OS. Understand me correctly, thinking about and discussing such topics is an important part of CS and OS dev culture, just don't expect everyone to get excited about it even if you could come up with a "perfect" OS development language: I, for one, would still write my OS in a mainstream language, simply because my resume would benefit more from C/C++ experience, than it would from P$, F-- or whatever.

Not sure if I ranted about the right thing, so I'll just shut up now. 8)

Posted: Wed Oct 17, 2007 11:49 am
by Colonel Kernel
Solar wrote:I don't really get the point here. If you had C++ references that could be reassigned, how would they differ from C++ pointers?
You can't do arithmetic on references. Also, if such a thing existed in C++, you wouldn't be able to static_cast (except for up-casting), const_cast, or reinterpret_cast them either (dynamic_cast would still be ok).

I'm guessing that so-called "handles" in C++/CLI behave this way (i.e. -- int^ foo = gcnew int; instead of int* foo = new int; ).
Objection. In a "new" language, I am opposing anything that results in debugger output being different from your source code, especially preprocessors.
Hear hear. Besides, C++ is just a giant object-oriented macro assembler already. ;)
There will always be the need for a "serious" DBMS on machines that still have to run other code, because a two-machine solution with dedicated DBMS server is not viable. (Many web applications, for example, where you can hardly expect the admin to rent a second machine for the DBMS.)
Then use virtualization.
I believe that some "revolutions" of computing science might look great on the drawing board, and might indeed solve many problems, but are stillborn if they effectively require too much rethinking, especially on the app-developer part (as applications are the lifeblood of any OS, without which you'll never get out of the "experiment" phase).

Thus, I distinguish between "experiment" OS's that might prove a point but will never grow beyond "novelty" status, and "progress" OS's that might be only mediocre advancements on the theory of CS, but stand a chance of actually attracting a user base.

Bottom line, I don't believe that the combination of "new OS" and "new language" leads anywhere, if simply because "no-one" could be bothered to learn some exotic language first only to be able to meddle with an exotic OS.
IMO you're mostly right, although the shift towards multi-core and many-core is a disruptive change that may very well force software development to take a radical turn, simply because of the lack of better options.

About the "new language" issue... What caught my attention about Singularity was that although it is a radical step in OS architecture, it is actually a much more modest step from the current state-of-the-art of Windows application development. (Not that I'm enamored of Windows development, but hear me out...) With a strategy based around languages that target the CLI, MS can simultaneously develop Singularity into a commercial OS and evolve .NET towards Singularity (e.g. -- wean developers off of reflection, introduce new design-by-contract language extensions, introduce new message-passing constructs, etc.). In a decade or so, C# may have evolved into Sing# (they are not that far apart already) and suddenly changing architecture no longer requires a change of language. It could happen, IMO.

Posted: Wed Oct 17, 2007 12:07 pm
by Crazed123
hckr83 wrote: edit:
A must: A frikkin preprocessor that does something more than copy and paste!! something like the powerful ASM preprocessors that are out there, those capable of making preprocessing loops, generation of names, and all that....
Dylan-like syntax and macros should solve this nicely. Dylan macros are built for copy-and-paste, but I'm pretty sure you can "hack" them to produce something more like the full power of a Lisp macro system (ie: a Turing-complete language at compile time).
Solar wrote:I don't really get the point here. If you had C++ references that could be reassigned, how would they differ from C++ pointers?
Not only is arithmetic on references illegal, references are guaranteed type-safe. You can't assign an arbitrary address to a reference, you must take the address of a correctly-typed variable.
Colonel Kernel wrote: Also, research into dependent types could yield promising results. In a dependently-typed language, you can tie types to run-time values to a certain extent. For example, a dynamically-allocated array's bounds could become part of its type identity, giving the compiler enough information to omit bounds checks nearly all the time.

I highly recommend Tim Sweeny's presentation The Next Mainstream Programming Language. There's a lot of neat stuff in there on how advances in static typing can give us safety and good performance.
Sounds neat.
Colonel Kernel wrote:That's already been done too, although I know you're not keen on the language it's been done with (C#). Smile So perhaps this thread could be called "a better type-safe kernel hacking language than C#". Then I wouldn't object that you're misrepresenting the topic. Smile
They actually made C# run with no substrate or runtime library beneath it? None at all? How?

Posted: Wed Oct 17, 2007 2:01 pm
by Alboin
Crazed123 wrote:
Solar wrote:I don't really get the point here. If you had C++ references that could be reassigned, how would they differ from C++ pointers?
Not only is arithmetic on references illegal, references are guaranteed type-safe. You can't assign an arbitrary address to a reference, you must take the address of a correctly-typed variable.
Wouldn't it make sense in a 'safe' language to avoid all confusion when possible, ie. reassignment of references?

I think making them like in C++ would improve simplicity and clearness in the language; adding a more functional feel to it in some aspects.

Posted: Wed Oct 17, 2007 2:04 pm
by Crazed123
It's all about compromises. In a low-level systems-programming language you need the ability to reassign a reference variable when you're done with the old reference, because you may not have the memory space to declare 1000 different "well-I-needed-to-store-a-new-value" references throughout your program.

Go look in a C/C++ program and count how many times your average pointer gets reassigned. Then find the average number of pointers in the program. Now multiply those two together an tell me: do you really want to store that many extra references for the sake of pure-functional wankery?

Posted: Wed Oct 17, 2007 2:12 pm
by Alboin
Crazed123 wrote:It's all about compromises. In a low-level systems-programming language you need the ability to reassign a reference variable when you're done with the old reference, because you may not have the memory space to declare 1000 different "well-I-needed-to-store-a-new-value" references throughout your program.

Go look in a C/C++ program and count how many times your average pointer gets reassigned. Then find the average number of pointers in the program. Now multiply those two together an tell me: do you really want to store that many extra references for the sake of pure-functional wankery?
If the functionality of a pointer is needed, then a pointer should be used! :)

References should be used in function arguments and the like. Otherwise, they are pointers.

Maybe a safe pointer type should be introduced. That is, a pointer with no arithmetic....

Posted: Wed Oct 17, 2007 2:58 pm
by Colonel Kernel
Crazed123 wrote:They actually made C# run with no substrate or runtime library beneath it? None at all? How?
They didn't make it run with no runtime library, but they did implement most of the runtime library itself in C#. C# is actually quite primitive except for the "new" operator which requires a garbage collector (and exceptions, but they're not as complicated as GC). They implemented a concurrent mark-sweep collector for the Singularity kernel in C# -- probably with lots of "unsafe" blocks.

The Singularity kernel is about 95% C#, 3% C++, and 2% asm. The asm is for low-level interrupt handling, context switching, and processor initialization (setting up the GDT and IDT and all that good stuff). The C++ is for their kernel debugger stub, which is compatible with WinDBG. I'm guessing it's borrowed code from elsewhere. The rest is all C#. From what one of the head researchers said, it sounded like about 17-18% of the C# source files have some unsafe code in them (a block here and there), but by and large the kernel is type-safe. Most of the unsafe code is in the kernel's GC, as I mentioned, and also in the Page Manager (their name for the memory manager).
Alboin wrote:Wouldn't it make sense in a 'safe' language to avoid all confusion when possible, ie. reassignment of references?
In the context of OSes, language "safety" means "memory safety". That is, we don't want one process to accidentally stomp on another one's memory by subverting the type system (i.e. -- casting or pointer arithmetic).

The general notion of "safety" is broad and poorly defined and not necessary to define in this context. But from a general programming language design perspective, you may have a point.
Alboin wrote:Maybe a safe pointer type should be introduced. That is, a pointer with no arithmetic....
That is exactly what a "reference" is in languages like C#, Java, Scala, etc. IMO in C++ they should have been called "aliases" to avoid confusion.

Posted: Thu Oct 18, 2007 12:02 am
by Solar
Colonel Kernel wrote:
Solar wrote:If you had C++ references that could be reassigned, how would they differ from C++ pointers?
You can't do arithmetic on references.
So use const pointers...?!? References are not meant to mimic pointers, so why should they behave like them?
Objection. In a "new" language, I am opposing anything that results in debugger output being different from your source code, especially preprocessors.
Hear hear. Besides, C++ is just a giant object-oriented macro assembler already. ;)
Mostly because we still use 1950's linker technology, which doesn't know about function overloading et al, but you have a point there. You know what I meant, though: A preprocessor results in symbols being turned into values, and unless you have a very smart IDE, all those nice macro symbols in your source (putc(), for example) will suddenly have turned into some strange-looking code...
There will always be the need for a "serious" DBMS on machines...
Then use virtualization.
Serious overkill IMHO - a virtual machine having an OS of its own to run my database, just so that virtual OS can be "more efficient" in handling the database? I seriously doubt that would cut it, not to speak of the additional maintenance / administration requirements.

(I had this "use virtualization" talk with a co-worker of mine recently. While virtual machines are something cool, I think they - as any "new" technology - are used in places where they shouldn't simply because they're "hip". This strikes me to be such a case.)
In a decade or so, C# may have evolved into Sing#...
...or have become C# 3.5.48.0002, which is slightly incompatible to 3.5.47.0067 and requires you to call FooBar( gnagna ) before you BazBamm(), never mind they told you the other way round until they discontinued the 3.4.52.0182 branch because the new FuzzReal technology required a radical change of the CLI...

Posted: Fri Oct 19, 2007 10:54 am
by Colonel Kernel
Solar wrote:
Colonel Kernel wrote:You can't do arithmetic on references.
So use const pointers...?!? References are not meant to mimic pointers, so why should they behave like them?
References in other languages are meant to mimic pointers, minus the arithmetic. We're not talking just about C++ here.

BTW, const pointers are still unsafe:

Code: Select all

int* const p = reinterpret_cast<int*>( 0x12345698 );
int* const q = p + 0x0000BEEF;
int x = *q;
Then use virtualization.
Serious overkill IMHO - a virtual machine having an OS of its own to run my database, just so that virtual OS can be "more efficient" in handling the database? I seriously doubt that would cut it, not to speak of the additional maintenance / administration requirements.
It's your strawman, not mine. ;) A DBMS is only as serious as its users. No one would dream of running Teradata, for example, in a production environment on only a single box that also happens to be running the web server. That way lies madness. Even with a more modest DBMS like SQL Server, you'd at least want to be able to add more webservers to handle additional traffic. The ratio of web servers to DB servers is rarely 1:1.
(I had this "use virtualization" talk with a co-worker of mine recently. While virtual machines are something cool, I think they - as any "new" technology - are used in places where they shouldn't simply because they're "hip". This strikes me to be such a case.)
In my mind, it's a way out for people who are too cheap to spend the money on a second box. That sounded like the example you gave, so that's why I suggested it. :)

Posted: Fri Oct 19, 2007 12:22 pm
by os64dev
BTW, const pointers are still unsafe:

Code: Select all

int* const p = reinterpret_cast<int*>( 0x12345698 );
int* const q = p + 0x0000BEEF;
int x = *q;
What is unsafe about this? p and q are different instances with q at an offset of p. After the assignment you cannot change p and/or q. Valid in my book.

I guess that the use of pointers requires people to think harder and in general people want to be 'lazy'.

Posted: Fri Oct 19, 2007 12:46 pm
by Colonel Kernel
os64dev wrote:
BTW, const pointers are still unsafe:

Code: Select all

int* const p = reinterpret_cast<int*>( 0x12345698 );
int* const q = p + 0x0000BEEF;
int x = *q;
What is unsafe about this?
Remember, I'm talking about type safety not "safety" in general. It is type unsafe because it is very likely that neither p nor q actually points to an int, but in fact point to some random crap in memory.
I guess that the use of pointers requires people to think harder and in general people want to be 'lazy'.
I've been working in the software industry for over a decade. I've seen code written by lots of people, most of whom are very smart and not lazy. They make mistakes. They are human beings, after all. Those mistakes cost time and money, and I would rather have the compiler catch those mistakes than my test team, thank you very much.

Go bang your chest somewhere else.

Posted: Fri Oct 19, 2007 1:36 pm
by Candy
Colonel Kernel wrote:I've been working in the software industry for over a decade. I've seen code written by lots of people, most of whom are very smart and not lazy. They make mistakes. They are human beings, after all. Those mistakes cost time and money, and I would rather have the compiler catch those mistakes than my test team, thank you very much.
Most typos and mistakes I've seen didn't involve pointer arithmetic of the variant of adding a random number to a pointer, nor have I seen reinterpret_cast's used at all. Nor C casts with the same effect.