My Kernel Design (Pt1) - Compiler And Language

cxzuk · Post by **cxzuk** » Fri Apr 08, 2011 6:19 am

Hello All!

Ive read almost every post on this forum, and can tell there is some great talent here, and would love to share my design ideas with everyone and get some good feedback.

The reason I am making a kernel is for two reasons, I love computers and programming and really see this as a great experiment and learning experience. Secondly, I have some issues with Linux/unix(i) which I think really shouldn't exist.

By far my biggest issue is Information Management. The linux kernel is very messy, and documentation is very poor (even out of date). Making the entry level for a kernel programmer very high. Retention of information is also very poor. The ck patches incident is one example. Due to programmer 'differences' and a dictatorship decision of 'we dont want two cpu schedulers', All the information that ck had built was wiped clean. That information should have been retained regardless of its use to aid future decisions or research.

=== Design ===

Anyway! I have decided to make everything from scratch. A compiler toolkit, kernel, and operating system applications.

My compiler toolkit is powered by XML. Currently only in test snubs and old prototypes, using XSL(i). The XML is closer to an AST than a programming language, but can be abstracted to any level to provide more detailed descriptions of code.

The main reason for this is I do not have the time/experience to create a compiler from scratch. Having a wealth of XML tools available on all platforms was the only feasible way to achieve this.

XML is also machine readable, So Documentation, Checked Exceptions(iv) and Optimisations are completely automated. Architectures are contained in separate files allowing for portability(iii).

Applications are based on the MVC paradigm. This means that only data is shared between applications. RPC is not possible.

Data is either a Collection, or a Member. You never specify "Array", "Directory", "Linked List" etc. All collections begin as a doubly-linked list. Then the type of the collection is selected at compile time depending on the functions that are used on the collection. (next, prev, etc)(v)

Additional collection information is stored separately to the collection data, So if the data is shared and another application requires a different type of collection, the collection is bootstrapped to allow of the additional information.

Data is synchronised with atomic operations and ID's/fingerprints/revision numbers. Watch() and polling (push is an optimisation) is used to signal changes to data.

The kernel is a hybrid. But is designed purely as a micro-kernel. When a driver is compiled to binary, I have additional information which allows the driver to be moved at link/run-time from user-space into kernel-space, and visa-versa. Included in this extra information is optimisation hints which can be applied at link time(vi). Such as static numbers etc.

Sorry for the long post! Comments welcome
Mike Brown

======
I name linux purely as I have read and worked on linux software and the kernel, and it has been my home operating system for over a decade.

[ii] XSL is turing complete. (http://www.unidex.com/turing/utm.htm)

[iii] (Opinion) Information portability is more important than binary portability. The kernel is compiled to each specific architecture/hardware.

[iv] All errors must be handled, or they get passed up to the calling function. Until it is passed outside of main() in which it displays an error message.

[v] Indexed collections are similarly possible.

[vi] Link time optimisations not going so well yet!

NickJohnson · Post by **NickJohnson** » Fri Apr 08, 2011 9:27 am

Could you give some sort of example on how you're going to use XSL to write system code?

Why is it so much more "from scratch" to use XML/XSL (which already exists and has tools, as you said), instead of a language intended for system programming, like C/C++/ASM, which also have existing tools?

Are you only writing the kernel and drivers using this language, or is the rest of the system also required to use it?

OsDeveloper · Post by **OsDeveloper** » Tue Apr 26, 2011 9:08 am

What is The new in XSL Which Attract os developers to use it ?

this is aquestion Ihope you answer it

cxzuk · Post by **cxzuk** » Fri Apr 29, 2011 4:36 pm

What exactly is everyone here doing? - They are taking information, normally in human readable text, and turning it into a machine readable language.. We are taking documentation of the hardware, and creating software which follows set rules (your OS decisions).

XML and co, should also us to completely automate this. We dont want to write C code, or ASM. The information we can store in these languages is pretty limited, hence why comments are so important.

What we need is to describe the hardware and features, and then to use XSL to transform this information into our OS code.

<memory>
<ram size="1gb"/>
<register size="8"/>
</memory>

<cpu>
<alu>
<flags>
<flag name="overflow"/>
</flags>
<instructions>
<instruction name="inc"/>
...
</instructions>
</alu>
<mmx>
...
</mmx>
</cpu>

<apic>
...
</apic>

As a small example. Is this a good structure as it is? Unlikely, but should give you an idea of what it will look like.

An XSL sheet would then check what hardware you have available and initialise it.

Now, The actual fact of taking hardware documentation and turning it into an OS isn't actually that hard (my opinion). What makes it hard for OS dev's is the amount of differing hardware, and our OS decisions are hard to change at a later date.

Bad OS decisions aside. Managing the large (huge) amount of information really is the advantage here. By keeping this information on the hardware as descriptive and flexible as possible and automating everything will make OS development much more productive.

P.S many programmers in general spend a large amount of time on optimisations. But as many say here, measure, optimise and measure again. The problem is that there is very little way of specifying optimisations to hardware. And I personally feel optimisations should be left to true software engineers as computers become more complex and interconnected. A good example of this is to XOR registers to set them to 0. This is the quickest choice on register based CPUs, but is much slower on stack based CPUs. While this optimisation is arch specific, An even more worrying example is this one. I was told by many sources that you can speed up loops by making them count down to zero. While this may have been true at the time this optimisation started, the ALU units these days are far more quicker than the sync clocks. This optimisation gives very little to no improvement, but requires you to change your code considerably, and the changes can make your code even harder to read.

Mike Brown

Brendan · Post by **Brendan** » Fri Apr 29, 2011 10:45 pm

Hi,

cxzuk wrote:What exactly is everyone here doing? - They are taking information, normally in human readable text, and turning it into a machine readable language.. We are taking documentation of the hardware, and creating software which follows set rules (your OS decisions).

XML and co, should also us to completely automate this. We dont want to write C code, or ASM. The information we can store in these languages is pretty limited, hence why comments are so important.

Actually it's the reverse. The more limited a language is, the easier it is for programmers to learn and understand, and the easier it is for tools to process and optimise source code written in that language. Comments exist to show the original programmers intent, which may or may not correspond to the code they actually wrote. For an over-simplified example, consider "foo = foo + 2; // Multiple foo by 2" - because you know the programmer's intent (from the comment) you can easily spot the bug in the code. In XML without the comments you're screwed, so you'd have to resort to XML with comments.

cxzuk wrote:What we need is to describe the hardware and features, and then to use XSL to transform this information into our OS code.

No. Most hardware and features should be auto-detected. During boot, you don't want to have code that auto-detects how much memory is present and generates the corresponding XML; then more code that parses the resulting XML into machine readable form and discards it - it's a complete waste of time.

cxzuk wrote:
Code: Select all
<memory>
  <ram size="1gb"/>
  <register size="8"/>
</memory>

Code: Select all

<memory>
  <ram size="big as ur mom!"/>
  <register size="why u not no?"/>
</memory>

This is far too flexible. You need a way to limit variables to specific types (e.g. integers where appropriate), plus documentation that says exactly which values are accepted (e.g. "-1" is an integer, but it's still an unacceptable "ram size"), plus more documentation showing the default value/s that will be used if something is omitted.

As for automagically converting lead into gold (or, converting human readable text into machine readable code, or converting software designed for a specific system into software designed for a completely different system), good luck...

Cheers,

Brendan

rdos · Post by **rdos** » Sat Apr 30, 2011 12:46 am

Brendan wrote:No. Most hardware and features should be auto-detected. During boot, you don't want to have code that auto-detects how much memory is present and generates the corresponding XML; then more code that parses the resulting XML into machine readable form and discards it - it's a complete waste of time.

True. As a user, you don't want to do any configuration. You want the drivers to autoconfig as much as possible. For most modern hardware this is easy. For instance, PCI-based hardware have their IDs, and so do USB hardware.

The only thing that needs configuration in my OS is which drivers to load. This is kept in a configuration file which then is converted to a binary image with a simple tool (not a linker). It would be possible to write an installation application that finds out which hardware is present and then creates the configuration & binary file, but since I don't target desktop, I've not done this.

cxzuk · Post by **cxzuk** » Sat Apr 30, 2011 4:15 pm

Heya

Brendan wrote:Hi,
Actually it's the reverse. The more limited a language is, the easier it is for programmers to learn and understand, and the easier it is for tools to process and optimise source code written in that language. Comments exist to show the original programmers intent, which may or may not correspond to the code they actually wrote. For an over-simplified example, consider "foo = foo + 2; // Multiple foo by 2" - because you know the programmer's intent (from the comment) you can easily spot the bug in the code. In XML without the comments you're screwed, so you'd have to resort to XML with comments.

I think your example is alittle too simplified.. What im saying to do is pretty much what is done already. Take information and make it more descriptive. Heres a rough example;

Code: Select all

#DEFINE PIT_CHANNEL0 0x40 // Read Write channel 0 for PIT
#DEFINE PIT_CHANNEL1 0x41 // Read Write channel 1 for PIT
#DEFINE PIT_CHANNEL2 0x42 // Read Write channel 1 for PIT
#DEFINE PIT_MODECOMMAND 0x43 // Write only, Read is ignored.
...
function set_pit(channel, div) {
  // some code that describes the below..
  bits 6,7 = channel
  bits 4,5 = mode
  bits 1,2,3 = operating mode
  bit 0 = BCD
  etc..
}

Now.. we have the same information in two places, the wiki page from which i translated text into C, and the C code. Im also going to have a third, the generated ASM. XML removes the need for C. Storing the information in XML allows it to be transformed either into ASM or into human readable text.

No. Most hardware and features should be auto-detected. During boot, you don't want to have code that auto-detects how much memory is present and generates the corresponding XML; then more code that parses the resulting XML into machine readable form and discards it - it's a complete waste of time.

Sorry, by hardware i mean architecture. Things like GDT, APIC, FPU, and features being MMX, 3DNow!, SSE etc. It is a micro-kernel so no devices.

As for memory, Your right. Detecting this at boot is much more suitable. Tho i put it in there as it is adventurous to know the ram size, PAE etc. Looking back, the cache sizes of the cpu would have been much better. And have XML describe the BIOS (e.g INT 15).

XML is not generated at boot. XML is only needed at a source level, the XSL takes the XML information and creates ASM code. XML then basically becomes your documentation.

Code: Select all
<memory>
  <ram size="big as ur mom!"/>
  <register size="why u not no?"/>
</memory>
This is far too flexible. You need a way to limit variables to specific types (e.g. integers where appropriate), plus documentation that says exactly which values are accepted (e.g. "-1" is an integer, but it's still an unacceptable "ram size"), plus more documentation showing the default value/s that will be used if something is omitted.

Yep! Your right. This is called a DTD and is required to validate XML. It does most of the above. What is left is the logical checking of the data. (The -1), which is checked in the XSL.

As for automagically converting lead into gold (or, converting human readable text into machine readable code, or converting software designed for a specific system into software designed for a completely different system), good luck...

Im sorry, i didnt fully understand this? But heres a stab at trying to answer it.

No software is converted, Whats being converted is the information about an architecture. Take scheduling. If you have asked for multi-tasking, The XSL checks for the PIT chip, if it is described in the architecture, it creates the ASM for pre-emptive multitasking, otherwise it creates the ASM for cooperative multitasking.

Cheers,

Brendan

Thankyou for the reply

Brendan · Post by **Brendan** » Sat Apr 30, 2011 6:19 pm

Hi,

cxzuk wrote:Heres a rough example;

cxzuk wrote:Now.. we have the same information in two places, the wiki page from which i translated text into C, and the C code. Im also going to have a third, the generated ASM. XML removes the need for C. Storing the information in XML allows it to be transformed either into ASM or into human readable text.

If you replace the information in the wiki with XML that is formatted according to the rules, etc your XML tools expect; then you remove the need to translate the information from the wiki into your XML source code. If you replace the information in the wiki with C that is formatted according to the rules, etc that all C compilers expect; then you remove the need to translate the information from the wiki into C source code. It's exactly the same thing, just with different languages.

What you're suggesting is that everyone in the world should switch to XML that is formatted according to the rules, etc your XML tools expect; so that everyone can cut & paste wiki pages into their OS? If you can find anyone willing to actually use (verbose and ugly) XML then it might work for those people, but then you'd have a group of people collaborating on a "wiki OS" rather than people writing their own OSs. Convincing people to work on the same OS, or even convincing people to adopt a common device driver framework, is something that has been tried many times and failed each time (even when normal programming languages that don't cause programmers to empty their stomach contents all over their keyboard are used..

).

cxzuk wrote:
No. Most hardware and features should be auto-detected. During boot, you don't want to have code that auto-detects how much memory is present and generates the corresponding XML; then more code that parses the resulting XML into machine readable form and discards it - it's a complete waste of time.
Sorry, by hardware i mean architecture. Things like GDT, APIC, FPU, and features being MMX, 3DNow!, SSE etc. It is a micro-kernel so no devices.

Things like whether or not APIC, FPU, MMX, 3DNow!, SSE, etc are present/supported should be auto-detected during boot. If an OS doesn't support CPUs that lack an FPU (for e.g.) then the FPU should still be auto-detected during boot (and the OS should display a nice/polite error message and refuse to boot if there was no FPU).

The only things that can't be auto-detected during boot are:

Architecture (e.g. 80x86 vs. PowerPC vs. ARM vs. MIPS). In almost all cases you have a toolchain that creates a different binary/executable for each (supported) architecture.
Real time clock details. This is typically just whether the RTC keeps track of UTC or local time (and which time zone); and honestly it's probably better if an OS says "RTC must be in UTC or else" and doesn't support "RTC set to local time" at all.
Legacy ISA devices. These are mostly obsolete now (excluding things like the PICs, PS/2 controller, DMA chips, etc which can either be auto-detected or assumed to exist). If the OS supports ancient hardware, information about legacy devices is stored somewhere.
Keyboard layout. This is one of the things that annoy me. Even for USB keyboards (where the specification includes fields that are intended for this purpose) the keyboard manufacturers ignore it. To be honest, the keyboard layout probably belongs in "end user configuration" below anyway.

There is also a pile of end-user configuration - network configuration, file system information ("/etc/fstab"), user accounts and passwords, locale/internationalisation information, etc.

XML could be used for all of this, but it's all stuff the end-user (and not the OS developer) needs to setup and configure. Because the end-user needs to do it, a decent/modern OS should provide nice clean/simple interface for end users to use (e.g. GUI with dialog boxes, etc), regardless of how the OS stores this information.

So, now we've got 4 completely different/separate subjects:

XML as documentation
XML as part of a toolchain's configuration; where the XML is used to specify details for the target architecture (similar to "Machine Descriptions" in GCC)
XML as a source code language
XML as a way to store end-user configuration

It'd probably be best if you pick one of these so the discussion doesn't get more confusing, or at least create a separate topic on the forums for each different/separate subject.

Cheers,

Brendan

h0bby1 · Post by **h0bby1** » Fri Aug 30, 2013 11:10 pm

i know it's an old thread, but i find the concept interesting, as i like what xml and xsl can achieve

but in the case, if you write your assembly code with enough modularity, you can just create the layout by making calls or not, and in the case, you still need to write assembly code as 'template' , xml can just say which is supposed to be executed for the particular configuration

you could have just the same done with macro, or with even regular calls, and then it just add the synthax of the xml/xsl to the definition you need to do, it can maybe allow greater flexibility than assembler macro, but still much less functionality than what regular language can offer, i don't see how it could mannage interaction between the different part of code being generated

xml can allow to define pretty much any kind of structure, and i guess in the absolute you could define the whole PC architecture in whole depth with an xml file, including devices, state of registers, and any state any device can have,but then it doesn't really allow to define how the information is supposed to be dealt with, and in the end, you still need to write asm code for each 'node', and in the end, to design the xml description to create the asm code you want, which just add a layer of complexity and another definition of the actual code to be generated

sometime i don't always get where people want to get at with 'not typing the code', but then creating system that are rather complex, and even often more complex to use in many case than just typing the code, and even in pure assembly you can still design functions, modules, and conditional execution of them, and you can also reach good design and code that is still human readable and conform to some global design without having to use external definition to generate the code or meta programming in general

if you are so insecure about the assembler code functionality for that you already anticipate to have to change and edit it and how to generate it in a flexible manner, there is maybe a problem somewhere , at some point you still need to know the exact code that will work correctly on the particular hardware, and needing to have 1000 xml files defined to handle all of the pc hardware is maybe not that wise compared to writing good code that conditionally execute what there is to execute depending on what is present, and that is actually working in a reliable enough way for that you don't have to change it that much

even if it's very easy in assembler to write code that is very poorly designed, and that lack global organisation and modular design, it's still possible to do so, either using macros, or well designed functions with a good api that can allow for the highest level part of the code to be easily modified and understood, people just don't do so in general because they either use assembler for optimization purpose and in that case they explicitly sacrifice clarity and maintenance toward speed, or that the action done in assembler are simple and clearly defined enougth for that you shouldn't need to really change or edit the code, it's clear if you need to write complex code in assembler only, a system like that could be usefull, to have a definition of what the code is supposed to do to generate the assembly code from the description, it could be something to look into, and it can make sure the implementation is really conform to the design

in a way, you could just use this xml in the building process to generate a header with preprocessor definition, or convert the xml to some binary tree (in the sense a system with chunks/child/node system not in text) to be parsed in assembler and making the calls that are required, i know it's a good thing to separate design and implementation, but sometime implementation is not either a simple linear process from the design, or then the design must have high complexity level and use language and concept that can become more bothering than anything else rather than just writing and reading the code directly

if you really need to work on higher level concept because you are not confortable enougth to write and maintain assembler code, you should maybe more improve your assembler skill rather than avoiding the issue by making a higher level definition of what the asm is supposed to be doing, or use another language to handle all the more complex non directly hardware related stuff and using higher level language to make the assembler calls

i doubt a non developper could really use the system to build up it's own kernel anyway, and it could also be just as simple or even simpler to just type the code or to write everything under the form of macros in the main file, and including specific file to define the macro depending on the configuration

if xslt could handle binary data, being able to generate the opcode and machine language code and to create directly a binary image file from the xml with xsl transformation, it could be funny =)

no but it's a good idea, i would never have thougth to use xslt to generate executable code, or as a self reliant global program description, it could be a good idea, but also i don't see the real interest, because you still need to know what the assembler will be doing when you write the xml, and lot of what there is in it is still strongly dependant on the hardware, and on what the assembler will be doing, so there is not major interest into having an xml definition of it, except for clarity reason, but you could just write a good documentation and implement it correctly in assembler to get the same result , and you'll still probably have to deal with the assembly code directly at some point if only for debugging purpose, and to implement higher level functions

it's mostly usable for static system configuration, and it would require to edit xml each time a new configuration is to be handled, which could lead to serious problem of maintenance as well, unless you can do higher level analysis of the whole xml code taking in account the assembly generated by it with a particular version of the xml/xslt , otherwise you could still need to change assembler code to handle a particular node on a particular config, and then you need to test if it works fine in every xml that use it

unless you really have a strong policy to have code for each node that is totally independant from each other, but i'm not sure how it could handle dependencies really cleanly then, or it would need to be handled manually in the xslt that a mmx version of a module should be used if mmx is enabled, and to mannage whole lot of low level things as interupt controller, and many features, that could only be enabled througth xml definition for each module

the whole chain of dependency could become also very complex to handle correctly in a meaningfull manner for that the xslt can know automatically which version of the module to load depending on lower level configuration, it would need a whole lot of implicit definition of what each module depend on to function, and there are already many good tool like autoconf/makefile or visual studio building environment that can handle this in a much better way, with real time parsing of the whole code being generated and dependencies and everything, and at least the tool to mannage the build has a real understanding of the language being used and what it's supposed to do, where the xslt parser has no way to check if the asm being generated is actually valid and it can't do any meaningfull analysis at all on the generated code

so it would still rely heavily on implicit definition that will mostly not be very easy to document clearly, or then you need to also document all the different part of assembler, and how they are supposed interact with each other, to base the design of the xslt on it

haha this stuff got me thinking still, cause i still think it could be usefull for something, maybe not directly to generate a kernel in assembly language directly like that, but something very smart could probably be done with a system like that

Antti · Post by **Antti** » Sat Aug 31, 2013 1:19 am

@h0bby1: I want to give you some feedback:

Your posts are written in a style that is hard to read. I have to admit that I stopped reading your posts because it is so exhausting to do. I think it would be very beneficial for you to consider changing your writing style. Of course, everyone has their own style but this one clearly excludes some readers. If you do it deliberately, it works well. If not, please be aware of this side effect.

h0bby1 · Post by **h0bby1** » Sat Aug 31, 2013 1:33 am

Antti wrote:@h0bby1: I want to give you some feedback:

Your posts are written in a style that is hard to read. I have to admit that I stopped reading your posts because it is so exhausting to do. I think it would be very beneficial for you to consider changing your writing style. Of course, everyone has their own style but this one clearly excludes some readers. If you do it deliberately, it works well. If not, please be aware of this side effect.

i already try to make them as readable as possible, but i approach many different points, i know some part can be hard to read or really understand, but i don't know how to formulate it in a better way, and i already try to make them as easy to read as possible, anyway i don't think this can interest a lot of people, if people are really interested into the topic, it shouldn't be so hard to read either, but it's also easy to understand how many people would have better things to do than reading how to generate an assembler kernel from xml =) it's not a very simple topic either in itself, i'm aware of this side effect, the post is rather long, and i guess most people can live without reading it, but i try to make it easy to read, and i re read them several time and re edit also several time to make it as easy to read and clear as i can, but to make it totally clear it would need even longer post

OSDev.org

My Kernel Design (Pt1) - Compiler And Language

My Kernel Design (Pt1) - Compiler And Language

Re: My Kernel Design (Pt1) - Compiler And Language

Re: My Kernel Design (Pt1) - Compiler And Language

Re: My Kernel Design (Pt1) - Compiler And Language

Re: My Kernel Design (Pt1) - Compiler And Language

Re: My Kernel Design (Pt1) - Compiler And Language

Re: My Kernel Design (Pt1) - Compiler And Language

Re: My Kernel Design (Pt1) - Compiler And Language

Re: My Kernel Design (Pt1) - Compiler And Language

Re: My Kernel Design (Pt1) - Compiler And Language

Re: My Kernel Design (Pt1) - Compiler And Language