OSDev.org

Posted: **Tue May 13, 2014 9:57 am**

The main advantage to using XML is that it is one of the few ways that you can actually "store" hierarchical data. Machine language is flat, but most modern programming languages are hierarchical, and programs are, arguably, data.

The fact that most source code is stored as text files is due to the fact that they are one of the easiest formats to view and edit. But, the first thing that your text editor is going to do (whether it's Notepad.exe, or Visual Studio) when you open that text file is convert that text file into structured data. The structured data is what the program uses to provide the functionality to edit the file, and it is only converted back into a text file when it is saved back to disk.

Storing your program in XML skips that whole step, and allows you to view and edit the data structure of the program directly. (In reality, XML files are still text files, but you get the picture.) So using XML as a programming language bypasses the need for a custom language parser and expression builder, and basically borrows the XML parser, and allows the user to build their own expressions, or allows them to use XSLT to build the expressions for them.

XSLT excels at transforming one XML structure into another -- basically a glorified "Find/Replace" function. If you consider the compiler is also a glorified "Find/Replace" function, finding references to functions and variables, and replacing them with machine instructions, you can see where XSLT can act as a fairly effective XML based compiler. It has the advantage of giving the user a lot more control over the output than C++ or C# would. Most languages give the user a few keywords to control the output of the compiled application (#ifdef, etc.). But XSLT allows you to write complex logic in your "compiler" that allows you to make drastic changes to the output. If you've ever used T4 templates in Visual Studio, you understand how useful it can be to be able to generate code "from" data.

As you have pointed out, writing the XML to define your expressions and your workflow is tedious, and highly repetitive. But then again, so is writing a program (or OS) in Assembly. No one in their right mind would use Assembly for their day-to-day development, but it is still used my many people, and by many compilers as a low level "intermediate" language.

So, in the case of XML, you wouldn't necessarily want to use it to write an application, but you may want to use it to store the source code for an application. You might want to write your application in another language (say, C#), and convert it to XML as it is saved to disk, or before it is compiled.

Posted: **Tue May 13, 2014 12:07 pm**

Sounds like you're merely trying to store a tree on disk, that is also human-readable?

XML is still a very verbose and expensive to parse.

Code: Select all

<try>
  <call method="Open" class="System.File"/>
  <catch exception="System.FileNotFoundException">
    <call method="Show" class="System.Dialog">
      <parameter value="The file could not be found..."/>
    </call>
  </catch>
</try>

You could compress your XML considerably. Replace closing tags with '.' to designate the end of a code block.

If parameters are required, no point typing them out, for example:

Code: Select all

<call method="<method>" class="<class>">
      <parameter value="<param1>"/>
      <parameter value="<param2>"/>
    </call>

can be replaced with

Code: Select all

call <class> <method> <param1> <param2> .

So in total your code could look like:

Code: Select all

try
  call "System.File" "Open" .
.
catch "System.FileNotFoundException"
  call "System.Dialog" "Show"
  "The file could not be found...".
.

But now it's starting to look like a programming language.

Most parsers build an abstract syntax tree as the first thing they do, and abstract syntax trees are very easy to pretty print back to source.

Look at most languages - Lisp, Forth, C++ - what you see is essentially a flattened out tree.

So let's say you implement your language in an XML format because you believe it would be easier to parse. So you'd write a recursive descent parser like:

Code: Select all

parse_statement_block(xmlnode) {
  stmtblock = new StatementBlock();
  foreach(child in xmlnode.children)
    stmtblock.push(parse_statement(child));
  return stmtblock;
}

parse_statement(xmlnode) {
  switch(xmlnode.getTagName()) {
    case "try": return parse_try_block(xmlnode);
    case "return": return parse_return(xmlnode);
    case "if": return parse_if(xmlnode);
    // other statements here
    default: error ("Unknown tag");
  }
}

parse_if(xmlnode) {
   ifstmt = new If();
   ifstmt.condition = null;
   ifstmt.then = null;
   ifstmt.else = null;

   foreach(xmlchild in xmlnode.children) {
      switch(xmlchild.getTagName()) {
        case "condition":
          if(ifstmt.condition != null) error("<if> has multiple <condition> clauses.");
          ifstmt.condition = parse_expression(xmlchild);
        case "then":
          if(ifstmt.then != null) error("<if> has multiple <then> clauses.");
          ifstmt.then = parse_block(xmlchild);
        case "else":
          if(ifstmt.else != null) error("<if> has multiple <else> clauses.");
          ifstmt.else = parse_block(xmlchild);
        break;
      }
   }

   if(ifstmt.condition == null) error("<if> requires a <condition> clause.");
   if(ifstmt.then == null) error ("<if> requires a <then> clause.");
   return ifstmt;
};

A recursive descent parser for curly-brace language:

Code: Select all

parse_statement_block(lexer) {
  if(lexer.peek() == "{")
    // must be a block
    stmtblock = new StatementBlock();
    lexer.mustbe("{");
    while(lexer.peek() != "}")
      stmtblock.push(parse-statement);
    lexer.mustbe("}");
    return stmtblock;
  }
  else // just a single statement
    return parse_statement(lexer);
}

parse_statement(lexer) {
  switch(lexer.token()) {
    case "try": return parse_try_block(lexer);
    case "return": return parse_return(lexer);
    case "if": return parse_if(lexer);
    // other statements here
    default: return parse_expression(lexer);
  }
}

parse_if(lexer) {
  ifstmt = new If();

  lexer.mustbe("(");
  ifstmt.condition = parse_expression(lexer);
  lexer.mustbe(")");
   
  ifstmt.true = parse_statement_block(lexer);
  if(lexer.peek() == "else") {
    lexer.mustbe("else");
    ifstmt.false = parse_statement_block(lexer);
  } else
    ifstmt.false = null;

  return ifstmt;
};

As you can see, once you try to rebuild it in memory to either execute it or compile it, it's just as much effort parsing XML as it is parsing a standard scripting language.

If you're writing the XML parser from scratch (to incorporate into your OS) then it'll be a lot more work to write an XML parser to extract properties and children (as XML is it's own language in itself) then to write a simple lexer, and parse what your lexer gives you.

Writing a parser that builds a syntax tree from a simple language would only take few days work. (I just recently did it myself.)

If you want a language that easily translates into a tree (so that tools can visualize it without having to implement parsing) then maybe a Lisp-style syntax would be better. A simple list parser just has to implement the following grammar to build a tree from Lisp:

Code: Select all

child: statement | literal
statement: "(" literal { child } ")"

You could whip up a parser to embed into a tool within an hour.

A JSON like syntax is also another alternative that can easily parse into a tree with many tools available.

But if you really love XML for some reason - then go for it.

Posted: **Tue May 13, 2014 2:30 pm**

MessiahAndrw wrote:So in total your code could look like:
Code: Select all
try
  call "System.File" "Open" .
.
catch "System.FileNotFoundException"
  call "System.Dialog" "Show"
  "The file could not be found...".
.
But now it's starting to look like a programming language.

I'm fine with that.

Posted: **Tue May 13, 2014 3:12 pm**

MessiahAndrw wrote:As you can see, once you try to rebuild it in memory to either execute it or compile it, it's just as much effort parsing XML as it is parsing a standard scripting language.

If you're writing the XML parser from scratch (to incorporate into your OS) then it'll be a lot more work to write an XML parser to extract properties and children (as XML is it's own language in itself) then to write a simple lexer, and parse what your lexer gives you.

At the moment, I'm only using XML at compile time. At run time, everything is stored in structs that the OS understands. (I have been thinking about possibly storing everything on disk in XML, instead of binary structs, as an exercise, or I may even allow the user to optionally decide at run time whether they want their data stored as XML or not, but that is way far in the future...)

MessiahAndrw wrote:Writing a parser that builds a syntax tree from a simple language would only take few days work.

True, but the question is whether the simple language is what gets stored on disk, or if the syntax tree gets stored on disk, possibly in XML format. Or, you could give the programmer the option of using the simple language, or editing the XML syntax tree, directly.

Also, would it be possible to use XML *as* the simple language?

Possibly, but only if you could automatically generate the XSD and XSLT files needed to convert from the "simple" format to the syntax tree format -- which should be possible if the data you need is available in your XML file. (i.e. defined in your XSD)

MessiahAndrw wrote:But if you really love XML for some reason - then go for it.

The advantage of using XML is that you get XSD and XSLT for free (at least at compile time). Also, there are a dozen editors that excel at editing and transforming XML files.

You could make up your own language, or use JSON, or Lisp, but you would have to write your own editor, assuming you wanted Intellisense, and tool tip documentation, and autocomplete, and, in the case of Visual Studio, step-by-step debugging of your "compiler" as it transforms your source files. With XML, you get all of this for free.

Posted: **Tue May 13, 2014 7:14 pm**

SpyderTL wrote:At the moment, I'm only using XML at compile time. At run time, everything is stored in structs that the OS understands. (I have been thinking about possibly storing everything on disk in XML, instead of binary structs, as an exercise, or I may even allow the user to optionally decide at run time whether they want their data stored as XML or not, but that is way far in the future...)

Why do you want to parse a language at compile time into a tree then save it back? Why don't you parse it into a tree when you need it?

If you need to save it back out of the compiler - for whatever reason - why can't you pretty print the syntax tree back into source code?

SpyderTL wrote:Also, would it be possible to use XML *as* the simple language?

The problem with doing so is XML lacks things like functions, control blocks - features your language probably has. So you need to describing your language constructs using XML tags and properties.

Parsing your language would require first parsing XML. Then the XML tree would act as input (essentially being the lexer - as I showed in my code example above) that you would then parse to extract your code constructs out of. It's about an equivalent amount of work as if you had a textual lexer feed you symbols than XML nodes.

SpyderTL wrote:The advantage of using XML is that you get XSD and XSLT for free (at least at compile time). Also, there are a dozen editors that excel at editing and transforming XML files.

You could make up your own language, or use JSON, or Lisp, but you would have to write your own editor, assuming you wanted Intellisense, and tool tip documentation, and autocomplete, and, in the case of Visual Studio, step-by-step debugging of your "compiler" as it transforms your source files. With XML, you get all of this for free.

If you have an XSD file that is incredibly documented, you might type

Code: Select all

<call>

and your XML editor could pop up:

Code: Select all

<call function="your function name" class="the class the function belongs to">
   <parameter>first parameter</parameter>
   <parameter>second parameter</parameter>
</call>

It could even show a GUI that allows you to edit it graphically.

However, rarely would I need to keep referring to the language's documentation, where as I often need to refer to the library's documentation.

For example, as I type:

Code: Select all

<call function="" class="System.IO">

Would an XML browser pop up the what acceptable classes I can select, or what functions are inside of that class?

If I declare a variable:

Code: Select all

<declare_variable>
<name>X</name>
<type>float</name>
<initial_value>
  <float_literal>10.0</float_literal>
</initial_value>
</declare_variable>

Then I want to use it in an if statement:

Code: Select all

<if>
  <condition>
    <equal>
       <float_literal>10.0</float_literal>
       <variable>..

Would an XML editor show me a list of variables? Or even know that we have a variable called 'X'?

So while your XML tools may have auto completion of the language's constructs, it would not auto complete variable, class, and other identifiers without parsing the XML, which would be just as much effort as parsing the original source.

Posted: **Tue May 13, 2014 10:20 pm**

If you were using XML, you would need to generate XSD and XSLT files for all of your functions. Then you would have autocomplete. And you would probably want this to happen automatically...

As for variable names, you can get autocomplete for those as well, but it is fairly complicated, and I gave up on it before I could get it working correctly.

But it's still better than notepad...

Posted: **Wed May 14, 2014 2:50 am**

MessiahAndrw wrote:Lisp

The nail has been hit well and truly on the head. Anybody using XML over S-expressions needs to have a good argument. The algorithm for reading Common LISP is very simple, see CLTL2(*) section 22.1.1 for the ten step algorithm. Step number four is what makes a language like Lisp magic - being able to extend the syntax at run-time with computational macros is one of things that is so obvious yet so mind-blowing when you first encounter it.

*) Common Lisp the Language, Second Edition, Guy L. Steele Jr. You can view it online here: http://www.cs.cmu.edu/Groups/AI/html/cl ... 0000000000

Posted: **Wed May 14, 2014 4:13 am**

SpyderTL wrote:The main advantage to using XML is that it is one of the few ways that you can actually "store" hierarchical data.

The XML way is supported by many tools and well known to many people - this is the advantage. But other ways offer more efficient way of data storage.

SpyderTL wrote:Machine language is flat, but most modern programming languages are hierarchical, and programs are, arguably, data.

Programs are, arguably, ideas. It is a way of saving your ideas in a form that is understood by a machine. If we look at programs this way we should talk about simplicity of expressing our ideas. And from such point of view XML is not the best language. But it still has high tooling support and deep "market penetration". Also it has supportive technologies like XSD and XSLT.

SpyderTL wrote:But, the first thing that your text editor is going to do (whether it's Notepad.exe, or Visual Studio) when you open that text file is convert that text file into structured data...

Storing your program in XML skips that whole step, and allows you to view and edit the data structure of the program directly.

It allows existing XML tooling utilisation, but doesn't allow to skip that whole step.

SpyderTL wrote:So using XML as a programming language bypasses the need for a custom language parser and expression builder, and basically borrows the XML parser, and allows the user to build their own expressions, or allows them to use XSLT to build the expressions for them.

May be there are such tools. But if not - the idea of lesser (or even comparable) time required to develop XML based solution will bust quickly.

SpyderTL wrote:XSLT excels at transforming one XML structure into another -- basically a glorified "Find/Replace" function. If you consider the compiler is also a glorified "Find/Replace" function, finding references to functions and variables, and replacing them with machine instructions, you can see where XSLT can act as a fairly effective XML based compiler.

If compiler to became a bit mature it requires a lot of optimization related code. It is very far from the "Find/Replace". But pattern matching capability of XSLT can be exploited if an intermediary representation of the compilation data will be in the form of XML. It is not very efficient, but may be it can deliver better idea expression capabilities. At least it is interesting to compare such approach with the ways of other languages.

SpyderTL wrote:It has the advantage of giving the user a lot more control over the output than C++ or C# would. Most languages give the user a few keywords to control the output of the compiled application (#ifdef, etc.).

If you use your personal language (even expressed in XML) - of course you can invent new keywords. But it is not an XML advantage, it can be done with any new language.

SpyderTL wrote:So, in the case of XML, you wouldn't necessarily want to use it to write an application, but you may want to use it to store the source code for an application.

Not only store the application, but use the XML abstraction in all intermediary steps. It can give you the power of XSLT pattern matching.

In general the XML way can be described as having rich tooling support and very wide user base. One more advantage - the set of technologies around XML is large enough to deliver some enhancements in the area of a person's idea expression and implementation. We can rephrase it like "unleash the power of XML"

But how to unleash it - it is very important question.

Posted: **Fri Jun 06, 2014 5:40 pm**

I was playing around with VirtualBox's new screen capture feature, and i posted it up to YouTube. This should give you an idea of what I was shooting for. This is the 32-bit console version.

FYI, hitting TAB on a blank line will list all of the static classes in memory. Hitting TAB will also autocomplete class names and method names. Hitting the up arrow on a blank line will recall the last command.

Since there is currently no support for passing parameters to methods, you have to navigate through collections with an enumerator class. (i.e. .First.Next.Next)

Posted: **Sun Jun 08, 2014 2:56 am**

SpyderTL wrote:I was playing around with VirtualBox's new screen capture feature, and i posted it up to YouTube. This should give you an idea of what I was shooting for. This is the 32-bit console version.

Actually, it's not an OS issue, but it's just a bit of UI (and very limited bit).

However, if you managed to make it with XML tools only - it's a bold and visible step and we can see your system is evolving.

Posted: **Mon Jun 09, 2014 4:04 am**

Yeah, I sort of switched back to the original topic. This video shows the console shell and the ability to call static methods and chain-call instance methods on return values. It also shows the kind of low level hardware "driver" stuff that I've been working on.

This really should have been two separate topics: object oriented OS and XML as a programming language. At some point, I'll share the XML stuff with you guys.

Posted: **Mon Jun 09, 2014 4:34 am**

SpyderTL wrote:This really should have been two separate topics: object oriented OS and XML as a programming language.

Does object oriented OS represent your OS or general OS design approach? It is very important difference.

Posted: **Mon Jun 09, 2014 3:07 pm**

embryo wrote:Does object oriented OS represent your OS or general OS design approach? It is very important difference.

For now, just the OS itself. Right now, the OS is made up of a boot loader (written in XML), a few interrupt handlers (written in XML), the console shell application (written in XML), and a bunch of classes (written in XML).

The boot loader, interrupt handlers and shell application are plain "flat" XML files that just contain commands and data, like Assembly (or a .com file).

The classes contain structured XML that includes methods that contain flat XML commands. Here is an example of the GetDate method on the System class:

Code: Select all

<cls:class name="System" static="true">
  <cls:method name="GetDate" type="Date" static="true">
    <!--Get Month-->
    <clk:GetMonth/>  <!--Returns the month value from the Real Time Clock.  Output: AL=Month-->

    <cpu:CopyRegisterToOperand8/>
    <op:AL-DHRegister/>

    <!--Get Day-->
    <clk:GetDayOfMonth/>  <!--Returns the day of the month value from the Real Time Clock.  Output: AL=DayOfMonth-->

    <cpu:CopyRegisterToOperand8/>
    <op:AL-DLRegister/>

    <!--Get Year-->
    <clk:GetYear/>  <!--Returns the year value from the Real Time Clock.  Output: AL=Year-->

    <cpu:CopyRegisterToOperand8/>
    <op:AL-CLRegister/>

    <!--Get Century-->
    <clk:GetCentury/>  <!--Returns the century value from the Real Time Clock.  Output: AL=Century-->

    <cpu:CopyRegisterToOperand8/>
    <op:AL-CHRegister/>

    <date:CreateObject/>  <!--Creates a new Date object.  Input: CX=Year DH=Month DL=Day  Output: DI=Object-->

    <cpu:ReturnToNearCaller/>
  </cls:method>

The clk: methods get replaced with code that reads the clock values from CMOS (and puts the results in AL). All of these command XML elements have documentation when you hover over them, but I copied the documentation and pasted it to the right, just to give you an idea of what it looks like.

Right now, the convention is that all class methods take the "this" object in the DI register (or at least, the address to it), and return the result object (if any) on the same DI register, and all of the other registers can be used freely. But that will probably change moving forward.

The copy register/operand stuff is a little complicated, but it matches the encoding of the instructions, byte-for-byte. You can see that, since I have autocomplete for free (thanks to the XSD files), I can make my instruction elements much more descriptive than just MOV, INC, JMP, etc.

Once the OS can manage its own objects, and those objects make up more of the OS than the "kernel" stuff, like the interrupt handlers, then I would say that the OS design would be Object Oriented.

Posted: **Mon Jun 09, 2014 3:15 pm**

Just FYI, in order to call the GetDate method above, the shell has to find the method by name using reflection (late binding...), and call an "ExecuteMethod" function that gets the method's entry point and jumps to it.

This is where things start looking more like Java (and .NET). Since Java doesn't mandate that your VM contain any specific structures, I imagine that you have something very similar to this behind embryo.

Posted: **Mon Jun 09, 2014 11:49 pm**

I uploaded my project to CodePlex, and posted a bootable ISO image, if you guys want to take a look.

http://ozone.codeplex.com

OSDev.org

Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS

Re: Object Oriented OS