Custom Programming Language
- mathematician
- Member
- Posts: 437
- Joined: Fri Dec 15, 2006 5:26 pm
- Location: Church Stretton Uk
Custom Programming Language
a.) 64 bit
b.) Run under Windows
c.) Produced ELF files.
If there had been a 64 bit version Open Watcom might have filled the bill. They have now got around to writing a 64 bit version of their compiler, but it is anybody's guess when it will appear.
Anyway, to cut a long story short, I have finally bitten the bullet, and decided to write my own compiler. I have also decided to come up with my own language whilst I was about it, and one very specifically intended for osdev. I have a specification for it, but before I get too far advanced with a compiler, I thought I might get some other people's comments, if they are interested to make any. The specification can be found here:
http://www.leslie-dellow.co.uk/aspl/ASP ... cation.pdf
Re: Custom Programming Language
What? You're seriously saying that nobody needs a logical not?There is no equivalent to C’s ! operator, because De Morgan’s Law seems to me to make it more or less redundant, and there is no point putting in things which seldom, or in my case never, get used. One other difference is that a typecast looks like:
The syntax looks familiar enough that I guess I would be able to write programs in it without too much difficulty. I do think that important features are missing (to name just one: no elseif in a language that requires end if?), but you'll probably see what you need when you actually write some code in it.
One small inconsistency is that you have end if/while/repeat/for, but <struct-type> end. Perhaps you should make that end <struct-type> as well. It's also funny that while you use mostly the same constructs as Pascal except that you removed the "begin", you actually add a "begin" for switch (and a redundant "end repeat", too), where Pascal doesn't have one.
- mathematician
- Member
- Posts: 437
- Joined: Fri Dec 15, 2006 5:26 pm
- Location: Church Stretton Uk
Re: Custom Programming Language
I Know I never use it.Kevin wrote:What? You're seriously saying that nobody needs a logical not?There is no equivalent to C’s ! operator, because De Morgan’s Law seems to me to make it more or less redundant, and there is no point putting in things which seldom, or in my case never, get used. One other difference is that a typecast looks like:
The majority of procedural languages seem to be either C like or Pascal like. I have pinched some ideas from Ada. I didn't see the point in running the two words else if into the single word elseif.The syntax looks familiar enough that I guess I would be able to write programs in it without too much difficulty. I do think that important features are missing (to name just one: no elseif in a language that requires end if?), but you'll probably see what you need when you actually write some code in it.
I will probably fix the inconsistency you mention. The begin after switch isn't strictly necessary; I just preferred 'begin' as a delimiter, rather than 'case', and the end repeat is there to delimit the conditional expression. C uses parentheses.One small inconsistency is that you have end if/while/repeat/for, but <struct-type> end. Perhaps you should make that end <struct-type> as well. It's also funny that while you use mostly the same constructs as Pascal except that you removed the "begin", you actually add a "begin" for switch (and a redundant "end repeat", too), where Pascal doesn't have one.
Re: Custom Programming Language
Without it, your code will look like this:mathematician wrote:The majority of procedural languages seem to be either C like or Pascal like. I have pinched some ideas from Ada. I didn't see the point in running the two words else if into the single word elseif.
Code: Select all
if a = 42 then
...
else
if b > 0 then
...
else
if c != a then
...
end if;
end if;
end if;
Isn't the semicolon good enough, like in Pascal?the end repeat is there to delimit the conditional expression. C uses parentheses.
- mathematician
- Member
- Posts: 437
- Joined: Fri Dec 15, 2006 5:26 pm
- Location: Church Stretton Uk
Re: Custom Programming Language
Kevin wrote:Without it, your code will look like this:mathematician wrote:The majority of procedural languages seem to be either C like or Pascal like. I have pinched some ideas from Ada. I didn't see the point in running the two words else if into the single word elseif.You could certainly write "else if" to hide that you're not formatting your source code like the syntax is defined, but you pile up lots of "end if"s at the end. The "end if" is the reason why Ada has an elsif in its syntax, whereas Pascal and C don't need it.Code: Select all
if a = 42 then ... else if b > 0 then ... else if c != a then ... end if; end if; end if;
if a = 42 thenthe end repeat is there to delimit the conditional expression. C uses parentheses.
...
else
if b > 0 then
...
else
if c != a then
...
end if;
end if;
end if;[/quote]
If I wrote it, it would look like:
Code: Select all
if a = 42 then
......
else if b > 0
......
else if c != a
......
endif
I am not sure how long the semi colon at the end of a line is going to survive. For the purposes of parsing the source, a newline would seem to be good enough. In fact I am not sure why I need to know when a new line occurs, except for the purposes of making sure there is a semi colon there. The tokens emerging from my lexer carry line numbers with them.Isn't the semicolon good enough, like in Pascal?
Re: Custom Programming Language
Yes. Except that you changed more than just making two words out of one, and therefore your code wouldn't compile. The last line needs to be "endif endif endif" rather than just "endif" with your syntax.mathematician wrote:If I wrote it, it would look like:
Running the two words into one would just seem to be stylistic.Code: Select all
if a = 42 then ...... else if b > 0 ...... else if c != a ...... endif
- mathematician
- Member
- Posts: 437
- Joined: Fri Dec 15, 2006 5:26 pm
- Location: Church Stretton Uk
Re: Custom Programming Language
Since I am writing the compiler, it will compile if I say so. The only question is whether or not it is unambiguous, and, so far as I can see, it is.Kevin wrote:Yes. Except that you changed more than just making two words out of one, and therefore your code wouldn't compile. The last line needs to be "endif endif endif" rather than just "endif" with your syntax.mathematician wrote:If I wrote it, it would look like:
Running the two words into one would just seem to be stylistic.Code: Select all
if a = 42 then ...... else if b > 0 ...... else if c != a ...... endif
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Custom Programming Language
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
- mathematician
- Member
- Posts: 437
- Joined: Fri Dec 15, 2006 5:26 pm
- Location: Church Stretton Uk
Re: Custom Programming Language
I'm game if you are. One possibility I am considering is to use the output of the front end as input for an interpreter, which can double as a debugger. The tokens which emerge from the scanner already have the beginnings of a byte code, because it is easier and quicker to compare numbers than it is to compare strings. Then the compiler could be written in its own language, with the result that, by the time it is finished, it would already have been used to compile one non trivial program.Schol-R-LEA wrote:If someone doesn't either get this discussion back on topic, or lock the thread, I gonna start discussing the advantages and disadvantages of hygienic macro systems as applied to assembly-level kernel development. At length. And no one wants that.
Re: Custom Programming Language
This is just random notes about the ASPL language specification.
If a character is always 4 bytes, then a character is UTF-32. If a string is always UTF-8, then doing something like "myString[2] = '9';" becomes a massive disaster. In my opinion it's better to only say "string literals are UTF-8" and "character constants are UTF-32" (and possibly also support UTF-32 string literals); and then don't provide any string or character data type at all and let the programmer build their own (using "typedef", arrays, whatever).
If the only other data types are byte, word, dword and qword; then how are people supposed to do floating point? Also don't forget that pointers and arrays are normally data types too; and that most languages support "user defined data types" (structures, unions, etc).
Also note that word, dword and qword are really bad names (they only make sense on 80x86 and any programmer familiar with other CPUs may mistake "word" for 32-bit or anything else. I'd recommend using u8, u16, u32, u64 instead. I'd also recommend getting rid of the "signed" type qualifier and having s8, s16, s32, s64 too. Finally, for bitfields, why not have "u3" for a 3-bit bitfield?
If assigning 9 to a 3-bit field is guaranteed to cause a compile time error, then how about "const byte foo := 9;" followed by "page.avail := foo;" or maybe "const byte foo := 3;" followed by "page.avail := foo*3;"?
If you do this for bitfields, do you also do it for other data types (e.g. "const word hexnum := 0x12345678;")? What about "word dayOfWeek range 1 to 7 := 9;"?
For integer ranges, why does the compiler need to be told the size? For example, instead of "word dayOfWeek range 1 to 7" why can't you just do "range 1 to 7 dayOfWeek" and let the compiler choose an underlying data type capable of storing values from 1 to 7?
For structures, can I do this:
Code: Select all
struct myStructure align 9
myStructure * next;
word something;
myStructure end;
For bitsets (continued from the earlier "u3 for a 3-bit bitfield" suggestion), why can't I just do:
Code: Select all
struct pgLevel1_t
u1 present;
u1 readWrite;
...
u40 pageAddress;
u11 avail;
u1 noExecute;
pgLevel1_t end;
How would De Morgan re-arrange something like 'if( !myBool1 or myBool2 )" to avoid the not? Is something like 'if( (myBool1 ^ true) or myBool2 )" easier to read? Given that your boolean operators are already words (and not symbols like "||"), is there any sane reason to refuse to support 'if( not myBool1 or myBool2 )"?
Where did all the relational operators go (==, <, <=, >, >=, !=)? Did you forget to put them in the "operator precedence" table, or is it impossible to do things like "if( myValue > 4)"?
If you only forgot the relational operators; what would "myBool := left < middle > right;" mean? Would it be the like "myBool := (left < middle) > right;" or would it be like "myBool := left (<middle>right);" where "<middle>" is a typecast, or would it depend on whether "middle" is a type or a variable?
For the "machine code" (assembly language and not machine code?); what does "movzx" look like? For example, would you do "movzxd eax,[esi]" and not know if it's a byte being extended or a word being extended? Does this apply to all instructions and not just "mov"?
What happens if I do "movd al,bl"? Does the assembler accept this (and if it does, which instruction would I actually get?), or does it give me an "incompatible operands" error. If it gives me an "incompatible operands" error then the assembler must've known the sizes of the operands; and if the assembler knows the sizes of the operands why does it need to be told the size to begin with?
Cheers,
Brendan
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Custom Programming Language
A macro definition (or simply a macro) is a piece of code within
a program which describes a set of transformations to be performed on another
set of code prior to that code being translated (either compiled or
interpreted).
A macro application is an instance of a macro being used to alter a
section of code.
The macro expansion of a given macro application is the resulting
code that is actual translated and executed.
To give a simple example using the familiar C pre-processor, the following
macro definition,
Code: Select all
#define square(x) x * x
assembly languages. It defines the name of the macro, a parameter, and a simple
set of string changes to make to the argument. One could apply it to a piece of
code, in this case a numerical literal,
Code: Select all
a = square(42);
Code: Select all
a = 42 * 42;
macro, is simply performing a string interpolation on the argument; that is to
say, it is replacing the the parameter with the text of the argument given to
the macro application, and inserting it verbatim into the body of the macro,
which then replaces the macro application. one could have written
Code: Select all
a = square(48 - 6);
Code: Select all
a = 48 - 6 * 48 - 6;
though they would appear on the surface to be identical semantically. This
highlights one of the flaws in naive macros of this sort, and introduced the
most basic issue of macro hygiene. avoiding duplicate evaluation.
The macros possible with the C preprocessor are extremely limited; they do not
compose effectively without heroic effort, they are generally restricted to a
single line of code (though they can span multiple lines with a line extension)
and they are rather simplistic in how they are expanded. They also represent
something of an exceptional case for the C language, as they do not match the
syntactic structures of the language overall. Macros, and the C pre-processor
directives in general, form a separate mini-language independent from C itself,
and indeed the pre-processor can be run separately from the compiler in most
implementations.
Similarly, the m4 macro processor, frequently used in conjunction with the
Unix as assembler, is actually a stand-alone program in it's own right, and
invoked automatically by the assembler. It is a significantly more sophisticated
macro processor than CPP, but it is still basically a textual macro expansion.
A key reason why these textual macros are limited is because they are separate
from the languages they pre-process. However, it is possible, in languages which
possess the property of homoiconicity - that is to say, the code
in which the language is written is itself a data structure of the language and
can be manipulated programmatically within the code without needing to call
out to a separate translator - can support a more powerful form of macro, known
as lexical macros. The classic examples of lexical macros are to be
found in the various dialects of the Lisp family of languages.
In Common Lisp. a macro is a piece of Lisp code the same as any other, except
that the compiler recognizes that it is a macro when it scans it and calls the
macro expander to perform the transformation at compile time, prior to compiling
the code. To use the simple example from before,
Code: Select all
(defmacro square (x)
`(* ,x ,x))
expansion of a given macro application would be, the macroexpand-1
special form. So, if we write
Code: Select all
(macroexpand-1 '(square 3))
Code: Select all
(* 3 3)
problem, as shown here:
Code: Select all
> (macroexpand-1 '(square (- 3 1)))
(* (- 3 1) (- 3 1)) ;
not have the consequences it did in C, thanks to a quirk of Lisp
syntax: because Lisp functions are always fully parenthesized, the specific
fault of mismatching the order of operations doesn't happen, though this
doesn't mean we're free of the duplicate evaluation problem.
If you look at the macro again, you should note the backquote and the commas;
these are relevant, as they indicate what parts of the code should be evaluated
at compile time and which at run time. To put it succinctly, a Lisp macro is a
Lisp function that is run before compile time, with it's expansion being the
function's output. The backquote says that the following part of the code
should be passed through to the expansion unaltered, except that it may
contain elements which should be evaluated, which are indicated by the commas.
Had we omitted these indicators,
Code: Select all
(defmacro square2 (x)
(* x x))
Code: Select all
> (macroexpand-1 '(square2 3))
9 ;
> (macroexpand-1 '(square2 (- 3 1)))
*** - *: (- 3 1) is not a number
expression evaluated at compile time, as the expression essentially becomes a
simple constant; but you can see in the second example the limitations of this,
as the arguments passed to the macro are not evaluated at all.
Because they have the full Common Lisp language available to them for
generating their expansion, Common Lisp macros do not have many of the
limitations that textual macros had. For example, one could write a macro that
does evaluate an arbitrary numeric argument to its square at compile
time:
Code: Select all
(defmacro square3 (x)
(let ((y (eval x)))
(if (numberp y)
(* y y)
nil)))
Code: Select all
> (macroexpand-1 '(square3 (- 3 1)))
4 ;
to y before it is squared, it avoids re-evaluating the expression;
no matter how complex the expression is, so long as the value is a number,
it will be evaluated exactly once, at compile time. Of course, this still has its
limitations: the expression must be evaluable at compile time, so it can't
depend on any variables entered by the user at run time, for example. Still,
this can be a useful property for certain applications.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Re: Custom Programming Language
The other question is just whether you're still implementing the language described by your spec... The grammar of the spec doesn't allow it, so if your compiler compiles it, it implements a different language.mathematician wrote:Since I am writing the compiler, it will compile if I say so. The only question is whether or not it is unambiguous, and, so far as I can see, it is.
- mathematician
- Member
- Posts: 437
- Joined: Fri Dec 15, 2006 5:26 pm
- Location: Church Stretton Uk
Re: Custom Programming Language
As I understand it, UTF-32 is no encoding at all, but just the code point - myString[2] = '9' would need the compiler to put in a function call.Brendan wrote:Hi,
This is just random notes about the ASPL language specification.
If a character is always 4 bytes, then a character is UTF-32. If a string is always UTF-8, then doing something like "myString[2] = '9';" becomes a massive disaster. In my opinion it's better to only say "string literals are UTF-8" and "character constants are UTF-32" (and possibly also support UTF-32 string literals); and then don't provide any string or character data type at all and let the programmer build their own (using "typedef", arrays, whatever).
pntr pntr word data[10]; will appear in the symbol table as a variable of type word, but two other fields will be set to:Also don't forget that pointers and arrays are normally data types too
levels (of indirection) = 2
elems = 10
As I say, ASPL is intended for one very specific task, and I cannot imagine that there will be very much call for floating point arithmetic in the kernel of an operating system. If there is, it can go into version 2.If the only other data types are byte, word, dword and qword; then how are people supposed to do floating point? Also don't forget that pointers and arrays are normally data types too; and that most languages support "user defined data types" (structures, unions, etc).
To be honest, they don't appeal to me for aesthetic reasons alone.Also note that word, dword and qword are really bad names (they only make sense on 80x86 and any programmer familiar with other CPUs may mistake "word" for 32-bit or anything else. I'd recommend using u8, u16, u32, u64 instead. I'd also recommend getting rid of the "signed" type qualifier and having s8, s16, s32, s64 too. Finally, for bitfields, why not have "u3" for a 3-bit bitfield?
There are all sorts of things which can and should be checked for at compile time. The fact that I haven't mentioned them does not mean that they will be left out. There are so many syntactic and semantic mistakes which could be made, the difficulty will be in making sure I have covered all the possibilities. I have already written part of the parser, and about every sixth line is error checking.If assigning 9 to a 3-bit field is guaranteed to cause a compile time error, then how about "const byte foo := 9;" followed by "page.avail := foo;" or maybe "const byte foo := 3;" followed by "page.avail := foo*3;"?
If you do this for bitfields, do you also do it for other data types (e.g. "const word hexnum := 0x12345678;")? What about "word dayOfWeek range 1 to 7 := 9;"?
Maybe it could, but typing "word" (or byte) doesn't seem any great hardship.For integer ranges, why does the compiler need to be told the size? For example, instead of "word dayOfWeek range 1 to 7" why can't you just do "range 1 to 7 dayOfWeek" and let the compiler choose an underlying data type capable of storing values from 1 to 7?
No, the only values the compiler will buy are 1, 2, 4 and 8.For structures, can I do this:Code: Select all
struct myStructure align 9 myStructure * next; word something; myStructure end;
Why would that be an improvement?For bitsets (continued from the earlier "u3 for a 3-bit bitfield" suggestion), why can't I just do:Code: Select all
struct pgLevel1_t u1 present; u1 readWrite; ... u40 pageAddress; u11 avail; u1 noExecute; pgLevel1_t end;
I tried writing some code in ASPL, although I obviously can't compile it yet, and x := x + 1, cropped up so often that it had to have special provision made for it. Admittedly that was before I had thought of x := + 1.The increment and decrement operators are a mistake. Because they have side effects they become a pain in the neck. A simple example would be "index += myArray[ incr index]".
The only time I envisage the "or" and "and" keywords being used is in conditional expressions, and not((x = 2) or (y > 10)) could just as easily (and in my opinion more naturally) be written as x != 2 and y <=10.How would De Morgan re-arrange something like 'if( !myBool1 or myBool2 )" to avoid the not? Is something like 'if( (myBool1 ^ true) or myBool2 )" easier to read? Given that your boolean operators are already words (and not symbols like "||"), is there any sane reason to refuse to support 'if( not myBool1 or myBool2 )"?
I didn't forget them but it seemed weird to lump them in with the arithmetical and bitwise operators; especially in a language without a boolean data type. In C I write if(x < (y + 2)) just to be on the safe side, but it seems to me intuitively obvious that the calculation y + 2 should come before the comparison. For (x < y) + 2 to even make sense, you have to postulate a boolean data type, cast it into an integer, and then add two to it.Where did all the relational operators go (==, <, <=, >, >=, !=)? Did you forget to put them in the "operator precedence" table, or is it impossible to do things like "if( myValue > 4)"?
Good question. Have you ever written an expression like that? Because I haven't. Any compiler I write will only look for conditional operators in connection with if,while.for or repeat instructions.If you only forgot the relational operators; what would "myBool := left < middle > right;" mean?
Writing the assembler isn't going to happen any time soon. I will be giving the mnemonics my fuller consideration nearer the time.For the "machine code" (assembly language and not machine code?); what does "movzx" look like? For example, would you do "movzxd eax,[esi]" and not know if it's a byte being extended or a word being extended? Does this apply to all instructions and not just "mov"?
What happens if I do "movd al,bl"? Does the assembler accept this (and if it does, which instruction would I actually get?), or does it give me an "incompatible operands" error. If it gives me an "incompatible operands" error then the assembler must've known the sizes of the operands; and if the assembler knows the sizes of the operands why does it need to be told the size to begin with?
Re: Custom Programming Language
In my opinion (for low level languages), everything should do what it looks like it does and additional overhead should not be hidden (including things like "myStruct1 = myStruct2", which should be banned as it hides a potentially costly copying operation).mathematician wrote:As I understand it, UTF-32 is no encoding at all, but just the code point - myString[2] = '9' would need the compiler to put in a function call.Brendan wrote:If a character is always 4 bytes, then a character is UTF-32. If a string is always UTF-8, then doing something like "myString[2] = '9';" becomes a massive disaster. In my opinion it's better to only say "string literals are UTF-8" and "character constants are UTF-32" (and possibly also support UTF-32 string literals); and then don't provide any string or character data type at all and let the programmer build their own (using "typedef", arrays, whatever).
Your function call would need to convert the codepoint into a variable length string of bytes. For something like this:
Code: Select all
dword foo = 0x87654321;
myString[2] = foo;
For something like this:
Code: Select all
myString[0] = firstChar;
myString[??] = secondChar;
For both of these reasons, the first thing any sane programmer is going to do is to write their own function that tests for errors and does return a "number of bytes". If that's the case, then why bother supporting it in the first place (e.g. rather than providing a library function that does it properly)?
In that case (building an entire tool-chain to use for one "80x86 kernel" binary and nothing else) the entire idea is a stupid waste of time - you'll spend more time writing and fixing the compiler than you actually spend using the compiler (especially if you want the resulting code to be optimised rather than extremely bad).mathematician wrote:As I say, ASPL is intended for one very specific task, and I cannot imagine that there will be very much call for floating point arithmetic in the kernel of an operating system. If there is, it can go into version 2.Brendan wrote:If the only other data types are byte, word, dword and qword; then how are people supposed to do floating point? Also don't forget that pointers and arrays are normally data types too; and that most languages support "user defined data types" (structures, unions, etc).
Given the choice between "aesthetics" and actually being good for the purpose it's designed, aesthetics are for people that don't belong anywhere near engineering who should be playing with Barbie dolls instead. Note: What I mean here is that "aesthetics" are not an acceptable reason. Perhaps when you said "for aesthetic reasons" you really meant to say something like "I think it's worse for code readability/maintainability" or "I think it's harder for the compiler to parse". These would have both been valid (but incorrect) reasons.mathematician wrote:They don't appeal to me for aesthetic reasons alone.Brendan wrote:Also note that word, dword and qword are really bad names (they only make sense on 80x86 and any programmer familiar with other CPUs may mistake "word" for 32-bit or anything else. I'd recommend using u8, u16, u32, u64 instead. I'd also recommend getting rid of the "signed" type qualifier and having s8, s16, s32, s64 too. Finally, for bitfields, why not have "u3" for a 3-bit bitfield?
I doubt you understand the complexity involved. For "page.avail := foo*3;" you need to do whole program analysis and prove that "foo" is always within the range 0 to 2.333.mathematician wrote:There are sorts of things which can and should be checked for at compile time.Brendan wrote:If assigning 9 to a 3-bit field is guaranteed to cause a compile time error, then how about "const byte foo := 9;" followed by "page.avail := foo;" or maybe "const byte foo := 3;" followed by "page.avail := foo*3;"?
If you do this for bitfields, do you also do it for other data types (e.g. "const word hexnum := 0x12345678;")? What about "word dayOfWeek range 1 to 7 := 9;"?
Except now you have type information in 2 places (on the left and right of the name) which will complicate parsing. For a simple example, consider:mathematician wrote:Maybe it could, but typing "word" (or byte) doesn't seem any great hardship.Brendan wrote:For integer ranges, why does the compiler need to be told the size? For example, instead of "word dayOfWeek range 1 to 7" why can't you just do "range 1 to 7 dayOfWeek" and let the compiler choose an underlying data type capable of storing values from 1 to 7?
Code: Select all
typedef range 1 to 12 myMonth_t;
typedef range 1 to 31 myDayOfMonth_t;
// typedef byte myMonth_t range 1 to 12;
// typedef byte myDayOfMonth_t range 1 to 31;
word birthYear;
myMonth_t birthMonth;
myDayOfMonth_t birthDayOfMonth;
Because it's simpler (for both humans and the compiler) than having different syntax for 2 almost identical things.mathematician wrote:Why would that be an improvement?Brendan wrote:For bitsets (continued from the earlier "u3 for a 3-bit bitfield" suggestion), why can't I just do:Code: Select all
struct pgLevel1_t u1 present; u1 readWrite; ... u40 pageAddress; u11 avail; u1 noExecute; pgLevel1_t end;
They could be statements instead of operators. E.g. "incr index; index += myArray[index];" andmathematician wrote:I tried writing some code in ASPL, although I obviously can't compile it yet, and x := x + 1, cropped up so often that it had to have special provision made for it. Admittedly that was before I had thought of x := + 1.Brendan wrote:The increment and decrement operators are a mistake. Because they have side effects they become a pain in the neck. A simple example would be "index += myArray[ incr index]".
"index += myArray[index]; incr index;" would both be valid, but "index += myArray[ incr index]" wouldn't be allowed.
That's short sighted at best:mathematician wrote:The only time I envisage the "or" and "and" keywords being used is in conditional expressions, and not((x = 2) or (y > 10)) could just as easily (and in my opinion more naturally) as x != 2 and y <=10.Brendan wrote:How would De Morgan re-arrange something like 'if( !myBool1 or myBool2 )" to avoid the not? Is something like 'if( (myBool1 ^ true) or myBool2 )" easier to read? Given that your boolean operators are already words (and not symbols like "||"), is there any sane reason to refuse to support 'if( not myBool1 or myBool2 )"?
Code: Select all
isPresent := myPage.present;
isUserSpace := myPage.pageAddress >= KERNEL_SPACE_START_ADDRESS;
if( not isPresent and isUserSpace) ) {
// Handle the most likely/expected case
allocPage();
} else {
// Handle the error cases
if( isPresent ) {
return ERR_PAGE_ALREADY_PRESENT;
} else if( not isUserSpace ) {
return ERR_PERMISSION_DENIED;
}
}
Um, what?mathematician wrote:I didn't forget them but it seemed weird to lump them in with the arithmetical and bitwise operators; especially in a language without a boolean data type. In C I write if(x < (y + 2)) just to be on the safe side, but it seems to me intuitively obvious that the calculation y + 2 should come before the comparison. For (x < y) + 2 to even make sense, you have to postulate a boolean data type, cast it into an integer, and then add two to it.Brendan wrote:Where did all the relational operators go (==, <, <=, >, >=, !=)? Did you forget to put them in the "operator precedence" table, or is it impossible to do things like "if( myValue > 4)"?
There are only 2 sane choices:
- You do have a boolean data type; and therefore your type checker can check if (e.g.) the operands of a boolean OR are both booleans, and can check if (e.g.) the operands to an addition are both numbers.
- You do not have a boolean data type (so people just use integers instead); and therefore your type checker can not check if (e.g.) the operands of a boolean OR are both booleans, and can not check if (e.g.) the operands to an addition are both numbers.
Also don't forget that the compiler converts expressions into AST (which is where your precedence rules are taken into account), and the type checker works recursively on the AST. For example, the expression "x + (y < z)" might get converted into the AST:
Code: Select all
expression
|_addition
|_x
|_less than
|_y
|_z
Code: Select all
expression
|_addition
|_type=number
|_less than
|_type=number
|_type=number
Code: Select all
expression
|_addition
|_type=number
|_type=number
Code: Select all
expression
|_type=number
Code: Select all
expression
|_addition
|_type=number
|_type=boolean
Note: The type checker typically does a lot more than this - I've simplified a lot to make the recursive nature of it clear.
Now, try to explain how you're planning on doing parsing and type checking when there are no relational operators. For example, for the expression "(x > y) and (x < z)" describe what the AST would look like and how the type checker would check it.
Is this some sort of joke? When given correct source code a compiler must generate correct output (and when given invalid source code the compiler must generate a hopefully meaningful error message). Whether or not it's likely to occur in source code is entirely irrelevant.mathematician wrote:Good question. Have you ever written an expression like that? Because I haven't.Brendan wrote:If you only forgot the relational operators; what would "myBool := left < middle > right;" mean?
Is it valid? If it is valid what is the correct output?
And then you'll get to common sub-expression elimination and realise that you end up with conditional operators in expressions anyway. For a simple example, this:mathematician wrote:Any compiler I write will only look for conditional operators in connection with if,while.for or repeat instructions.
Code: Select all
if(x < y*3) {
foo();
}
if( (x < y*3) and (z = 3) ) {
bar();
}
Code: Select all
temp = x < y*3;
if(temp) {
foo();
}
if( (temp) and (z = 3) ) {
bar();
}
Cheers,
Brendan
- mathematician
- Member
- Posts: 437
- Joined: Fri Dec 15, 2006 5:26 pm
- Location: Church Stretton Uk
Re: Custom Programming Language
Well maybe. I already have a range of functions for UTF-8 strings, written in C. When the time comes it might make sense to rewrite them in ASPL; especially if I decide to bootstrap the compiler. However, "string" will have to be a type, because they are not simply arrays, as they are in C. The variable length characters means that some additional information needs to be embedded with them.Brendan wrote:In my opinion (for low level languages), everything should do what it looks like it does and additional overhead should not be hidden (including things like "myStruct1 = myStruct2", which should be banned as it hides a potentially costly copying operation).
Functions for doing the conversion, in both directions, already exist, but, having written them, I think I have yet to use them.Your function call would need to convert the codepoint into a variable length string of bytes. For something like this:The codepoint is not valid and can't be converted to UTF-8; but you have no way to return an error from your function.Code: Select all
dword foo = 0x87654321; myString[2] = foo;
The number of bytes a UTF-8 character occupies is encoded into the first byte of that character.For something like this:There's no way to determine how many bytes the first character consumed, and no way to figure out where to put the second character.Code: Select all
myString[0] = firstChar; myString[??] = secondChar;
As I have said there already exist functions which the compiler either has, or probably will, use internally. Later on they could possibly be rewritten in ASPL.For both of these reasons, the first thing any sane programmer is going to do is to write their own function that tests for errors and does return a "number of bytes". If that's the case, then why bother supporting it in the first place (e.g. rather than providing a library function that does it properly)?
In a sense, writing an operating system which isn't going to challenge Windows, or even Linux, is pretty stupid. For the purposes of writing the code you have to pretend to yourself that other people will be using your system, but, away from the keyboard, you know full well that that is highly unlikely to be the case. The only chance you might stand is if you could find a niche market whose needs couldn't be met by an application running on top of an existing OS. Failing that, you do it just for the hell of it.In that case (building an entire tool-chain to use for one "80x86 kernel" binary and nothing else) the entire idea is a stupid waste of time - you'll spend more time writing and fixing the compiler than you actually spend using the compiler (especially if you want the resulting code to be optimised rather than extremely bad).
i meant what I said. What the data types are called is a fairly marginal issue, and the ones you suggested just strike me as ugly. Why code it in a high or medium level language at all, rather than just assembly code? If you thought your compiler and/or operating system was going to take over the world, the answer might be portability, but, for me personally, it is aesthetics. I prefer the look of the code on the page.Given the choice between "aesthetics" and actually being good for the purpose it's designed, aesthetics are for people that don't belong anywhere near engineering who should be playing with Barbie dolls instead. Note: What I mean here is that "aesthetics" are not an acceptable reason. Perhaps when you said "for aesthetic reasons" you really meant to say something like "I think it's worse for code readability/maintainability" or "I think it's harder for the compiler to parse". These would have both been valid (but incorrect) reasons.
If you are interested, the type names are now, int8_t, int16_t, int32_t, int64_t, int8s_t, int16s)t, int32s_t, int64s_t.
Only if checking that in z = x/y, checking that y could never be zero is also a job you would give to the compiler. As for constants, checking that 9 isn't being or'ed into a three bit field would be straightforward, because the structure (which already exists) would record details about the field, one of which is the maximum value it can hold.I doubt you understand the complexity involved. For "page.avail := foo*3;" you need to do whole program analysis and prove that "foo" is always within the range 0 to 2.333.
mathematician wrote:Maybe it could, but typing "word" (or byte) doesn't seem any great hardship.Brendan wrote:For integer ranges, why does the compiler need to be told the size? For example, instead of "word dayOfWeek range 1 to 7" why can't you just do "range 1 to 7 dayOfWeek" and let the compiler choose an underlying data type capable of storing values from 1 to 7?
That a letter followed by a number on the left hand side is preferable to a colon followed by a number n the right hand side, is something about which we will have to disagree.Because it's simpler (for both humans and the compiler) than having different syntax for 2 almost identical things.
That's short sighted at best:
Code: Select all
isPresent := myPage.present;
isUserSpace := myPage.pageAddress >= KERNEL_SPACE_START_ADDRESS;
if( not isPresent and isUserSpace) ) {
// Handle the most likely/expected case
allocPage();
} else {
// Handle the error cases
if( isPresent ) {
return ERR_PAGE_ALREADY_PRESENT;
} else if( not isUserSpace ) {
return ERR_PERMISSION_DENIED;
}
}
For good or ill, I would probably write
Code: Select all
if myPage.present = 0 and myPage.pageAddress < KERNEL_SPACE_START_ADDRESS
allocPage()
[/quote]Um, what?
There are only 2 sane choices:For both cases, the relational operators are still operators, and the operands to the relational operators are numbers. The only difference is what the resulting type is - e.g. if there is a boolean type then something like "x < y" has a result of type boolean; and if there isn't any boolean type then something like "x < y" has a result of type number.
- You do have a boolean data type; and therefore your type checker can check if (e.g.) the operands of a boolean OR are both booleans, and can check if (e.g.) the operands to an addition are both numbers.
- You do not have a boolean data type (so people just use integers instead); and therefore your type checker can not check if (e.g.) the operands of a boolean OR are both booleans, and can not check if (e.g.) the operands to an addition are both numbers.
Also don't forget that the compiler converts expressions into AST (which is where your precedence rules are taken into account), and the type checker works recursively on the AST. For example, the expression "x + (y < z)" might get converted into the AST:The type checker starts at the top and tries to find the resulting type of "addition". To do that it has to find the types of both "x" and "less than" (which is where it gets recursive). To do that it has to find the types of "y" and "z". This means that (after all the recursion) it becomes like this:Code: Select all
expression |_addition |_x |_less than |_y |_z
Then it checks the operands for "less than", then finds the resulting type of "less than":Code: Select all
expression |_addition |_type=number |_less than |_type=number |_type=number
Then checks the operands for "addition", then finds the resulting type of "addition":Code: Select all
expression |_addition |_type=number |_type=number
And now it's completed the type checking. Notice that (because there was no boolean type) the resulting type of the "less than" had to be "type=number"? This is what makes it impossible for you to generate an error when someone tries to add a boolean to a number. If there was a boolean type, then it would've become:Code: Select all
expression |_type=number
And in this case the type checker can/would complain that addition expects a number type.Code: Select all
expression |_addition |_type=number |_type=boolean
Note: The type checker typically does a lot more than this - I've simplified a lot to make the recursive nature of it clear.
Now, try to explain how you're planning on doing parsing and type checking when there are no relational operators. For example, for the expression "(x > y) and (x < z)" describe what the AST would look like and how the type checker would check it.
Is this some sort of joke? When given correct source code a compiler must generate correct output (and when given invalid source code the compiler must generate a hopefully meaningful error message). Whether or not it's likely to occur in source code is entirely irrelevant.mathematician wrote:Good question. Have you ever written an expression like that? Because I haven't.Brendan wrote:If you only forgot the relational operators; what would "myBool := left < middle > right;" mean?
Is it valid? If it is valid what is the correct output?
And then you'll get to common sub-expression elimination and realise that you end up with conditional operators in expressions anyway. For a simple example, this:mathematician wrote:Any compiler I write will only look for conditional operators in connection with if,while.for or repeat instructions.Should be converted/optimised into:Code: Select all
if(x < y*3) { foo(); } if( (x < y*3) and (z = 3) ) { bar(); }
Code: Select all
temp = x < y*3; if(temp) { foo(); } if( (temp) and (z = 3) ) { bar(); }
Cheers,
Brendan
As I said, I only envisage conditional expressions occurring in the context of if or looping instructions. So to take an example
Code: Select all
if a + b > z then
x = 2;
else
x = 3;
Code: Select all
temp = a+b
cmp temp, z
jump_if_less_than_or_equal lab_1
x = 2
jump lab_2
lab_1
x = 3
lab_2