Page 8 of 11

Re: Why are ASM hobby OS more successful than other language

Posted: Tue Dec 13, 2011 3:59 pm
by Combuster
turdus wrote:You totally does not understand
It's "do", and the feeling is mutual.

Re: Why are ASM hobby OS more successful than other language

Posted: Tue Dec 13, 2011 4:35 pm
by Rusky
Unnamed problems? I've named the null-pointer problem several times, but off the top of my head there's also array bounds checking, null-terminated strings, dangling pointers, uninitialized values, undefined behavior, array/function pointer decay, implicit type conversions, poor handling of co/contravariant pointers, and a string-based macro-system.

I'm not saying C makes things Unix-like, or that it makes these problems unsolvable, or even that you shouldn't use C. I'm saying C embodies a lot of these problems, and that Unix is a kind of local maximum in operating system design, so "to an extent," using C means you inherit the problems of that C/Unix design.

There are languages that solve these problems. Individually they don't solve all of them, or aren't well suited to systems programming, or don't have support in the areas you mention. I agree that mainstream acceptance is important, but that doesn't mean people should "turn a cold shoulder" on things that aren't there (yet).

Re: Why are ASM hobby OS more successful than other language

Posted: Tue Dec 13, 2011 11:55 pm
by Solar
...aaaaaand didn't add anything to what was said before.

There's also the problem of world hunger. I blame martians. We should all pray.

(Zero-terminated strings? Really? I had no idea! Of course, you are right, how come we didn't see the superiority of the alternative you suggested right away?)

Re: Why are ASM hobby OS more successful than other language

Posted: Wed Dec 14, 2011 8:43 am
by Rusky
First all the problems were unnamed, but then when I named them I didn't add anything?

Solutions are pretty widely available, as I said, but here are some possibilities anyway:

null pointers - don't include null as a valid value in the default pointer type; replace their use for optional types with an Option type, replace their use as sentinels with array lengths (see below), etc.

array bounds checking - specify array size as part of its type, don't let arrays decay into pointers so the type is kept across calls/etc., let compiler check for possible overruns and only force runtime checks when necessary (yes, this is possible).

null-terminated strings - strings are arrays of chars, so their length gets stored with their type. It can thus be reified in the program without looping over the string, and there are no more security holes by default with strcpy et. al.

uninitialized values/dangling pointers - use typestate to make accessing an uninitialized value, an already-freed pointer, or otherwise non-useful value illegal.

undefined behavior - a lot of undefined behavior should instead be illegal- violating sequence points is trivial for the compiler to detect, using undefined values or freed pointers is solvable (see above).

implicit type conversions - make far, far less of them so that accidentally losing information or causing undefined behavior is impossible.

co/contravariance - this is easier to avoid in C, but this code should be illegal and is not even given a warning (again, by default):

Code: Select all

struct A { int a; };
struct B : public A { int b; };
A *as = new B[10];
string-based macro system - Lisp has had real macros since the 70s, and nasm has a good macro system even for assembly. Why are we still pasting strings together in C? Why not bits of AST that are immune to e.g. precedence issues that force parenthesizing every argument?

A good language to look at that attempts some of these in a systems programming context is Rust.

Re: Why are ASM hobby OS more successful than other language

Posted: Wed Dec 14, 2011 9:24 am
by Solar
Rusky wrote:First all the problems were unnamed, but then when I named them I didn't add anything?
Now this post actually adds something tangible.

What follows is not an attempt to say "there is no problem", but an attempt to show that "it's not that much of a problem". I will use C++, simply because it's a) a language I am familiar with, b) a system programming language, and c) addresses most of the issues you have.
null pointers - don't include null as a valid value in the default pointer type; replace their use for optional types with an Option type, replace their use as sentinels with array lengths (see below), etc.
As I said before, a NullPointerException goes a long way towards handling what was fatal to any C app. Boost::Option exists, std::vector and std::string exist.
array bounds checking - specify array size as part of its type, don't let arrays decay into pointers so the type is kept across calls/etc.,
std::vector.
...let compiler check for possible overruns and only force runtime checks when necessary (yes, this is possible).
Point to you. I'm not sure how good C++ compiler warnings are in this regard.
null-terminated strings - strings are arrays of chars, so their length gets stored with their type. It can thus be reified in the program without looping over the string, and there are no more security holes by default with strcpy et. al.
std::string.
uninitialized values/dangling pointers - use typestate to make accessing an uninitialized value, an already-freed pointer, or otherwise non-useful value illegal.
-Wall -Wextra -Werror goes a looong way towards this end.
undefined behavior - a lot of undefined behavior should instead be illegal- violating sequence points is trivial for the compiler to detect, using undefined values or freed pointers is solvable (see above).
-Wall -Wextra -Werror plus any of std::*_ptr, again don't solve the problem, but take much of the edge off it.
implicit type conversions - make far, far less of them so that accidentally losing information or causing undefined behavior is impossible.
Here, C++ wins over C without moving a finger. ;-)
co/contravariance - this is easier to avoid in C, but this code should be illegal and is not even given a warning (again, by default)...
ACK. While it is easy to avoid the issue, the fact that the compiler turns a blind eye is bad.
string-based macro system - Lisp has had real macros since the 70s, and nasm has a good macro system even for assembly. Why are we still pasting strings together in C? Why not bits of AST that are immune to e.g. precedence issues that force parenthesizing every argument?
Basically, you are free to use whatever macro system you want to use, even in C. You don't have to rely on the C preprocessor.

As for macros in C++ code, they fall into the same category as arrays in C++ code: You do, you die. At least as long as I can get away with tossing people out of the window. ;-)
A good language to look at that attempts some of these in a systems programming context is Rust.
And here we get right down to the point: I admit that C/C++ have a couple of weak spots. Some of them got better over time, some remain firmly rooted in their 197x / 198x heritage.

But they are here, they are known, they are supported. Every programmer interested in system programming knows them, or will happily learn them as this knowledge significantly improves his CV. The footholes are well known and easily avoided by the careful programmer.

Rust? Looks nice at first glance. I'll come back in five years to find out if it has had its first release, and what other coders have to say about it.

All this is not meant to flame or to be derogatory. Yes, there are some issues in C and its descendants. But those descendants have evolved, which shouldn't be ignored for sake of making a point, and their ubiquitousness is a major factor. There's a reason why virtually all tutorials on OS development are in C or C++...

Re: Why are ASM hobby OS more successful than other language

Posted: Wed Dec 14, 2011 10:10 am
by Brendan
Hi,

Rusky wrote:First all the problems were unnamed, but then when I named them I didn't add anything?

Solutions are pretty widely available, as I said, but here are some possibilities anyway:
Let's have a practical example.

Let's assume that the boot loader is responsible for finding ACPI tables and telling the kernel the address of the RSDT. During kernel initialisation you want to parse the ACPI tables. Show us the code in the kernel that would replace "int parse_ACPI_tables(void *address_of_RSDT);".

This function should:
  • handle the "no ACPI tables present" case (e.g. "if(address_of_ACPI_tables == NULL)")
  • check the checksum for the RSDT
  • display the "OEMID" string from the RSDT
  • for each table pointed to by the array of pointers in the RSDT, it should:
    • check the checksum for that table, and:
      • if the checksum is OK, display a message that includes the table's signature
      • if the checksum failed, display a message that says the checksum failed
  • if all tables had valid checksums, then for each table pointed to by the array of pointers in the RSDT, it should:
    • if the signature is known, call a function to parse that table based on the table's signature (e.g. call one function if the signature is 'APIC', a different function if the signature is 'FADT', etc). You only need to provide one of these functions for parsing the MADT (table signature 'APIC') - assume the rest are "to be implemented later"
    • if the signature is not known, skip the table
The function to parse the ACPI MADT table (signature 'APIC') only needs to:
  • Display the APIC ID for any "IO APIC Structures" it finds
  • Display the APIC ID for any "Processor Local APIC Structures" it finds
  • Correctly ignore/skip all other structures in the MADT
The code doesn't need to compile on any compiler (e.g. it can be an example for a proposed new language that doesn't exist); but the code must be adequately commented so that at least 50% of the forum members are able to understand exactly what it is doing (even if none of the forum members have seen source code for whatever language you choose to use).

You can assume that (if the RSDT is present) the RSDT can be accessed directly from your kernel code via. the address supplied by the boot loader (e.g. no need to map physical pages into the kernel's virtual address space, no need to mess with segment bases, etc).

The latest version of the ACPI specification can be found here, but it doesn't matter if you use an earlier version of the specification if you want.

Good luck!


Cheers,

Brendan

Re: Why are ASM hobby OS more successful than other language

Posted: Wed Dec 14, 2011 11:19 am
by rdos
Brendan wrote:Let's assume that the boot loader is responsible for finding ACPI tables and telling the kernel the address of the RSDT.
Finding the base in the kernel is simple enough. A few lines of assembly is all that is needed. It gets more complicated if paging is enabled, since then there is a need to mess with paging.
Brendan wrote:During kernel initialisation you want to parse the ACPI tables. Show us the code in the kernel that would replace "int parse_ACPI_tables(void *address_of_RSDT);".

This function should:
  • handle the "no ACPI tables present" case (e.g. "if(address_of_ACPI_tables == NULL)")
  • check the checksum for the RSDT
  • display the "OEMID" string from the RSDT
  • for each table pointed to by the array of pointers in the RSDT, it should:
    • check the checksum for that table, and:
      • if the checksum is OK, display a message that includes the table's signature
      • if the checksum failed, display a message that says the checksum failed
  • if all tables had valid checksums, then for each table pointed to by the array of pointers in the RSDT, it should:
    • if the signature is known, call a function to parse that table based on the table's signature (e.g. call one function if the signature is 'APIC', a different function if the signature is 'FADT', etc). You only need to provide one of these functions for parsing the MADT (table signature 'APIC') - assume the rest are "to be implemented later"
    • if the signature is not known, skip the table
We have ACPICA for that. :mrgreen:

However, ACPICA requires a set of functions to operate, and those are often not supported in your typical C compiler. For instance, C doesn't have IO-access, PCI-access, synchronization, and physical memory access.
Brendan wrote:The latest version of the ACPI specification can be found here, but it doesn't matter if you use an earlier version of the specification if you want.
Understanding ACPI and building an assembler-based handler seemed a little too complex issue with too little use, so I went into a lot of work to port the clib to kernel-space instead. :mrgreen:

BTW, even if using ACPICA, I don't see the functions I really need, like give me the IRQ for this PCI-device. That requires writing a lot of code to provide (which I'm currently doing, in C). At least now I have a list of devices and their resource uses and the PCI IRQ redirection table, which might be enough to solve that mapping issue. I plan to write a tool to list relevant device-info, which would be good for debugging.

When it comes to raw-table access (MADT and HPET), I let the device-drivers themselves parse them (in assembly). I don't see what use I would have for C or ACPICA there.

Re: Why are ASM hobby OS more successful than other language

Posted: Wed Dec 14, 2011 12:02 pm
by Brendan
Hi,
rdos wrote:
Brendan wrote:Let's assume that the boot loader is responsible for finding ACPI tables and telling the kernel the address of the RSDT.
Finding the base in the kernel is simple enough. A few lines of assembly is all that is needed. It gets more complicated if paging is enabled, since then there is a need to mess with paging.
Assume the kernel can be booted from boot loaders designed for PC BIOS and UEFI. For UEFI you ask the firmware for the address of the ACPI tables and don't do a stupid (cache thrashing) search for them.
rdos wrote:
Brendan wrote:During kernel initialisation you want to parse the ACPI tables. Show us the code in the kernel that would replace "int parse_ACPI_tables(void *address_of_RSDT);".
We have ACPICA for that. :mrgreen:
You missed the entire point of the exercise.

The idea is to get Rusky to show that his idea of a system programming language is actually capable of being used for system programming. Rusky isn't silly though - if I asked him to re-implement ACPICA in a language with strong type checking, array bounds checking, no "void *", no NULL, etc; then he'd probably be smart enough to refuse.


Cheers,

Brendan

Re: Why are ASM hobby OS more successful than other language

Posted: Wed Dec 14, 2011 12:18 pm
by rdos
Brendan wrote:Assume the kernel can be booted from boot loaders designed for PC BIOS and UEFI. For UEFI you ask the firmware for the address of the ACPI tables and don't do a stupid (cache thrashing) search for them.
That would make sense then.
Brendan wrote:You missed the entire point of the exercise.

The idea is to get Rusky to show that his idea of a system programming language is actually capable of being used for system programming. Rusky isn't silly though - if I asked him to re-implement ACPICA in a language with strong type checking, array bounds checking, no "void *", no NULL, etc; then he'd probably be smart enough to refuse.
:lol:

Yes, but my point was that you don't want to rewrite something for ACPI when all you need from ACPI is IRQ-mappings (basically). The other tables (MADT, HPET) can easily be parsed in any language (possibly in combination with assembly).

Re: Why are ASM hobby OS more successful than other language

Posted: Wed Dec 14, 2011 4:10 pm
by Combuster
..let compiler check for possible overruns and only force runtime checks when necessary (yes, this is possible).
Actually, it's check when the compiler can not prove it's safe. In other words, possible in a subset of array accesses. We demonstrated that fact earlier in another thread.

Re: Why are ASM hobby OS more successful than other language

Posted: Wed Dec 14, 2011 8:25 pm
by Rusky
C and C++ have a lot of support, a lot of users, serve well for systems programming, and have libraries, features, or compiler warnings to address the issues with their "vanilla" forms- definitely agreed. However, these options often aren't completely satisfactory, add runtime overhead, needlessly complicate things, or don't actually prevent incorrect code from running.

Language features and compiler-enforced rules would be an improvement because they would be the default, they would be more efficient, and they would be able to do much more than a separate static analyzer. Let's continue to evolve what's available for systems programming- how do you think C++ got started?

NullPointerExceptions only happen at runtime, they cannot be caught at compile time once null is allowed. It's the same problem as dynamic typing- that's essentially what pointers are, as the values null/dangling pointer and valid address do not support the same set of operations. C++ references solve the typing problem, but they don't allow pointer arithmetic without re-entering the dynamically-typed, compiler-unchecked world of C-style pointers.

Containers like boost::option, std::vector, and std::string are good, but they aren't perfect. std::vector and std::string add overhead that can often be avoided, and don't always guarantee in-bounds access. They can't be used for everything (including things like parsing ACPI tables), whereas simple first-class arrays are more transparent and don't have these problems. Think of them as this class, with compiler-checked access:

Code: Select all

template <typename T, int len>
class array {
private:
    T data[len];
public:
    T &operator[](int i) { return data[i]; }
};
C++ doesn't actually solve the problems of C's implicit type conversions. For example, this compiles with no errors or warnings as both C and C++, even with -Wall -Wextra -Werror -pedantic:

Code: Select all

int main() { char c; int a = 0xffff0000; c = a; return c; }
Macro systems are not entirely separate from their languages- especially Lisp macros, which were my first example because they are so much more powerful and useful than either the C preprocessor or C++'s various replacement features. Slapping another preprocessor on top doesn't let macros manipulate abstract syntax trees and in-language values to do things like generate tables, create custom control structures, or design DSLs.

Re: Why are ASM hobby OS more successful than other language

Posted: Wed Dec 14, 2011 8:25 pm
by Rusky
Brendan's challenge is somewhat biased because ACPI is designed for C, and its ABI would likely be different if another language had been dominant when it was written (yes, this is an advantage of C, but we all agree on that point). I also don't have an existing language to use so the syntax will probably be pretty rough and not everything will be implemented the best way possible, but I'll give it a shot:

Code: Select all

// this implements a rudementary form of inheritance polymorphism and/or algebraic data types
// depending on your preference, these could be in the language so long as they have a well-defined abi
// this would probably decrease the amount of casting, which would be good

acpi_header: struct = {
	signature: char[4]
	length: uint32
	revision: byte
	checksum: byte
	oem_id: char[6]
	oem_table_id: char[8]
	oem_revision: uint32
	creator_id: char[4]
	creator_revision: char[4]
}

acpi_rsdt: struct = {
	header: acpi_header = { signature = "RSDT" }

	// this is not the most elegant solution and is probably one thing I would change
	tables: acpi_header*[header.length / 4 - sizeof header / 4]
}

acpi_madt: struct = {
	header: acpi_header = {
		signature = "APIC"
		revision = 3
	}
	lapic: apic_lapic*
	flags: uint32
	controllers: acpi_madt_header*[header.length / 4 - sizeof header / 4]
}

acpi_madt_header: struct = {
	type: byte
	length: byte
}

acpi_madt_lapic: struct = {
	header: acpi_madt_header = {
		type = 0
		length = sizeof acpi_madt_ioapic
	}
	processor_id: byte
	apic_id: byte
	flags: uint32
}

acpi_madt_ioapic: struct = {
	header: acpi_madt_header = {
		type = 1
		length = sizeof acpi_madt_ioapic
	}
	ioapic_id: byte
	reserved: byte = 0
	address: apic_ioapic*
	interrupt_start: uint32
}

// the bootloader is responsible for providing an optional pointer to the rsdt
// it must conform to the language's abi, potentially exactly the same as a C acpi_rsdt*
// the memory pointed to, if not null, must be verified or trusted by the bootloader to conform to the type above
kernel_entry: (..., optional_rsdt: acpi_rsdt?, ...) -> () = {
	...
	
	// before dereferencing optional_rsdt, it must be null-checked
	// if the pointer were not optional, it could be typed "acpi_rsdt*", again with the bootloader responsible for self-verification and abi compatibility
	match (optional_rsdt) {
		real_rsdt: acpi_rsdt* -> parse_acpi_tables(real_rsdt)
		_ -> /* no rsdt */
	}

	...
}

checksum: (contents: byte[len]*, checksum: byte): (bool) = {
	// not sure this is the correct way to calculate the checksum, but it's illustrative enough...
	// bytes is a pointer to an array of bytes, inferred to be the size of an acpi_rsdt
	// sum is a standard, generic function of type array of addables to that same type
	checksum: byte = sum(bytes)
	return checksum == 0
}

parse_acpi_tables: (rsdt: acpi_rsdt*) -> (results, errors, etc) = {
	assert(checksum(byte[]*(rsdt)))

	// straightforward enough, just note that print has varargs of type printable*
	print("acpi oemid: "&, oem_id&, "\n"&)

	pass: bool = true
	for (table in rsdt->tables) { // this could be a macro, or whatever else you want
		match (checksum(byte[table->length]*(table))) {
			true -> print(table->signature&, " passed\n"&)
			false -> {
				print(table->signature&, " failed\n"&)
				pass = false
			}
		}
	}

	// based on your wording I think this is what you meant, but this could be folded into the above loop instead
	if (pass) {
		for (table in rsdt.tables) {
			// matching on strings is possible because they're just first-class array values
			match (table->signature) {
				"APIC" -> parse_madt(acpi_madt*(table))
				...
			}
		}
	}
	
	...
}

parse_madt: (madt: acpi_madt*) -> (ditto) = {
	...

	for (controller in madt->controllers) {
		// this is a place where algebraic data types/dynamic_cast-equivalent would help
		// instead of matching on the type using symbols, it would match based on type, like:
		// lapic: acpi_madt_lapic* -> ...
		// ioapic: acpi_madt_ioapic* -> ...
		match (controller->header.type) {
			// these should be symbolic constants.
			0 -> print("lapic "&, acpi_madt_lapic*(controller)->apic_id&, "\n"&)
			1 -> print("ioapic "&, acpi_madt_ioapic*(controller)->ioapic_id&, "\n"&)
			...
		}
	}

	...
}

Re: Why are ASM hobby OS more successful than other language

Posted: Thu Dec 15, 2011 9:52 am
by Brendan
Hi,
Rusky wrote:Brendan's challenge is somewhat biased because ACPI is designed for C, and its ABI would likely be different if another language had been dominant when it was written (yes, this is an advantage of C, but we all agree on that point).
It's just parsing data. You'd have similar problems parsing most (non-text) file formats, except that "pointer to something" would be "offset of something within file" (and you'd still need type casts, just they'd be like "my_type *foo = (my_type *)&byte_array[offset]" instead).
Rusky wrote:I also don't have an existing language to use so the syntax will probably be pretty rough and not everything will be implemented the best way possible, but I'll give it a shot:
This code is like C, except that:
  • some keywords were renamed (e.g. "match" instead of "switch")
  • semicolons were replaced with end of line, and a few other trivial changes in syntax
  • the order things appear in declerations/definitions was changed (e.g. "checksum: (contents: byte[len]*, checksum: byte): (bool)" instead of "bool checksum(byte[len] *contents, byte checksum)"
  • you stole "cout" from C++ (and renamed it to "print")
  • there's support for returning multiple values from functions
  • the compiler is able to make (potentially wrong) assumptions about the sizes of arrays (e.g. "checksum(byte[]*(rsdt))")
  • the compiler is able to handle arrays of mixed types (e.g. "for (controller in madt->controllers)")
For the code itself, the amount of validation done is lacking. For example, what happens if the MADT is 200 bytes but has a 16-byte structure that starts at offset 198? What happens if the length/size of the RSDT is odd (and there's only space for part of a pointer at the end of the table of pointers)? Does the compiler automatically generate code to check these things; and if it does, how does it report errors? Should the entire thing by wrapped in "try/catch" exception handling?

I also don't think that the ability to handle arrays/lists of mixed types is realistic; especially for "foreign" structures where the compiler can't insert hidden fields to make it easier (e.g. turn it into a linked list of "union { struct; struct; struct;}"). It's the sort of feature that sounds easy to do until you attempt to implement a compiler that does it. Perhaps this is why you couldn't find an existing language? ;)


Cheers,

Brendan

Re: Why are ASM hobby OS more successful than other language

Posted: Thu Dec 15, 2011 12:15 pm
by Rusky
Brendan wrote:It's just parsing data.
Yes, but which data? You can specify sizes and such in a variety of different ways, and using e.g. a length in table entries rather than a byte length or null terminator makes it nicer for this imaginary language.
Brendan wrote:This code is like C, except that:
  • some keywords were renamed (e.g. "match" instead of "switch")
  • semicolons were replaced with end of line, and a few other trivial changes in syntax
  • the order things appear in declerations/definitions was changed (e.g. "checksum: (contents: byte[len]*, checksum: byte): (bool)" instead of "bool checksum(byte[len] *contents, byte checksum)"
  • you stole "cout" from C++ (and renamed it to "print")
  • there's support for returning multiple values from functions
This was mostly to emphasize the difference from C. By changing the little things it makes it obvious it's not the same, at a glance. What did you expect?

However, note that match is a little bit more flexible in that it doesn't switch strictly on value but also on things like the presence of a value in an optional type. It's pattern matching, borrowed from functional programming btw. :)
Brendan wrote:
  • the compiler is able to handle arrays of mixed types (e.g. "for (controller in madt->controllers)")
...
I also don't think that the ability to handle arrays/lists of mixed types is realistic; especially for "foreign" structures where the compiler can't insert hidden fields to make it easier (e.g. turn it into a linked list of "union { struct; struct; struct;}"). It's the sort of feature that sounds easy to do until you attempt to implement a compiler that does it. Perhaps this is why you couldn't find an existing language? ;)
My example does not use heterogeneous arrays, only pointers to "base classes." Did I misunderstand the ACPI spec? If so, and the controllers are in-line, try this:

Code: Select all

acpi_madt: struct = {
   ...
   controllers: byte[header.length / 4 - (sizeof header + sizeof lapic + sizeof flags) / 4]
}

Code: Select all

for (
   controller: acpi_madt_header* = acpi_madt_header*(madt->controllers&);
   controller <= madt->controllers[-1]&; // since array bounds are checked, use negative indices to index from the end- could be other options
   controller = acpi_madt_header*(byte*(controller) + controller->header.length) // yet another place where a modified abi would be nice
) {
   ...
}
Brendan wrote:
  • the compiler is able to make (potentially wrong) assumptions about the sizes of arrays (e.g. "checksum(byte[]*(rsdt))")
How is that potentially wrong? Given a correct RSDT, which is the firmware and/or bootloader's responsibility, the program knows the size of the table by reading rsdt->header.length (since the size of the rsdt is based on it). It's just more implicit than you may be used to.
Brendan wrote:For the code itself, the amount of validation done is lacking. For example, what happens if the MADT is 200 bytes but has a 16-byte structure that starts at offset 198? What happens if the length/size of the RSDT is odd (and there's only space for part of a pointer at the end of the table of pointers)? Does the compiler automatically generate code to check these things; and if it does, how does it report errors? Should the entire thing by wrapped in "try/catch" exception handling?
Considering I probably misunderstood the MADT layout, your question should be addressed at this point. As for how the compiler responds to problems, my preference would be a compile-time error. For example, if inside that new for loop above, controller is cast to something (acpi_madt_header* even) that could permit access outside the bounds of controllers, the programmer has to add a check themselves before dereferencing it. Because that check is necessary anyway if you're willing to expect incorrectly-sized tables, this is just the compiler and language semantics reminding you of potential problems before they come up in hard-to-find bugs at runtime.

However, note that this is a systems programming language and should make allowances for dealing with memory directly- if I were designing a managed language things would be more strict and things like parsing ACPI tables would probably be reduced to reading out of a byte array.

Re: Why are ASM hobby OS more successful than other language

Posted: Thu Dec 15, 2011 4:28 pm
by Brendan
Hi,
Rusky wrote:
Brendan wrote:This code is like C, except that:
  • some keywords were renamed (e.g. "match" instead of "switch")
  • semicolons were replaced with end of line, and a few other trivial changes in syntax
  • the order things appear in declerations/definitions was changed (e.g. "checksum: (contents: byte[len]*, checksum: byte): (bool)" instead of "bool checksum(byte[len] *contents, byte checksum)"
  • you stole "cout" from C++ (and renamed it to "print")
  • there's support for returning multiple values from functions
This was mostly to emphasize the difference from C. By changing the little things it makes it obvious it's not the same, at a glance. What did you expect?
I would've avoided "change for the sake of change", and tried to use a syntax that 50% of forum members would be more likely to understand, to avoid the need for lots of explanations/comments.

Note: support for returning multiple values from functions doesn't belong in the "change for the sake of change" list.
Rusky wrote:However, note that match is a little bit more flexible in that it doesn't switch strictly on value but also on things like the presence of a value in an optional type. It's pattern matching, borrowed from functional programming btw. :)
Ah - now I understand. The underscore character on line 70 ("_ -> /* no rsdt */") is your renamed NULL.

Actually, no. I don't understand the line above it ("real_rsdt: acpi_rsdt* -> parse_acpi_tables(real_rsdt)") which looks like it creates a variable called "read_rsdt" of type "acpi_rsdt*" and assigns the value returned by "parse_acpi_tables(real_rsdt)" to the variable. In C syntax it'd be "acpit_rsdt *read_rsdt = parse_acpi_tables(real_rsdt);", and you'd get a warning about using an uninitialised variable as the argument to "parse_acpi_tables()".
Rusky wrote:
Brendan wrote:
  • the compiler is able to handle arrays of mixed types (e.g. "for (controller in madt->controllers)")
...
I also don't think that the ability to handle arrays/lists of mixed types is realistic; especially for "foreign" structures where the compiler can't insert hidden fields to make it easier (e.g. turn it into a linked list of "union { struct; struct; struct;}"). It's the sort of feature that sounds easy to do until you attempt to implement a compiler that does it. Perhaps this is why you couldn't find an existing language? ;)
My example does not use heterogeneous arrays, only pointers to "base classes." Did I misunderstand the ACPI spec? If so, and the controllers are in-line, try this:

Code: Select all

acpi_madt: struct = {
   ...
   controllers: byte[header.length / 4 - (sizeof header + sizeof lapic + sizeof flags) / 4]
}

Code: Select all

for (
   controller: acpi_madt_header* = acpi_madt_header*(madt->controllers&);
   controller <= madt->controllers[-1]&; // since array bounds are checked, use negative indices to index from the end- could be other options
   controller = acpi_madt_header*(byte*(controller) + controller->header.length) // yet another place where a modified abi would be nice
) {
   ...
}
How does the "controller <= madt->controllers[-1]&;" make sense? Should I translate it into "stop looping if the starting address of the current controller is not lower than or equal to the starting address of the last entry in a heterogeneous array that isn't a heterogeneous array"?

Wouldn't it make more sense to do "stop looping if the address of the "length" field in the controller structure would be past the end of the parent "acpi_madt" structure; and (if the first check passes and you can safely use the length field of the controller structure) also stop looping if the "controller start address plus controller length field" is beyond the size of the parent acpi_madt structure"?
Rusky wrote:
Brendan wrote:
  • the compiler is able to make (potentially wrong) assumptions about the sizes of arrays (e.g. "checksum(byte[]*(rsdt))")
How is that potentially wrong? Given a correct RSDT, which is the firmware and/or bootloader's responsibility, the program knows the size of the table by reading rsdt->header.length (since the size of the rsdt is based on it). It's just more implicit than you may be used to.
Given a correct RSDT? Nobody said it's a correct RSDT.
Rusky wrote:As for how the compiler responds to problems, my preference would be a compile-time error. For example, if inside that new for loop above, controller is cast to something (acpi_madt_header* even) that could permit access outside the bounds of controllers, the programmer has to add a check themselves before dereferencing it. Because that check is necessary anyway if you're willing to expect incorrectly-sized tables, this is just the compiler and language semantics reminding you of potential problems before they come up in hard-to-find bugs at runtime.
I don't think that the ability to do compile time enforcement of explicit run-time checking is realistic either (another "sounds easy to do until you attempt to implement a compiler that does it" feature).


Cheers,

Brendan