Re: Why are ASM hobby OS more successful than other language
Posted: Tue Dec 13, 2011 3:59 pm
It's "do", and the feeling is mutual.turdus wrote:You totally does not understand
The Place to Start for Operating System Developers
https://f.osdev.org/
It's "do", and the feeling is mutual.turdus wrote:You totally does not understand
Code: Select all
struct A { int a; };
struct B : public A { int b; };
A *as = new B[10];
Now this post actually adds something tangible.Rusky wrote:First all the problems were unnamed, but then when I named them I didn't add anything?
As I said before, a NullPointerException goes a long way towards handling what was fatal to any C app. Boost::Option exists, std::vector and std::string exist.null pointers - don't include null as a valid value in the default pointer type; replace their use for optional types with an Option type, replace their use as sentinels with array lengths (see below), etc.
std::vector.array bounds checking - specify array size as part of its type, don't let arrays decay into pointers so the type is kept across calls/etc.,
Point to you. I'm not sure how good C++ compiler warnings are in this regard....let compiler check for possible overruns and only force runtime checks when necessary (yes, this is possible).
std::string.null-terminated strings - strings are arrays of chars, so their length gets stored with their type. It can thus be reified in the program without looping over the string, and there are no more security holes by default with strcpy et. al.
-Wall -Wextra -Werror goes a looong way towards this end.uninitialized values/dangling pointers - use typestate to make accessing an uninitialized value, an already-freed pointer, or otherwise non-useful value illegal.
-Wall -Wextra -Werror plus any of std::*_ptr, again don't solve the problem, but take much of the edge off it.undefined behavior - a lot of undefined behavior should instead be illegal- violating sequence points is trivial for the compiler to detect, using undefined values or freed pointers is solvable (see above).
Here, C++ wins over C without moving a finger.implicit type conversions - make far, far less of them so that accidentally losing information or causing undefined behavior is impossible.
ACK. While it is easy to avoid the issue, the fact that the compiler turns a blind eye is bad.co/contravariance - this is easier to avoid in C, but this code should be illegal and is not even given a warning (again, by default)...
Basically, you are free to use whatever macro system you want to use, even in C. You don't have to rely on the C preprocessor.string-based macro system - Lisp has had real macros since the 70s, and nasm has a good macro system even for assembly. Why are we still pasting strings together in C? Why not bits of AST that are immune to e.g. precedence issues that force parenthesizing every argument?
And here we get right down to the point: I admit that C/C++ have a couple of weak spots. Some of them got better over time, some remain firmly rooted in their 197x / 198x heritage.A good language to look at that attempts some of these in a systems programming context is Rust.
Let's have a practical example.Rusky wrote:First all the problems were unnamed, but then when I named them I didn't add anything?
Solutions are pretty widely available, as I said, but here are some possibilities anyway:
Finding the base in the kernel is simple enough. A few lines of assembly is all that is needed. It gets more complicated if paging is enabled, since then there is a need to mess with paging.Brendan wrote:Let's assume that the boot loader is responsible for finding ACPI tables and telling the kernel the address of the RSDT.
We have ACPICA for that.Brendan wrote:During kernel initialisation you want to parse the ACPI tables. Show us the code in the kernel that would replace "int parse_ACPI_tables(void *address_of_RSDT);".
This function should:
- handle the "no ACPI tables present" case (e.g. "if(address_of_ACPI_tables == NULL)")
- check the checksum for the RSDT
- display the "OEMID" string from the RSDT
- for each table pointed to by the array of pointers in the RSDT, it should:
- check the checksum for that table, and:
- if the checksum is OK, display a message that includes the table's signature
- if the checksum failed, display a message that says the checksum failed
- if all tables had valid checksums, then for each table pointed to by the array of pointers in the RSDT, it should:
- if the signature is known, call a function to parse that table based on the table's signature (e.g. call one function if the signature is 'APIC', a different function if the signature is 'FADT', etc). You only need to provide one of these functions for parsing the MADT (table signature 'APIC') - assume the rest are "to be implemented later"
- if the signature is not known, skip the table
Understanding ACPI and building an assembler-based handler seemed a little too complex issue with too little use, so I went into a lot of work to port the clib to kernel-space instead.Brendan wrote:The latest version of the ACPI specification can be found here, but it doesn't matter if you use an earlier version of the specification if you want.
Assume the kernel can be booted from boot loaders designed for PC BIOS and UEFI. For UEFI you ask the firmware for the address of the ACPI tables and don't do a stupid (cache thrashing) search for them.rdos wrote:Finding the base in the kernel is simple enough. A few lines of assembly is all that is needed. It gets more complicated if paging is enabled, since then there is a need to mess with paging.Brendan wrote:Let's assume that the boot loader is responsible for finding ACPI tables and telling the kernel the address of the RSDT.
You missed the entire point of the exercise.rdos wrote:We have ACPICA for that.Brendan wrote:During kernel initialisation you want to parse the ACPI tables. Show us the code in the kernel that would replace "int parse_ACPI_tables(void *address_of_RSDT);".
That would make sense then.Brendan wrote:Assume the kernel can be booted from boot loaders designed for PC BIOS and UEFI. For UEFI you ask the firmware for the address of the ACPI tables and don't do a stupid (cache thrashing) search for them.
Brendan wrote:You missed the entire point of the exercise.
The idea is to get Rusky to show that his idea of a system programming language is actually capable of being used for system programming. Rusky isn't silly though - if I asked him to re-implement ACPICA in a language with strong type checking, array bounds checking, no "void *", no NULL, etc; then he'd probably be smart enough to refuse.
Actually, it's check when the compiler can not prove it's safe. In other words, possible in a subset of array accesses. We demonstrated that fact earlier in another thread...let compiler check for possible overruns and only force runtime checks when necessary (yes, this is possible).
Code: Select all
template <typename T, int len>
class array {
private:
T data[len];
public:
T &operator[](int i) { return data[i]; }
};
Code: Select all
int main() { char c; int a = 0xffff0000; c = a; return c; }
Code: Select all
// this implements a rudementary form of inheritance polymorphism and/or algebraic data types
// depending on your preference, these could be in the language so long as they have a well-defined abi
// this would probably decrease the amount of casting, which would be good
acpi_header: struct = {
signature: char[4]
length: uint32
revision: byte
checksum: byte
oem_id: char[6]
oem_table_id: char[8]
oem_revision: uint32
creator_id: char[4]
creator_revision: char[4]
}
acpi_rsdt: struct = {
header: acpi_header = { signature = "RSDT" }
// this is not the most elegant solution and is probably one thing I would change
tables: acpi_header*[header.length / 4 - sizeof header / 4]
}
acpi_madt: struct = {
header: acpi_header = {
signature = "APIC"
revision = 3
}
lapic: apic_lapic*
flags: uint32
controllers: acpi_madt_header*[header.length / 4 - sizeof header / 4]
}
acpi_madt_header: struct = {
type: byte
length: byte
}
acpi_madt_lapic: struct = {
header: acpi_madt_header = {
type = 0
length = sizeof acpi_madt_ioapic
}
processor_id: byte
apic_id: byte
flags: uint32
}
acpi_madt_ioapic: struct = {
header: acpi_madt_header = {
type = 1
length = sizeof acpi_madt_ioapic
}
ioapic_id: byte
reserved: byte = 0
address: apic_ioapic*
interrupt_start: uint32
}
// the bootloader is responsible for providing an optional pointer to the rsdt
// it must conform to the language's abi, potentially exactly the same as a C acpi_rsdt*
// the memory pointed to, if not null, must be verified or trusted by the bootloader to conform to the type above
kernel_entry: (..., optional_rsdt: acpi_rsdt?, ...) -> () = {
...
// before dereferencing optional_rsdt, it must be null-checked
// if the pointer were not optional, it could be typed "acpi_rsdt*", again with the bootloader responsible for self-verification and abi compatibility
match (optional_rsdt) {
real_rsdt: acpi_rsdt* -> parse_acpi_tables(real_rsdt)
_ -> /* no rsdt */
}
...
}
checksum: (contents: byte[len]*, checksum: byte): (bool) = {
// not sure this is the correct way to calculate the checksum, but it's illustrative enough...
// bytes is a pointer to an array of bytes, inferred to be the size of an acpi_rsdt
// sum is a standard, generic function of type array of addables to that same type
checksum: byte = sum(bytes)
return checksum == 0
}
parse_acpi_tables: (rsdt: acpi_rsdt*) -> (results, errors, etc) = {
assert(checksum(byte[]*(rsdt)))
// straightforward enough, just note that print has varargs of type printable*
print("acpi oemid: "&, oem_id&, "\n"&)
pass: bool = true
for (table in rsdt->tables) { // this could be a macro, or whatever else you want
match (checksum(byte[table->length]*(table))) {
true -> print(table->signature&, " passed\n"&)
false -> {
print(table->signature&, " failed\n"&)
pass = false
}
}
}
// based on your wording I think this is what you meant, but this could be folded into the above loop instead
if (pass) {
for (table in rsdt.tables) {
// matching on strings is possible because they're just first-class array values
match (table->signature) {
"APIC" -> parse_madt(acpi_madt*(table))
...
}
}
}
...
}
parse_madt: (madt: acpi_madt*) -> (ditto) = {
...
for (controller in madt->controllers) {
// this is a place where algebraic data types/dynamic_cast-equivalent would help
// instead of matching on the type using symbols, it would match based on type, like:
// lapic: acpi_madt_lapic* -> ...
// ioapic: acpi_madt_ioapic* -> ...
match (controller->header.type) {
// these should be symbolic constants.
0 -> print("lapic "&, acpi_madt_lapic*(controller)->apic_id&, "\n"&)
1 -> print("ioapic "&, acpi_madt_ioapic*(controller)->ioapic_id&, "\n"&)
...
}
}
...
}
It's just parsing data. You'd have similar problems parsing most (non-text) file formats, except that "pointer to something" would be "offset of something within file" (and you'd still need type casts, just they'd be like "my_type *foo = (my_type *)&byte_array[offset]" instead).Rusky wrote:Brendan's challenge is somewhat biased because ACPI is designed for C, and its ABI would likely be different if another language had been dominant when it was written (yes, this is an advantage of C, but we all agree on that point).
This code is like C, except that:Rusky wrote:I also don't have an existing language to use so the syntax will probably be pretty rough and not everything will be implemented the best way possible, but I'll give it a shot:
Yes, but which data? You can specify sizes and such in a variety of different ways, and using e.g. a length in table entries rather than a byte length or null terminator makes it nicer for this imaginary language.Brendan wrote:It's just parsing data.
This was mostly to emphasize the difference from C. By changing the little things it makes it obvious it's not the same, at a glance. What did you expect?Brendan wrote:This code is like C, except that:
- some keywords were renamed (e.g. "match" instead of "switch")
- semicolons were replaced with end of line, and a few other trivial changes in syntax
- the order things appear in declerations/definitions was changed (e.g. "checksum: (contents: byte[len]*, checksum: byte): (bool)" instead of "bool checksum(byte[len] *contents, byte checksum)"
- you stole "cout" from C++ (and renamed it to "print")
- there's support for returning multiple values from functions
My example does not use heterogeneous arrays, only pointers to "base classes." Did I misunderstand the ACPI spec? If so, and the controllers are in-line, try this:Brendan wrote:...
- the compiler is able to handle arrays of mixed types (e.g. "for (controller in madt->controllers)")
I also don't think that the ability to handle arrays/lists of mixed types is realistic; especially for "foreign" structures where the compiler can't insert hidden fields to make it easier (e.g. turn it into a linked list of "union { struct; struct; struct;}"). It's the sort of feature that sounds easy to do until you attempt to implement a compiler that does it. Perhaps this is why you couldn't find an existing language?
Code: Select all
acpi_madt: struct = {
...
controllers: byte[header.length / 4 - (sizeof header + sizeof lapic + sizeof flags) / 4]
}
Code: Select all
for (
controller: acpi_madt_header* = acpi_madt_header*(madt->controllers&);
controller <= madt->controllers[-1]&; // since array bounds are checked, use negative indices to index from the end- could be other options
controller = acpi_madt_header*(byte*(controller) + controller->header.length) // yet another place where a modified abi would be nice
) {
...
}
How is that potentially wrong? Given a correct RSDT, which is the firmware and/or bootloader's responsibility, the program knows the size of the table by reading rsdt->header.length (since the size of the rsdt is based on it). It's just more implicit than you may be used to.Brendan wrote:
- the compiler is able to make (potentially wrong) assumptions about the sizes of arrays (e.g. "checksum(byte[]*(rsdt))")
Considering I probably misunderstood the MADT layout, your question should be addressed at this point. As for how the compiler responds to problems, my preference would be a compile-time error. For example, if inside that new for loop above, controller is cast to something (acpi_madt_header* even) that could permit access outside the bounds of controllers, the programmer has to add a check themselves before dereferencing it. Because that check is necessary anyway if you're willing to expect incorrectly-sized tables, this is just the compiler and language semantics reminding you of potential problems before they come up in hard-to-find bugs at runtime.Brendan wrote:For the code itself, the amount of validation done is lacking. For example, what happens if the MADT is 200 bytes but has a 16-byte structure that starts at offset 198? What happens if the length/size of the RSDT is odd (and there's only space for part of a pointer at the end of the table of pointers)? Does the compiler automatically generate code to check these things; and if it does, how does it report errors? Should the entire thing by wrapped in "try/catch" exception handling?
I would've avoided "change for the sake of change", and tried to use a syntax that 50% of forum members would be more likely to understand, to avoid the need for lots of explanations/comments.Rusky wrote:This was mostly to emphasize the difference from C. By changing the little things it makes it obvious it's not the same, at a glance. What did you expect?Brendan wrote:This code is like C, except that:
- some keywords were renamed (e.g. "match" instead of "switch")
- semicolons were replaced with end of line, and a few other trivial changes in syntax
- the order things appear in declerations/definitions was changed (e.g. "checksum: (contents: byte[len]*, checksum: byte): (bool)" instead of "bool checksum(byte[len] *contents, byte checksum)"
- you stole "cout" from C++ (and renamed it to "print")
- there's support for returning multiple values from functions
Ah - now I understand. The underscore character on line 70 ("_ -> /* no rsdt */") is your renamed NULL.Rusky wrote:However, note that match is a little bit more flexible in that it doesn't switch strictly on value but also on things like the presence of a value in an optional type. It's pattern matching, borrowed from functional programming btw.
How does the "controller <= madt->controllers[-1]&;" make sense? Should I translate it into "stop looping if the starting address of the current controller is not lower than or equal to the starting address of the last entry in a heterogeneous array that isn't a heterogeneous array"?Rusky wrote:My example does not use heterogeneous arrays, only pointers to "base classes." Did I misunderstand the ACPI spec? If so, and the controllers are in-line, try this:Brendan wrote:...
- the compiler is able to handle arrays of mixed types (e.g. "for (controller in madt->controllers)")
I also don't think that the ability to handle arrays/lists of mixed types is realistic; especially for "foreign" structures where the compiler can't insert hidden fields to make it easier (e.g. turn it into a linked list of "union { struct; struct; struct;}"). It's the sort of feature that sounds easy to do until you attempt to implement a compiler that does it. Perhaps this is why you couldn't find an existing language?Code: Select all
acpi_madt: struct = { ... controllers: byte[header.length / 4 - (sizeof header + sizeof lapic + sizeof flags) / 4] }
Code: Select all
for ( controller: acpi_madt_header* = acpi_madt_header*(madt->controllers&); controller <= madt->controllers[-1]&; // since array bounds are checked, use negative indices to index from the end- could be other options controller = acpi_madt_header*(byte*(controller) + controller->header.length) // yet another place where a modified abi would be nice ) { ... }
Given a correct RSDT? Nobody said it's a correct RSDT.Rusky wrote:How is that potentially wrong? Given a correct RSDT, which is the firmware and/or bootloader's responsibility, the program knows the size of the table by reading rsdt->header.length (since the size of the rsdt is based on it). It's just more implicit than you may be used to.Brendan wrote:
- the compiler is able to make (potentially wrong) assumptions about the sizes of arrays (e.g. "checksum(byte[]*(rsdt))")
I don't think that the ability to do compile time enforcement of explicit run-time checking is realistic either (another "sounds easy to do until you attempt to implement a compiler that does it" feature).Rusky wrote:As for how the compiler responds to problems, my preference would be a compile-time error. For example, if inside that new for loop above, controller is cast to something (acpi_madt_header* even) that could permit access outside the bounds of controllers, the programmer has to add a check themselves before dereferencing it. Because that check is necessary anyway if you're willing to expect incorrectly-sized tables, this is just the compiler and language semantics reminding you of potential problems before they come up in hard-to-find bugs at runtime.