Page 1 of 1
What assembler syntax is often used to create an OS
Posted: Thu Feb 11, 2021 12:41 am
by cytorak87
Hello everyone. Sorry to create a stupid post, but I would like to ask the professionals. What is the most commonly used assembly syntax to create an OS. And please tell me why they say that AT&T syntax is not for people? Thanks to all who responded!
Re: What assembler syntax is often used to create an OS
Posted: Wed Feb 17, 2021 7:40 pm
by Octocontrabass
cytorak87 wrote:What is the most commonly used assembly syntax to create an OS.
Most OSes aren't written in assembly.
If you're asking about the most popular syntax for the small bits of assembly used while writing an OS in a higher-level language, I don't know if anyone has done a survey to find out.
I prefer NASM syntax, although it isn't an option for inline assembly. For inline assembly, I usually use AT&T syntax just because it's the default. (I don't write enough inline assembly to bother with switching to Intel syntax.)
cytorak87 wrote:And please tell me why they say that AT&T syntax is not for people?
People sometimes need visual cues to remember how the base, index, and scale are used to calculate effective addresses. People sometimes need to look up instructions in Intel's (or AMD's) manual. AT&T syntax has unhelpful punctuation in effective addresses and changes many of the instruction mnemonics.
Re: What assembler syntax is often used to create an OS
Posted: Wed Feb 17, 2021 10:54 pm
by nullplan
There are essentially two schools of thought on the matter: Those that use NASM/FASM or similar for their assembler files are OK with using a tool outside of the binutils to build their OS and like Intel syntax more (there are technical reasons for the latter, but it still comes down to preference). Then there are those that use GAS for their assembler files because they don't want to impose a dependency outside of the already necessary binutils on their users, and they don't mind AT&T syntax. That would be my school.
AT&T syntax can be more specific than Intel syntax. For example, you can dereference a register by specifying it in a Mod/RM byte, or by specifying it in a SIB byte. There is almost never a reason for that, but in the TLS spec, it does say that some instructions should be encoded this way, so they will have the correct length so the linker can replace those instructions when it sees an optimization opportunity. In AT&T syntax, there is a difference between "(%rax)" and "(,%rax)", while in Intel syntax both translate to "[rax]".
But outside of these esoteric examples, there is little reason to use AT&T syntax beyond "it's what GAS uses". Therefore my assembler files for x86 are written in AT&T syntax. No other architecture (to my knowledge, anyway) does something like that.
Re: What assembler syntax is often used to create an OS
Posted: Wed Feb 17, 2021 11:32 pm
by wxwisiasdf
I have seen that sometimes NASM syntax is used for pure-assembly OSes. I hadn't experienced the luck of seeing a fully-GAS coded OS :P.
Anyways it's just preference, on whetever you think "mov ax, bx" means "move ax to bx" (GAS) or "move bx to ax" (NASM). It's just personal choice.
Re: What assembler syntax is often used to create an OS
Posted: Mon Mar 01, 2021 11:20 pm
by Octocontrabass
nullplan wrote:In AT&T syntax, there is a difference between "(%rax)" and "(,%rax)", while in Intel syntax both translate to "[rax]".
There is a difference between "[rax]" and "[rax*1]" in Intel/NASM syntax too. GAS always treats these as distinct encodings, while NASM translates both to "[rax]" unless you specify "[nosplit rax*1]".
Re: What assembler syntax is often used to create an OS
Posted: Mon Mar 01, 2021 11:31 pm
by clementttttttttt
Linux used att syntax, while (I think) Windows used intel syntax. I personally used intel syntax though because you can type a lot less.
Re: What assembler syntax is often used to create an OS
Posted: Tue Mar 02, 2021 4:24 am
by sham1
It really is just personal preference. But for x86 and AMD64 I personally prefer the Intel syntax (the NASM variation specifically), since it better corresponds to the actual opcode bytes being produced.
It's also the syntax used by the Intel and AMD manuals, which also helps.
But as said by others in this thread, usually most OSes only have some fragments of Assembly and the rest is written in some high-level language like C or whatever.
Re: What assembler syntax is often used to create an OS
Posted: Tue Mar 02, 2021 4:50 am
by Korona
Intel syntax is nicer to write by humans. However, since the amount of assembly in most OSes is so tiny, that does not justify introducing another x86-only dependency. More importantly though, you have to use AT&T in inline asm anyway, so it's nice to have consistent asm code.
Note that GNU AS also supports an Intel-like syntax but you cannot reliably use it in inline asm (try to name a global variable "rax" in C code and see what happens with -masm-syntax=intel).
Re: What assembler syntax is often used to create an OS
Posted: Tue Mar 02, 2021 9:32 am
by sj95126
These days I prefer AT&T syntax, even if I can't quite quantify why. I originally learned Intel syntax a long time ago, but decided to switch to AT&T for my OS, and it just felt more natural. The fact that GAS inline uses it, and that other architectures don't support Intel syntax, made it seem like a good idea. I also preferred the uniformity of staying in a single toolchain.
There definitely are some areas where AT&T can be problematic. You can easily mix up the base and index registers, while in Intel syntax it's very obvious what you intend. But from a syntactical point of view, it seems more logical to me that registers are prefixed with %. They aren't variables or memory references, so I think they should be different.
Everyone has their preferences, and there's really no right or wrong. You're the one who has to write it and support it. You want to choose a tool that works with you, not fights you.
Re: What assembler syntax is often used to create an OS
Posted: Tue Mar 02, 2021 12:37 pm
by nullplan
Octocontrabass wrote:There is a difference between "[rax]" and "[rax*1]" in Intel/NASM syntax too. GAS always treats these as distinct encodings, while NASM translates both to "[rax]" unless you specify "[nosplit rax*1]".
Interesting. This is the first time I've seen that particular keyword (I only knew "strict" before). So I guess you can do everything with NASM you can do with GAS. Well, almost. I did try one thing the other day and could not figure it out. But some background first.
In musl (and some other places), you will often find invocations of a macro called "weak_alias". I wanted to know how exactly it works. It creates a new symbol, aliases it with an existing symbol, and makes it weak. For the C language, it sets the linkage to "external", and the sum of all of this causes the compiler to emit references to the symbol as external calls. The linker will then bind those references to the weak definitions, unless it encountered a strong definition of the same symbol. So, this could, for example, be used in the definition of exit(), to call all the atexit() functions, but only if atexit() is even linked in. If I resolve the macro and the typeof(), it basically looks like this in C:
Code: Select all
static void dummy(void) {}
extern void __funcs_on_exit(void) __attribute__((weak, alias("dummy")));
_Noreturn void exit(int code) {
__funcs_on_exit();
_Exit(code);
}
Simplified, of course. There is other stuff in there. Then atexit.c contains a definition of atexit() and __funcs_on_exit(). Therefore in a statically linked program, if atexit() is never called, then the linker will never consider atexit.o, and bind the reference in exit to the local definition. Since the compiler does not create the object file directly, I wanted to see what this looks like in assembler. Essentially this:
Code: Select all
.text
dummy:
retq
.global __funcs_on_exit
.weak __funcs_on_exit
.set __funcs_on_exit, dummy
.global exit
exit:
pushq %rdi
callq __funcs_on_exit
popq %rdi
jmpq _Exit
Again, I'm simplifying. I wanted to try to write something like this with NASM, but I didn't get very far. Instead of the ".set", I can just define the weak symbol on the same line, but I could never get the external reference to work.
Code: Select all
global __funcs_on_exit:weak
dummy:
__funcs_on_exit:
ret
global exit:function
extern _Exit
exit:
push rdi
call __funcs_on_exit
pop rdi
jmp _Exit
Unfortunately, NASM ends up binding the __funcs_on_exit internally, not emitting a relocation. So this will never work. I never managed to get exactly the same result. If I make a weak reference, it ends up being 0 if not fulfilled. I don't want it to be 0 if not fulfilled, I want it to be "dummy", whatever that ends up being.
Re: What assembler syntax is often used to create an OS
Posted: Tue Mar 02, 2021 4:16 pm
by Octocontrabass
nullplan wrote:Unfortunately, NASM ends up binding the __funcs_on_exit internally, not emitting a relocation.
Support for weak symbols was added pretty recently, so I'd guess that's a bug in NASM.
Re: What assembler syntax is often used to create an OS
Posted: Wed Mar 03, 2021 10:15 pm
by cytorak87
Thanks everyone for the answers! You all helped me a lot
Re: What assembler syntax is often used to create an OS
Posted: Thu Mar 04, 2021 4:37 am
by rdos
I think there is a third alternative: I use MASM/TASM/WASM syntax. It's a bit like NASM but uses more intuitive ways of making a difference between loading the address of a variable (offset) and the value []. Nasm is problematic since you don't need to define this and then it defaults to reading the address, which breaks a lot of code written with MASM-syntax.
Re: What assembler syntax is often used to create an OS
Posted: Thu Mar 04, 2021 11:03 am
by nullplan
rdos wrote:I think there is a third alternative: I use MASM/TASM/WASM syntax. It's a bit like NASM but uses more intuitive ways of making a difference between loading the address of a variable (offset) and the value []. Nasm is problematic since you don't need to define this and then it defaults to reading the address, which breaks a lot of code written with MASM-syntax.
This does not compute. Code written for MASM will not work directly in NASM? Yes, and code written in C# will not work directly with Java, what is your point? The two are different. Admittedly small differences, but small differences can add up to a lot. The difference between 40°C and 41°C is also very small, but if those are your body temperatures, you'd know the difference.
Re: What assembler syntax is often used to create an OS
Posted: Thu Mar 04, 2021 3:09 pm
by rdos
nullplan wrote:rdos wrote:I think there is a third alternative: I use MASM/TASM/WASM syntax. It's a bit like NASM but uses more intuitive ways of making a difference between loading the address of a variable (offset) and the value []. Nasm is problematic since you don't need to define this and then it defaults to reading the address, which breaks a lot of code written with MASM-syntax.
This does not compute. Code written for MASM will not work directly in NASM? Yes, and code written in C# will not work directly with Java, what is your point? The two are different. Admittedly small differences, but small differences can add up to a lot. The difference between 40°C and 41°C is also very small, but if those are your body temperatures, you'd know the difference.
I think the point is that MASM/TASM/WASM syntax has existed far longer than NASM syntax, and so it was the NASM guys that got it wrong and broke stuff.
And the target is the same (the x86 processor), so your comparisions between C# and Java does not compute.