What assembler syntax is often used to create an OS

cytorak87 · Post by **cytorak87** » Thu Feb 11, 2021 12:41 am

Hello everyone. Sorry to create a stupid post, but I would like to ask the professionals. What is the most commonly used assembly syntax to create an OS. And please tell me why they say that AT&T syntax is not for people? Thanks to all who responded!

Octocontrabass · Post by **Octocontrabass** » Wed Feb 17, 2021 7:40 pm

cytorak87 wrote:What is the most commonly used assembly syntax to create an OS.

Most OSes aren't written in assembly.

If you're asking about the most popular syntax for the small bits of assembly used while writing an OS in a higher-level language, I don't know if anyone has done a survey to find out.

I prefer NASM syntax, although it isn't an option for inline assembly. For inline assembly, I usually use AT&T syntax just because it's the default. (I don't write enough inline assembly to bother with switching to Intel syntax.)

cytorak87 wrote:And please tell me why they say that AT&T syntax is not for people?

People sometimes need visual cues to remember how the base, index, and scale are used to calculate effective addresses. People sometimes need to look up instructions in Intel's (or AMD's) manual. AT&T syntax has unhelpful punctuation in effective addresses and changes many of the instruction mnemonics.

nullplan · Post by **nullplan** » Wed Feb 17, 2021 10:54 pm

There are essentially two schools of thought on the matter: Those that use NASM/FASM or similar for their assembler files are OK with using a tool outside of the binutils to build their OS and like Intel syntax more (there are technical reasons for the latter, but it still comes down to preference). Then there are those that use GAS for their assembler files because they don't want to impose a dependency outside of the already necessary binutils on their users, and they don't mind AT&T syntax. That would be my school.

AT&T syntax can be more specific than Intel syntax. For example, you can dereference a register by specifying it in a Mod/RM byte, or by specifying it in a SIB byte. There is almost never a reason for that, but in the TLS spec, it does say that some instructions should be encoded this way, so they will have the correct length so the linker can replace those instructions when it sees an optimization opportunity. In AT&T syntax, there is a difference between "(%rax)" and "(,%rax)", while in Intel syntax both translate to "[rax]".

But outside of these esoteric examples, there is little reason to use AT&T syntax beyond "it's what GAS uses". Therefore my assembler files for x86 are written in AT&T syntax. No other architecture (to my knowledge, anyway) does something like that.

wxwisiasdf · Post by **wxwisiasdf** » Wed Feb 17, 2021 11:32 pm

I have seen that sometimes NASM syntax is used for pure-assembly OSes. I hadn't experienced the luck of seeing a fully-GAS coded OS :P.

Anyways it's just preference, on whetever you think "mov ax, bx" means "move ax to bx" (GAS) or "move bx to ax" (NASM). It's just personal choice.

Octocontrabass · Post by **Octocontrabass** » Mon Mar 01, 2021 11:20 pm

nullplan wrote:In AT&T syntax, there is a difference between "(%rax)" and "(,%rax)", while in Intel syntax both translate to "[rax]".

There is a difference between "[rax]" and "[rax*1]" in Intel/NASM syntax too. GAS always treats these as distinct encodings, while NASM translates both to "[rax]" unless you specify "[nosplit rax*1]".

clementttttttttt · Post by **clementttttttttt** » Mon Mar 01, 2021 11:31 pm

Linux used att syntax, while (I think) Windows used intel syntax. I personally used intel syntax though because you can type a lot less.

sham1 · Post by **sham1** » Tue Mar 02, 2021 4:24 am

It really is just personal preference. But for x86 and AMD64 I personally prefer the Intel syntax (the NASM variation specifically), since it better corresponds to the actual opcode bytes being produced.

It's also the syntax used by the Intel and AMD manuals, which also helps.

But as said by others in this thread, usually most OSes only have some fragments of Assembly and the rest is written in some high-level language like C or whatever.

Korona · Post by **Korona** » Tue Mar 02, 2021 4:50 am

Intel syntax is nicer to write by humans. However, since the amount of assembly in most OSes is so tiny, that does not justify introducing another x86-only dependency. More importantly though, you have to use AT&T in inline asm anyway, so it's nice to have consistent asm code.

Note that GNU AS also supports an Intel-like syntax but you cannot reliably use it in inline asm (try to name a global variable "rax" in C code and see what happens with -masm-syntax=intel).

sj95126 · Post by **sj95126** » Tue Mar 02, 2021 9:32 am

These days I prefer AT&T syntax, even if I can't quite quantify why. I originally learned Intel syntax a long time ago, but decided to switch to AT&T for my OS, and it just felt more natural. The fact that GAS inline uses it, and that other architectures don't support Intel syntax, made it seem like a good idea. I also preferred the uniformity of staying in a single toolchain.

There definitely are some areas where AT&T can be problematic. You can easily mix up the base and index registers, while in Intel syntax it's very obvious what you intend. But from a syntactical point of view, it seems more logical to me that registers are prefixed with %. They aren't variables or memory references, so I think they should be different.

Everyone has their preferences, and there's really no right or wrong. You're the one who has to write it and support it. You want to choose a tool that works with you, not fights you.

nullplan · Post by **nullplan** » Tue Mar 02, 2021 12:37 pm

Octocontrabass wrote:There is a difference between "[rax]" and "[rax*1]" in Intel/NASM syntax too. GAS always treats these as distinct encodings, while NASM translates both to "[rax]" unless you specify "[nosplit rax*1]".

Interesting. This is the first time I've seen that particular keyword (I only knew "strict" before). So I guess you can do everything with NASM you can do with GAS. Well, almost. I did try one thing the other day and could not figure it out. But some background first.

In musl (and some other places), you will often find invocations of a macro called "weak_alias". I wanted to know how exactly it works. It creates a new symbol, aliases it with an existing symbol, and makes it weak. For the C language, it sets the linkage to "external", and the sum of all of this causes the compiler to emit references to the symbol as external calls. The linker will then bind those references to the weak definitions, unless it encountered a strong definition of the same symbol. So, this could, for example, be used in the definition of exit(), to call all the atexit() functions, but only if atexit() is even linked in. If I resolve the macro and the typeof(), it basically looks like this in C:

Code: Select all

static void dummy(void) {}
extern void __funcs_on_exit(void) __attribute__((weak, alias("dummy")));
_Noreturn void exit(int code) {
  __funcs_on_exit();
  _Exit(code);
}

Simplified, of course. There is other stuff in there. Then atexit.c contains a definition of atexit() and __funcs_on_exit(). Therefore in a statically linked program, if atexit() is never called, then the linker will never consider atexit.o, and bind the reference in exit to the local definition. Since the compiler does not create the object file directly, I wanted to see what this looks like in assembler. Essentially this:

Code: Select all

.text
dummy:
  retq
.global __funcs_on_exit
.weak __funcs_on_exit
.set __funcs_on_exit, dummy
.global exit
exit:
  pushq %rdi
  callq __funcs_on_exit
  popq %rdi
  jmpq _Exit

Again, I'm simplifying. I wanted to try to write something like this with NASM, but I didn't get very far. Instead of the ".set", I can just define the weak symbol on the same line, but I could never get the external reference to work.

Code: Select all

global __funcs_on_exit:weak
dummy:
__funcs_on_exit:
    ret


global exit:function
extern _Exit
exit:
    push rdi
    call __funcs_on_exit
    pop rdi
    jmp _Exit

Unfortunately, NASM ends up binding the __funcs_on_exit internally, not emitting a relocation. So this will never work. I never managed to get exactly the same result. If I make a weak reference, it ends up being 0 if not fulfilled. I don't want it to be 0 if not fulfilled, I want it to be "dummy", whatever that ends up being.

Octocontrabass · Post by **Octocontrabass** » Tue Mar 02, 2021 4:16 pm

nullplan wrote:Unfortunately, NASM ends up binding the __funcs_on_exit internally, not emitting a relocation.

Support for weak symbols was added pretty recently, so I'd guess that's a bug in NASM.

cytorak87 · Post by **cytorak87** » Wed Mar 03, 2021 10:15 pm

Thanks everyone for the answers! You all helped me a lot

rdos · Post by **rdos** » Thu Mar 04, 2021 4:37 am

I think there is a third alternative: I use MASM/TASM/WASM syntax. It's a bit like NASM but uses more intuitive ways of making a difference between loading the address of a variable (offset) and the value []. Nasm is problematic since you don't need to define this and then it defaults to reading the address, which breaks a lot of code written with MASM-syntax.

nullplan · Post by **nullplan** » Thu Mar 04, 2021 11:03 am

rdos wrote:I think there is a third alternative: I use MASM/TASM/WASM syntax. It's a bit like NASM but uses more intuitive ways of making a difference between loading the address of a variable (offset) and the value []. Nasm is problematic since you don't need to define this and then it defaults to reading the address, which breaks a lot of code written with MASM-syntax.

This does not compute. Code written for MASM will not work directly in NASM? Yes, and code written in C# will not work directly with Java, what is your point? The two are different. Admittedly small differences, but small differences can add up to a lot. The difference between 40°C and 41°C is also very small, but if those are your body temperatures, you'd know the difference.

rdos · Post by **rdos** » Thu Mar 04, 2021 3:09 pm

nullplan wrote:
rdos wrote:I think there is a third alternative: I use MASM/TASM/WASM syntax. It's a bit like NASM but uses more intuitive ways of making a difference between loading the address of a variable (offset) and the value []. Nasm is problematic since you don't need to define this and then it defaults to reading the address, which breaks a lot of code written with MASM-syntax.
This does not compute. Code written for MASM will not work directly in NASM? Yes, and code written in C# will not work directly with Java, what is your point? The two are different. Admittedly small differences, but small differences can add up to a lot. The difference between 40°C and 41°C is also very small, but if those are your body temperatures, you'd know the difference.

I think the point is that MASM/TASM/WASM syntax has existed far longer than NASM syntax, and so it was the NASM guys that got it wrong and broke stuff.

And the target is the same (the x86 processor), so your comparisions between C# and Java does not compute.

OSDev.org

What assembler syntax is often used to create an OS

What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS

Re: What assembler syntax is often used to create an OS