Page 1 of 2
Some ASM questions
Posted: Sat Nov 08, 2008 9:03 pm
by Qoppa
Hello all
Some background first... I'm a C++ programmer primarily; I've learned it for about 10 years. Naturally I've picked up C as well and I can code pretty comfortably with it too. I also know some other languages (Scheme, Pascal, Java, a bunch of webdev languages...) but they're pretty irrelevant here. I'm a complete ASM n00b though, and I know little beyond the premise of how it works and the basic syntax. Now, I'm quite interested in low level programming, and I figured what better way to learn about low level programming than to build by own OS! That said, I know remarkably little about what goes on under the hood, so even though I'm interested in this, it's all new to me. A lot of what I'm learning right now is just the vocabulary. So anyway, here I am, ready to venture into the land of low-level programming!
So, onto my questions.
I worked through the barebones tutorial on this site and got it to compile. Seeing the little 'A' in the corner was amazing. But, now that I've got it working, my next goal is to actually learn what the code does. I've got it mostly figured out except for a few things.
1)
Code: Select all
MultiBootHeader:
dd MAGIC
dd FLAGS
dd CHECKSUM
What does dd do when there's no variable specified to store the data in? Where does it put it?
2)
Code: Select all
loader:
mov esp, stack+STACKSIZE ; set up the stack
push eax ; pass Multiboot magic number
push ebx ; pass Multiboot info structure
I'm pretty sure the first command puts the stack pointer to the end of the stack, but I'm not so sure about the next two. What's in eax and ebx? MAGIC and FLAGS it seems, but why? Does calling dd without a variable store the data in the first available general register or something?
Thanks!
Re: Some ASM questions
Posted: Sat Nov 08, 2008 10:34 pm
by stephenj
You should stay in the application space until you become comfortable with assembly.
To learn assembly using your current method, you're going to need to figure out what the syntax does, and then how the bare-metal works. Thereby artificially multiplying your workload (and potentially discouraging you from learning either).
As for your questions, let us pretend that the macros MAGIC, FLAGS and CHECKSUM are 1-3 respectively. So pretend the preprocessor has made a pass to yield:
Which reserves doubles (32 bits or 4 bytes) with the values 1-3 at the memory locations they occur at. Pretend that they start at memory location 0 (this can be achieved by "ORG 0" in Intel syntax). Thus, memory location [0] would be 1, [4] would be 2, and [8] would be 3.
Code: Select all
ORG 0
dd 1
dd 2
dd 3
mov eax, [0*4] ; eax == 1
mov ebx, [1*4] ; ebx == 2
mov ecx, [2*4] ; ecx == 3
Placing labels makes it a little more readable (and allows the compiler to pick the starting point):
Code: Select all
mem_start:
dd 1
dd 2
dd 3
mov eax, [mem_start+0*4]
mov ebx, [mem_start+1*4]
mov ecx, [mem_start+2*4]
And I suggest cleaning it up a bit:
Code: Select all
mem_start: dd 1, 2, 3
mov eax, [mem_start+0]
mov ebx, [mem_start+4]
mov ecx, [mem_start+8]
But the later example is just my preference. I suspect that this should also shed some light unto question 2.
Again though, learning assembly through OS code is like learning English through books on abstract algebra. Learning one is difficult enough, learning two is rare, and learning them both from the same book would be a nightmare!
Re: Some ASM questions
Posted: Sun Nov 09, 2008 2:36 am
by bewing
I'm not sure that stephenj made it quite clear with his explanation (although it was a perfectly good attempt):
When you are writing an ASM program, it might be easiest to think that you are creating a flat binary file, one byte at a time. You start at the beginning of the binary file. The dd "keyword" puts 4 bytes into the file -- a "long" value. Note that the value is stored lowendian. Also note that the assembler does not verify for you that the 4 bytes really are aligned on a 4-byte boundary (which is necessary for them to actually work, usually). This is why you sometimes need to use an "align" keyword.
Each mnemonic for an opcode also stores some "unknowable" number of bytes in the binary file (15 or fewer).
As stephenj says, it makes it nicer for everybody if you put labels in -- they are just names for "offsets" into the binary file. But they let you easily reference points to jump to, the beginnings and endings of arrays, pointers to pre-allocated structures, and pre-allocated data (from dd, db, dw, and dq "data keywords").
So, as he says, the dd statements put some numbers into the beginning of your binary file. The lines that he added load those numbers into eax and ebx. The "ESP" line does indeed set the stack pointer to the top of the allocated stack (where it belongs -- stacks grow DOWN).
PUSH statements take a "default-sized" register or chunk of data from memory, move the stack pointer down by the default size, and store the data at the new stack pointer.
Note: the default size changes with the cpu mode -- real mode, pmode, or long mode. The opposite of a PUSH is a POP.
Re: Some ASM questions
Posted: Sun Nov 09, 2008 3:22 am
by DeletedAccount
Hi ,
type the follwing into a file
Code: Select all
;-----------------------------------------------
; Wonderful Program
;-----------------------------------------------
db 65
db 66
db 67
db 68
Assemble this with nasm or fasm and open it with a text editor , this should clear all you doubts about what is 'db' or 'dd'
Regards
Sandeep
Re: Some ASM questions
Posted: Sun Nov 09, 2008 6:09 am
by CodeCat
ASM has no concept of variables at all. How C-style variables are stored depends on their storage type.
- Local variables and parameters are stored on the stack, and are accessed by adding an offset to the EBP register. [ebp+8] is the first parameter, [ebp+12] the second, [ebp-4] would be the first local variable, etc. [ebp] stores the previous function's EBP after you push it, which you should normally do when entering a function. [ebp+4] stores the return address to the previous function, so if you mess with that your program will go crazy. In any case, within assembly local variables have no name, they can only be accessed relative to EBP. It's your job to remember which offsets refer to which variables (normally the compiler does this).
Code: Select all
push ebp
mov ebp, esp
sub esp, number_of_local_variables_you_need (optional)
...rest of the code...
mov ebp, esp
pop ebp
ret
- Global and static variables are stored directly in the binary, either in the .data section using db, dw, dd (if initialised) or the .bss section using resb, resw, resd (if they default to zero). They don't have a name either, they are just raw bytes within the program's data. A label is used to give them names, but a label does not define a variable. All a label does is assign a name to a given location (address) within the program. Other parts of the program can then refer to that location. What the label refers to is entirely up to you, and it's very easy to use a label in the wrong way.
Code: Select all
one: dw 0
dw 1
format: "%d", 0
...code...
mov eax, dword [one]
push eax
push format
call printf
add esp, 8
The above code would seem to print '0' on the screen, but it actually prints '65536'. The reason is that the 'mov eax, dword [one]' instruction has no notion of how big the space is that the label 'one' refers to. It only sees an address that is represented by the label 'one'. So it just reads four bytes from that location, even though you defined it to be only two bytes. The end result is that it reads the next two bytes as well. The bytes it reads are therefore 00 00 01 00, which is 65536.
Re: Some ASM questions
Posted: Sun Nov 09, 2008 8:25 am
by Love4Boobies
As for why MAGIC and FLAGS were on the stack in the loader, here's your answer. Apps are not what you are probably thinking they are. They're somewhat like normal procedure calls, where each app has main() called by the OS. The first "OS-stage" when booting the computer is the boot loader, which has to start the kernel. The loader you're using is Multiboot compliant (probably GRUB), and THAT's what pushes MAGIC and FLAGS to the stack so your kernel can pop them.
Re: Some ASM questions
Posted: Sun Nov 09, 2008 8:50 am
by Love4Boobies
bewing wrote:(from dd, db, dw, and dq "data keywords")
I think it's "define", not "data"
Re: Some ASM questions
Posted: Sun Nov 09, 2008 10:55 am
by Qoppa
Thanks for all the answers guys.
stephenj wrote:You should stay in the application space until you become comfortable with assembly.
To learn assembly using your current method, you're going to need to figure out what the syntax does, and then how the bare-metal works. Thereby artificially multiplying your workload (and potentially discouraging you from learning either).
It's not so bad. I'm reading a bunch of assembly manuals online (and getting slightly annoyed none of them are for Linux), and I'm not finding it terribly difficult yet. At least not the concept... I haven't really coded anything myself, but it's usually not too hard to follow what the code presented does.
Love4Boobies wrote:As for why MAGIC and FLAGS were on the stack in the loader, here's your answer. Apps are not what you are probably thinking they are. They're somewhat like normal procedure calls, where each app has main() called by the OS. The first "OS-stage" when booting the computer is the boot loader, which has to start the kernel. The loader you're using is Multiboot compliant (probably GRUB), and THAT's what pushes MAGIC and FLAGS to the stack so your kernel can pop them.
Thank you! This was what was confusing me most. It makes sense now.
Re: Some ASM questions
Posted: Sun Nov 09, 2008 12:19 pm
by CodeCat
Love4Boobies wrote:As for why MAGIC and FLAGS were on the stack in the loader, here's your answer. Apps are not what you are probably thinking they are. They're somewhat like normal procedure calls, where each app has main() called by the OS. The first "OS-stage" when booting the computer is the boot loader, which has to start the kernel. The loader you're using is Multiboot compliant (probably GRUB), and THAT's what pushes MAGIC and FLAGS to the stack so your kernel can pop them.
Whoa! Time out.
First of all, main() is not what's called by the OS. There's a small startup file that gets linked into every program, and this then calls main. The startup code is what takes care of things like setting up the runtime library, and initialising global objects for C++ code. When main returns, it's also responsible for calling the destructors of C++ objects, and finally calling the exit system call with main's exit code, which asks the OS to terminate the process. It's a small difference, but a crucial one, because a program which has main as its entry point will NOT work! main is a function like any other, so it expects to have a return address as well. Without startup code, returning from main would crash, or even return to a random memory location.
Secondly, the multiboot header, which includes the MAGIC and FLAGS, is not on the stack at all. The bootloader (GRUB) loads the kernel file from the disc (where exactly is specified in GRUB's configuration) and then looks in the first 8 kB of the file for the magic number. If it finds it, it loads the file according to the specifications following the magic number (or the ELF file header if it is an ELF binary) and then jumps to the specified entry point. From there, your kernel is in charge, but there is no stack at all until your kernel sets one up. To be more specific, until you set esp to a known value, you may not call any C-style functions and you may not activate interrupts (sti).
Since C functions depend on having a stack available, the entry point must be written in assembly. This entry 'stub' is much like the startup code of a regular program, and its job is to set things up so that you can jump into the main kernel function (kmain or something like it, you decide). The stub has to set up a stack at the very least. If you're using a higher-half kernel, then you must also set up the higher-half paging in the startup code before setting up the stack and calling the (k)main function, or all hell will break loose.
When GRUB starts your kernel, it sets EAX to the magic number 0x2BADB002, and EBX to the address of the multiboot information structure. When your kernel starts, you must first check that the EAX register equals 0x2BADB002. This is to make sure that your kernel was actually loaded by multiboot and not some other bootloader. The barebones tutorial just takes these values and pushes them onto the stack, so that they can be accessed as parameters from the kmain function, but does not check their values or do anything with them. Once you start to expand your kernel, you should check both of these values. Checking EAX within the startup code is what I do, and then I pass only EBX to kmain, but passing EAX to kmain as well works, as long as your startup code does not need any of the multiboot information. Your kernel must not use the multiboot information structure or even assume that it was loaded by multiboot at all until you've verified that EAX has the correct value.
Re: Some ASM questions
Posted: Sun Nov 09, 2008 6:15 pm
by Love4Boobies
I didn't want to over-explain anything. It's actually a function called _main() (or something) that does what you said. I was just giving the general picture.
Re: Some ASM questions
Posted: Sun Nov 09, 2008 6:49 pm
by CodeCat
Allright, sorry if you wanted to keep things simple. I always tend to go by 'keep it simple, but not so simple that it's no longer true'. Cause otherwise you end up confusing people.
Re: Some ASM questions
Posted: Mon Nov 10, 2008 10:23 am
by Qoppa
Thanks for the explanation. It's complicated, but it is nice to know exactly what's happening. That's why I'm getting into this, no? Also, I need to get used to the idea that I'm coding everything. When you mentioned that I could actually pick the parameters that get passed to main (hell, even the name of the main function), it was a weird realization. I'm so free... haha. I added the check you suggested and then decided to not pass the magic number to main, because if I get as far as calling main, then I'm guaranteed to know the value of the first paramater since I've already checked it. So it seems kind of redundant.
I've now coded my basic memory functions along with a putch and puts function. Nothing really hard yet... Random question though: if I print a string to the screen with puts, the screen flickers like when you try to do graphics without double buffering. Odd?
Re: Some ASM questions
Posted: Mon Nov 10, 2008 12:02 pm
by Love4Boobies
Qoppa wrote:I added the check you suggested and then decided to not pass the magic number to main, because if I get as far as calling main, then I'm guaranteed to know the value of the first paramater since I've already checked it. So it seems kind of redundant.
I'm not sure if I understood you right. But if I did, what you wanted to say is that you find it redundant that the you check to see for the MAGIC value when you know that it was passed by the loader anyway. The reason for doing this is that you need to know if you kernel was loaded by a Multiboot-compliant boot loader. You may want to use the same boot loader for several OSes or have the same OS booted from several boot loaders. Think of it as a check for compatibility. The boot loader and kernel are two separate entities.
Re: Some ASM questions
Posted: Mon Nov 10, 2008 4:23 pm
by Qoppa
OK, that kind of makes sense. I'm not so sure I actually understand, but I guess I shouldn't expect everything to make sense right of the bat.
Question time though. I was reading through some of the code
here (written by someone on this board... I forget who) and I have a question about it.
Code: Select all
inportb: MOV DX, [ESP+1*4] ; DX = stack(1)
IN byte AL, DX ; AL = result
RET
Why are we specifying AL as a byte? AL's always a byte, no? What's that keyword doing there?
Also, if anyone knows anything about my flickering text problem I mentioned in my previous post, I'm still wondering. Aaaaaand, if anyone has any good books giving an overview of what goes into an OS (like all the parts and how everything interacts), that would cool. With code examples is even better.
Thanks for all the help!
Re: Some ASM questions
Posted: Mon Nov 10, 2008 5:37 pm
by CodeCat
The specification with 'byte' is redundant in that case, yes. But it can always be added for clarity.
And as for your puts function, it would help if you'd post the code for it.