Writing bootloader with includes... [Solved. Mostly...]

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
TheGameMaker90
Member
Member
Posts: 83
Joined: Thu Jan 07, 2021 2:01 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by TheGameMaker90 »

Octocontrabass wrote:
TheGameMaker90 wrote:That's how I have it in my boot.S. If I'm understanding you correctly, this will not work?
Correct. You need to get rid of "mov %ax, %cs" and use some other instruction (such as LJMP) to set CS.
TheGameMaker90 wrote:Decent stack:
Would it be like the one in 32-bit mode like this:
That would work, but it's probably more than you need, and it would make your binary bigger. Personally, I'd set ESP to 0x7C00. It's very important that you set all of ESP! Code generated by GCC will use 32-bit registers, so the upper bits of ESP must be zero.
TheGameMaker90 wrote:Direction flag:
No idea what that is, lol.
So... you're going to find out, right?
neon wrote:boot.c appears to be compiled as 32 bit as opposed to 16 bit code:
You're disassembling it wrong. It's 16-bit code, but it uses mostly 32-bit registers.
Okay so not too long ago, I discovered a repo that contains every version of Linux ever made (including the new 5.13-rc3) which isn't on github. Looking at early versions of Linux, I saw that it uses this ljmp (in a way) that you speak of.
To put it literally, here's the code of the first few lines:

Code: Select all

BOOTSEG = 0x07C0
INITSEG = 0x9000

entry start
start:
    mov ax, #BOOTSEG
    mov ds, ax
    mov ax, #INITSEG
    mov es, ax
    ...
It doesn't use a ljmp here, but it jumps to a label called go to INITSEG.
Anyway, to make a long story short, I did some research on this and came upon a resource (can't remember where it is) that said something like this. The value stored in cs is ????.

Code: Select all

# So I guess what I'm asking is this:
.globl _start
ljmp $0x07C0, $_start
# What is the value stored in cs now? And how do I use it? (bare with me it's 1:44AM where I am...)
I also read somewhere that ds = ax * 16 or something to that effect which changes 0x07C0 to 0x7C00, the value we need. Is any of this useful information? Am I on the right track at least?
TheGameMaker90
Member
Member
Posts: 83
Joined: Thu Jan 07, 2021 2:01 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by TheGameMaker90 »

Octocontrabass wrote:
TheGameMaker90 wrote:That's how I have it in my boot.S. If I'm understanding you correctly, this will not work?
Correct. You need to get rid of "mov %ax, %cs" and use some other instruction (such as LJMP) to set CS.
TheGameMaker90 wrote:Decent stack:
Would it be like the one in 32-bit mode like this:
That would work, but it's probably more than you need, and it would make your binary bigger. Personally, I'd set ESP to 0x7C00. It's very important that you set all of ESP! Code generated by GCC will use 32-bit registers, so the upper bits of ESP must be zero.
TheGameMaker90 wrote:Direction flag:
No idea what that is, lol.
So... you're going to find out, right?
neon wrote:boot.c appears to be compiled as 32 bit as opposed to 16 bit code:
You're disassembling it wrong. It's 16-bit code, but it uses mostly 32-bit registers.
I just caught onto something. Are you trying to say that I should handle it like this:

Code: Select all

_start:
    mov %cs, %ax
    mov %ax, %ds
    mov %ax, %es
    mov %ax, %ss
    mov $0x7C00, %esp
Or something to that effect?

And yes I'm looking into the direction flag now.

By the way, whenever I use jmp or ljmp to start, it says it can't find bootable device. I assume this is because of the 0x7C00 in the linker command. It already starts from there. Does the linker store 0x7C00 (or 0x07C0) in cs when the file is linked? That hardly sounds like it makes any sense, but I figured I'd ask. The linker should have nothing to do with compiled source register and segment states right?
Octocontrabass
Member
Member
Posts: 5568
Joined: Mon Mar 25, 2013 7:01 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by Octocontrabass »

TheGameMaker90 wrote:Anyway, to make a long story short, I did some research on this and came upon a resource (can't remember where it is) that said something like this. The value stored in cs is ????.
0x7C0, but you don't want that, you want 0.
TheGameMaker90 wrote:I also read somewhere that ds = ax * 16 or something to that effect which changes 0x07C0 to 0x7C00, the value we need. Is any of this useful information? Am I on the right track at least?
Understanding how segmentation works in real mode is useful information, but GCC doesn't really work with segmentation, so you want to set it up so that segmentation can be ignored. The easiest way to do that in real mode is to set CS, DS, ES, and SS to 0.
TheGameMaker90 wrote:Are you trying to say that I should handle it like this:
No. You don't know what value is in CS when the BIOS jumps to your code, so if you put that value into your other segment registers, then you don't know what value is in any of them either. (But your code to set ESP is correct.)
TheGameMaker90 wrote:By the way, whenever I use jmp or ljmp to start, it says it can't find bootable device. I assume this is because of the 0x7C00 in the linker command. It already starts from there. Does the linker store 0x7C00 (or 0x07C0) in cs when the file is linked?
The linker replaces labels with addresses. The 0x7C00 in the linker command is you telling the linker that you'll load the code at that address, and it's up to you to make sure your code will actually be loaded at that address. Just like GCC, LD knows almost nothing about segmentation, so this works only if you set the segment registers to 0.

You added another label to be the destination for the LJMP instruction, right?
TheGameMaker90 wrote:The linker should have nothing to do with compiled source register and segment states right?
The linker replaces labels with addresses. If you use a label to set the value of a register, the linker chooses what value it will be. You typically don't want LD to choose the values to put into your segment registers, since LD doesn't do segmentation.
TheGameMaker90
Member
Member
Posts: 83
Joined: Thu Jan 07, 2021 2:01 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by TheGameMaker90 »

Octocontrabass wrote:
TheGameMaker90 wrote:Anyway, to make a long story short, I did some research on this and came upon a resource (can't remember where it is) that said something like this. The value stored in cs is ????.
0x7C0, but you don't want that, you want 0.
TheGameMaker90 wrote:I also read somewhere that ds = ax * 16 or something to that effect which changes 0x07C0 to 0x7C00, the value we need. Is any of this useful information? Am I on the right track at least?
Understanding how segmentation works in real mode is useful information, but GCC doesn't really work with segmentation, so you want to set it up so that segmentation can be ignored. The easiest way to do that in real mode is to set CS, DS, ES, and SS to 0.
TheGameMaker90 wrote:Are you trying to say that I should handle it like this:
No. You don't know what value is in CS when the BIOS jumps to your code, so if you put that value into your other segment registers, then you don't know what value is in any of them either. (But your code to set ESP is correct.)
TheGameMaker90 wrote:By the way, whenever I use jmp or ljmp to start, it says it can't find bootable device. I assume this is because of the 0x7C00 in the linker command. It already starts from there. Does the linker store 0x7C00 (or 0x07C0) in cs when the file is linked?
The linker replaces labels with addresses. The 0x7C00 in the linker command is you telling the linker that you'll load the code at that address, and it's up to you to make sure your code will actually be loaded at that address. Just like GCC, LD knows almost nothing about segmentation, so this works only if you set the segment registers to 0.

You added another label to be the destination for the LJMP instruction, right?
TheGameMaker90 wrote:The linker should have nothing to do with compiled source register and segment states right?
The linker replaces labels with addresses. If you use a label to set the value of a register, the linker chooses what value it will be. You typically don't want LD to choose the values to put into your segment registers, since LD doesn't do segmentation.
I'll make another push to the repo if need be, but this is literally what I have:

Code: Select all

#include "boot.h"

.code16
.text

jmp $0x7C00, $_start
.globl _start
_start:
    mov %cs, %ax
    mov %as, %ds
    mov %ax, %ds
    mov %ax, %ss
    mov $0x7C00, %esp

    call initialize_terminal
    ...
I've even tried to change jmp to ljmp, $0x7C00 to $0x07C0 and moved the jmp all over the place. I even tried just having jmp _start, and ljmp _start and having nothing under _start. It all gives the same results.

Edit 2:
kay I think I see what you're saying now. On my prior research (I have this one bookmarked), I found this page. After what you said and happening onto this page again I think I see what's going on here. Link for reference: https://appusajeev.files.wordpress.com/ ... loader.png
This is nasm assembly, but I just need to figure out how to "convert it."

org 0, sets CS to zero, correct? Then it long jumps to start @0x07C0, moves cs into ax so it can be used with ds. Like I said before. Then ds=0x7C00. What of es in this case though?
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by kzinti »

(deleted, I was way off tracks)
Last edited by kzinti on Mon May 24, 2021 3:09 pm, edited 1 time in total.
TheGameMaker90
Member
Member
Posts: 83
Joined: Thu Jan 07, 2021 2:01 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by TheGameMaker90 »

kzinti wrote:
TheGameMaker90 wrote: org 0, sets CS to zero, correct?
No, org 0 is an assembler directive and it doesn't generate any code. What it does is tell the assembler that CS will be 0 when executing the instructions you write after it.

"org" is short for "origin" and is used to tells the assembler at which address your intend to load/execute your code.

0x7C00 can be represented in many ways using 16 bits seg:offset addresses: 0x00:0x7C00, 0x7C0:0x0000, etc.


Here is what I would recommend you do (this is the easiest one to understand and likely what you want):

Code: Select all

    org 0              # Tell your assembler that CS = 0, but no instructions issued
    ljmp $0, _start    # Set CS to 0 and IP to the address of _start (really the offset relative to the origin of 0, but that's the same thing as the address)
_start:
    # Here you want to set DS = SS = 0
Or alternatively (but I don't recommend it):

Code: Select all

    org 0x7C00             # Tell your assembler that CS = 0x7C0, but no instructions issued
    ljmp $0x7C0, _start    # Set CS to 0x7C0 and IP to the address of _start minus 0x7C00 (i.e. the offset of _start from the origin you set to 0x7C00)
_start:
    # Here you want to set DS = SS = 0x7C0
TheGameMaker90 wrote: Like I said before. Then ds=0x7C00. What of es in this case though?
Clearly you don't understand seg:offset addressing in 16 bits mode. I suggest you read on it before continuing to try to hack something together. The short version is that the effective address of a seg::offset address is "segment * 16 + offset". If you set ds to 0x7C00, you will end up accessing memory in the 0x7C000+ range which is clearly not going to work. If you don't see es, then you don't know what the effective addresses will be when you access something using the es segment.

Pay attention to what people write here: 0x7C0 and 0x7C00 are not typos. They are different numbers. When you multiply 0x7C0 by 16, you get 0x7C00.
Here we go again. I just freaking said that. I switched over to NASM assembly for a moment and I know that about CS. NASM assembly is slightly different than GAS assembly and I know wehat org stands for. I spent a little time learning NASM assembly before I even got started with OS dev, granted not much, but I know the basics.

Although I wasn't sure you could use org like that in GAS assembly. I read somewhere that they are different, despite being named similarly. I will however try your suggestion because my way doesn't seem to yield any results. But doesn't _start require a '$' as well? Maybe not. I'll try it momentarily.

And I never said they were. If you read one of my previous posts you'd see that I just finished saying that DS=AX * 16 so if ax = 0x07C0, then DS = 0x7C00. I know that. I also realized that regardless of how it's written if CS=0, then DS and ES would both be 0 because anything X 0 = 0. Not stupid here.

Edit:
Okay, I tried your idea and got this:
boot.S:8: Error: no such instruction: `org 0'
boot.S:9: Error: operand type mismatch for `ljmp'

Like I said. org is different in GAS and I'm pretty sure I need a '$' in front of _start. Also it still fails to boot when I do that and remove org. I changed it to .org 0 and ljmp $0, $_start and it says:
Booting from hard disk...
Boot failed: Could not read the boot disk

Booting from DVD/CD...
Boot Failed: Could not read from CDROM (code 0003)
Booting from ROM...
iPXE (PCI 00:03.0) starting execution...ok
...

Also I've tried to find resources on how es relates to ax and all of those (<- lazy explanation here), but to no avail.

This is using QEMU.
You should know the rest. It doesn't boot when I have any kind of jump in GAS.

Again, I'll post my github repo:
https://github.com/gamemaker90/Custom-Bootloader.git

Okay, here's a pointer. Is it the linker in my Makefile?

I just noticed something! When I do hexdump boot.bin, I noticed that the magic number is in the wrong spot. It's supposed to be @0x1FE which is hex for 510. How do I fix this?
I think it's because of the offset from the ljmp personally. I could be wrong.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by kzinti »

TheGameMaker90 wrote:If you read one of my previous posts you'd see that I just finished saying that DS=AX * 16 so if ax = 0x07C0, then DS = 0x7C00. I know that
And that's completely wrong. If you set AX to 0x7C0 and "mov" AX to DS, then DS is 0x7C0. DS doesn't get magically multiplied by 16 (and it shouldn't be, nor do you want it to).
TheGameMaker90 wrote: Also I've tried to find resources on how es relates to ax and all of those (<- lazy explanation here), but to no avail.
I am not even sure what you mean here by "how ES relates to AX". They are different registers and you can use them with x86 instructions. There is no inherent relation between them, so you won't find much if anything.
TheGameMaker90 wrote: This is using QEMU.
You should know the rest.
I do know the rest. But we aren't here to write your code for you. There is plenty of information here on the forum and the wiki and google. If you are stuck on a bit or needs clarification on something, people here will be able to help. But you have to help yourself first and so far it sounds like you aren't putting the efforts. Or maybe you are, in which case you might want to reconsider your goals here.

Using google, I was able to determine in less than 30 seconds that the syntax for ORG on GCC is ",org". You also need to indicate to GAS that you want 16 bits instructions to be generated. Here is something that does compile:

Code: Select all

.section .text
.code16
.org 0x7C00

    ljmp $0, $_start
_start:
    # Here CS is 0
You are literally stuck on the very first instruction of your bootloader and have been for days. You might want to take a step back and learn about how a C toolchain works, how programs are linked, how the assembler works, how the processor works and so on. You seem to have too many gapes in your knowledge to get started with a bootloader. I would suggest you start with an existing one and write some C code that displays some text (and ignore the bootloader for now, you can come back to it later if you want to).

Or you can just tell me how I don't understand you and that you are not stupid. Either way, I don't have unlimited patience.
Last edited by kzinti on Mon May 24, 2021 3:10 pm, edited 3 times in total.
Octocontrabass
Member
Member
Posts: 5568
Joined: Mon Mar 25, 2013 7:01 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by Octocontrabass »

TheGameMaker90 wrote:I'll make another push to the repo if need be, but this is literally what I have:
Okay, there are two problems here. The first is that the LJMP instruction will set CS to 0x7C00, but you need CS to be 0. The second is that you put the LJMP instruction before the _start label, but it should be after _start. Add another label to function as the destination. Since you place the boot signature 510 bytes after the _start label, putting anything before _start will move the boot signature.
kzinti wrote:No, org 0 is an assembler directive and it doesn't generate any code. What it does is tell the assembler that CS will be 0 when executing the instructions you write after it.
No, org 0 tells the assembler that your code origin is (starts at) offset 0, which only works when you set the segment registers to 0x7C0. You want your segment registers to be 0, which requires the origin to be 0x7c00.

You don't need to specify the origin anywhere in your assembly. Use the linker for that.
User avatar
neon
Member
Member
Posts: 1567
Joined: Sun Feb 18, 2007 7:28 pm
Contact:

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by neon »

Hi,

Just a couple of corrections: (1) Octocontrabass is correct, when the code I posted is interpreted as 16 bit code then it would assemble into their 32 bit equivalents due to 0x66 which is fine and expected so the code is fine -- this was an error on my part. (2) The org directive is more complicated then you think: I'd rather not get into details, but effectively it is a value that is added to all relocations. But that doesn't matter: here you either want org 0 and segments set to 0x7c0 or org 0x7c00 and segments set to 0. I believe the recommendation posted earlier is for "org 0x7c00 set segments to 0". Keep in mind that you cannot use mov to set cs.

ds should never be 0x7c00. It should only ever be 0 (if org 0x7c00) or 0x7c0 (if org 0) - nothing else. Note assemblers assume org 0 if not specified.

Although a little simplified, I typically describe org like this:

Code: Select all

org 0x7c00
label:
   mov ax, label  <- this generates mov ax, label (+ 0x7c00)
If this code was executing at 0x7c00, the value of "label" would be what we expect after relocation - 0x7c00.
Last edited by neon on Mon May 24, 2021 3:42 pm, edited 2 times in total.
OS Development Series | Wiki | os | ncc
char c[2]={"\x90\xC3"};int main(){void(*f)()=(void(__cdecl*)(void))(void*)&c;f();}
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by kzinti »

Octocontrabass wrote:No, org 0 tells the assembler that your code origin is (starts at) offset 0, which only works when you set the segment registers to 0x7C0. You want your segment registers to be 0, which requires the origin to be 0x7c00.
Indeed, I wrote this too fast without involving enough brain cells and got it completely backward. Apologies to OP. I'll take myself out of this thread before I further embarrass myself. :oops:
TheGameMaker90
Member
Member
Posts: 83
Joined: Thu Jan 07, 2021 2:01 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by TheGameMaker90 »

Look, I feel like I'm losing brain cells from this post too.

Most of the sources I've come across (indeed on here too) are stubs of broken code. Like this page:
https://wiki.osdev.org/Real_Mode

It uses a "value" that isn't defined anywhere.
mov eax, DATASEL16

Great, but how do I know what that's supposed to be? If I'm using GRUB, I don't have access to the 16 bit code. This page doesn't explain it or define it, yet I'm supposed to know what it is when thousands of other sources do it completely different.

Forums are supposed to be places where you can get help. What defines helpful information? Google says this:
"The definition of helpful is someone or something that is useful, that provides assistance or aid, or that is prone to providing aid."

Here's my definition:
"Information that helps somebody understand something building off of prior knowledge or new information."

I'm sorry to say this, but none of this is helpful. I need to understand why when I use an ljmp instuction my magic number goes past the boundaries. No matter what address I use it results in this:
0000200 0000 5500 00aa
when I use hexdump boot.bin.

Without the ljmp it works finme, but yes, you are all right that I don't know enough about the registers to know what's in CS without the ljmp instruction. I am not above admitting the gaps in my knowledge, but if there is a page that goes into detail about them. Please provide a link.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by kzinti »

You set the BOOT_MAGIC at offset "_start + 510". So your magic word offset will change depending on the size of the "ljmp" instruction. This is because "_start" will move depending on the size of the "ljmp" instruction, therefore "_start + 510" also will.

"_start + 510" tells me you expect the ljmp instruction to be 2 bytes, but that isn't true in this case. This is why BOOT_MAGIC isn't showing up where you expect it to.
Last edited by kzinti on Mon May 24, 2021 3:57 pm, edited 3 times in total.
User avatar
neon
Member
Member
Posts: 1567
Joined: Sun Feb 18, 2007 7:28 pm
Contact:

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by neon »

Hi,
It uses a "value" that isn't defined anywhere.
mov eax, DATASEL16

Great, but how do I know what that's supposed to be?
The Global Descriptor Table (GDT) defines the descriptors. Typically you have the NULL descriptor 0, kernel code descriptor 1, kernel data descriptor 2. Other descriptors you can create are up to you: you can have one for the TSS, one for 16 bit code, one for 16 bit data, and others. The code that references DATASEL16 is using the GDT index for the 16 bit data descriptor.

I.e. Lets assume this is our GDT:

GDT[0] = NULL
GDT[1] = 32 bit Kernel Code
GDT[2] = 32 bit Kernel Data
GDT[3] = 16 bit Kernel Code
GDT[4] = 16 bit Kernel Data

Then DATASEL16 = 4 in this example.

I do agree though that a lot of the code throughout the Wiki make assumptions about what you have already read. Basically the code is describing the process to drop down to real mode from protected mode: 32 bit protected mode > 16 bit protected mode > 16 bit real mode.
OS Development Series | Wiki | os | ncc
char c[2]={"\x90\xC3"};int main(){void(*f)()=(void(__cdecl*)(void))(void*)&c;f();}
TheGameMaker90
Member
Member
Posts: 83
Joined: Thu Jan 07, 2021 2:01 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by TheGameMaker90 »

kzinti wrote:You set the BOOT_MAGIC at offset "_start + 510". So your magic word offset will change depending on the size of the "ljmp" instruction. This is because "_start" will move depending on the size of the "ljmp" instruction, therefore "_start + 510" also will.

You could just use ". = 510" and it would work in your case:

Code: Select all

. = 510;
.word BOOT_MAGIC
See? Was that so hard? I got an X. Thank you. Now If I try to call that boot_main function, what is required? Will it work in the current state?
I recall Octocontrabass mentioning the direction flag [link to the page I came across: https://en.wikipedia.org/wiki/Direction_flag]

How does it work? This page is very limited as well. Where should it go? How does it help me to do what I want to do?
TheGameMaker90
Member
Member
Posts: 83
Joined: Thu Jan 07, 2021 2:01 pm

Re: Writing bootloader with includes... [Solved. Mostly...]

Post by TheGameMaker90 »

neon wrote:Hi,
It uses a "value" that isn't defined anywhere.
mov eax, DATASEL16

Great, but how do I know what that's supposed to be?
The Global Descriptor Table (GDT) defines the descriptors. Typically you have the NULL descriptor 0, kernel code descriptor 1, kernel data descriptor 2. Other descriptors you can create are up to you: you can have one for the TSS, one for 16 bit code, one for 16 bit data, and others. The code that references DATASEL16 is using the GDT index for the 16 bit data descriptor.

I.e. Lets assume this is our GDT:

GDT[0] = NULL
GDT[1] = 32 bit Kernel Code
GDT[2] = 32 bit Kernel Data
GDT[3] = 16 bit Kernel Code
GDT[4] = 16 bit Kernel Data

Then DATASEL16 = 4 in this example.

I do agree though that a lot of the code throughout the Wiki make assumptions about what you have already read. Basically the code is describing the process to drop down to real mode from protected mode: 32 bit protected mode > 16 bit protected mode > 16 bit real mode.
Sorry neon, just realized I skipped over your post. Wouldn't it be 5 based on your theory? 0-4 is 5 values. Again, I could be wrong. But that was meant as more of an example of what I'm talking about. The GDT tutorial shows us a 64 bit GDT, but not a 32 bit one too. I think that most people will be in 32 bit mode more often. It's just confusing when you come from a standpoint where most of your life OSDev seemed like black magic lol.
Post Reply