far jmp after enabling A20 triple faults

iansjack · Post by **iansjack** » Sat Jul 20, 2013 11:35 am

I guess my British reply was a little too subtle for your American sensibilities.

DavidCooper · Post by **DavidCooper** » Sat Jul 20, 2013 1:19 pm

Your latest version is a great improvement. If you sort out the part Minoto pointed to it may even work now. Your A20 code won't work on every machine, so be ready for it to fail later.

[Note: sorry about the misinformation I provided about the jump - I knew your jump wasn't working as it wasn't loading CS, but it wasn't jumping 8 bytes up memory either. Not being an assembler user, I was confusing the way jumps work in machine code (where they jump a relative distance) with the way they work in assembler (where they jump to absolute locations) - the assembler has to translate the latter into the former.]

Casm · Post by **Casm** » Sat Jul 20, 2013 1:40 pm

DavidCooper wrote:Not being an assembler user, I was confusing the way jumps work in machine code (where they jump a relative distance) with the way they work in assembler (where they jump to absolute locations) - the assembler has to translate the latter into the former.]

In the olden (16 bit) days, far jumps were to absolute addresses. The ip was relative to the new segment, but the segment address in cs was absolute.

Nowadays, of course, there is only one segment; whether by design (32 bit), or because AMD says so (64 bit)..

DavidCooper · Post by **DavidCooper** » Sat Jul 20, 2013 3:11 pm

Casm wrote:In the olden (16 bit) days, far jumps were to absolute addresses. The ip was relative to the new segment, but the segment address in cs was absolute.

Nowadays, of course, there is only one segment; whether by design (32 bit), or because AMD says so (64 bit)..

Okay - I shouldn't have used the word "absolute", so I'd better clarify things just in case it matters to someone. I haven't read up on 64 bit mode fully yet as I don't have a machine capable of running in that mode (and avoid working in emulators), but in 32 bit mode with far jumps and calls they're made to immediate addresses (within the selected segment), while with near jumps and calls (and with short jumps) they use immediate relative distances instead of addresses, and those have to be converted by the processor into addresses (unless of course you're dealing with near jumps and calls to addresses held in registers where there is no immediate involved). I would have thought that this creates more work for the processor to do as it needs to take the immediate jump distance and add it to the current (E)IP rather than just putting an immediate address straight into (E)IP. Z80 processors use addresses for near jumps and calls (far ones don't exist there), though they do use relative distances for short jumps. I was surprised when I switched from using Z80 machines to PCs to find that the PC didn't do the same for near calls and jumps as it seemed to be taking a step backwards, requiring a tiny bit of extra work every time a near jump or call to an immediate is run, even if the difference is trivial (which it doubtless will be once you've added in the cost of converting the segment value and adding that in, plus the extra work of calculating the absolute address when paging is enabled), but then a lot of trivial things when added together can add up to something significant. Near rets are more sensible, using an address off the stack instead of a jump distance. One possible explanation for things working this way is that you can prefix a near jump instruction to make it use a jump distance of two bytes instead of four (in 32 bit mode) or four instead of two (in 16 bit mode), though I don't know if that was planned when the PC processor was originally designed.

Anyway, in assembler you don't normally see any of the relative distances in your code as you use either labels or addresses, so the assembler converts them into relative distances for you and then the processor converts them back into addresses again when it meets them. It's an unnecessary complication (except with short or prefixed jumps), but it's worth understanding what goes on in the actual machine code if you're ever trying to debug a bit of code by looking at the hex. Because I only work with machine code, I often forget that assembler doesn't use relative distances even with short jumps, which is why I misread "jmp 0x8" as a jump of 8 bytes up memory rather than to the address 0x8 within CS.

Casm · Post by **Casm** » Sat Jul 20, 2013 5:28 pm

DavidCooper wrote:Anyway, in assembler you don't normally see any of the relative distances in your code as you use either labels or addresses, so the assembler converts them into relative distances for you and then the processor converts them back into addresses again when it meets them. It's an unnecessary complication

The assembler doesn't talk about absolute or relative addresses; it only talks about where you want to end up (at the instruction after a label) - unless you very explicitly specify an absolute address. The reason almost all processors use relative addresses for jumps is to make it easier to relocate code.

TwixEmma · Post by **TwixEmma** » Sat Jul 20, 2013 7:45 pm

Thanks for all your replies guys, I got my bootloader working, I realise why you told me that code wouldn't enable A20, it's because I put the comment in then forgot to add in the code :S *facepalm*, anyhow I figured out two reasons why it was triple faulting, and choosing to take the easy solution the resulting code is below. But first the two reasons: A: once the code was relocated, it was struggling to jump to the right piece of memory for the 32bit code (possibly a fault of my own but I got lazy so I refactored my code instead lol) B: I was trying to enable A20 before PMode (once I added the code in) which for some possibly well known reason which I am so far unaware of >.< was causing problems, I dont know if anyone else has encountered this, I didn't have to google it I went on instinct to move the call lol. So here's the new, 100% working code for your curiosities:

Code: Select all

;stage 1 boot
use16
org 0x7C00

jmp Start16

Start16:
xor ax,ax
mov ds,ax
mov es,ax
cli
xor ax,ax
mov ds,ax
mov es,ax
mov ax,0x9000
mov ss,ax
mov sp,0xFFFF
sti

mov [drive],dl ;BIOS passes drive number in DL

call LoadStage2
cmp [errorflag],0

jg hang16
jmp loadpoint16 ;jump to loaded program

hang16:
xor ah,ah
int 0x16             ; wait for a key
int 0x19             ; reboot the machine

LoadStage2:
  pusha
  mov [counter1],1
  .tryread:
    mov ax,ds          ;segment
    mov es,ax
    mov bx,loadpoint16 ;offset
    mov ah,02h         ;read sectors into memory
    mov al,4d          ;load 4 sectors
    mov ch,00h         ;the track to read from
    mov cl,02h         ;sector id
    mov dh,00h         ;head
    mov dl,[drive]     ;drive
    int 13h
    jc .tryagain
    jmp .fin
  .tryagain:
    inc [counter1]
    cmp [counter1],3
    jg .failed
    jmp .tryread
  .failed:
    mov [errorflag],1
  .fin:
  popa
ret

drive db 0

errorflag db 0
counter1 db 0

times 510-($-$$) db 0
dw 0xAA55
;end of stage 1 boot
;=========================================
;stage 2 boot
loadpoint16:

call LoadGDT
call EnablePMode

jmp 08h:loadpoint32             ;32Bit data segment

jmp hang16

gdtr:
 dw gdt_end - gdt - 1 ;limit (size of GDT)
 dd gdt               ;base of GDT
gdt:
 dq 0                   ;dummy entry (8 zero filled bytes)
;gdt code:              ;code descriptor
 dw 0xFFFF              ;limit low
 dw 0                   ;base low
 db 0                   ;base middle
 db 10011010b           ;access
 db 11001111b           ;granularity
 db 0                   ;base high
;gdt data:              ;data descriptor
 dw 0xFFFF              ;limit low
 dw 0                   ;base low
 db 0                   ;base middle
 db 10010010b           ;access
 db 11001111b           ;granularity
 db 0                   ;base high
gdt_end:

LoadGDT:
 pusha
 lgdt [gdtr]
 sti
 popa
ret

EnablePMode:
 cli
 mov eax,cr0            ;set bit 0 in CR0-go to pmode
 or eax,1
 mov cr0,eax
ret

;end of stage 2 boot
;=========================================
;stage 3 boot
use32

EnableA20:
 mov al,0xDD    ;command 0xDD enable A20
 out 0x74,al    ;send command to keyboard controller
ret

loadpoint32:

mov ax,0x10             ;32Bit code segment
mov ds,ax
mov ss,ax
mov es,ax
mov esp,90000h

call EnableA20

;Move 32Bit code to 0x0000:0x0000
mov esi,Start32
xor eax,eax                     ;eax = 0x00000000
mov edi,eax
mov cx,((End32-Start32)/4)+1    ;byte length of 32bit code
move_dword:
mov eax,[esi]
mov [edi],eax
add esi,4
add edi,4
dec cx
cmp cx,0
jg move_dword

jmp 0x00000000

Start32:

call ClearScreen32
mov ebx,HelloWorld32
call PrintText32

mov [xpos],0
mov [ypos],4
mov bh, byte [ypos]
mov bl, byte [xpos]
call SetCursorPos32

hang32:
cli
hlt

Putch32:
 pusha
 mov edi,0xB8000
 xor eax,eax
 mov ecx,160
 mov al, byte [ypos]
 mul ecx
 push eax
 mov al, byte [xpos]
 mov cl,2
 mul cl
 pop ecx
 add eax,ecx
 xor ecx,ecx
 add edi,eax
 cmp bl,0x0A
 je .row
 mov dl,bl
 mov dh,[char_attrib]
 mov word [edi],dx
 inc byte [xpos]
 cmp [xpos], 80
 je .row
 jmp .done
 .row:
  mov byte [xpos], 0
  inc byte [ypos]
 .done:
  popa
ret

PrintText32:
 pusha
 push ebx
 pop edi
 .loop:
  mov bl, byte[edi]
  cmp bl, 0
  je .done
  call Putch32
 .next:
  inc edi
  jmp .loop
 .done:
  mov bh, byte [ypos]
  mov bl, byte [xpos]
  call SetCursorPos32
 popa
ret

SetCursorPos32:
 pusha
 xor eax,eax
 mov ecx,80
 mov al,bh
 mul ecx
 add al,bl
 mov ebx,eax
 mov al,0x0F
 mov dx,0x03D4
 out dx,al
 mov al,bl
 mov dx,0x03D5
 out dx,al
 xor eax,eax
 mov al,0x0E
 mov dx,0x03D4
 out dx,al
 mov al,bh
 mov dx,0x03D5
 out dx,al
 popa
ret

ClearScreen32:
 pusha
 cld
 mov edi,0xB8000
 mov cx,2000
 mov ah,[char_attrib]
 mov al,' '
 rep stosw
 popa
ret

char_attrib db 00000101b
xpos db 4
ypos db 2
HelloWorld32 db "Hello 32Bit World",0

End32:

;end of stage 3 boot

In the new code, instead of moving the 0x07C0 code to a different location, I load the second part first, the second part then sets up 32Bit PMode, then relocates the 32Bit code to 0x00000000 (finally), then jumps to it. I've checked this, by after relocating the 32Bit code, formatting the memory that it came from with 0x0s. It still ran fine after the jump to 0x00000000. If I am missing anything here please tell me but so far it works fine, my fingers are crossed lol. Oh and the 32Bit code enables A20 instead.

Do I hear a round of applause or a loud humming boo O.O?

Mikemk · Post by **Mikemk** » Sat Jul 20, 2013 8:19 pm

I'd be careful running below 0x500, because there is information there that you'll need.

TwixEmma · Post by **TwixEmma** » Sun Jul 21, 2013 10:03 am

m12 wrote:I'd be careful running below 0x500, because there is information there that you'll need.

what location would you suggest moving the code to? I'll research what you tell me to! Thanks for your help.

EDIT: I researched why below 0x500 is unsafe, I've moved the code and changed the jump to 0x00000500 in ram instead. I'd still appreciate any research or help you can point me to. Thanks in advance.

Mikemk · Post by **Mikemk** » Sun Jul 21, 2013 12:20 pm

TwixEmma wrote:
m12 wrote:I'd be careful running below 0x500, because there is information there that you'll need.
what location would you suggest moving the code to? I'll research what you tell me to! Thanks for your help.

EDIT: I researched why below 0x500 is unsafe, I've moved the code and changed the jump to 0x00000500 in ram instead. I'd still appreciate any research or help you can point me to. Thanks in advance.

That holds the real mode ivt, bda, and a pointer to the ebda.

TwixEmma · Post by **TwixEmma** » Sun Jul 21, 2013 1:52 pm

m12 wrote:
TwixEmma wrote:
m12 wrote:I'd be careful running below 0x500, because there is information there that you'll need.
what location would you suggest moving the code to? I'll research what you tell me to! Thanks for your help.

EDIT: I researched why below 0x500 is unsafe, I've moved the code and changed the jump to 0x00000500 in ram instead. I'd still appreciate any research or help you can point me to. Thanks in advance.
That holds the real mode ivt, bda, and a pointer to the ebda.

Thanks, but I got that much.

Another thing that would help is if anyone has any links or references to assembly os specific information, since most stuff talks about using C, and I am making this OS purely ASM, because of two reasons, first I don't fully grasp C but I fully grasp ASM, second pure ASM you have more control and it can result in faster code instruction wise.

DavidCooper · Post by **DavidCooper** » Sun Jul 21, 2013 8:42 pm

TwixEmma wrote:Do I hear a round of applause or a loud humming boo O.O?

I think applause is most apt - you learn fast.

TwixEmma wrote:Another thing that would help is if anyone has any links or references to assembly os specific information, since most stuff talks about using C, and I am making this OS purely ASM, because of two reasons, first I don't fully grasp C but I fully grasp ASM, second pure ASM you have more control and it can result in faster code instruction wise.

You should probably work with whatever programming language you feel most comfortable with unless there's a good reason to use something else, such as making your OS easier to port to machines using other types of processor. Speed only usually matters in innermost loops which repeat the same action many times, and even there you'll probably get faster running code by using a compiler unless you hack around with your machine code for a long time and test it carefully. ASM does give you much more compact code though, but this is less important than in the past because most memory holds data rather than code and there's plenty of it available now, while load times are getting ever faster too. Even if you don't program in C, it's still worth learning enough to be able to follow any example code that uses it - you shouldn't really need ASM versions of anything as you ought to be able to translate them yourself, and in any case all you really want is an understanding of how the thing works so that you can then implement it your own way, so you don't even want to translate the examples.

OSDev.org

far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults

Re: far jmp after enabling A20 triple faults