kernel load address > 1MB access in realmode

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: kernel load address > 1MB access in realmode

Post by Octocontrabass »

rdos wrote:For just implementing the copy function, it's not necessary to "obey" all the rules of switching to protected mode.
True, but only because some of the "rules" are just suggestions. You can get a pretty good idea of what's actually required by looking at the source code to HIMEM.SYS that Microsoft published.
mtbro wrote:Isn't that undefined behavior when it comes to what is happening to selectors?
Yes, behavior is undefined if you don't use a far JMP or far CALL to reload CS immediately after changing CR0.PE. It does work sometimes, which is why your PoC worked even though it didn't reload CS.
mtbro wrote:Checking out the wiki's unreal mode proper PM cs selector is loaded with far jump. Why there would be 64k code limitation then? It's not in its GDT definition (0xffff limit).
The 0xFFFF limit with byte granularity indicates a 64kB segment.
mtbro
Member
Member
Posts: 54
Joined: Fri Apr 08, 2022 3:12 pm

Re: kernel load address > 1MB access in realmode

Post by mtbro »

Octocontrabass wrote:The 0xFFFF limit with byte granularity indicates a 64kB segment.
#-o Right, I should have seen that.

I'll do the full jump with prep then, I'm ok with that. I was able to cleanup a code a bit more, I don't need to set data selectors to 16b size before setting them to 0. While I was not able to find this anywhere why, I do need to enter RM by first setting code selectors to 16b and then I can set it to what I want (0 in my case): libsa16.S
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: kernel load address > 1MB access in realmode

Post by Octocontrabass »

mtbro wrote:While I was not able to find this anywhere why, I do need to enter RM by first setting code selectors to 16b and then I can set it to what I want (0 in my case):
That has always been a requirement. Does the current Intel SDM not list it anywhere?

Is there any particular reason why you're using a 32-bit code segment? You should be able to do everything in 16-bit protected mode.
mtbro
Member
Member
Posts: 54
Joined: Fri Apr 08, 2022 3:12 pm

Re: kernel load address > 1MB access in realmode

Post by mtbro »

Vol 3a/5-1 chapter about segment and page protection says this:
The part of the segment-protection mechanism that is based on privilege levels can essentially be disabled while still in protected mode by assigning a privilege level of 0 (most privileged) to all segment selectors and segment descriptors.
But I was not able to find anywhere I need to go from 32 to 16 first. I'm not saying it's not there, it's one heavy book.
Octocontrabass wrote:Is there any particular reason why you're using a 32-bit code segment?
There's absolutely none. I was focused on removing my farmemcpy() with pm_copy() I didn't think of this simple thing. Thank you.

edit: I thought if I use 16b code segments I can simplify the copy a bit more as I would not need to farjump that many times. But it doesn't work this way: libsa16.S:pm_copy. I need to grab a sleep to think about this. (hlt instructions in the function are there as I was debugging it)
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: kernel load address > 1MB access in realmode

Post by Octocontrabass »

mtbro wrote:edit: I thought if I use 16b code segments I can simplify the copy a bit more as I would not need to farjump that many times. But it doesn't work this way:
You're using LODS/STOS instructions (why not REP MOVS?) which default to the current code segment's address size. Since you've switched to a 16-bit code segment, the default address size is now 16-bit, and an address size override prefix is required to use 32-bit addressing.

You can convince GAS to emit the correct prefix by explicitly writing the operands, like "rep movsb (%esi),%es:(%edi)".
mtbro
Member
Member
Posts: 54
Joined: Fri Apr 08, 2022 3:12 pm

Re: kernel load address > 1MB access in realmode

Post by mtbro »

Octocontrabass wrote:You're using LODS/STOS instructions (why not REP MOVS?) which default to the current code segment's address size
I didn't know lods/stos defaults to code segment. I assumed it would be data segment register as it's reading from that src/dst. This saved me some headaches debugging this.

About not using rep movs? :oops: Few years ago I had to solve similar "memcpy" style problem and failed to find an elegant way. I have no idea how I missed that. But since then I used it this way.
I'm happy how my pm_copy() looks now.
rdos
Member
Member
Posts: 3296
Joined: Wed Oct 01, 2008 1:55 pm

Re: kernel load address > 1MB access in realmode

Post by rdos »

I checked my copy code again, and I actually reload cs like you are supposed to, but I'm still not sure if this is required. I also use another trick in the code and have 32-bit physical address of source and destination as parameters, and then set the base addresses in the GDT to those. After loading the GDT, I can load source & destination selector and do a move with offsets set to zero. I think the idea behind that design was that copy size should never exceed 64k, and so by giving the source and destination selectors a 64k limit, I don't need to reload real-mode selectors with the correct limits. It's also important to do cli before setting bit 0 in CR0 and do sti after clearing it again.

The code selector is 16-bits, which is much simpler given that the code surrounding the copy must be real mode and use 16-bit default operand size.
rdos
Member
Member
Posts: 3296
Joined: Wed Oct 01, 2008 1:55 pm

Re: kernel load address > 1MB access in realmode

Post by rdos »

mtbro wrote:
Octocontrabass wrote:You're using LODS/STOS instructions (why not REP MOVS?) which default to the current code segment's address size
I didn't know lods/stos defaults to code segment. I assumed it would be data segment register as it's reading from that src/dst. This saved me some headaches debugging this.

About not using rep movs? :oops: Few years ago I had to solve similar "memcpy" style problem and failed to find an elegant way. I have no idea how I missed that. But since then I used it this way.
I'm happy how my pm_copy() looks now.
I can see one potential problem with the code. You never reload ds and es with selectors with a 64k limit, which means you leave the copy procedure with "unreal mode" selectors. Just reloading them with zero after entering real mode won't fix the limits. The limits can only be changed in protected mode.

In relation to the previous discussion of unreal mode, I think it is when BIOS temporarily use protected mode that unreal mode is destroyed, not when segment registers are reloaded. Something it must do if the disc is in AHCI mode or when it must access other MMIO areas above 1M.
rdos
Member
Member
Posts: 3296
Joined: Wed Oct 01, 2008 1:55 pm

Re: kernel load address > 1MB access in realmode

Post by rdos »

Here is my copy code:

Code: Select all

source_sel  = 8
dest_sel  = 10h
flat_sel  = 18h

LoadGdt:
load_gdt0:
  DW 27h
  DD 0
  DW 0
load_gdt_source:
  DW 0FFFFh
  DD 92000000h
  DW 0
load_gdt_dest:
  DW 0FFFFh
  DD 92300000h
  DW 0
load_gdt_flat:
  DW 0FFFFh
  DD 92000000h
  DW 008Fh
load_gdt_cs:
  DW 0FFFFh
  DD 9A000000h
  DW 0

InitGdt Proc near
  mov ax,cs
  movzx eax,ax
  shl eax,4
  add eax,OFFSET LoadGdt
  mov dword ptr cs:load_gdt0+2,eax
  lgdt fword ptr cs:load_gdt0
;
  mov ax,cs
  movzx eax,ax
  shl eax,4
  or dword ptr cs:load_gdt_cs+2,eax
  ret
InitGdt Endp

MoveData        Proc near
  push ds
  push es
  pushad
;
  mov eax,esi
  mov dword ptr cs:load_gdt_source+2,eax
  mov al,92h
  xchg al,byte ptr cs:load_gdt_source+5
  mov byte ptr cs:load_gdt_source+7,al
;
  mov eax,edi
  mov dword ptr cs:load_gdt_dest+2,eax
  mov al,92h
  xchg al,byte ptr cs:load_gdt_dest+5
  mov byte ptr cs:load_gdt_dest+7,al
  mov word ptr cs:MoveDataRmCs,cs
;
  cli
  mov eax,cr0
  or al,1
  mov cr0,eax
;
  db 0EAh
  dw OFFSET MoveDataPm
  dw 20h

MoveDataPm:
  mov ax,source_sel
  mov ds,ax
  mov ax,dest_sel
  mov es,ax
  xor esi,esi
  xor edi,edi
  rep movs byte ptr es:[edi],[esi]
;
  mov eax,cr0
  and al,NOT 1
  mov cr0,eax
;
  db 0EAh
  dw OFFSET MoveDataRm
MoveDataRmCs:
  dw 0

MoveDataRm:
  sti
  popad
  pop es
  pop ds
  ret
MoveData        Endp
I can see that I gate A20 and load GDT before starting to read the disc, and the MoveData seems to rely on BIOS not changing GDT or A20 gating. That could be a problem, although the code seems to work on many real PCs.
mtbro
Member
Member
Posts: 54
Joined: Fri Apr 08, 2022 3:12 pm

Re: kernel load address > 1MB access in realmode

Post by mtbro »

It was very late night when I modified that code, I see I have some leftovers when I removed lods/stos instructions. I need to do a code cleanup.
rdos wrote: You never reload ds and es with selectors with a 64k limit, which means you leave the copy procedure with "unreal mode" selectors
Hm, that's valid point. I hope I'll get some time today to test this on actual HW.
Out of curiosity, is segment specification needed in

Code: Select all

rep movs byte ptr es:[edi],[esi]
? Isn't es segment in es:edi implied?
rdos
Member
Member
Posts: 3296
Joined: Wed Oct 01, 2008 1:55 pm

Re: kernel load address > 1MB access in realmode

Post by rdos »

mtbro wrote:It was very late night when I modified that code, I see I have some leftovers when I removed lods/stos instructions. I need to do a code cleanup.
rdos wrote: You never reload ds and es with selectors with a 64k limit, which means you leave the copy procedure with "unreal mode" selectors
Hm, that's valid point. I hope I'll get some time today to test this on actual HW.
Out of curiosity, is segment specification needed in

Code: Select all

rep movs byte ptr es:[edi],[esi]
? Isn't es segment in es:edi implied?
That looks a bit strange. I actually should have checked so ecx was less than 64k, and use rep movs byte ptr es:[di],ds:[si] (the 16-bit variant) as that would not emit address size overrides. I could have optimized with dword moves too, but then source and destination might not be dword aligned.

I think I could also have written MoveDataRmCs in the InitGdt procedure.

I'm sure there are other stuff that could be optimized as well.
mtbro
Member
Member
Posts: 54
Joined: Fri Apr 08, 2022 3:12 pm

Re: kernel load address > 1MB access in realmode

Post by mtbro »

You are correct about the unreal mode leftovers. As I was looking at the code I think I don't understand fundamental question: when does data selector reloading take effect ? For code/cs I need to do a far jump. My last test of pm_copy() doesn't work - the unreal mode leftovers are in effect.
I tested this on actual HW with this code:

Code: Select all

        /* install custom RM gp handler */
        xorw %bx, %bx
        movw $gp_handler, 0x34(%bx)
        movw %bx, 0x36(%bx)

        /* address test: 0x35000 vs 0x5000 */
        movw $0x5000, %bx
        movb $'X',(%bx)

        /* >> hash out to test: GP occurs on actual HW */
        movl $0x35000, %edi
        pushl $5
        pushl $s1
        pushl %edi
        call pm_copy
       // << end of hashout

        movl $0x35000, %ebx
        movb $'E',(%ebx)

        /* load and store dword from 0x35000 on stack */
        pushw %ds
        movw $0x3000, %ax
        movw %ax, %ds
        movl %ds:(%bx), %edx
        xorw %ax,%ax
        movw %ax, %ds
        pushl %edx
        movw %sp, %bx

        pushl $0x5000
        pushw $tmsg1
        call printf16

        pushw %bx
        pushw $tmsg2
        call printf16

        call dump_regs

        jmp .Lb1_haltme

        tmsg1:  .asciz  "string @ 0x5000: %s\r\n"
        tmsg2:  .asciz  "4B @ 0x35000: %s\r\n"
        gph:    .asciz  "GP occured @ %x:%x\r\n"
        s1:     .asciz  "AAAAAA"

        gp_handler:
                cli
                xorw %ax,%ax
                movw %ax,%ds
                movw %sp, %bp

                pushw (%bp)
                pushw 2(%bp)
                pushw $gph
                call printf16
        2:
                hlt
                jmp 2b
                iret
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: kernel load address > 1MB access in realmode

Post by nullplan »

rdos wrote:In relation to the previous discussion of unreal mode, I think it is when BIOS temporarily use protected mode that unreal mode is destroyed, not when segment registers are reloaded. Something it must do if the disc is in AHCI mode or when it must access other MMIO areas above 1M.
Exactly. Or if, say, the BIOS has support for USB keyboards (which SeaBios does, so it is not impossible), and a timer interrupt happens, and BIOS has to check on the interrupt pipe (it does not use an interrupt from the HC to trigger that, it just checks on the pipe in the timer interrupt).

Or more broadly, BIOS might enter protected mode in response to interrupts, and will likely leave it with correct real mode segments loaded in. Thus if you have interrupts enabled, an unreal mode setup can be destroyed at literally any time (BIOS has essentially no way of finding out if the system currently is in unreal mode, so it can only restore the default state). I would therefore suggest not using it if it can be at all avoided. And it can be, because all BIOSes you are likely to encounter in anything made this century have the functions to load sectors by LBA and to copy to high memory with a BIOS function.
Carpe diem!
rdos
Member
Member
Posts: 3296
Joined: Wed Oct 01, 2008 1:55 pm

Re: kernel load address > 1MB access in realmode

Post by rdos »

mtbro wrote:You are correct about the unreal mode leftovers. As I was looking at the code I think I don't understand fundamental question: when does data selector reloading take effect ? For code/cs I need to do a far jump.
Limits & attributes of segment registers are only loaded in protected mode. That means that if you load any segment register with attributes or limits that are not valid for real mode, you need to add "restore" selectors to the GDT that you can load before leaving protected mode. That's why my code loads a code selector with 16-bits bitness and a 64k limit, and has source & destination selectors that have 64k limits. That way, the segment registers will have the correct limit and attributes when real mode is entered without a need to load restore selectors.

Note that this is the case for the code segment register too. The far jump will only load limits & attributes of the code segment register when done in protected mode. That means that if your protected mode code selector doesn't have 16-bit bitness and a 64k limit, you need to load another selector in protected mode that does before leaving protected mode. Otherwise, your code segment will have invalid attributes in real mode.
mtbro
Member
Member
Posts: 54
Joined: Fri Apr 08, 2022 3:12 pm

[solved] Re: kernel load address > 1MB access in realmode

Post by mtbro »

rdos wrote:
mtbro wrote:Limits & attributes of segment registers are only loaded in protected mode.
That makes sense as when I load the segment in RM I'm actually loading segment part of the address. The last pm_copy() I shared does this in RM but that what my desperate attempt to see "if that works". It doesn't. The only time my pm_copy() worked was when I used full cs/ds in 32b mode and rolled back to 16b.

As I need to test this on real HW there's this inconvenience of copying disk from VM to my machine where I dd it to the sdcard, and then full power on of the HW. I googled around if qemu is capable of enforcing segment limits. Actually, using kvm it does. So with that enabled I'm able to debug this on my VM first which is really nice.

edit: found the issue - wrong granularity in 16b data segment in gdt definition. My pm_copy() now works and I was able to trigger #GP both under qemu+kvm and actual HW.

Again, thank you all guys, I did learn few things here.

unrelated note: I used to play CTF/wargames and it was an exploit in 2017 where I needed to do the memcpy in exploit. I somehow missed the movs instruction and since then I was using lods/stos to copy data. It does make me smile a bit.
Post Reply