Page 2 of 3
Re: kernel load address > 1MB access in realmode
Posted: Fri Dec 02, 2022 3:00 pm
by Octocontrabass
rdos wrote:For just implementing the copy function, it's not necessary to "obey" all the rules of switching to protected mode.
True, but only because some of the "rules" are just suggestions. You can get a pretty good idea of what's actually required by looking at
the source code to HIMEM.SYS that Microsoft published.
mtbro wrote:Isn't that undefined behavior when it comes to what is happening to selectors?
Yes, behavior is undefined if you don't use a far JMP or far CALL to reload CS immediately after changing CR0.PE. It does work sometimes, which is why your PoC worked even though it didn't reload CS.
mtbro wrote:Checking out the
wiki's unreal mode proper PM cs selector is loaded with far jump. Why there would be 64k code limitation then? It's not in its GDT definition (0xffff limit).
The 0xFFFF limit with byte granularity indicates a 64kB segment.
Re: kernel load address > 1MB access in realmode
Posted: Fri Dec 02, 2022 3:32 pm
by mtbro
Octocontrabass wrote:The 0xFFFF limit with byte granularity indicates a 64kB segment.
Right, I should have seen that.
I'll do the full jump with prep then, I'm ok with that. I was able to cleanup a code a bit more, I don't need to set data selectors to 16b size before setting them to 0. While I was not able to find this anywhere why, I do need to enter RM by first setting code selectors to 16b and then I can set it to what I want (0 in my case):
libsa16.S
Re: kernel load address > 1MB access in realmode
Posted: Fri Dec 02, 2022 4:29 pm
by Octocontrabass
mtbro wrote:While I was not able to find this anywhere why, I do need to enter RM by first setting code selectors to 16b and then I can set it to what I want (0 in my case):
That has always been a requirement. Does the current Intel SDM not list it anywhere?
Is there any particular reason why you're using a 32-bit code segment? You should be able to do everything in 16-bit protected mode.
Re: kernel load address > 1MB access in realmode
Posted: Fri Dec 02, 2022 4:47 pm
by mtbro
Vol 3a/5-1 chapter about segment and page protection says this:
The part of the segment-protection mechanism that is based on privilege levels can essentially be disabled while still in protected mode by assigning a privilege level of 0 (most privileged) to all segment selectors and segment descriptors.
But I was not able to find anywhere I need to go from 32 to 16 first. I'm not saying it's not there, it's one heavy book.
Octocontrabass wrote:Is there any particular reason why you're using a 32-bit code segment?
There's absolutely none. I was focused on removing my farmemcpy() with pm_copy() I didn't think of this simple thing. Thank you.
edit: I thought if I use 16b code segments I can simplify the copy a bit more as I would not need to farjump that many times. But it doesn't work this way:
libsa16.S:pm_copy. I need to grab a sleep to think about this. (hlt instructions in the function are there as I was debugging it)
Re: kernel load address > 1MB access in realmode
Posted: Fri Dec 02, 2022 6:05 pm
by Octocontrabass
mtbro wrote:edit: I thought if I use 16b code segments I can simplify the copy a bit more as I would not need to farjump that many times. But it doesn't work this way:
You're using LODS/STOS instructions (why not REP MOVS?) which default to the current code segment's address size. Since you've switched to a 16-bit code segment, the default address size is now 16-bit, and an address size override prefix is required to use 32-bit addressing.
You can convince GAS to emit the correct prefix by explicitly writing the operands, like "rep movsb (%esi),%es:(%edi)".
Re: kernel load address > 1MB access in realmode
Posted: Fri Dec 02, 2022 6:48 pm
by mtbro
Octocontrabass wrote:You're using LODS/STOS instructions (why not REP MOVS?) which default to the current code segment's address size
I didn't know lods/stos defaults to code segment. I assumed it would be data segment register as it's reading from that src/dst. This saved me some headaches debugging this.
About not using rep movs?
Few years ago I had to solve similar "memcpy" style problem and failed to find an elegant way. I have no idea how I missed that. But since then I used it this way.
I'm happy how my
pm_copy() looks now.
Re: kernel load address > 1MB access in realmode
Posted: Sat Dec 03, 2022 4:04 am
by rdos
I checked my copy code again, and I actually reload cs like you are supposed to, but I'm still not sure if this is required. I also use another trick in the code and have 32-bit physical address of source and destination as parameters, and then set the base addresses in the GDT to those. After loading the GDT, I can load source & destination selector and do a move with offsets set to zero. I think the idea behind that design was that copy size should never exceed 64k, and so by giving the source and destination selectors a 64k limit, I don't need to reload real-mode selectors with the correct limits. It's also important to do cli before setting bit 0 in CR0 and do sti after clearing it again.
The code selector is 16-bits, which is much simpler given that the code surrounding the copy must be real mode and use 16-bit default operand size.
Re: kernel load address > 1MB access in realmode
Posted: Sat Dec 03, 2022 4:23 am
by rdos
mtbro wrote:Octocontrabass wrote:You're using LODS/STOS instructions (why not REP MOVS?) which default to the current code segment's address size
I didn't know lods/stos defaults to code segment. I assumed it would be data segment register as it's reading from that src/dst. This saved me some headaches debugging this.
About not using rep movs?
Few years ago I had to solve similar "memcpy" style problem and failed to find an elegant way. I have no idea how I missed that. But since then I used it this way.
I'm happy how my
pm_copy() looks now.
I can see one potential problem with the code. You never reload ds and es with selectors with a 64k limit, which means you leave the copy procedure with "unreal mode" selectors. Just reloading them with zero after entering real mode won't fix the limits. The limits can only be changed in protected mode.
In relation to the previous discussion of unreal mode, I think it is when BIOS temporarily use protected mode that unreal mode is destroyed, not when segment registers are reloaded. Something it must do if the disc is in AHCI mode or when it must access other MMIO areas above 1M.
Re: kernel load address > 1MB access in realmode
Posted: Sat Dec 03, 2022 4:54 am
by rdos
Here is my copy code:
Code: Select all
source_sel = 8
dest_sel = 10h
flat_sel = 18h
LoadGdt:
load_gdt0:
DW 27h
DD 0
DW 0
load_gdt_source:
DW 0FFFFh
DD 92000000h
DW 0
load_gdt_dest:
DW 0FFFFh
DD 92300000h
DW 0
load_gdt_flat:
DW 0FFFFh
DD 92000000h
DW 008Fh
load_gdt_cs:
DW 0FFFFh
DD 9A000000h
DW 0
InitGdt Proc near
mov ax,cs
movzx eax,ax
shl eax,4
add eax,OFFSET LoadGdt
mov dword ptr cs:load_gdt0+2,eax
lgdt fword ptr cs:load_gdt0
;
mov ax,cs
movzx eax,ax
shl eax,4
or dword ptr cs:load_gdt_cs+2,eax
ret
InitGdt Endp
MoveData Proc near
push ds
push es
pushad
;
mov eax,esi
mov dword ptr cs:load_gdt_source+2,eax
mov al,92h
xchg al,byte ptr cs:load_gdt_source+5
mov byte ptr cs:load_gdt_source+7,al
;
mov eax,edi
mov dword ptr cs:load_gdt_dest+2,eax
mov al,92h
xchg al,byte ptr cs:load_gdt_dest+5
mov byte ptr cs:load_gdt_dest+7,al
mov word ptr cs:MoveDataRmCs,cs
;
cli
mov eax,cr0
or al,1
mov cr0,eax
;
db 0EAh
dw OFFSET MoveDataPm
dw 20h
MoveDataPm:
mov ax,source_sel
mov ds,ax
mov ax,dest_sel
mov es,ax
xor esi,esi
xor edi,edi
rep movs byte ptr es:[edi],[esi]
;
mov eax,cr0
and al,NOT 1
mov cr0,eax
;
db 0EAh
dw OFFSET MoveDataRm
MoveDataRmCs:
dw 0
MoveDataRm:
sti
popad
pop es
pop ds
ret
MoveData Endp
I can see that I gate A20 and load GDT before starting to read the disc, and the MoveData seems to rely on BIOS not changing GDT or A20 gating. That could be a problem, although the code seems to work on many real PCs.
Re: kernel load address > 1MB access in realmode
Posted: Sat Dec 03, 2022 5:12 am
by mtbro
It was very late night when I modified that code, I see I have some leftovers when I removed lods/stos instructions. I need to do a code cleanup.
rdos wrote: You never reload ds and es with selectors with a 64k limit, which means you leave the copy procedure with "unreal mode" selectors
Hm, that's valid point. I hope I'll get some time today to test this on actual HW.
Out of curiosity, is segment specification needed in
? Isn't es segment in es:edi implied?
Re: kernel load address > 1MB access in realmode
Posted: Sat Dec 03, 2022 5:25 am
by rdos
mtbro wrote:It was very late night when I modified that code, I see I have some leftovers when I removed lods/stos instructions. I need to do a code cleanup.
rdos wrote: You never reload ds and es with selectors with a 64k limit, which means you leave the copy procedure with "unreal mode" selectors
Hm, that's valid point. I hope I'll get some time today to test this on actual HW.
Out of curiosity, is segment specification needed in
? Isn't es segment in es:edi implied?
That looks a bit strange. I actually should have checked so ecx was less than 64k, and use rep movs byte ptr es:[di],ds:[si] (the 16-bit variant) as that would not emit address size overrides. I could have optimized with dword moves too, but then source and destination might not be dword aligned.
I think I could also have written MoveDataRmCs in the InitGdt procedure.
I'm sure there are other stuff that could be optimized as well.
Re: kernel load address > 1MB access in realmode
Posted: Sat Dec 03, 2022 6:37 am
by mtbro
You are correct about the unreal mode leftovers. As I was looking at the code I think I don't understand fundamental question: when does data selector reloading take effect ? For code/cs I need to do a far jump. My last test of
pm_copy() doesn't work - the unreal mode leftovers are in effect.
I tested this on actual HW with this code:
Code: Select all
/* install custom RM gp handler */
xorw %bx, %bx
movw $gp_handler, 0x34(%bx)
movw %bx, 0x36(%bx)
/* address test: 0x35000 vs 0x5000 */
movw $0x5000, %bx
movb $'X',(%bx)
/* >> hash out to test: GP occurs on actual HW */
movl $0x35000, %edi
pushl $5
pushl $s1
pushl %edi
call pm_copy
// << end of hashout
movl $0x35000, %ebx
movb $'E',(%ebx)
/* load and store dword from 0x35000 on stack */
pushw %ds
movw $0x3000, %ax
movw %ax, %ds
movl %ds:(%bx), %edx
xorw %ax,%ax
movw %ax, %ds
pushl %edx
movw %sp, %bx
pushl $0x5000
pushw $tmsg1
call printf16
pushw %bx
pushw $tmsg2
call printf16
call dump_regs
jmp .Lb1_haltme
tmsg1: .asciz "string @ 0x5000: %s\r\n"
tmsg2: .asciz "4B @ 0x35000: %s\r\n"
gph: .asciz "GP occured @ %x:%x\r\n"
s1: .asciz "AAAAAA"
gp_handler:
cli
xorw %ax,%ax
movw %ax,%ds
movw %sp, %bp
pushw (%bp)
pushw 2(%bp)
pushw $gph
call printf16
2:
hlt
jmp 2b
iret
Re: kernel load address > 1MB access in realmode
Posted: Sat Dec 03, 2022 6:39 am
by nullplan
rdos wrote:In relation to the previous discussion of unreal mode, I think it is when BIOS temporarily use protected mode that unreal mode is destroyed, not when segment registers are reloaded. Something it must do if the disc is in AHCI mode or when it must access other MMIO areas above 1M.
Exactly. Or if, say, the BIOS has support for USB keyboards (which SeaBios does, so it is not impossible), and a timer interrupt happens, and BIOS has to check on the interrupt pipe (it does not use an interrupt from the HC to trigger that, it just checks on the pipe in the timer interrupt).
Or more broadly, BIOS might enter protected mode in response to interrupts, and will likely leave it with correct real mode segments loaded in. Thus if you have interrupts enabled, an unreal mode setup can be destroyed at literally any time (BIOS has essentially no way of finding out if the system currently is in unreal mode, so it can only restore the default state). I would therefore suggest not using it if it can be at all avoided. And it can be, because all BIOSes you are likely to encounter in anything made this century have the functions to load sectors by LBA and to copy to high memory with a BIOS function.
Re: kernel load address > 1MB access in realmode
Posted: Sat Dec 03, 2022 11:01 am
by rdos
mtbro wrote:You are correct about the unreal mode leftovers. As I was looking at the code I think I don't understand fundamental question: when does data selector reloading take effect ? For code/cs I need to do a far jump.
Limits & attributes of segment registers are only loaded in protected mode. That means that if you load any segment register with attributes or limits that are not valid for real mode, you need to add "restore" selectors to the GDT that you can load before leaving protected mode. That's why my code loads a code selector with 16-bits bitness and a 64k limit, and has source & destination selectors that have 64k limits. That way, the segment registers will have the correct limit and attributes when real mode is entered without a need to load restore selectors.
Note that this is the case for the code segment register too. The far jump will only load limits & attributes of the code segment register when done in protected mode. That means that if your protected mode code selector doesn't have 16-bit bitness and a 64k limit, you need to load another selector in protected mode that does before leaving protected mode. Otherwise, your code segment will have invalid attributes in real mode.
[solved] Re: kernel load address > 1MB access in realmode
Posted: Sat Dec 03, 2022 1:13 pm
by mtbro
rdos wrote:mtbro wrote:Limits & attributes of segment registers are only loaded in protected mode.
That makes sense as when I load the segment in RM I'm actually loading segment part of the address. The last pm_copy() I shared does this in RM but that what my desperate attempt to see "if that works". It doesn't. The only time my pm_copy() worked was when I used full cs/ds in 32b mode and rolled back to 16b.
As I need to test this on real HW there's this inconvenience of copying disk from VM to my machine where I dd it to the sdcard, and then full power on of the HW. I googled around if qemu is capable of enforcing segment limits. Actually, using kvm it does. So with that enabled I'm able to debug this on my VM first which is really nice.
edit: found the issue - wrong granularity in 16b data segment in gdt definition. My
pm_copy() now works and I was able to trigger #GP both under qemu+kvm and actual HW.
Again, thank you all guys, I did learn few things here.
unrelated note: I used to play CTF/wargames and it was an exploit in 2017 where I needed to do the memcpy in exploit. I somehow missed the movs instruction and since then I was using lods/stos to copy data. It does make me smile a bit.