Page 1 of 1

[Solved]Trouble simplifying GDT on x86

Posted: Thu Jun 28, 2018 11:03 am
by psykochewbacca
Hi!

I'm trying to simplify a GDT table with contains 6 segments but in which 2 are really necessary (from what I gather). I cannot make the changes work.

The code is from Cromwell, a Xbox (Original) bootloader. The CPU is a Pentium III . There is no concept of userspace so everything should run on segments with privilege level 0. I want to begin with a flat model with a single code32 and a single data32 segment.

Here's the relevant original working code:

Code: Select all

    .code32

.section .text, "ax"
     .org 0x00
     jmp    start_linux

.global Cromwellconfig
Cromwellconfig:
    .org 0x0c
    // Space for the SHA1 checksum
    .org 0x20   

    // The Value positions are fixed, do not change them, used everywhere
    .long 0x0   // 0x20 if XBE, then this bit is 0, if Cromwell mode, the bit is set to 1 by the Startuploader
    .long 0x0   // 0x24 ImageRetryLoads
    .long 0x0   // 0x28 Bank, from where Loaded
    .long 0x0   // 0x2C 0 .. Bios = 256 k, 1 .. Bios = 1MB
    .long 0x0   // 0x30 free
    .long _end_complete_rom       // 0x34 free
    .long 0x0       // 0x38 free
    .long 0x0   // free

.align 16
tableGdt:
    .byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 // 0x00 dummy
    .byte 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00 // 0x08 code32
    .byte 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00 // 0x10 code32
    .byte 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00 // 0x18 data32
    .byte 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0x8f, 0x00 // 0x20 code16 (8f indicates 4K granularity, ie, huge limit)
    .byte 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0x8f, 0x00 // 0x28 data16

tableGdtDescriptor:
    // This is the GDT header having 8 bytes
    .word tableGdtDescriptor-tableGdt  // 0x30 byte GDT
    .long GDT_LOC                      // GDT located at 0xA0000
    .word 0                            // Padding
tableGdtEnd:

.align 16
tableIdtDescriptor:

    .word 2048
    .long IDT_LOC                      // IDT located at 0xB0000
    .word 0     // fill Word, so we get aligned again

        // We are dword aligned now

.align 16        
    .globl start_linux
start_linux:

    // Make SURE the IRQs are turned off
    cli

    // kill the cache  = Disable bit 30 + 29 = CD + NW
    // CD = Cache Disable (disable = 1)
    // NW Not write through (disable = 1)
    // Protected mode enabled
    mov     $0x60010033, %eax
    mov %eax, %cr0
    wbinvd

    // Flush the TLB
    xor %eax, %eax
    mov %eax, %cr3

    // We kill the Local Descriptor Table
    xor %eax, %eax
    lldt    %ax

    // DR6/DR7: Clear the debug registers
    xor %eax, %eax
    mov %eax, %dr6
    mov %eax, %dr7
    mov %eax, %dr0
    mov %eax, %dr1
    mov %eax, %dr2
    mov %eax, %dr3


    // IMPORTANT!  Linux expects the GDT located at a specific position,
    // 0xA0000, so we have to move it there.

    // Copy the GDT to its final location
    movl $GDT_LOC, %edi
    movl $tableGdt, %esi
    movl $(tableGdtEnd-tableGdt)/4, %ecx
    // Moving (tableGdtEnd-tableGdt)/4 DWORDS from &tableGdt to &GDT_LOC
    rep movsl

    // Load the new GDT (bits0-15: Table limit, bits16-47: Base address)
    lgdt GDT_LOC+(tableGdtDescriptor-tableGdt)

    // Kill the LDT, if any
    xor %eax, %eax
    lldt %ax

    // Reload CS as 0010 from the new GDT using a far jump
    jmp $0x010, $reload_cs

reload_cs:

    // CS is now a valid entry in the GDT.  Set SS, DS, and ES to valid
    // descriptors, but clear FS and GS as they are not necessary.

    // Set SS, DS, and ES to a data32 segment with maximum limit.
    movw $0x0018, %ax
    mov %eax, %ss
    mov %eax, %ds
    mov %eax, %es

    // Clear FS and GS
    xor %eax, %eax
    mov %eax, %fs
    mov %eax, %gs
Changing the far jump in the code above to

Code: Select all

jmp $0x008, $reload_cs
also works fine by the way.

As you can see, protected mode is enabled at the start.

I want to trim the GDT to have a code32 segment at 0x08 and a data32 segment at 0x10. Here's my take on this; which isn't working:

Code: Select all

    .code32

.section .text, "ax"
     .org 0x00
     jmp    start_linux

.global Cromwellconfig
Cromwellconfig:
    .org 0x0c
    // Space for the SHA1 checksum
    .org 0x20   

    // The Value positions are fixed, do not change them, used everywhere
    .long 0x0   // 0x20 if XBE, then this bit is 0, if Cromwell mode, the bit is set to 1 by the Startuploader
    .long 0x0   // 0x24 ImageRetryLoads
    .long 0x0   // 0x28 Bank, from where Loaded
    .long 0x0   // 0x2C 0 .. Bios = 256 k, 1 .. Bios = 1MB
    .long 0x0   // 0x30 free
    .long _end_complete_rom       // 0x34 free
    .long 0x0       // 0x38 free
    .long 0x0   // free

.align 16
tableGdt:
    .byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 // 0x00 dummy
    .byte 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00 // 0x08 code32
    .byte 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00 // 0x10 data32

tableGdtDescriptor:
    // This is the GDT header having 8 bytes
    .word tableGdtDescriptor-tableGdt  // 0x18 byte GDT
    .long GDT_LOC                      // GDT located at 0xA0000
    .word 0                            // Padding
tableGdtEnd:

.align 16
tableIdtDescriptor:

    .word 2048
    .long IDT_LOC                      // IDT located at 0xB0000
    .word 0     // fill Word, so we get aligned again

        // We are dword aligned now

.align 16        
    .globl start_linux
start_linux:

    // Make SURE the IRQs are turned off
    cli

    // kill the cache  = Disable bit 30 + 29 = CD + NW
    // CD = Cache Disable (disable = 1)
    // NW Not write through (disable = 1)
    // Protected mode enabled
    mov     $0x60010033, %eax
    mov %eax, %cr0
    wbinvd

    // Flush the TLB
    xor %eax, %eax
    mov %eax, %cr3

    // We kill the Local Descriptor Table
    xor %eax, %eax
    lldt    %ax

    // DR6/DR7: Clear the debug registers
    xor %eax, %eax
    mov %eax, %dr6
    mov %eax, %dr7
    mov %eax, %dr0
    mov %eax, %dr1
    mov %eax, %dr2
    mov %eax, %dr3


    // IMPORTANT!  Linux expects the GDT located at a specific position,
    // 0xA0000, so we have to move it there.

    // Copy the GDT to its final location
    movl $GDT_LOC, %edi
    movl $tableGdt, %esi
    movl $(tableGdtEnd-tableGdt)/4, %ecx
    // Moving (tableGdtEnd-tableGdt)/4 DWORDS from &tableGdt to &GDT_LOC
    rep movsl

    // Load the new GDT (bits0-15: Table limit, bits16-47: Base address)
    lgdt GDT_LOC+(tableGdtDescriptor-tableGdt)

    // Kill the LDT, if any
    xor %eax, %eax
    lldt %ax

    // Reload CS as 0008 from the new GDT using a far jump
    jmp $0x008, $reload_cs

reload_cs:

    // CS is now a valid entry in the GDT.  Set SS, DS, and ES to valid
    // descriptors, but clear FS and GS as they are not necessary.

    // Set SS, DS, and ES to a data32 segment with maximum limit.
    movw $0x0010, %ax
    mov %eax, %ss
    mov %eax, %ds
    mov %eax, %es

    // Clear FS and GS
    xor %eax, %eax
    mov %eax, %fs
    mov %eax, %gs
Can anybody spot why it wouldn't work?

Thanks.

Re: Trouble simplifying GDT on x86

Posted: Thu Jun 28, 2018 11:43 am
by Octocontrabass
Does anything else reference the GDT? You've only changed the selector values used by the bootloader, not by other software (e.g. the Linux kernel).

Re: Trouble simplifying GDT on x86

Posted: Thu Jun 28, 2018 11:53 am
by nullplan
As I don't get tired of telling my customers: "Doesn't work" is no error message. What happens? Can you tell? Can you try it in an emulator and try to get debug output? Load an IDT with actual handlers set up so you can see if an exception happens? The comments mention a SHA-1 checksum. So maybe there's something cryptographic going on? Also, what do you hope to achieve with this change? Save a few bytes?

Anyway, there are a couple of things. I hope, paging is disabled when you enter this code, or else I hope the code is running in an identity mapped region, or else the move to CR0 at the start will disable paging. Also: Why do you disable the cache and clear the TLB? The cache should be set up correctly from the code running before it (or else they'd have trouble with MMIO), and the TLB is completely useless, since paging is disabled.

For clearing the debug registers, it is entirely sufficient to clear DR7. DR6 is a status register, so its value doesn't matter, and as long as DR7 is zeroed, DR0-3 also don't matter.

The comments mention that Linux expects the GDT to be in a certain place. I'm fairly certain that's a lie, since Linux will load its own GDT once it gets on the driver's seat. Or at least it does on the PC, maybe the Xbox version was lobotomized.

The old GDT does, however, contain a bunch of segments the code you posted does not reference. Is it possible they are referenced later?

Re: Trouble simplifying GDT on x86

Posted: Thu Jun 28, 2018 1:27 pm
by psykochewbacca
Octocontrabass wrote:Does anything else reference the GDT? You've only changed the selector values used by the bootloader, not by other software (e.g. the Linux kernel).
This bootloader doesn't load a kernel on it's own. It display an interface on screen to let the user select a kernel. So the GDT is used right away by the same program. It jumps to the C code part of the program after finishing init such as IDT, MTRR and other essential stuff.
nullplan wrote:As I don't get tired of telling my customers: "Doesn't work" is no error message. What happens? Can you tell? Can you try it in an emulator and try to get debug output? Load an IDT with actual handlers set up so you can see if an exception happens? The comments mention a SHA-1 checksum. So maybe there's something cryptographic going on? Also, what do you hope to achieve with this change? Save a few bytes?
Well the code runs on target directly. JTAG interface on CPU is permanently disabled (TRST tied to GND). I tried setting up XQemu, a Xbox-specific variant of QEmu but couldn't make it work.

"Doesn't work" is pretty much all I can say. This code should run and then display an interface on screen. It doesn't load up a kernel on it's own(see reply to Octocontrabass, just above). An IDT is loaded down the road that's for sure. With the original code snippet in place (with the 6 segments), it runs just fine. It's when I try to simplify it to 2 that the program halts before jumping to the C code portion of the program.

The SHA-1 value stored is used by the bootstrap program (which loads this one) to validate integrity of the binary. It's done this was because the whole program (bootstrap + this bootloader) is stored on flash and bootstrap portion is never erased. SHA-1 is used because the bootstrap already contains code to calculate SHA-1 hash. Adding MD5 or something other is kinda pointless. Anyway long story short, it has nothing to do with SHA-1. The program works fine with the first code snipped but doesn't with the second snippet in place.
nullplan wrote:Anyway, there are a couple of things. I hope, paging is disabled when you enter this code, or else I hope the code is running in an identity mapped region, or else the move to CR0 at the start will disable paging. Also: Why do you disable the cache and clear the TLB? The cache should be set up correctly from the code running before it (or else they'd have trouble with MMIO), and the TLB is completely useless, since paging is disabled.

For clearing the debug registers, it is entirely sufficient to clear DR7. DR6 is a status register, so its value doesn't matter, and as long as DR7 is zeroed, DR0-3 also don't matter.

The comments mention that Linux expects the GDT to be in a certain place. I'm fairly certain that's a lie, since Linux will load its own GDT once it gets on the driver's seat. Or at least it does on the PC, maybe the Xbox version was lobotomized.

The old GDT does, however, contain a bunch of segments the code you posted does not reference. Is it possible they are referenced later?
I don't pretend to understand how all these works or why were they implemented that way. This thing was put up circa 2002 and it was to win a competition on who would boot up Linux on the Xbox first. So it's probably not thought thoroughly.

Paging is disabled. In fact, it is not enabled in this bootloader. I don't know why cache is disabled and TLB is cleared.

I don't think the extra segment are referenced later. Not while in this bootloader program at least, which we will not exit until a user input is made. There's no other place in the code where segment registers (cs, ds, ...) are set. Sorry if the terminology I use isn't on point. I'm learning big time right now!

Re: Trouble simplifying GDT on x86

Posted: Thu Jun 28, 2018 1:47 pm
by Octocontrabass
psykochewbacca wrote:IDT
You know, the IDT has a lot of segment selectors in it. Did you update those when you changed the GDT?

Re: Trouble simplifying GDT on x86

Posted: Thu Jun 28, 2018 3:00 pm
by psykochewbacca
Octocontrabass wrote:
psykochewbacca wrote:IDT
You know, the IDT has a lot of segment selectors in it. Did you update those when you changed the GDT?
Well I didn't change that part of the code which was already working with the original code. Are you implying that values in the IDT are influenced by any changes in the GDT?

Re: Trouble simplifying GDT on x86

Posted: Thu Jun 28, 2018 5:58 pm
by MichaelPetch
psykochewbacca wrote: Well I didn't change that part of the code which was already working with the original code. Are you implying that values in the IDT are influenced by any changes in the GDT?
Each IDT entry has a code segment selector in it, and it must use an appropriate one from the GDT.

Re: Trouble simplifying GDT on x86

Posted: Thu Jun 28, 2018 6:07 pm
by psykochewbacca
MichaelPetch wrote:Each IDT entry has a code segment selector in it, and it must use an appropriate one from the GDT.
Well there it is!
Make sense you'd want to map different IDT entries to different segment with appropriate privilege level I guess!

Thanks alot. Now onto making my ATA DMA driver work! I think the issue is related to virtual address not matching physical's, hence why I wanted to simplify my GDT table. But that's another issue!

Big thanks to everyone here.

Re: Trouble simplifying GDT on x86

Posted: Fri Jun 29, 2018 8:56 am
by nullplan
psykochewbacca wrote:Thanks alot. Now onto making my ATA DMA driver work! I think the issue is related to virtual address not matching physical's, hence why I wanted to simplify my GDT table. But that's another issue!
All the old segments had a base of 0 and a limit of 4GB. So there is no difference between linear and virtual address in either system, no matter the segments. I think your issue is elsewhere.