triple-fault on xeon cpu

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
01000101
Member
Member
Posts: 1599
Joined: Fri Jun 22, 2007 12:47 pm
Contact:

triple-fault on xeon cpu

Post by 01000101 »

hey, I just got alot of my development equiptment and amongst them are some Intel Xeon 5405's (quad-core 1333fsb). I tried booting DiNS on those machines and every one of them experiences a triple-fault.

I narrowed it down to the far jump into PMode. The code as worked fine on x86/x86_64 cpus (p4, p4d, c2d/q) and has never given me any issues between them, but for some reason the xeons don't like the code at all. I looked through the intel sd manuals and could find no alterations that need to be made for PMode initialization on the xeon cpus. Here is the code for the basic PMode setup. btw, this works fine in bochs, qemu, vpc2007, and virtualbox OSE on my ubuntu 8.04.1 machine (running the dual xeons).

Code: Select all


[ORG 0x7C00]
[bits 16]

start: 
   xor ax, ax
   xor bx, bx
   xor cx, cx
   xor dx, dx

floppy_reset: 
   sti
   int 0x13
   cmp ah, 0
   jne floppy_reset

floppy_load_sector:
   mov ax, 0x1000
   mov es, ax
   mov bx, 0x0
   mov ah, 02h
   mov al, 18 
   mov ch, 0  
   mov cl, 01h
   mov dh, 0x00

floppy_load_sector_loop: 
   int 0x13
   mov ax, es
   add ax, 0x0240
   mov es, ax
   mov ah, 0x02
   mov al, 18
   inc dh
   cmp dh, 2
   jne floppy_load_sector_loop
   mov dh, 0
   inc ch
   cmp ch, 0x3
   jne floppy_load_sector_loop 

floppy_stop_motor: 
   mov dx, 0x3F2
   mov al, 0x0C
   out dx, al

setup_gdt: 
   cli
   lgdt [gdt_desc]

setup_cr0: 
   mov eax, cr0
   or al, 1
   mov cr0, eax
   ; -- gets to here just fine, but fails on the jump -- ;
far_jump: 
   jmp 0x08:setup_stack

[bits 32]
setup_stack: 
   xor eax, eax
   xor ebx, ebx
   xor ecx, ecx
   xor edx, edx
   mov ax, 0x10
   mov ds, ax
   mov es, ax
   mov fs, ax
   mov gs, ax
   mov ss, ax
   mov esp, 0x90000
   ; -- never reaches here -- ;
   mov word [0xb8000], 0xA053
   hlt

gdt_start: 
gdt_null_desc: 
   dd 0
   dd 0
gdt_sys_code_desc: 
   dw 0xffff
   dw 0x0
   db 0x0
   db 0x9a
   db 0xcf
   db 0x0
gdt_sys_data_desc: 
   dw 0xffff
   dw 0x0
   db 0x0
   db 0x92
   db 0xcf
   db 0x0
gdt_end: 
gdt_desc: 
   dw gdt_end - gdt_start - 1
   dd gdt_start

times 510 - ($ - $$) db 0
dw 0xAA55
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: triple-fault on xeon cpu

Post by bewing »

Clearly the jump is fine. So it's gotta be some special deal in the GDT .... :-k

I just looked at mine, and I use 0x99 and 0x93 where you are using 0x9a and 0x92 -- but I haven't gone back to look up what the 1 bit means. -- However, after thinking about it for a minute, I am fairly certain that you do NOT want the 2 bit set in your code segment!
User avatar
01000101
Member
Member
Posts: 1599
Joined: Fri Jun 22, 2007 12:47 pm
Contact:

Re: triple-fault on xeon cpu

Post by 01000101 »

the second bit you are referring to is the R (code) or R/W (data) bit that allows the segment to be readable or writable. This seems to be necessary :) . Also, why would this work on other similar processors but not the xeon?

I have a bit of a side (but related) question as well. Could a NMI such as the server 'watchdog' feature cause a tripe-fault in a setup that does not have interrupt handlers setup in pmode?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: triple-fault on xeon cpu

Post by Brendan »

Hi,
01000101 wrote:I have a bit of a side (but related) question as well. Could a NMI such as the server 'watchdog' feature cause a tripe-fault in a setup that does not have interrupt handlers setup in pmode?
I'd hope not - that sort of thing should be disabled by default for backward compatibility ("opt in" rather than "opt out"), and/or hidden from the OS until needed (e.g. implemented to generate SMI that's handled by the BIOS SMM handler until you enable "ACPI mode", rather than as an NMI).

There are some problems with your code though. First, you shouldn't assume anything about the stack the BIOS was using before your code started - the stack could be overwriting data you load, or data you load could be overwriting the stack.

Also, you don't check to see if there was any errors when you load sectors from floppy - the BIOS could load nothing and return a "bad sector" error, and you'd just assume it worked and JMP to nothing. This is a common/likely problem because floppies are extremely unreliable for a variety of reasons (dust, head alignment issues, floppy motor startup speeds, etc).

For an example, for my latest floppy boot code I retry floppy operations at least 3 times before giving up (with a "disk reset" between some retries), and drop back to single sector reads if multi-sector reads fail too many times, and have detailed error messages, and I implemented support for redundant floppy images (so if I can't read a sector I can try to get the same data from the other side of the disk). While I admit to being a little paranoid (I even assume BIOS functions will trash registers I need when I know they shouldn't), experience has taught me that there's good reasons for being paranoid... ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: triple-fault on xeon cpu

Post by bewing »

Yes, that bit is the R/W bit -- but I think it goes the other way around. If the bit is set then the segment is writable, AFAIK. And the xeon may have an additional protection feature that insists that code segments can never be writable. So, when you say it seems to be necessary, I suspect you are mistaken. I think you should try changing the 0x9a to 0x98 and run it.

And yes, of course a NMI can crash an OS that does not have a valid IDT set up, if it fires. However, it seems completely impossible that an NMI would happen in between those 10 opcodes -- and as Brendan says, NMI should be disabled during boot.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: triple-fault on xeon cpu

Post by Brendan »

Hi,
bewing wrote:Yes, that bit is the R/W bit -- but I think it goes the other way around. If the bit is set then the segment is writable, AFAIK. And the xeon may have an additional protection feature that insists that code segments can never be writable. So, when you say it seems to be necessary, I suspect you are mistaken. I think you should try changing the 0x9a to 0x98 and run it.
For data segments that bit is the R/W bit (where "clear" means writes cause general protection faults, and "set" means writes are allowed). For code segments that bit is *not* the R/W bit - it determines if the code segment can be read (e.g. "clear" means reads from the code segment cause general protection faults, and "set" means reads are allowed).

This means that an instruction like "mov eax,[cs:foo]" may or may not cause a general protection fault depending on the setting of this bit. In protected mode, code segments have *never* been writable, and something like "mov [cs:foo],eax" will always cause a general protection fault (unless you're in real mode or V86 mode).

The other different bit here is the "accessed" bit (bit 0), where "set" means the segment has been used by the CPU and "clear" means it hasn't. This is like the "accessed" bit in page table entries - when the CPU reads from or writes to a segment it makes sure the descriptor's "accessed" flag is set, so that people using segmentation (and not using paging) could use the flag to implement swapping. I normally set this flag because it prevents the needs for the CPU to write to the descriptor to update the flag the first time the segment is accessed (and I never test if the "accessed" flag is set or not in descriptors).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply