Page 1 of 1

Is this a bug in AMDs Athlon?

Posted: Thu May 05, 2011 12:02 am
by rdos
I'm not the one to propose that failures in software are related processor-bugs, but this one seems so "clean-cut" I wonder what else could behind it?

OK, so I setup different GDTs per processor-core in order to be able to retrieve the core selector that uniquely identifies the core to the OS. This seems to work most of the time (well, actually every time except at one specific occassion). I now have a crash-debugger report that is related to one core retrieving another core's core selector. It is evident that this had happened while the other core also was trying to get the core selector, because otherwise the scheduler-lock would not have failed (which was the reason the crash-debugger was invoked).

The code executed just before to get the core selector is very simple:

Code: Select all

   push ax
   mov ax,40h          ; selector 40h is the core selector which has different GDT mappings per core.
   mov fs,ax
   mov fs,fs:ps_sel    ; ps_sel is the core selector saved "within itself" which points to an unique selector.
   pop ax
   add fs:ps_nesting,1
   jc lcDone
   Crash                   ; this is the position the crash-debugger shows
lcDone:
   ret
In the crash-debugger, I could verify that the actual value at 40:ps_sel was not the value loaded into FS. Instead, the value loaded into FS was another core's core selector. I could also verify that the stack was correct (there was a reference to LockCore on the correct position on the stack, called from LeaveSection), so the code didn't get to the position through an incorrect jmp/call.

It would be possible to get to the present result in FS if using another core's GDT. Maybe AMD does not support having different GDTs per core? Or maybe nobody uses it, and AMD have made some optimizations that are broken when GDTs differ?

EDIT: In addition to this, the core which owed the selector loaded, also faults shortly afterwards when it's ps_nesting value is not the one expected (because the other core locked the wrong core).

Re: Is this a bug in AMDs Athlon?

Posted: Thu May 05, 2011 1:29 am
by turdus
This is not a bug. If you use segment selectors, the memory is NOT checked for gdt entries, instead read from cpu's internal cache. If you change mappings, you also have to use lgdt (even if offset does not change) to refresh shadow registers in processor.

Re: Is this a bug in AMDs Athlon?

Posted: Thu May 05, 2011 1:44 am
by rdos
turdus wrote:This is not a bug. If you use segment selectors, the memory is NOT checked for gdt entries, instead read from cpu's internal cache. If you change mappings, you also have to use lgdt (even if offset does not change) to refresh shadow registers in processor.
In the startup-code for each processor-core, LGDT is used to load different bases for GDT, so this is not the issue. I also can see in the crash-debugger that GDTR has different bases on the cores. And this code works normally. The present thing happened after more than half-an-hour of stressing the system with a heavy load, which made millions of these references on 4 different cores.

Re: Is this a bug in AMDs Athlon?

Posted: Thu May 05, 2011 2:33 am
by Owen
turdus wrote:This is not a bug. If you use segment selectors, the memory is NOT checked for gdt entries, instead read from cpu's internal cache. If you change mappings, you also have to use lgdt (even if offset does not change) to refresh shadow registers in processor.
What you're saying directly contradicts both AMD and Intel's architecture manuals. You do not need to do an LGDT after modifying the GDT; the processor always does cache-coherent loads* when reading it

(* Assuming it is mapped to a cache-coherent memory type, i.e. you are not being stupid)

Re: Is this a bug in AMDs Athlon?

Posted: Thu May 05, 2011 3:28 am
by turdus
Owen wrote:What you're saying directly contradicts both AMD and Intel's architecture manuals. You do not need to do an LGDT after modifying the GDT; the processor always does cache-coherent loads* when reading it

(* Assuming it is mapped to a cache-coherent memory type, i.e. you are not being stupid)
Why don't you take it a try? Modify an entry in GDT, nothing will happen. I can imagine it's enough the change the segment register to refresh the cache like lgdt do, but you have to refresh.
rdos wrote: The present thing happened after more than half-an-hour of stressing the system with a heavy load
It this case it seems like you've found a real bug or you have a race condition somewhere.

Re: Is this a bug in AMDs Athlon?

Posted: Thu May 05, 2011 4:02 am
by rdos
turdus wrote:
Owen wrote:What you're saying directly contradicts both AMD and Intel's architecture manuals. You do not need to do an LGDT after modifying the GDT; the processor always does cache-coherent loads* when reading it

(* Assuming it is mapped to a cache-coherent memory type, i.e. you are not being stupid)
Why don't you take it a try? Modify an entry in GDT, nothing will happen. I can imagine it's enough the change the segment register to refresh the cache like lgdt do, but you have to refresh.
This is slightly beside the point, but of course you need to reload an entry from the GDT in order to change loaded selector's attributes. Also, reloading GDTR should not change loaded selector attributes. It only changes the base for GDTR.

In this case, different cores have different GDTR bases. Each core has the first 48h entries mapped to a private physical page, while the rest of the GDT is mapped (with paging) to the same physical addresses for all cores. The FS selector is first loaded with GDT entry 40h (which has a different base in each core due to different GDTR mappings), and then the unique selector is loaded from a position within the segment which has previously (initialization time) been filled with an unique selector. This unique selector has the same attributes (base & size) as the original selector, but is unique when used between cores.

Re: Is this a bug in AMDs Athlon?

Posted: Thu May 05, 2011 4:23 am
by Candy
Did you tag it as global in the page tables? If so, I can imagine some part of the MMU thinking "this should be shared between cores too then". Should reproduce more easily though.

Why don't you use gs though? I recall there being a gs-base-swap function specifically for this kind of thing.

Re: Is this a bug in AMDs Athlon?

Posted: Thu May 05, 2011 4:52 am
by rdos
Candy wrote:Did you tag it as global in the page tables? If so, I can imagine some part of the MMU thinking "this should be shared between cores too then". Should reproduce more easily though.
I have an idea of how to reproduce it more easily. The thing is, I won't notice something is wrong until another core tries to lock at the same time (nesting-level is wrong). I think I will add experimental code that reads the APIC ID, and uses this to get the current core instead, and then compares the results. This would notice something is wrong even if it has no direct, adverse effects.
Candy wrote:Why don't you use gs though? I recall there being a gs-base-swap function specifically for this kind of thing.
Can't. RDOS is a segmented OS, and thus makes frequent use of GS. I might want to look into the TR-trick though. I currently don't use the TR for anything significant, so it might be just as fast to modify SS0 contents in a single TR-per-core instead of switching TR, which in itself is complicated since there is a need to change the descriptor-type to "available TSS" before loading it. Another advantage of this is that threads would need one GDT-selector less (no TSS), and could keep registers in the thread-control block instead. But the latter is a major undertaking.

Re: Is this a bug in AMDs Athlon?

Posted: Thu May 05, 2011 5:12 am
by Owen
Note that I specifically mentioned LGDT. There is never a need to invoke LGDT unless you wish to change the linear address or limit of the table.



SWAPGS is a K8+ instruction, and for 32-bit code "push GS; mov AX, gs_sel; mov GS, AX" is cleaner anyway (simpler nesting)

Re: Is this a bug in AMDs Athlon?

Posted: Thu May 05, 2011 11:55 am
by rdos
Just reproduced the error (it took 15 minutes this time). The first time it was core 2 that got core 4s selector. This time it was core 3 that got core 4s selector. The fault also happened in the unlock procedure this time, while the other time it was in the lock procedure.

I just tested on my Intel Atom processor. It doesn't have the same error, but another type of error that does not occur on AMD. :cry:

I've started work on using TR to identify processor core instead. That has to work on all processors.

Re: Is this a bug in AMDs Athlon?

Posted: Sat May 07, 2011 4:40 am
by rdos
To answer the question in the subject line: No. This is not a bug. I just realized what is wrong with the code, and why the wrong processor core could become locked.

Code: Select all

; this code is one on core 1:
   push ax
   mov ax,40h          ; selector 40h is the core selector which has different GDT mappings per core.
   mov fs,ax
   mov fs,fs:ps_sel    ; ps_sel is the core selector saved "within itself" which points to an unique selector.

; here an ISR happens, which switch to a new thread

; when control returns, the code is now run on core 2. However, FS is now loaded with core 1 selector

   pop ax
   add fs:ps_nesting,1  ; here core 1 is locked instead of core 2
   jc lcDone
   Crash                   ; this is the position the crash-debugger shows
lcDone:
   ret
It is real simple to fix this problem:

Code: Select all

   push ax
   mov ax,40h          ; selector 40h is the core selector which has different GDT mappings per core.
   mov fs,ax
   cli
   mov fs,fs:ps_sel    ; ps_sel is the core selector saved "within itself" which points to an unique selector.
   pop ax
   add fs:ps_nesting,1
   jc lcDone
   Crash                   ; this is the position the crash-debugger shows
lcDone:
   sti
   ret
EDIT: An even better fix is this: This should work because if a thread switch (and core switch) happens after loading FS, FS would be reloaded with the correct base for the new core in question. The problem stems from using the unique selector before the scheduler is locked (and thread-switching can occur).

Code: Select all

   push ax
   mov ax,40h          ; selector 40h is the core selector which has different GDT mappings per core.
   mov fs,ax
   pop ax
   add fs:ps_nesting,1
   jc lcDone
   Crash                   ; this is the position the crash-debugger shows
lcDone:
   mov fs,fs:ps_sel    ; ps_sel is the core selector saved "within itself" which points to an unique selector.
   ret

Re: Is this a bug in AMDs Athlon?

Posted: Sat May 07, 2011 9:51 am
by rdos
I've tested the new code on both Atom and Athlon, and it works. The test run for 3,5 hour on Atom, and it has run about an hour on Athlon now, and nothing has happened.