Octocontrabass wrote:xeyes wrote:It seems fun but the SMI handler would almost certainly have to touch the graphics card and/or backlight drivers that I'm not famliar with. It could be challenging to understand it even if I can somehow get the source code with comments.
That's fine; you don't need to understand those parts if you can figure out the parts that touch the APIC.
Looking at 16b code is interesting and previously I had quite a bit of fun getting some classic OSes (like P-system and CP/M) working inside a virtual 8086 VM. They would sometimes use retf 2 to sneak past my iret trap and broke my non standard iret frame. I wonder is this just a case in reverse that I did something unexpected by the UEFI BIOS writer and broke something.
But not able to step live code is going to be a lot trickier though. Maybe some day.
linuxyne wrote:
Since HLT is treated specially (31.10 AUTO HALT RESTART in SDM Vol3), I guessed that removing one more factor can help eradicate it as a potential source of the problem.
31.3.2 is "Exiting From SMM" on the latest SDM Vol3. It speaks about a single processor entering the shutdown state when encountering invalid contents in SMRAM. The manual does say that "Intel processors stop executing instructions until a RESET#, INIT# or NMI# is asserted".
I don't think I'd be able to break states inside SMM this easily, and it is likely not in that state. I changed the hlt to a print every N loops and the core kept printing even though timer interrupt handler has stopped being triggered.
linuxyne wrote:
One can try running a Linux test with noacpi, nomodeset (and other applicable parameters, such as disabling vt-d to avoid interrupt-remapping, to coerce it into the software state required to replicate the issue) in the boot command line, to see if it too experiences problems.
This is hard, Linux wouldn't enable x2 if it can't read the DMAR table, and x2 doesn't hang if Linux enables it.
linuxyne wrote:
In one of the patches [3], Intel tried blacklisting Lenovo Thinkpad W520 T420 for broken SMI causing hard lock-ups when enabling x2apic. So it is certainly possible that this is a BIOS/HW issue.
Wow you are so good at searching, the only x2 bug I found while looking online was some suse document on lack of serialization around writing registers could cause a hang.
Could you share how did you find so many related issues?
I saw that in [3] people found a workaround (UEFI). Adapted the simple version to boot with UEFI but it is still inconclusive. The issue no longer happens, but the laptop doesn't have the SMI backlight adjust feature anymore, pressing the key combos don't change the backlight under UEFI.
[2] is also a very interesting read, Not surprised that Intel's manual can confuse people working for Intel as the manual only made a passing reference to remapper and the reasons outlined in the emails make perfect sense (I also thought that remapping was because of IOAPIC having only 8 bits for dest). While "not architectural", all combinations (x2,x1)(logical,physical)(remap,no remap) seem to work on this laptop, besides that any ombination with x2 has this 'hang' issue.
My best guess now: maybe the BIOS code made certain assumptions, and I did something in a different way.