x2apic losing interrupt after setting ISR around SMI

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
xeyes
Member
Member
Posts: 212
Joined: Mon Dec 07, 2020 8:09 am

Re: x2apic losing interrupt after setting ISR around SMI

Post by xeyes »

Octocontrabass wrote:If the problem is indeed the APIC configuration, you can use something like msr-tools in Linux to compare the x2APIC registers against your OS.

I can't imagine what else it could be if it's not the APIC.
Isn't x2 mostly the same, just with different logic IDs and an extra self-IPI register?

Linux sets it up with more vectors enabled, thermal interrupt, performance counter overflow, and CMCI.

As you might have have guessed from the differences, while I can read core temperatures and count retired instructions now, the interrupt is still lost once I adjust the backlight.

I guess the issue is probably elsewhere, but it seems that without access to a platform debugger (not sure if a production model would even have the needed traces to connect though), it's like shooting arrows in the dark :(
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: x2apic losing interrupt after setting ISR around SMI

Post by Octocontrabass »

If you really want to dig into it, you could try disassembling the BIOS ROM. The code that runs in SMM has to be in there somewhere.

It's not worth the effort unless you think it's fun to do that sort of thing.
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: x2apic losing interrupt after setting ISR around SMI

Post by linuxyne »

Some more points:
  • Does send_eoi still use the MMIO interface even with x2apic mode (the source code comment seems to suggest that it does, though it may just be a stale comment leftover from when xapic was setup)? Section 10.12.2 of SDM Vol3.
  • Does x2apic ESR suggest anything?
  • What happens if the idle loop is an infinite loop without the HLT instruction?
  • Any possibility that the CPU entered a shutdown state (Section 31.3.2 of SDM Vol3) ?
  • Any possibility of testing on another laptop?
A bare minimum 32-bit protected mode code (which, may be, counts 0 to 9 and back again on each, or every other, timer interrupt and prints the count on the VGA display), and grub-mkrescue iso can be used to test.
xeyes
Member
Member
Posts: 212
Joined: Mon Dec 07, 2020 8:09 am

Re: x2apic losing interrupt after setting ISR around SMI

Post by xeyes »

Octocontrabass wrote:If you really want to dig into it, you could try disassembling the BIOS ROM. The code that runs in SMM has to be in there somewhere.

It's not worth the effort unless you think it's fun to do that sort of thing.
It seems fun but the SMI handler would almost certainly have to touch the graphics card and/or backlight drivers that I'm not famliar with. It could be challenging to understand it even if I can somehow get the source code with comments.

The issue is also dynamic, if I reduce timer frequency to a few Hz, it almost never happen. So it is not just that SMI and normal interrupt don't mix well, it's more like they have to race in order for the apic to go into this state. Thus even a good understanding of what the SMI handler does may not help :(

I did write a much simpler version to repro this since no "guinea pig" showed up in my call for volunteers.

It doesn't have error checking but does the setup almost the same as my kernel. It also repros the issue on the same machine, as in x1 works, x2 hangs when backlight is adjusted. So the bug is either in this code, or due to something else that I didn't do.

Please help take a look.
Attachments

[The extension s has been deactivated and can no longer be displayed.]

xeyes
Member
Member
Posts: 212
Joined: Mon Dec 07, 2020 8:09 am

Re: x2apic losing interrupt after setting ISR around SMI

Post by xeyes »

linuxyne wrote:Some more points:
  • Does send_eoi still use the MMIO interface even with x2apic mode (the source code comment seems to suggest that it does, though it may just be a stale comment leftover from when xapic was setup)? Section 10.12.2 of SDM Vol3.
  • Does x2apic ESR suggest anything?
  • What happens if the idle loop is an infinite loop without the HLT instruction?
  • Any possibility that the CPU entered a shutdown state (Section 31.3.2 of SDM Vol3) ?
  • Any possibility of testing on another laptop?
A bare minimum 32-bit protected mode code (which, may be, counts 0 to 9 and back again on each, or every other, timer interrupt and prints the count on the VGA display), and grub-mkrescue iso can be used to test.
Thanks for the ideas

1. It uses wrmsr, what comment are we talking about?
2. No error or pic error interrupt
3. I have to try, but if the ISR bit is set, pic would probably block the same vector regardless?
4. My copy doesn't have .2 under 31.3, what's the title of the section? Would "shutdown state" cause the other APs (and the laptop itself) to shutdwon? Does NMI/fixed IPI pierce this state?
5. Didn't find another one that has SMI based backlight adjustment or support for ACPI _BCM and x2apic, yet.

I wrote a simpler sequence that is otherwise very simliar to what the kernel does, and the same issue happens with this version as well. Attached in the post above.
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: x2apic losing interrupt after setting ISR around SMI

Post by Octocontrabass »

xeyes wrote:It seems fun but the SMI handler would almost certainly have to touch the graphics card and/or backlight drivers that I'm not famliar with. It could be challenging to understand it even if I can somehow get the source code with comments.
That's fine; you don't need to understand those parts if you can figure out the parts that touch the APIC.
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: x2apic losing interrupt after setting ISR around SMI

Post by linuxyne »

xeyes wrote: 1. It uses wrmsr, what comment are we talking about?
Then the comment was indeed stale. In one of your previous replies, the code snippet read:

Code: Select all

send_eoi(); // writes 0 to the EOI regiser at offset B0
xeyes wrote: 3. I have to try, but if the ISR bit is set, pic would probably block the same vector regardless?
Since HLT is treated specially (31.10 AUTO HALT RESTART in SDM Vol3), I guessed that removing one more factor can help eradicate it as a potential source of the problem.
xeyes wrote: 4. My copy doesn't have .2 under 31.3, what's the title of the section? Would "shutdown state" cause the other APs (and the laptop itself) to shutdwon? Does NMI/fixed IPI pierce this state?
31.3.2 is "Exiting From SMM" on the latest SDM Vol3. It speaks about a single processor entering the shutdown state when encountering invalid contents in SMRAM. The manual does say that "Intel processors stop executing instructions until a RESET#, INIT# or NMI# is asserted".
xeyes wrote: I wrote a simpler sequence that is otherwise very simliar to what the kernel does, and the same issue happens with this version as well. Attached in the post above.
Thanks. Unable to find anything at fault in it, though.

---

This does seem more of a technical support issue and less of a development issue.

One can try running a Linux test with noacpi, nomodeset (and other applicable parameters, such as disabling vt-d to avoid interrupt-remapping, to coerce it into the software state required to replicate the issue) in the boot command line, to see if it too experiences problems.

In one of the patches [3], Intel tried blacklisting Lenovo Thinkpad W520 T420 for broken SMI causing hard lock-ups when enabling x2apic. So it is certainly possible that this is a BIOS/HW issue.

Some details, below:

[1] https://patchew.org/Xen/20191204162025. ... itrix.com/
[2] https://lore.kernel.org/lkml/1373592159 ... l.com/t/#u
[3] https://bugzilla.kernel.org/show_bug.cgi?id=43054
[4] https://bugzilla.kernel.org/show_bug.cgi?id=42604
[5] http://bxr.su/OpenBSD/sys/arch/amd64/amd64/lapic.c#192
[6] http://bxr.su/FreeBSD/sys/x86/acpica/madt.c#174
[7] https://elixir.bootlin.com/linux/latest ... ic.c#L1862
xeyes
Member
Member
Posts: 212
Joined: Mon Dec 07, 2020 8:09 am

Re: x2apic losing interrupt after setting ISR around SMI

Post by xeyes »

Octocontrabass wrote:
xeyes wrote:It seems fun but the SMI handler would almost certainly have to touch the graphics card and/or backlight drivers that I'm not famliar with. It could be challenging to understand it even if I can somehow get the source code with comments.
That's fine; you don't need to understand those parts if you can figure out the parts that touch the APIC.
Looking at 16b code is interesting and previously I had quite a bit of fun getting some classic OSes (like P-system and CP/M) working inside a virtual 8086 VM. They would sometimes use retf 2 to sneak past my iret trap and broke my non standard iret frame. I wonder is this just a case in reverse that I did something unexpected by the UEFI BIOS writer and broke something.

But not able to step live code is going to be a lot trickier though. Maybe some day.
linuxyne wrote: Since HLT is treated specially (31.10 AUTO HALT RESTART in SDM Vol3), I guessed that removing one more factor can help eradicate it as a potential source of the problem.
31.3.2 is "Exiting From SMM" on the latest SDM Vol3. It speaks about a single processor entering the shutdown state when encountering invalid contents in SMRAM. The manual does say that "Intel processors stop executing instructions until a RESET#, INIT# or NMI# is asserted".
I don't think I'd be able to break states inside SMM this easily, and it is likely not in that state. I changed the hlt to a print every N loops and the core kept printing even though timer interrupt handler has stopped being triggered.
linuxyne wrote: One can try running a Linux test with noacpi, nomodeset (and other applicable parameters, such as disabling vt-d to avoid interrupt-remapping, to coerce it into the software state required to replicate the issue) in the boot command line, to see if it too experiences problems.
This is hard, Linux wouldn't enable x2 if it can't read the DMAR table, and x2 doesn't hang if Linux enables it.
linuxyne wrote: In one of the patches [3], Intel tried blacklisting Lenovo Thinkpad W520 T420 for broken SMI causing hard lock-ups when enabling x2apic. So it is certainly possible that this is a BIOS/HW issue.
Wow you are so good at searching, the only x2 bug I found while looking online was some suse document on lack of serialization around writing registers could cause a hang.

Could you share how did you find so many related issues?

I saw that in [3] people found a workaround (UEFI). Adapted the simple version to boot with UEFI but it is still inconclusive. The issue no longer happens, but the laptop doesn't have the SMI backlight adjust feature anymore, pressing the key combos don't change the backlight under UEFI.

[2] is also a very interesting read, Not surprised that Intel's manual can confuse people working for Intel as the manual only made a passing reference to remapper and the reasons outlined in the emails make perfect sense (I also thought that remapping was because of IOAPIC having only 8 bits for dest). While "not architectural", all combinations (x2,x1)(logical,physical)(remap,no remap) seem to work on this laptop, besides that any ombination with x2 has this 'hang' issue.


My best guess now: maybe the BIOS code made certain assumptions, and I did something in a different way.
Post Reply