xeyes wrote:Thanks for sharing the interesting detail. So the handlers (or part of them) have to be identity mapped due to running cross MMU on/off boundary? I wonder whether this is because of the fact that the architecture was from a period where most CPUs don't have MMUs? The new power arch (like POWER 10) probably doesn't have the same limitation anymore?
The handlers also have fixed addresses (e.g. Data Access Exception is at 0x300, External Interrupt is at 0x700, etc.), so in practice, most operating systems simply copy themselves to the start of address space. The handlers are all in the first 12kB, and why make it more complicated than it has to be? When the kernel takes control of the system, all of RAM is free, so it might as well move itself to address 0. Linux has a linear mapping of address 0 to the 3GB line, so translating between physical and virtual is pretty simple in that range.
As to POWER10, I kind of doubt it. I couldn't find an architecture manual, but I looked at one for PPC64 not too long ago and found that it too will turn off the DR and IR bits in the MSR. I'm guessing they want to keep compatibility, so not all the OSes have to be redeveloped.
xeyes wrote:Why do you need to clean it up ever? Is it because of using some fast syscall instructions that don't swich stack during a syscall? Otherwise if there's always a stack switch, the user level code won't be able to look at it so there's no need to clean up right?
I guess that was misunderstandable. What I meant is that there is no need to always return back exactly the way you came in. The initial task needs to construct its own IRET frame, and it can do so at the bottom of its stack rather than the top. This might mean that a few stack frames are left active when the IRET happens, but it doesn't matter, because the stack will start over at the very top next time the kernel is called. So that's what I meant by "clean up".