Gigasoft wrote:rdos wrote:Real tests on AMD reveals that AMD generates a protection fault (with 0 in error code) when using the 64-bit version of the indirect call instruction (0x48 0xFF xx). That probably means that AMD doesn't support this, and thus it is not possible to use calls to switch from 64-bit to 32-bit.
That's because it's the wrong instruction. Remove the 0x48.
0xFF /3 encodes a 32-bit far call.There is no 64-bit far call.
rdos wrote:Gigasoft wrote:One way would be to put all your 32-bit destinations in a table and use an indirect far call (12 bytes per call). Another would be to write a stub function for every segment that you are going to call into, like this (10 bytes per call):
I have checked this now. The Intel manual states that 32-bit indirect calls are supported from 64-bit mode, while the AMD manual doesn't mention this possibility. Real tests on AMD reveals that AMD generates a protection fault (with 0 in error code) when using the 64-bit version of the indirect call instruction (0x48 0xFF xx). That probably means that AMD doesn't support this, and thus it is not possible to use calls to switch from 64-bit to 32-bit. Although, it might be possible to detect this situation and use calls on Intel and retfs on AMD, since calls are much more efficient. In effect, it means that 64-bit code wanting to do 32-bit syscalls must go through protection fault handler on AMD (coding the pushs and retf inline takes too much space). In fact, if I patch the code with the indirect call on all CPUs, and see a new protection fault with patched code, I'll do the call in protection fault handler instead.
Since no existing 64-bit code exists, surely it is possible to make room for the retf sequence for 64-to-32 far calls?
Also, how is the 32-bit code returning to 64-bit mode? Obviously you can't just do a simple retf, since the 32-bit code can't do a 64-bit far return.
A possible method I can see of coding inline far calls:
Code: Select all
push cs
call stub
...
stub:
push destcs
push destip
retf
I wouldn't expect a significant speed difference from coding direct far calls, and I would expect it to be significantly faster than taking a trip through the GPF handler
That all said, I'd be tempted to just use a different inline sequence, something along the lines of
Code: Select all
farptr:
dd off
dw seg
lea rdi, [rel farptr]
call far64to32stub
// Note if calling from >4GB away, must use an indirect call due to offset limitations
// <4GB
far64to32stub:
call far [rdi]
ret
since it means the 32-bit code needs no adjustment (it just restores CS/IP like normal and implicitly lands in 64-bit mode)