Every 32-bit operation done (in 64-bit or 32-bit code) will wipe out the upper 32-bits of the destination register.rdos wrote:It does because loading a 32-bit register in 64-bit mode does NOT clobber the upper half of it. If it did, there would be no sense in having 32-bit operand overrides.Brendan wrote:You're making the mistake of assuming most of the CPU cares if you're running 16-bit, 32-bit or 64-bit code. It doesn't.rdos wrote:That makes no sense at this is only part of 32-bit code where upper halves are NOT available without doing far jumps which essentially will stop out-of-order execution.
You can have a 4 GiB executable that uses 100 TiB of dynamically allocated space. For the dynamically allocated stuff you can't use "hard-coded" addresses in the instruction itself (and need to store pointers to it because you don't know the address at compile/link time) so the "+/- 2 GiB" limit for immediate addresses doesn't make any difference at all for that case.rdos wrote:Which means that basically all 64-bit designs are wasting hardware and performance with 4 level paging when nobody cares for more than 2 levels. We could simply reduce the address space to 32-bit and just introduce 64-bit operations to existing 32-bit mode instead. To me it seems like software developers simply are not using the hardware features, which in this case is due to high-level compilers and linkers that cannot handle the setup.Brendan wrote:Um, what? There's no reason you can't use full 64-bit pointers, it's just slower. Fortunately it's extremely rare (I doubt I've ever seen an executable file that's larger than 4 GiB) so no sane people care that it's slower.
GCC is GNU's compiler designed for GNU's systems (just like MSVC is Microsoft's compiler designed for Microsoft's systems, andrdos wrote:Not at all. I blame the GCC team as they are the one's that haven't implemented this in a way that allows me to exploit it. The hardware is perfectly functional while the software (C compiler and linker) is not.Brendan wrote:Note: I suspect that you're trying to blame AMD because your OS is poorly designed, and I really do think it's unreasonable to blame AMD for your mistake.
IBM XL C/C++ is IBM's compiler designed for IBM's AIX systems). Your compiler (the one you designed for your OS) doesn't exist; so if GCC's "-mcmodel=large" option doesn't work for your obscure special case then perhaps you should just write your own compiler.
I'm not too sure who claimed that (in which context). Typically data is zero extended, but addresses are sign extended. For example, if you did "mov eax,-1" you'd end up with "RAX = 0x00000000FFFFFFFF" (zero extended) and if you did "lea rax,[-1]" (where the -1 is a 32-bit immediate) you'd end up with "RAX = 0xFFFFFFFFFFFFFFFF" (sign extended). In both cases the instruction has no dependency on the previous value of RAX, so it's not as slow as (e.g.) "mov ax,-1" would be.rdos wrote:Tested it, and it's not sign-extension as previously claimed:
Who cares? It's not like you can do "push eax" in 64-bit interrupt handlers - you save/restore the 64-bit registers regardless of whether the interrupted process was 32-bit or not.rdos wrote:I think this does explain why 32-bit code manipulating 32-bit registers will zero upper half of the 64-bit register. This seems to be a "feature" (rather bug) of long mode.
What this essentially means is that general registers that needs to be preserved and does not return values must be saved before calling an unknown handler in 32-bit mode (as it cannot save the upper half if it uses a 32-bit register), but that if a value is returned there is no need to bother about the upper half as it will automatically be cleared when the 32-bit code loads that particular register.
Cheers,
Brendan