Adding 64-bit support to RDOS

bluemoon · Post by **bluemoon** » Fri Oct 26, 2012 4:02 am

You can always use IRET to simulate jmp SEG:OFFSET, and it works on both AMD and Intel.
However, expect it is slow with all the segment checks; but then I don't see much application in normal usage except in start-up code.

Owen · Post by **Owen** » Fri Oct 26, 2012 5:33 am

bluemoon wrote:You can always use IRET to simulate jmp SEG:OFFSET, and it works on both AMD and Intel.
However, expect it is slow with all the segment checks; but then I don't see much application in normal usage except in start-up code.

RETF is simpler and faster.

rdos · Post by **rdos** » Fri Oct 26, 2012 6:57 am

What I primarily want is a call far from 64-bit mode which generates a 32-bit far return frame that compability mode can use retf32 to return back to long mode with. I'm sure I could solve it other ways as well, but that option would be the best as the calls from 64-bit mode to compability mode would be most efficient that way.

bluemoon · Post by **bluemoon** » Fri Oct 26, 2012 7:24 am

Owen wrote:
bluemoon wrote:You can always use IRET to simulate jmp SEG:OFFSET, and it works on both AMD and Intel.
However, expect it is slow with all the segment checks; but then I don't see much application in normal usage except in start-up code.
RETF is simpler and faster.

IIRC, RETF do not take REX.w prefix and therefore limited to current operand size for the offset; and I suppose you usually want to switch to a segment with different size; therefore retf is not sufficient.

Brendan · Post by **Brendan** » Fri Oct 26, 2012 7:38 am

Hi,

rdos wrote:What I primarily want is a call far from 64-bit mode which generates a 32-bit far return frame that compability mode can use retf32 to return back to long mode with. I'm sure I could solve it other ways as well, but that option would be the best as the calls from 64-bit mode to compability mode would be most efficient that way.

I think that should be possible maybe; with the restriction that you'd have to use an indirect far call and that the "return RIP" must fit in 32 bits.

[EDIT:] There is another restriction (assuming CPL=0 calling CPL=0 where no stack switch is involved). The 64-bit code's SS:RSP must be compatible with the 32-bit code's SS:ESP. This means that you'd have to use the same SS in 64-bit (where base address is ignored) and in 32-bit (where base address must be zero to be compatible); and RSP must fit in 32-bits.
[/EDIT]

As far as I know, for 64-bit code "call far [memptr]" is assumed to be 32-bit and no address size override is needed; possibly because "call far" is almost entirely useless when calling 64-bit code from 64-bit code (where there's almost never a reason to change CS).

Cheers,

Brendan

Gigasoft · Post by **Gigasoft** » Fri Oct 26, 2012 7:56 am

One way would be to put all your 32-bit destinations in a table and use an indirect far call (12 bytes per call). Another would be to write a stub function for every segment that you are going to call into, like this (10 bytes per call):

Code: Select all

; Register version
CallSegmentXXXX:
push rax
mov word ptr [rsp+4],XXXX
mov dword ptr [rsp+12],cs
db 40h ; necessary on Intel?
retf
; Or:
push [rsp]
mov [rsp+8],cs
push XXXX
push rax
db 48h
retf

; Stack version
CallSegmentXXXX:
push qword ptr [rsp]
push qword ptr [rsp+16]
pop qword ptr [rsp+16]
pop qword ptr [rsp+16]
mov word ptr [rsp+4],XXXX
mov dword ptr [rsp+12],cs
db 40h
retf
; Or:
push XXXX
push [rsp+16]
mov [rsp+24],cs
db 48h
retf

Owen · Post by **Owen** » Fri Oct 26, 2012 9:48 am

bluemoon wrote:RETF do not take REX.w prefix and therefore limited to current operand size for the offset; and I suppose you usually want to switch to a segment with different size; therefore retf is not sufficient.

RETF doesn't need a REX.W prefix. It's a stack op; unless an explicit prefix is used (primarily 66h on PUSH to form PUSH WORD, which will actually subtract 2 from rSP, unlike PUSH BYTE and other similar mnemonics...) it uses the current stack operand size (64-bit)

RETF and IRET are the only two instructions which can do full seg16:mem64 addressing.

Gigasoft wrote:One way would be to put all your 32-bit destinations in a table and use an indirect far call (12 bytes per call). Another would be to write a stub function for every segment that you are going to call into, like this (10 bytes per call):
Code: Select all
; Register version
CallSegmentXXXX:
push rax
mov word ptr [rsp+4],XXXX
mov dword ptr [rsp+12],cs
retf

; Stack version
CallSegmentXXXX:
push qword ptr [rsp]
push qword ptr [rsp+16]
pop qword ptr [rsp+16]
pop qword ptr [rsp+16]
mov word ptr [rsp+4],XXXX
mov dword ptr [rsp+12],cs
retf

Far calls are not an option; they do not take 64-bit offsets (like far jumps)

Edit: I should add a comment on the following:

rdos wrote:It's bad design that jmp seg:offset in not supported.

Not from the point of view of the designers of long mode. Segmentation is very much a vestigial and deprecated feature; many of the single byte were removed in order to free up single byte opcodes for other uses. For example, the LDS/LES opcodes have now been reused for the AVX VEX prefix

Gigasoft · Post by **Gigasoft** » Fri Oct 26, 2012 1:29 pm

Owen wrote:Far calls are not an option; they do not take 64-bit offsets (like far jumps)

But the offset is 32 bit. He's calling 32 bit functions.

RETF doesn't need a REX.W prefix. It's a stack op; unless an explicit prefix is used (primarily 66h on PUSH to form PUSH WORD, which will actually subtract 2 from rSP, unlike PUSH BYTE and other similar mnemonics...) it uses the current stack operand size (64-bit)

According to the AMD manual, the default operand size for RETF is 32 bits. The Intel manual, on the other hand, states that "In 64-bit mode, the default operation size of this instruction is the stack-address size, i.e. 64 bits.". If this is true, it's probably better to always use a REX prefix with RETF.

Owen · Post by **Owen** » Fri Oct 26, 2012 2:51 pm

Gigasoft wrote:According to the AMD manual, the default operand size for RETF is 32 bits. The Intel manual, on the other hand, states that "In 64-bit mode, the default operation size of this instruction is the stack-address size, i.e. 64 bits.". If this is true, it's probably better to always use a REX prefix with RETF.

I just checked AMD's Pseudocode, and you're right.

This is another example of the way in which Intel64 is a subtly broken copy of AMD64; though this issue is nowhere near as bad as the misimplementation of SYSCALL

rdos · Post by **rdos** » Fri Oct 26, 2012 4:52 pm

Gigasoft wrote:One way would be to put all your 32-bit destinations in a table and use an indirect far call (12 bytes per call). Another would be to write a stub function for every segment that you are going to call into, like this (10 bytes per call):

Exactly. They are already in a table, and if I cannot patch the direct call into 64-bit code, I can at least patch an indirect call to the gate table instead. That's almost as good. Although, I might need to load all segment registers with a base-0 descriptor as well, unless I can guarantee that the code already has such setup.

rdos · Post by **rdos** » Fri Oct 26, 2012 4:55 pm

Now I have a function for long mode that writes all the general registers, segment registers and flags to screen. Next, I'll link this code to exception handlers so I can avoid tripple faults and see what goes wrong as I experiment with long mode code.

rdos · Post by **rdos** » Sat Oct 27, 2012 2:29 am

Owen wrote: Far calls are not an option; they do not take 64-bit offsets (like far jumps)

Why would you want it to take 64-bit offsets? All interactions between long mode and compability mode need to be done below 4G.

rdos · Post by **rdos** » Sat Oct 27, 2012 2:36 am

The previous idea of locating the unity-mapped region at 0x3000 doesn't look like a good idea after all. This memory region needs to be mapped in all contexts, including V86 processes, and then it will conflict with V86 memory. This is because the scheduler might need to do a mode switch at any point, and I don't want an intermediate CR3 reload. Thus, the next best option is to place it just above V86 addressable memory at 0x110000.

I hope NASM can handle multiple orgs, as the startup-code will need to be 0-based while the other code needs to start at 0x110000.

Brendan · Post by **Brendan** » Sat Oct 27, 2012 4:01 am

Hi,

rdos wrote:I hope NASM can handle multiple orgs, as the startup-code will need to be 0-based while the other code needs to start at 0x110000.

It can. There's 3 basic options:

Tell NASM to generate an object file, and then use a linker. In this case you end up with whatever the object file format and linker you use supports.
Tell NASM to generate an flat binary and use "ORG". This will not work for your case.
Tell NASM to generate an flat binary and define multiple sections. This lets you define any number of sections, where each section can have any name, any attributes (with/without initialised data), be at any virtual address and appear in the file in any order (it's a bit like having a linker script in the assembly source code). The only real restriction here is that sections can't overlap in the file (for obvious reasons); however sections can overlap in virtual memory if you want. When using multiple sections like this, you shouldn't use "ORG" at all (the section's definition is used to determine the "ORG" of that section instead).

For an example of "multiple sections" (untested):

Code: Select all

	SECTION .header progbits start=0x00000000 vstart=0x00000000
	SECTION .text progbits follows=.header vfollows=.header
	SECTION .data progbits follows=.text vfollows=.text
	SECTION .bss nobits vfollows=.data

	SECTION .text64 progbits follows=.data vstart=0x00110000
	SECTION .data64 progbits follows=.text64 vfollows=.text64
	SECTION .bss64 nobits vfollows=.data64

Note here that "start" and "follows" refers to where the section is placed in the file (and has nothing to do with virtual addresses), and "vstart" and "vfollows" refers to where the section will be loaded into the virtual address space (and has nothing to do with where the section will end up in the file).

Of course actually loading the sections into the correct virtual addresses is your problem. I'd be tempted to place a small header at the start of the file which contains any information your loader needs to work out which parts of the file should end up at which virtual addresses.

Cheers,

Brendan

rdos · Post by **rdos** » Sat Oct 27, 2012 6:47 am

Brendan wrote: Tell NASM to generate an flat binary and use "ORG". This will not work for your case.

Why not? As long as NASM can use one org type in the boot-part, and another in the operational part, I have a way to make it work (by copying the code to the unity-mapped section).

I'd rather not use multiple sections and a linker if I can get away without it.

BTW, a possibly way is to waste 1Mb of linear address space, and put paddings in the file, but I'd rather not use that method (I used it when the code was copied to 0x3000, but then the waste was minimal).

I could also code all the offsets in the boot-part manually by putting the opcodes directly in the image, and org everything to 0x110000. but that looks a little ugly so I'd rather not.

OSDev.org

Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS