Adding 64-bit support to RDOS

rdos · Post by **rdos** » Sat Oct 27, 2012 11:27 am

It seems like it's impossible to use multiple orgs, so that doesn't work.

Although, building on Brendan's suggestions, it is possible to do it with sections.

Section setup:

Code: Select all


; this is the 32-bit RDOS device header which is required to be first

section .header progbits start=0x00000000 vstart=0x00000000 align=1

hdr         dw 0x3252
cip         dd init
code_size   dd text_end - text_start + boot_end
code_sel    dw long_dev_code_sel
data_size   dd 0
data_sel    dw 0

; then a new section can defined with vstart at 0.

section .boot progbits follows=.header vstart=0x00000000 align=1

init:

...

boot_end:    
    
; and last the bulk of code is placed (can be at any position, but in this example at 0x3000)

section .text progbits follows=.boot vstart=0x00003000 align=1

text_start:

...

text_end:

Before jumping to the .text section, there is a need to move the code to the unity-mapped position indicated with vstart. Offset to boot_end indicates the code segment offset that corresponds to the first instruction in the .text section. The align=1 directive is need because pads would break the code.

Owen · Post by **Owen** » Sat Oct 27, 2012 11:36 am

rdos wrote:It seems like it's impossible to use multiple orgs, so that doesn't work.

ORG is considered a "convinience feature"; it wraps the builtin section support.

Given that I presume you're using WASM for your other code, I'm surprised you didn't move to JWASM for 64-bit support, since it's a fork enhanced with 64-bit support (among other features)

Watch out for the fact that the AMD64 calling convention is different (Whether you use the simpler Microsoft or better AMD/SystemV ABI)

rdos · Post by **rdos** » Sat Oct 27, 2012 11:53 am

Owen wrote: Given that I presume you're using WASM for your other code, I'm surprised you didn't move to JWASM for 64-bit support, since it's a fork enhanced with 64-bit support (among other features)

I thought about it, but since I was able to include relevant parts (syscall numbers), and there won't be any large bulk of 64-bit code in kernel (64-bit device drivers will need to use the microkernel approach), I thought it would be convinient. I don't think that WASM or JWASM can handle mixing 32-bit and 64-bit code, and the different offsets very well, so I think NASM is a good choice. For 64-bit userland and device-drivers I'd probably use GCC, not OpenWatcom nor NASM.

Owen wrote: Watch out for the fact that the AMD64 calling convention is different (Whether you use the simpler Microsoft or better AMD/SystemV ABI)

The 64-bit kernel will not contain any C code, and in userland I'll rely on standard combinations.

rdos · Post by **rdos** » Sat Oct 27, 2012 1:22 pm

Now I can place it any position without problem. Moving the code to its new position, and redefining the code selector to be 0-based didn't require a lot of code. I also setup the stack to be 0-based.

Edit: Updated logic so the device creates the unity-mapping at boot time at 0x110000, and makes it global. Also, now it is possible to use a thread instead for testing the 64-bit code. I had to change the load-point for GRUB to 0x120000, which was a lot of work since it was at a number of places, including the OW linker.

rdos · Post by **rdos** » Mon Oct 29, 2012 9:14 am

OK, so now there will be no more tripple faults from 64-bit mode, since now I have working (default) exception vectors that displays exception occurred and register state. There will be no need for using emulators either, as I can now relatively easy test on real hardware, including both Intel and AMD processors.

The next step is to try to solve the call gate issues (code patching for 32-bit code and some way to do syscalls in 64-bit code).

JamesM · Post by **JamesM** » Mon Oct 29, 2012 12:29 pm

While this ad-hoc blog post is indeed interesting, is this forum the correct place to post it?

Jezze · Post by **Jezze** » Mon Oct 29, 2012 1:00 pm

I dont mind as long as it is one thread, on topic and at least features some discussion.

rdos · Post by **rdos** » Tue Oct 30, 2012 1:21 am

JamesM wrote:While this ad-hoc blog post is indeed interesting, is this forum the correct place to post it?

Why not? Among other things, it provides a radically different approach to starting a new 64-bit environment compared to your own tutorial, that I think some people should be interested in. Instead of writing a large piece of code and hoping it would work (and posting it here when it doesn't), you start by providing exception handlers that can dump register state. I'm sure exception handlers are not even part of your tutorial. OK, so we have emulators, and many people here tell people "use an emulator", but many hardware platforms doesn't have emulators, so that approach is not effective there. I also started with OS development when there were no 386 emulators, and not even a decent compiler, so I'm used to environments without that kind of fancy stuff.

And to be honest, most of the OSes developped here are just clones of JamesM's tutorial, and how useful is that? How creative is it?

The typical "printf-debugging" method that is required when writing an OS without decent exception handlers is simply inefficient beyond words. I had to use the reboot/no-reboot method when setting up the 64-bit environment, but once that is done I certainly don't want to continue with it.

rdos · Post by **rdos** » Tue Oct 30, 2012 2:54 am

Gigasoft wrote:One way would be to put all your 32-bit destinations in a table and use an indirect far call (12 bytes per call). Another would be to write a stub function for every segment that you are going to call into, like this (10 bytes per call):

I have checked this now. The Intel manual states that 32-bit indirect calls are supported from 64-bit mode, while the AMD manual doesn't mention this possibility. Real tests on AMD reveals that AMD generates a protection fault (with 0 in error code) when using the 64-bit version of the indirect call instruction (0x48 0xFF xx). That probably means that AMD doesn't support this, and thus it is not possible to use calls to switch from 64-bit to 32-bit. Although, it might be possible to detect this situation and use calls on Intel and retfs on AMD, since calls are much more efficient. In effect, it means that 64-bit code wanting to do 32-bit syscalls must go through protection fault handler on AMD (coding the pushs and retf inline takes too much space). In fact, if I patch the code with the indirect call on all CPUs, and see a new protection fault with patched code, I'll do the call in protection fault handler instead.

Cognition · Post by **Cognition** » Tue Oct 30, 2012 3:16 am

Why not? Among other things, it provides a radically different approach to starting a new 64-bit environment compared to your own tutorial, that I think some people should be interested in. Instead of writing a large piece of code and hoping it would work (and posting it here when it doesn't), you start by providing exception handlers that can dump register state. I'm sure exception handlers are not even part of your tutorial. OK, so we have emulators, and many people here tell people "use an emulator", but many hardware platforms doesn't have emulators, so that approach is not effective there. I also started with OS development when there were no 386 emulators, and not even a decent compiler, so I'm used to environments without that kind of fancy stuff.

And to be honest, most of the OSes developped here are just clones of JamesM's tutorial, and how useful is that? How creative is it?

The typical "printf-debugging" method that is required when writing an OS without decent exception handlers is simply inefficient beyond words. I had to use the reboot/no-reboot method when setting up the 64-bit environment, but once that is done I certainly don't want to continue with it.

No offense, but providing a register/state dump when a fault is encountered is pretty standard behavior kind of the opposite of a "radically different approach". In fact it's actually suggested on the triple fault page in the wiki as well. I'm not exactly sure why you're ripping on JameM's tutorial here either as he simply asked if this was the proper forum for you to post what really amounts to a development diary. I'm not going to say there's no value in posting such a thing, but if your goal is really to offer an alternative to existing tutorials you might want to consider actually structuring things out and organizing them into something a little more coherent to an uninitiated reader.

rdos · Post by **rdos** » Tue Oct 30, 2012 3:26 am

Cognition wrote: No offense, but providing a register/state dump when a fault is encountered is pretty standard behavior kind of the opposite of a "radically different approach".

Not WHEN a fault occurs, but in order to CATCH faults. Typically, you also use int 3 to induce faults yourself in the code instead of using printf.

Cognition wrote: In fact it's actually suggested on the triple fault page in the wiki as well. I'm not exactly sure why you're ripping on JameM's tutorial here either as he simply asked if this was the proper forum for you to post what really amounts to a development diary.

Not at all. I mostly write about methods used, not the code itself which I leave to interesting parties to implement themselves.

Cognition wrote: I'm not going to say there's no value in posting such a thing, but if your goal is really to offer an alternative to existing tutorials you might want to consider actually structuring things out and organizing them into something a little more coherent to an uninitiated reader.

That's the whole point. I don't believe in tutorials or "OS for dummies". I believe in describing methods, and letting people comment on them instead. Anybody wanting to write an OS should be able to write the code themselves without tutorials, otherwise they don't have the required knowledge / competence.

Gigasoft · Post by **Gigasoft** » Tue Oct 30, 2012 4:29 am

rdos wrote:Real tests on AMD reveals that AMD generates a protection fault (with 0 in error code) when using the 64-bit version of the indirect call instruction (0x48 0xFF xx). That probably means that AMD doesn't support this, and thus it is not possible to use calls to switch from 64-bit to 32-bit.

That's because it's the wrong instruction. Remove the 0x48.

Owen · Post by **Owen** » Tue Oct 30, 2012 6:23 am

Gigasoft wrote:
rdos wrote:Real tests on AMD reveals that AMD generates a protection fault (with 0 in error code) when using the 64-bit version of the indirect call instruction (0x48 0xFF xx). That probably means that AMD doesn't support this, and thus it is not possible to use calls to switch from 64-bit to 32-bit.
That's because it's the wrong instruction. Remove the 0x48.

0xFF /3 encodes a 32-bit far call.There is no 64-bit far call.

rdos wrote:
Gigasoft wrote:One way would be to put all your 32-bit destinations in a table and use an indirect far call (12 bytes per call). Another would be to write a stub function for every segment that you are going to call into, like this (10 bytes per call):
I have checked this now. The Intel manual states that 32-bit indirect calls are supported from 64-bit mode, while the AMD manual doesn't mention this possibility. Real tests on AMD reveals that AMD generates a protection fault (with 0 in error code) when using the 64-bit version of the indirect call instruction (0x48 0xFF xx). That probably means that AMD doesn't support this, and thus it is not possible to use calls to switch from 64-bit to 32-bit. Although, it might be possible to detect this situation and use calls on Intel and retfs on AMD, since calls are much more efficient. In effect, it means that 64-bit code wanting to do 32-bit syscalls must go through protection fault handler on AMD (coding the pushs and retf inline takes too much space). In fact, if I patch the code with the indirect call on all CPUs, and see a new protection fault with patched code, I'll do the call in protection fault handler instead.

Since no existing 64-bit code exists, surely it is possible to make room for the retf sequence for 64-to-32 far calls?

Also, how is the 32-bit code returning to 64-bit mode? Obviously you can't just do a simple retf, since the 32-bit code can't do a 64-bit far return.

A possible method I can see of coding inline far calls:

Code: Select all

push cs
call stub
...

stub:
push destcs
push destip
retf

I wouldn't expect a significant speed difference from coding direct far calls, and I would expect it to be significantly faster than taking a trip through the GPF handler

That all said, I'd be tempted to just use a different inline sequence, something along the lines of

Code: Select all

farptr: 
  dd off
  dw seg

lea rdi, [rel farptr]
call far64to32stub
// Note if calling from >4GB away, must use an indirect call due to offset limitations

// <4GB
far64to32stub:
  call far [rdi]
  ret

since it means the 32-bit code needs no adjustment (it just restores CS/IP like normal and implicitly lands in 64-bit mode)

rdos · Post by **rdos** » Tue Oct 30, 2012 9:00 am

rdos wrote:
Gigasoft wrote:
rdos wrote:Real tests on AMD reveals that AMD generates a protection fault (with 0 in error code) when using the 64-bit version of the indirect call instruction (0x48 0xFF xx). That probably means that AMD doesn't support this, and thus it is not possible to use calls to switch from 64-bit to 32-bit.
That's because it's the wrong instruction. Remove the 0x48.
Yes, that is correct. It's nasm that adds the 0x48 to the instruction. Since I don't know how to remove this behavior, I coded it in hex instead, and then it actually works (at least on AMD, I'll check Intel later)!

I can now confirm that dual-core Intel Atom and AMD Athlon x6 also supports 32-bit far calls from 64-bit mode to 32-bit compability-mode, and that a 32-bit far return will get back to the correct position with the stack in the correct state. I'll probably code the syscalls directly by referring to the actual gate-entry which is stored in kernel. Then 64-bit code needs no patching of syscalls at all.

rdos · Post by **rdos** » Tue Oct 30, 2012 9:11 am

Owen wrote:
Gigasoft wrote:
rdos wrote:Real tests on AMD reveals that AMD generates a protection fault (with 0 in error code) when using the 64-bit version of the indirect call instruction (0x48 0xFF xx). That probably means that AMD doesn't support this, and thus it is not possible to use calls to switch from 64-bit to 32-bit.
That's because it's the wrong instruction. Remove the 0x48.
0xFF /3 encodes a 32-bit far call.There is no 64-bit far call.

That might have been the reason why AMD generates a GPF with REX.W. I just don't get why nasm adds REX.W when this is not supported!

Owen wrote:Also, how is the 32-bit code returning to 64-bit mode? Obviously you can't just do a simple retf, since the 32-bit code can't do a 64-bit far return.

Seems like the 32-bit far call generates the correct stack frame for the 32-bit retf, which it actually should if the function would be meaningful.

OSDev.org

Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS