OSDev.org

Posted: **Sat Apr 14, 2012 3:21 pm**

By checking all the optimizations available in Watcom, I can get my AMD Phenom II, at 2.7GHz, to do 230 million calls per second. However, even this version pushes ebx, ecx, edx, esi, edi and ebp in the prolog, and pops them in the epilog in NullProc. Short of inlining the function, I'm not sure if it is possible to achieve better performance with Watcom.

Posted: **Sat Apr 14, 2012 3:54 pm**

Got the test to run on the Vaio portable (which doesn't have a functional keyboard driver).

Intel Core Duo (2.13GHz):
near: 35.4 million calls per second
gate: 6.7 million calls per second

I'm quite thrilled that all of the boards I've tested this far (apart from the Vaio and i5 board), actually have a functional keyboard + video driver (when configured with PIC or ACPI/APIC) so that I can run the command line interpreter in RDOS. It seem like I've made some real progress in compability the last couple of years.

Posted: **Sat Apr 14, 2012 11:33 pm**

Hi,

rdos wrote:Having read-up on sysenter (and syscall), I've more or less concluded that my original statement was true. They would not be of any kind of benefit.

List of the problems:
1. SS:ESP is loaded with a fixed value regardless thread (how about core?), and cli is executed. This means the kernel ss:esp must be manually loaded from the TSS since I don't want syscalls to run to completion with interrupts disabled (the same kernel stack is used for all threads).

Not enough detail to understand how this works or why it's a problem. Can probably just use SS.base = 0 for the kernel's stack and set SYSENTER_ESP_MSR during task switches. Kernel's SYSENTER entry point can also do "STI".

rdos wrote:2. Upon return, CS and SS are loaded with a zero base. This would only work in the case when the caller is a flat application (the usual case), and when the application has a zero base (mostly also the usual case)

OK. This means only applications that have a zero base can use SYSENTER; which is fine (especially if most applications have a zero base).

rdos wrote:3. CS is loaded with a zero base. This basically means that ALL entrypoints must do a far call, since neither kernel nor any device-driver executes in a code segment with a zero base. Additionally, limit checking in the code segment of the syscalls in question would be disabled unless a far call is executed.

No. It just means you need to change your linker script so that the kernel has a zero base (e.g. kernel might start at offset 0xC0000000 in its segment). Your kernel shouldn't need limit checks (but if it's full of bugs then maybe you could do something like "#ifdef DEBUGGING; jmp far").

rdos wrote:4. ECX and EDX is used in the interface. This means that EAX (function number), ECX and EDX must be saved and restored on the application stack. Creates more overhead that is not necesary otherwise.

ECX and EDX only need to be saved and restored by the caller if the caller cares what they currently contain. This isn't any different to a normal function call (with C calling conventions) because ECX and EDX aren't preserved for normal function calls either.

Some of the kernel API's functions might need to be modified so they don't expect arguments in ECX or EDX or return values in ECX or EDX. This means you'd end up with the new kernel interface that doesn't use these registers (for a fast "normal case") and an old/legacy interface for backward compatibility that may be slightly slower due to the need to move values from ECX and EDX into other registers before calling the normal kernel functions (unless you duplicate effected kernel functions to avoid making the irrelevant legacy API slower - e.g. one version that doesn't use ECX or EDX and the old version that does).

It seems to me that it will take a little work to get most of the performance benefits of SYSENTER/SYSCALL working (and you'll probably never get the full benefit due to flaws in the initial design); but none of the problems above prevent SYSENTER/SYSCALL from being used to get most of the performance benefits.

Of course I still say RDOS should have been redesigned before SMP was added; and the more I hear about RDOS the more I think it needs a clean redesign (rather than simply polishing the existing code). Maybe one day you can start a nice clean 64-bit version of it, and eventually escape from all the bad design decisions that way.

Cheers,

Brendan

Posted: **Sun Apr 15, 2012 3:17 am**

Brendan wrote:Not enough detail to understand how this works or why it's a problem. Can probably just use SS.base = 0 for the kernel's stack and set SYSENTER_ESP_MSR during task switches. Kernel's SYSENTER entry point can also do "STI".

The kernel or device-drivers do not expect to see a stack with base 0, nor do I want to have a check if SEP is supported in my task-switch code, followed by loading SYSENTER_ESP_MSR, as that would affect performance on older processors. The kernel rather expects that the kernel stack is a selector, and that the stack is full when SP becomes 0. No way that I will change this, as it would disable stack-checks in kernel.

I haven't looked, but in order to work at all, I suppose that the MSRs must be per core, right?

Brendan wrote:No. It just means you need to change your linker script so that the kernel has a zero base (e.g. kernel might start at offset 0xC0000000 in its segment). Your kernel shouldn't need limit checks (but if it's full of bugs then maybe you could do something like "#ifdef DEBUGGING; jmp far").

Not so. There is no linker script that changes kernel offset. Kernel offset is always 0, and it's exact physical location is decided by the boot-loader. There is a constant linear address that the kernel code selector is mapped to after paging is enabled, but that address cannot be 0, as that is the IVT of V86 mode.

Additionally, the boot-loader sets-up the kernel as a 16-bit segment, not a 32-bit segment, which also means there must be a far-jump to kernel in order to execute the code with the correct bitness. Replacing the boot-loader in already running installations is not something I'm likely to do.

Brendan wrote:ECX and EDX only need to be saved and restored by the caller if the caller cares what they currently contain. This isn't any different to a normal function call (with C calling conventions) because ECX and EDX aren't preserved for normal function calls either.

RDOS doesn't use C calling convention. It uses register calling conventions, and many syscalls use ECX and EDX.

Brendan wrote:Some of the kernel API's functions might need to be modified so they don't expect arguments in ECX or EDX or return values in ECX or EDX. This means you'd end up with the new kernel interface that doesn't use these registers (for a fast "normal case") and an old/legacy interface for backward compatibility that may be slightly slower due to the need to move values from ECX and EDX into other registers before calling the normal kernel functions (unless you duplicate effected kernel functions to avoid making the irrelevant legacy API slower - e.g. one version that doesn't use ECX or EDX and the old version that does).

Not a chance. Changing the syscall interface for close to 500 syscalls just to support SYSENTER is pretty much out of the question. The only possible solution here is to push EAX, ECX and EDX on the application stack, then load the gate number into EAX and do a syscall. Another constraint is that the code in application space should be less than 10 bytes, otherwise the design would still affect performance when SYSENTER is not available. Pushing and poping EAX, ECX and EDX is 6 bytes, the SYSENTER instruction is 2 bytes, and loading function number is 4 (provided a 16-bit value is loaded) or 5 (if a 32-bit value is loaded). That's 12 or 13 bytes, which is 2-3 bytes to much.

Brendan wrote:It seems to me that it will take a little work to get most of the performance benefits of SYSENTER/SYSCALL working (and you'll probably never get the full benefit due to flaws in the initial design); but none of the problems above prevent SYSENTER/SYSCALL from being used to get most of the performance benefits.

I have a different idea, that might also have some other benefits. The SYSENTER will need to always load SS:ESP for the current thread, and it will need to do a far jump to the entery-point. SYSLEAVE with not need to, as it has it's correct destination. In order to get good performance with SYSLEAVE the code to leave the syscall must be mapped into every device-driver, and every device-driver needs to have a near-version of the syscalls (many already have due to multi-bitness support). An additional benefit of this design is that self-reference (a device-driver / kernel) that uses it's own syscall can be replaced with a near-call / near-return.

Brendan wrote:Of course I still say RDOS should have been redesigned before SMP was added; and the more I hear about RDOS the more I think it needs a clean redesign (rather than simply polishing the existing code). Maybe one day you can start a nice clean 64-bit version of it, and eventually escape from all the bad design decisions that way.

That won't happen. I won't start any new OS-projects. I have other interests as well that I will persue if RDOS no longer is interesting to maintain.

Posted: **Sun Apr 15, 2012 12:42 pm**

Now I also have some indication what the difference between using a zero base, and a non-zero base for the flat selector the application uses is. When I change the configuration for the 6-core AMD from non-zero base to zero base, number of near calls per second increases from 44.7 to 46.6 (a 4% increase). This is not a lot, but it might be more for a more ordinary application that not only does dummy near-calls. On Intel Atom, the difference is much smaller (much less than 1%, possible even zero).

Posted: **Mon Apr 16, 2012 6:18 am**

Hi,

rdos wrote:
Brendan wrote:Not enough detail to understand how this works or why it's a problem. Can probably just use SS.base = 0 for the kernel's stack and set SYSENTER_ESP_MSR during task switches. Kernel's SYSENTER entry point can also do "STI".
The kernel or device-drivers do not expect to see a stack with base 0, nor do I want to have a check if SEP is supported in my task-switch code, followed by loading SYSENTER_ESP_MSR, as that would affect performance on older processors. The kernel rather expects that the kernel stack is a selector, and that the stack is full when SP becomes 0. No way that I will change this, as it would disable stack-checks in kernel.

I'm going to keep a running count - I'll explain it below.

1) If device drivers have some strange dependency on the kernel stack's location, then device drivers are broken.

2) There's many differences between CPUs that cause "if(feature_supported) then". There's 2 ways to avoid the branching in critical places - conditional code that enables/disables things at compile time, and duplicating code (e.g. 8 different pieces of task switch code; where the kernel uses "call [address_of_task_switch_code]" or a function pointer, so all the branching only happens once when you decide which version of the code to use and not each time the code is run). If your kernel doesn't already do this (e.g. testing if FXSAVE should be used, if AVX is present, if there's multiple CPUs, etc), then your kernel is broken.

3) Testing if ESP equals 0 to determine if the kernel stack is full is stupid. Testing if ESP is less than some value to determine if the kernel stack is "almost full" can make sense. Magic numbers should be avoided though so a sane developer would do something like "if(esp < STACK_FULL_LEVEL)". If you don't do this and can't easily change the "STACK_FULL_LEVEL" to something like "#define STACK_FULL_LEVEL (current_task_thing->stack_full_level)" or anything else (for any reason) then your kernel is broken.

4) If your kernel is so broken that it actually needs stack limits checks in release versions (e.g. rather than just in debug builds) then your kernel is broken.

rdos wrote:I haven't looked, but in order to work at all, I suppose that the MSRs must be per core, right?

Yes.

rdos wrote:
Brendan wrote:No. It just means you need to change your linker script so that the kernel has a zero base (e.g. kernel might start at offset 0xC0000000 in its segment). Your kernel shouldn't need limit checks (but if it's full of bugs then maybe you could do something like "#ifdef DEBUGGING; jmp far").
Not so. There is no linker script that changes kernel offset. Kernel offset is always 0, and it's exact physical location is decided by the boot-loader. There is a constant linear address that the kernel code selector is mapped to after paging is enabled, but that address cannot be 0, as that is the IVT of V86 mode.

5) I'd assume you aren't running applications that use SYSENTER during boot; and that you only run applications that use SYSENTER after paging is enabled. The part of your kernel that's used after paging is enabled probably uses something like "CS.base = 0x12345678, offset in segment >= 0". If you can't easily change this to "CS.base = 0, offset in segment >= 0x12345678" then your kernel is broken. Notice that even though the segment base is zero the kernel won't be using any area of the virtual address space reserved for virtual8086.

rdos wrote:Additionally, the boot-loader sets-up the kernel as a 16-bit segment, not a 32-bit segment, which also means there must be a far-jump to kernel in order to execute the code with the correct bitness. Replacing the boot-loader in already running installations is not something I'm likely to do.

6) If you can't change a far jump in the kernel (so it jumps to a different address when switching from 16-bit to 32-bit) then your kernel is broken.

rdos wrote:
Brendan wrote:ECX and EDX only need to be saved and restored by the caller if the caller cares what they currently contain. This isn't any different to a normal function call (with C calling conventions) because ECX and EDX aren't preserved for normal function calls either.
RDOS doesn't use C calling convention. It uses register calling conventions, and many syscalls use ECX and EDX.

I didn't say RDOS uses the C calling convention, and only said that if functions clobber ECX and EDX then it'd be no different to anything that does use the C calling convention. Basically, clobbering ECX and EDX is common practice that is widely used and widely accepted (which is probably why both Intel and AMD chose those registers, and why I think you're whining about the overhead of clobbered registers for no valid reason).

rdos wrote:
Brendan wrote:Some of the kernel API's functions might need to be modified so they don't expect arguments in ECX or EDX or return values in ECX or EDX. This means you'd end up with the new kernel interface that doesn't use these registers (for a fast "normal case") and an old/legacy interface for backward compatibility that may be slightly slower due to the need to move values from ECX and EDX into other registers before calling the normal kernel functions (unless you duplicate effected kernel functions to avoid making the irrelevant legacy API slower - e.g. one version that doesn't use ECX or EDX and the old version that does).
Not a chance. Changing the syscall interface for close to 500 syscalls just to support SYSENTER is pretty much out of the question. The only possible solution here is to push EAX, ECX and EDX on the application stack, then load the gate number into EAX and do a syscall. Another constraint is that the code in application space should be less than 10 bytes, otherwise the design would still affect performance when SYSENTER is not available. Pushing and poping EAX, ECX and EDX is 6 bytes, the SYSENTER instruction is 2 bytes, and loading function number is 4 (provided a 16-bit value is loaded) or 5 (if a 32-bit value is loaded). That's 12 or 13 bytes, which is 2-3 bytes to much.

If new applications are compiled to use SYSENTER directly (without the hideous "applications attempt to call the kernel and generate an exception and the exception handler patches the caller" mess) then your hideous "applications attempt to call the kernel and generate an exception and the exception handler patches the caller" mess is irrelevant for SYSENTER.

If new applications are compiled to use SYSENTER directly, then even if the CPU doesn't support SYSENTER and you have to emulate it, it'd probably still be faster than your hideous "applications attempt to call the kernel and generate an exception and the exception handler patches the caller" mess (especially for the first few times a call is made - modern CPUs don't handle self-modifying code well).

You also don't need to add a new syscall interface for close to 500 syscalls all at the same time. You could start with one kernel API function, then add another one next week, then add all the rest eventually. Also, based on everything I've heard about RDOS so far, I'd also assume that 99% of the existing kernel API functions are badly designed mistakes; and creating a new/alternative syscall interface would give you a chance to fix all the problems with the existing syscall interface without breaking compatibility (old software can still use the old syscall interface, while new software moves to the new syscall interface).

Also, the new syscall interface wouldn't (and shouldn't) be limited to SYSENTER only. The normal way that sane people do it is to have "eax = function number" and a call table; where the kernel's SYSENTER handler does something like "call [functionTable+eax*4]" then SYSEXIT, the kernel's SYSCALL handler does something like "call [functionTable+eax*4]" then SYSRET, the kernel's software interrupt handler does something like "call [functionTable+eax*4]" then IRET, etc. This would also make it easier to support 64-bit applications one day.

rdos wrote:
Brendan wrote:Of course I still say RDOS should have been redesigned before SMP was added; and the more I hear about RDOS the more I think it needs a clean redesign (rather than simply polishing the existing code). Maybe one day you can start a nice clean 64-bit version of it, and eventually escape from all the bad design decisions that way.
That won't happen. I won't start any new OS-projects. I have other interests as well that I will persue if RDOS no longer is interesting to maintain.

How about just a new/alternative syscall interface then?

All of the things I numbered above (from 1 to 6) are design flaws that prevent you from moving forward. It's probably only a small amount of the total number of things that cripple your ability to maintain and improve the code, but all of these things should be fixed. For example, device drivers shouldn't have some strange dependency on the kernel stack's location (regardless of whether or not you ever decide to change the kernel's stack location), the kernel should have an effective way of dealing with CPU feature differences (regardless of whether you support SYSENTER or not), you should be doing something like "if(esp < STACK_FULL_LEVEL)" regardless of whether the "STACK_FULL_LEVEL" is ever changed or not, etc.

Cheers,

Brendan

Posted: **Mon Apr 16, 2012 8:34 am**

Brendan wrote:1) If device drivers have some strange dependency on the kernel stack's location, then device drivers are broken.

Not so. Hardware takes care of stack checking, as the stack-pointer will roll around when the stack is full, and then generate a stack fault exception. The device-drivers normally don't care about the stack. Some parts of kernel related to exception handling, and creating kernel threads do care about the stack, but otherwise it doesn't matter where it is located. OTOH, many things depend on ESP being less than 0x10000, so it is not possible to have a kernel stack with a 32-bit offset. In order to use a 32-bit stack pointer, quite a few things need to be changed, especially in the kernel. Additionally, I don't want to use software validation of the stack. I like hardware validation better.

Brendan wrote:2) There's many differences between CPUs that cause "if(feature_supported) then". There's 2 ways to avoid the branching in critical places - conditional code that enables/disables things at compile time, and duplicating code (e.g. 8 different pieces of task switch code; where the kernel uses "call [address_of_task_switch_code]" or a function pointer, so all the branching only happens once when you decide which version of the code to use and not each time the code is run). If your kernel doesn't already do this (e.g. testing if FXSAVE should be used, if AVX is present, if there's multiple CPUs, etc), then your kernel is broken.

I do this at several places. For instance, I use this method when selecting lock-types. When I run on a single core I don't need spinlocks, so those will become nops when there is only one core. Some other of these issues are solved by picking a specific driver (for instance PIC vs APIC). However, even adding these things at least requires a call / ret sequence, which slows things down. I certainly wouldn't want different compilations of the kernel.

Brendan wrote:4) If your kernel is so broken that it actually needs stack limits checks in release versions (e.g. rather than just in debug builds) then your kernel is broken.

I so no reason why to disable stack checking (done by hardware) in production release. Silent stack faults in kernel should not be allowed to be ignored in a production release. These are causes of panics and reboots in the production release.

Brendan wrote:5) I'd assume you aren't running applications that use SYSENTER during boot; and that you only run applications that use SYSENTER after paging is enabled. The part of your kernel that's used after paging is enabled probably uses something like "CS.base = 0x12345678, offset in segment >= 0". If you can't easily change this to "CS.base = 0, offset in segment >= 0x12345678" then your kernel is broken. Notice that even though the segment base is zero the kernel won't be using any area of the virtual address space reserved for virtual8086.

For the moment, the kernel is a 16-bit module, and that's the primary reason why it cannot have CS.base = 0. An additional reason is that it has CS.operandsize = 16, which also means that CS must be reloaded. However, it would be possible to recompile it as a 32-bit module, and possibly also give it a non-zero offset. I'm sure there would be some issues in such a move, but it would be possible. However, the primary reason why I might change it to a 32-bit module is because of size constraints, not because of possible SYSENTER support.

Brendan wrote:6) If you can't change a far jump in the kernel (so it jumps to a different address when switching from 16-bit to 32-bit) then your kernel is broken.

I suppose I could define a new level where the old 16-bit kernel just chains to a new 32-bit kernel.

Brendan wrote:If new applications are compiled to use SYSENTER directly (without the hideous "applications attempt to call the kernel and generate an exception and the exception handler patches the caller" mess) then your hideous "applications attempt to call the kernel and generate an exception and the exception handler patches the caller" mess is irrelevant for SYSENTER.

It's not a mess. It is called "binary compability"

But I suspect you've never heard about such a concept in the *nix world?

Brendan wrote:If new applications are compiled to use SYSENTER directly, then even if the CPU doesn't support SYSENTER and you have to emulate it, it'd probably still be faster than your hideous "applications attempt to call the kernel and generate an exception and the exception handler patches the caller" mess (especially for the first few times a call is made - modern CPUs don't handle self-modifying code well).

The patching procedure works on all processors I know about, and it is SMP safe. It also has no negative performance aspects as it is only done once for each occurance of a syscall.

Another way to stay within the current size-contraints of the syscall code might be this:

Code: Select all


; patched code

    push table_index                ; 5 bytes (table index << 4)
    call SYSENTER_PROC           ; 5 bytes (total of 10 bytes)

; at some global position (not in the executable)

SYSENTER_PROC   Proc near
    push eax
    push ecx
    push edx
    sysenter
sysenter_pos:
    pop edx
    pop ecx
    pop eax
    ret 4
SYSENTER_PROC  End

; in kernel

app_index      = 16
app_eax         =  8
app_ecx         = 4
app_edx         = 0

sysenter_entry:
    load_task_ss_esp                         ; load task  ss:esp in some way
    push ecx                                    ; put application stack on kernel stack
    cmp edx,OFFSET sysenter_pos      ; check that sysenter was used in a proper manner
    jnz sysenter_fail                          ; go if not
    mov eax,ds:[ecx].app_index          ; get table index from application stack (flat ss no longer present here, but ds is the flat selector 
                                                    ; of the application, and thus has the correct mapping)
    cmp eax,cs:sys_tab_size               ; check that index is reasonable
    jae sysenter_fail                           ; go if not
    push dword ptr cs:[eax].sys_tab
    push dword ptr cs:[eax+4].sys_tab    ; put destination on stack
    mov eax,ds:[ecx].app_eax
    mov edx,ds:[ecx].app_edx
    mov ecx,ds:[ecx].app_ecx
    retf32                                          ; jump to syscall handler

sysenter_fail:
    int 3

Brendan wrote:You also don't need to add a new syscall interface for close to 500 syscalls all at the same time. You could start with one kernel API function, then add another one next week, then add all the rest eventually. Also, based on everything I've heard about RDOS so far, I'd also assume that 99% of the existing kernel API functions are badly designed mistakes; and creating a new/alternative syscall interface would give you a chance to fix all the problems with the existing syscall interface without breaking compatibility (old software can still use the old syscall interface, while new software moves to the new syscall interface).

I'm quite content with the current syscall interface.

Brendan wrote:Also, the new syscall interface wouldn't (and shouldn't) be limited to SYSENTER only. The normal way that sane people do it is to have "eax = function number" and a call table; where the kernel's SYSENTER handler does something like "call [functionTable+eax*4]" then SYSEXIT, the kernel's SYSCALL handler does something like "call [functionTable+eax*4]" then SYSRET, the kernel's software interrupt handler does something like "call [functionTable+eax*4]" then IRET, etc. This would also make it easier to support 64-bit applications one day.

This is the really ancient way of doing syscalls that goes back to DOS and other terrible OSes. I left this way of doing it (along with IOCTL) 20 years ago in favor of my current interface, and I'll never go back to the ancient mess again. My interface garantees binary compability, as existing syscalls may only be changed in ways that doesn't break backward compability. At the server end, device-drivers (or kernel) registers the entry-points to the kernel, and then the patcher creates the call-gates "on the fly" and patches them into user-space. It could also patch the above sysenter code "on the fly" when sysenter is supported, and the device-driver can handle it. Additionally, my syscall interface can gracefully handled unimplemented syscalls, by patching a default-handler that just returns with CY. All syscalls use CY to indicate success / failure.

In fact, your above table is easy to implement by using the already present gate number cache. However, I would more likely implement it in a similar way as the gate descriptors. When the patcher is invoked, it would check if the current syscall is already in the table, and if it is, it would generate the code to push the existing index on the stack. If not, it would add a new entry to the table, and push the index to that entry. That way, the table would be compact and only contain references to used syscalls.

Posted: **Mon Apr 16, 2012 8:46 am**

If you decided the patch the syscall with function call, why not do it a load time (at load-time linking stage, with syscalls as kernel exported symbols)?
This can avoid the self-modifying scenario that modern processor may not handle well, and possible to provide or authorize different versions of API to applications.

Posted: **Mon Apr 16, 2012 9:04 am**

bluemoon wrote:If you decided the patch the syscall with function call, why not do it a load time (at load-time linking stage, with syscalls as kernel exported symbols)?
This can avoid the self-modifying scenario that modern processor may not handle well, and possible to provide or authorize different versions of API to applications.

The current reason why it is not done at load-time is because GDT descriptors is a limited resource, so I only want to allocate call gate descriptors for used syscalls. That's why this is done at first access rather than at load-time. Another reason is that the same syscall code is also used in device-drivers and kernel, and when it is used there it would be patched to a far call or a near call, not to a call gate or sysenter interface. Also, usage in kernel / device-driver would not allocate a GDT call gate descriptor.

For the proposed sysenter interface, this is convinient because then the dispatch table would only contain used syscalls, and no syscalls that haven't been patched can be accessed.

It is also convinient to patch the code as the call gate interface is much faster than going through a protection fault on processors that don't support sysenter, or might handle the call gate interface faster than the sysenter interface.

Posted: **Mon Apr 16, 2012 12:57 pm**

I need to reconsider the patch code. Apparently, the patched code can only be 7 bytes (+ an initial nop).

Alternative approach (which actually is faster):

Code: Select all


; patched code

    nop                                   ; lead-byte (1 byte)
    call gate_entry                   ; a near call to a dynamically created user-level gate entry (5 bytes)
    nop
    nop

; The dynamic entry. These are placed in application space with read only access.

gate_nr        DD ?

gate_entry Proc near
    push eax
    push ecx
    push edx
    sysenter
    pop edx
    pop ecx
    pop eax
    ret
gate_entry Endp

; in kernel

gate_nr         = -9

app_eax         = 8
app_ecx         = 4
app_edx         = 0

; Each core will setup it's own sysenter handler. This can be used to define the processor block linear address

proc_linear     DD ?
systab_linear  DD ?

sysenter_entry:
    mov eax,cs:proc_linear               ; get current core description block
    mov ss,cs:[eax].ps_ss0              ; get ss0 of current thread
    mov esp,stack0_size                    ; load top of stack
    sti
    push edx                                    ; push return-point (application EIP)
    push ecx                                    ; put application ESP on kernel stack

IFDEF DEBUG
    cmp edx,share_app_start                   ; check that sysenter was used in a proper manner
    jbe sysenter_fail                          ; go if not
ENDIF

    mov eax,ds:[edx].gate_nr              ; get gate # from just before the current procedure
    add eax,cs:systab_linear
    push dword ptr cs:[eax].ret    ; push sysleave offset 
    push dword ptr cs:[eax].sel    ; push handler selector
    push dword ptr cs:[eax].offset  ; push handler offset
    mov eax,ds:[ecx].app_eax
    mov edx,ds:[ecx].app_edx
    mov ecx,ds:[ecx].app_ecx
    retf32                                          ; jump to syscall handler

sysenter_fail:
    int 3

; in a device-driver module

dummy_gate  Proc near
    ret
dummy_gate  Endp

; it is assumed that DS contains a writable flat selector. This is ok for all syscalls except for two

; exit procedure for 32-bit code:

sysleave_entry32:
    push ecx
    mov ecx,ss:[esp+4]                   ; get application ESP
    mov ds:[ecx].app_edx,edx          ; return registers to caller
    mov ds:[ecx].app_eax,eax
    pop ds:[ecx].app_ecx
    pop edx                                    ; pop application EIP
    sysleave

; exit procedure for 16-bit code:

sysleave_entry16:
    push ecx
    mov ecx,ss:[esp+6]                   ; get application ESP
    mov ds:[ecx].app_edx,edx          ; return registers to caller
    mov ds:[ecx].app_eax,eax
    pop ds:[ecx].app_ecx
    pop dx                                      ; pop unused high part of entry-point EIP
    pop edx                                    ; pop application EIP
    sysleave

The new approach takes some ideas from the dynamic interrupt stubs I described elsewhere. Instead of loading a function number in EAX, this solution creates the dynamic gate entry in user-accessible address space each time a new gate is needed, and preceeds it with the gate number.

The sysentry stubs are also created dynamically, with one stub per core. This allows preceeding this function with a fast-path to the current processor control block, which would contain the current thread's kernel SS selector.

Posted: **Tue Apr 17, 2012 9:07 am**

Hi,

rdos wrote:OTOH, many things depend on ESP being less than 0x10000, so it is not possible to have a kernel stack with a 32-bit offset.

rdos wrote:For the moment, the kernel is a 16-bit module, and that's the primary reason why it cannot have CS.base = 0.

Holy mother of McGyver, Batman!

It's like a monster, that was created decades ago by stitching together the flesh of dead corpses (80286 protected mode) and then brought to life using arcane means, that roams the earth hoping for an end to its stumbling uncoordinated misery, while people expect it to work part time as an elegant ballet dancer.

rdos wrote:I suppose I could define a new level where the old 16-bit kernel just chains to a new 32-bit kernel.

That would help, and may help a lot; but is it enough? You need to sit down for about a month with a pen & paper and design (not implement) a new, clean, elegant, modern kernel; then create a "strategic plan" that describes how you're going to make a (gradual) transition from the old stuff to the new stuff without breaking backward compatibility too much too quickly (assuming backward compatibility matters for embedded things, which to be honest I doubt).

Think of it like Microsoft's transition from DOS to modern Windows; or Apple's transitions Motorola/Mac to PowerPC to 80x86/OSX - it might years to complete your strategic plan, but it's far better in the long run than not having any plan (and being tied to Frankenstein's monster forever).

rdos wrote:It's not a mess. It is called "binary compability"

But I suspect you've never heard about such a concept in the *nix world?

It's a mess; and there is no binary compatibility (it does not allow applications/processes to be run on other OSs).

In the *nix world they have system libraries that are dynamically linked to processes, so that processes don't have a "hard-coded" dependence on the kernel API at all. The dynamic linking is done during process loading (no "modify the code with exceptions while it's running"); and it works so well that (at least in theory) binaries can be run on radically different OSs with extremely different kernel APIs.

rdos wrote:
Brendan wrote:Also, the new syscall interface wouldn't (and shouldn't) be limited to SYSENTER only. The normal way that sane people do it is to have "eax = function number" and a call table; where the kernel's SYSENTER handler does something like "call [functionTable+eax*4]" then SYSEXIT, the kernel's SYSCALL handler does something like "call [functionTable+eax*4]" then SYSRET, the kernel's software interrupt handler does something like "call [functionTable+eax*4]" then IRET, etc. This would also make it easier to support 64-bit applications one day.
This is the really ancient way of doing syscalls that goes back to DOS and other terrible OSes. I left this way of doing it (along with IOCTL) 20 years ago in favor of my current interface, and I'll never go back to the ancient mess again. My interface garantees binary compability, as existing syscalls may only be changed in ways that doesn't break backward compability.

Your interface only guarantees limited backward compatibility in the same way that being able to add new function numbers would, except that it's more limited. For example, someone recently said that you couldn't easily support things like SYSENTER because it'd break compatibility, because the "limited backward compatibility" is too limiting.

rdos wrote:All syscalls use CY to indicate success / failure.

And that's stupid too.

Did the function fail because the file isn't present, or because of file permissions, or because the kernel ran out of memory, or because I've reached some sort of "maximum file handles per process" quota, or for some other reason? I don't know, it doesn't tell me, it just returns a "success/failure" boolean. I guess my application will only be able to show the user an "There was an error" dialog box instead of telling them what went wrong or giving them some clue what they could do to fix the problem.

Cheers,

Brendan

Posted: **Tue Apr 17, 2012 9:32 am**

Brendan wrote:Holy mother of McGyver, Batman!

It's like a monster, that was created decades ago by stitching together the flesh of dead corpses (80286 protected mode) and then brought to life using arcane means, that roams the earth hoping for an end to its stumbling uncoordinated misery, while people expect it to work part time as an elegant ballet dancer.

Bitness of segments has nothing to do with the 80286. RDOS was created from start to support only 386+ processors, and so the decision to use 16-bit default segment attributes had nothing to do with 286. It was a design decision that made sense since all drivers were written in assembly, and none of them had problems with the 64k limit. Today the choice of bitness is made in the IDE, and really makes little difference. Some modules (like the one's containing C-code) have 32-bit code segments, while few written in assembly have. The new sysenter device uses a 32-bit segment, even if it is in assembly, simply because it makes it easier to create the stubs.

Brendan wrote:Your interface only guarantees limited backward compatibility in the same way that being able to add new function numbers would, except that it's more limited. For example, someone recently said that you couldn't easily support things like SYSENTER because it'd break compatibility, because the "limited backward compatibility" is too limiting.

Not so. I expect to be able to do new performance tests with SYSENTER soon, probably tomorrow.

Brendan wrote:Did the function fail because the file isn't present, or because of file permissions, or because the kernel ran out of memory, or because I've reached some sort of "maximum file handles per process" quota, or for some other reason? I don't know, it doesn't tell me, it just returns a "success/failure" boolean. I guess my application will only be able to show the user an "There was an error" dialog box instead of telling them what went wrong or giving them some clue what they could do to fix the problem.

To put up error dialog boxes in an embedded application is real stupid. I've seen this in real-world applications running Windows. Who do they think will be able to study the error, and click this thinge away? I'm sure it won't be the guy wanting to buy petrol at least!

Posted: **Tue Apr 17, 2012 9:52 am**

rdos wrote:
Brendan wrote:Did the function fail because the file isn't present, or because of file permissions, or because the kernel ran out of memory, or because I've reached some sort of "maximum file handles per process" quota, or for some other reason? I don't know, it doesn't tell me, it just returns a "success/failure" boolean. I guess my application will only be able to show the user an "There was an error" dialog box instead of telling them what went wrong or giving them some clue what they could do to fix the problem.
To put up error dialog boxes in an embedded application is real stupid. I've seen this in real-world applications running Windows. Who do they think will be able to study the error, and click this thinge away? I'm sure it won't be the guy wanting to buy petrol at least!

OK, so your syscall failed, you bailed out, and the box said "!!! ERROR !!!". Same happens for the next customer, and the one after that. Petrol selling at that particular gas station comes to a complete stop. Your support hotline is called, a support technician is deployed, and I am sure he'd like the little black box to tell him what actually happened, without whipping out the command-line debugger. Actually, I am sure the guy at the cashier would like to tell his angry customers some inkling of what went wrong, beyond "the box says !!! ERROR !!!".

Yes, I too have smirked at "Out of memory" requesters popping up in front of video advertising screens, or "Error 6918, technician underway" on the bank's self-service terminal. But that's way better than "Failed, full stop."

Posted: **Tue Apr 17, 2012 3:02 pm**

Solar wrote:OK, so your syscall failed, you bailed out, and the box said "!!! ERROR !!!". Same happens for the next customer, and the one after that. Petrol selling at that particular gas station comes to a complete stop. Your support hotline is called, a support technician is deployed, and I am sure he'd like the little black box to tell him what actually happened, without whipping out the command-line debugger. Actually, I am sure the guy at the cashier would like to tell his angry customers some inkling of what went wrong, beyond "the box says !!! ERROR !!!".

That's what you have log files for. You simply don't present message errors to end-users of embedded systems, but you log them to your log-file. Then the technicial downloads the log-file, and can see what went wrong. I can even see exactly what the customer does, and all the status of different parts of the terminal from work. No need to go there in order to find out what is wrong.

Presenting errors to end-users of embedded systems is just plain stupid.

Besides, you would not log filesystem error codes in a log, as that would not make any sense to the typical support guy. You would not log error codes from GUI APIs to the, log nor would you want to log that an invalid handle is passed to a file-IO syscall. The trick is to provide useful information, and not to log things that only programmers understand. If a a GUI call fails, or if a file-IO operation fails, you would simply gracefully recover, or reboot the system. If you run out of memory you might log that for your own sake, and then automatically reboot. You would not put up a message telling the end-user that you're out of memory. That's so extremely unprofessional.

Posted: **Tue Apr 17, 2012 3:07 pm**

I can even see exactly what the customer does

Including his credit/debit card number?

OSDev.org

Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best OS!

Re: Best processor for 32-bit OS

Re: Best processor for 32-bit OS

Re: Best processor for 32-bit OS

Re: Best processor for 32-bit OS

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Re: Best processor for 32-bit OS

Re: Best processor for 32-bit OS

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Re: Best processor for 32-bit OS

Re: Best processor for 32-bit OS

Re: Best processor for 32-bit OS

Re: Best processor for 32-bit OS

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best