Page 2 of 2
Re:Fastest possible kernel API call?
Posted: Mon Nov 28, 2005 7:14 am
by Brendan
Hi,
Candy wrote:This is why I consider 32-bit code deprecated for my OS. It complicates the code very much, whereas given the probable release date of my OS, it's quite unnecessary for the complexity to be added.
Yes, that's pretty arrogant and pessimistic about my OS, but if limiting the OS new technology makes it a lot cleaner and usable, and if I consider it very likely that my target user base will have that kind of computer, it's no problem.
I don't think it's arrogant or pessimistic, depending on your goals and how realistic you want to be. IMHO estimating that it'll take ten years for a "simple" 64 bit desktop OS would be very close to the truth (and there won't be too many 32 bit machines left by then).
For me, I don't want to exclude embedded systems or those "antiques" gathering dust. It'll be part of my eventual marketting strategy (e.g. a pool of old/recycled computers getting as much work done as a new computer for much less up front cost).
Candy wrote:AFAIK the only possible way this triple fault can be avoided (if allocation on demand is used for CPL=3 stacks) is by changing ESP so that it points to a page that is present. This must be done before the stack is used by the kernel.
Try a task gate or such for your page fault handler.
No thanks - I have too many page faults and task gates are slow (it'd be better for performance to use software interrupts or reload ESP if SYSCALL is used).
Cheers,
Brendan
Re:Fastest possible kernel API call?
Posted: Mon Nov 28, 2005 7:15 am
by tigujo
Maybe I should rephrase my initial question of 'the fastest possible kernel API call' to restrict it to 64bit mode - that's what I try to be after, and leave all the idiosyncrasis of 32bit mode behind.
Though it seems to be impossible even to get into this topic without being irritated by the historic burden of all the modes before... E.g. reading AMD64 manuals to get into 64bit mode technology leaves me regularily in the state of confusion having to deal with all the exceptions for no visible rule
What I'd like to see is a separate thread for 64bit mode only. Then I'd have more confidence to catch all those brilliant postings, knowing, they do apply to 64bit mode. What about that?
You see, as a newbie I have the right to behave like one...
Thanks a lot for all the postings, they encourage to dig further...
Re:Fastest possible kernel API call?
Posted: Mon Nov 28, 2005 7:49 am
by Brendan
Hi,
tigujo wrote:
Maybe I should rephrase my initial question of 'the fastest possible kernel API call' to restrict it to 64bit mode - that's what I try to be after, and leave all the idiosyncrasis of 32bit mode behind.
I'd assume that for 64 bit only, everyone would recommend SYSCALL (I do anyway
).
tigujo wrote:What I'd like to see is a separate thread for 64bit mode only. Then I'd have more confidence to catch all those brilliant postings, knowing, they do apply to 64bit mode. What about that?
You mean using "Fastest possible kernel API call for 64 bit?" in this thread's subject line? It can be changed if you're the original poster AFAIK...
Cheers,
Brendan
Re:Fastest possible kernel API call?
Posted: Mon Nov 28, 2005 8:14 am
by Candy
tigujo wrote:
Maybe I should rephrase my initial question of 'the fastest possible kernel API call' to restrict it to 64bit mode - that's what I try to be after, and leave all the idiosyncrasis of 32bit mode behind.
Finally, somebody sharing a part of my view
Oh by the way, use syscall in combination with the IST stack swap method for errors (exceptions) so that you separate anything happening within user context and stuff happening in kernel context (errors + interrupts + active drivers (drivers with their own threads/processes are handled in kernel context).
Re:Fastest possible kernel API call?
Posted: Mon Nov 28, 2005 8:56 am
by Pype.Clicker
quoted--
AFAIK the only possible way this triple fault can be avoided (if allocation on demand is used for CPL=3 stacks) is by changing ESP so that it points to a page that is present. This must be done before the stack is used by the kernel.
Try a task gate or such for your page fault handler.
--endofquote
No thanks - I have too many page faults and task gates are slow (it'd be better for performance to use software interrupts or reload ESP if SYSCALL is used).
i may have missed something important while reading SYSCALL/SYSRET (or was it SYSENTER/SYSLEAVE ?) docs back when i bought my AMD-K6-II, but as far as i can remind, they were only there for _trapping to system calls_ ... can we actually implement _exceptions handling_ (or hardware interrupts) using that technology ?
Re:Fastest possible kernel API call?
Posted: Mon Nov 28, 2005 9:22 am
by Brendan
Hi,
Pype.Clicker wrote:i may have missed something important while reading SYSCALL/SYSRET (or was it SYSENTER/SYSLEAVE ?) docs back when i bought my AMD-K6-II, but as far as i can remind, they were only there for _trapping to system calls_ ... can we actually implement _exceptions handling_ (or hardware interrupts) using that technology ?
No!
The problem is when you use SYSCALL for the kernel API when the CPL=3 stack points to a "not present" page (swapped to disk or allocation on demand), which causes the normal page fault handler (interrupt gate or trap gate) to triple fault because it doesn't do a stack switch (CPL=0 to CPL=0, no privilege transition) and can't access the stack (not present page).
If the kernel API uses a call gate, interrupt gate or trap gate the CPU would change ESP to a CPL=0 stack, but SYSCALL doesn't do this to "save time" (which means you might need to do it yourself).
Cheers,
Brendan
Re:Fastest possible kernel API call?
Posted: Mon Nov 28, 2005 10:04 am
by Pype.Clicker
so if i'm using SYSCALL, i might wish to have my 'syshandler' wrapped around
Code: Select all
syshandler:
mov [kernel_top_of_stack],esp
mov esp,kernel_top_of_stack-4
; - - - - 8< - - - -
; now do whatever you want
; - - - - >8 - - - -
mov esp,[kernel_top_of_stack]
sysret
Re:Fastest possible kernel API call?
Posted: Mon Nov 28, 2005 2:13 pm
by Ax64
Hi all
Just like to say from entering to handler the rcx register should be preserved, because it uses this to set the rip for the return
Oh and before I forget I think the instruction cycle count is in AMD's software optimization guide for amd x64 etc...