Saving context

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Saving context

Post by gerryg400 »

Hi all,
When I enter the kernel in my OS I do lot's of 'push' instructions to save all the GP regs. Last night I had a peek at the linux x86_64 context save routines and saw that they use rsp-relative moves to save the register context. Something like this.

Code: Select all

subq    $64, %rsp
movq    %rdi, 8*8(%rsp)
movq    %rsi, 7*8(%rsp) 
... etc.
Can anyone suggest why ? Surely push is faster than a reg-rel mov.
Thanks
- gerryg400
If a trainstation is where trains stop, what is a workstation ?
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: Saving context

Post by NickJohnson »

gerryg400 wrote:Can anyone suggest why ? Surely push is faster than a reg-rel mov.
Why do you assume that? Every push is a write and a register modification; that is just a write: the stack pointer is only modified once, at the end. I don't know which exactly is faster, but I'd trust the Linux people to optimize at least that section to hell. Beyond speed, it may just be neater to think of the area that the context is being saved to as less of a stack than an array.

Edit: Actually, maybe I do understand. It seems like pushes would be hard to pipeline, because they all modify the stack pointer. By saving each register without modifying the stack pointer, it may play better with the processor pipeline. It would all depend on whether writes can be pipelined like that - maybe only on newer processors?
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Saving context

Post by gerryg400 »

I'm not sure about instruction timings either but push rdx, rcx, rax, etc are single byte opcodes. Those stack relatives are 5 byte opcodes. I'm pretty sure that in isolation, it's quicker to move something to the stack with a push.
However, I think you're on to something with the pipelining and out of order execution. I don't really know anything about this type of stuff but it strikes me that the pushes would have to be done one at a time in order, but the mov's could be done in parallel or out of order since they all access different regs and different memory. This is way beyond my knowledge at the moment. Just a curiosity for me. When my OS is complete (yeah right!) and I'm looking for a performance improvement, I'll look at this again. Thanks NickJohnson

- gerryg400
If a trainstation is where trains stop, what is a workstation ?
Post Reply