Strange register (or stack?) clobber problem.

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
mduft
Member
Member
Posts: 46
Joined: Thu Jun 05, 2008 9:23 am
Location: Austria

Strange register (or stack?) clobber problem.

Post by mduft »

Hi!

I have implemented kernel threads in my x86_64 kernel recently, and it's finally (sort of) working... :) I can start a few dozen threads, and they get scheduled alright. But after a while, threads start to die... All the threads do is write to the screen in a tight loop. somewhere in the logging functions, all of a sudden parameters are 0 (or some other random value) where they shouldn't.

I suspect that a timer interrupt occurs somewhere in a bad place, and somehow clobbers the registers used for parameter passing. i triple checked the thread state saving and restoring code [1], but cannot seem to find a problem there. also, i checked whether a nested interrupt could have confused some code, but in [2] the interrupt handlers are completely locked (i have only the BSP running so far) by disabling interrupts.

i also disabled the red-zone, so that should not be the problem. when disassembling the method that crashes, i can see, that the compiler generates code to save the register used to pass in the parameter to the stack. right after that, if i check the value, it's zero.... i debugged this for a while now (which is pretty much impossible, since the timer tends to fire faster than i can debug ;)) and am out of ideas. any suggestion what i could look out for would be appreciated...

[1] https://github.com/mduft/tachyon3/blob/ ... 64/state.S
[2] https://github.com/mduft/tachyon3/blob/ ... 6_64/idt.S - line 49

did i forget to mention something...?

thanks for the help :)
markus
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Strange register (or stack?) clobber problem.

Post by gerryg400 »

Are your kernel stacks 16 byte aligned ? I got caught by that.
If a trainstation is where trains stop, what is a workstation ?
User avatar
mduft
Member
Member
Posts: 46
Joined: Thu Jun 05, 2008 9:23 am
Location: Austria

Re: Strange register (or stack?) clobber problem.

Post by mduft »

thanks for the hint. i double checked and added a few logs. nothing going wrong there however; all stacks are page aligned (4K).

Code: Select all

trace: allocated new stack at 0xffffffff80000000 (16384 bytes comm, 16384 bytes res)
trace: allocated new stack at 0x0000800000000000 (8192 bytes comm, 1048576 bytes res)
trace: allocated new stack at 0x00007ffffff01000 (8192 bytes comm, 1048576 bytes res)
trace: allocated new stack at 0x00007fffffe02000 (8192 bytes comm, 1048576 bytes res)
trace: allocated new stack at 0x00007fffffd03000 (8192 bytes comm, 1048576 bytes res)
trace: allocated new stack at 0x00007fffffc04000 (8192 bytes comm, 1048576 bytes res)
trace: allocated new stack at 0x00007fffffb05000 (8192 bytes comm, 1048576 bytes res)
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Strange register (or stack?) clobber problem.

Post by Owen »

Have you disabled the red zone? (Thereby permitting GCC to save data at negative offsets from ESP?)
User avatar
mduft
Member
Member
Posts: 46
Joined: Thu Jun 05, 2008 9:23 am
Location: Austria

Re: Strange register (or stack?) clobber problem.

Post by mduft »

yes, i disabled the red zone... the disassembly at the relevant location looks like this:

Code: Select all

000000000000021a <log_format_message>:
static void log_format_message(char* buf, size_t len, char const* fmt, va_list args) {
 21a:   55                      push   %rbp
 21b:   48 89 e5                mov    %rsp,%rbp
 21e:   48 83 ec 70             sub    $0x70,%rsp
 222:   48 89 7d a8             mov    %rdi,-0x58(%rbp)
 226:   48 89 75 a0             mov    %rsi,-0x60(%rbp)
 22a:   48 89 55 98             mov    %rdx,-0x68(%rbp)
 22e:   48 89 4d 90             mov    %rcx,-0x70(%rbp)
    char c;
    char* p = buf;
 232:   48 8b 45 a8             mov    -0x58(%rbp),%rax
 236:   48 89 45 f0             mov    %rax,-0x10(%rbp)

    if(!fmt)
 23a:   48 83 7d 98 00          cmpq   $0x0,-0x68(%rbp)
 23f:   0f 85 9e 05 00 00       jne    7e3 <log_format_message+0x5c9>
        fatal("null format in log_format_message!\n");
the noteworthy thing is: at 21a, the values passed in are "sane" (i.e. not NULL/zero). but at 23a, the cmpq fails, and the kernel stops (with fatal error "null format ..."). i have no idea what could be the problem. the code for the interrupt handler(s) is here, and the state saving/restoring code is here. i need somebody with good asm-foo to have a look at this please, as i'm totally stuck.

thanks
User avatar
mduft
Member
Member
Posts: 46
Joined: Thu Jun 05, 2008 9:23 am
Location: Austria

Re: Strange register (or stack?) clobber problem.

Post by mduft »

thanks to those that tried to help, i FOUND IT :D

... finally *phew*. it seems i forgot to save register from clobbering through GCC generated code that ran _before_ saving registers (while trying to find out where to save registers to... uh).

now it works, and kernel threading is pretty stable now (128 threads all printing thread-id + system time running endlessly without problem :))

thanks again!
Post Reply