Hi folks,
I have a problem at the very beginning of my kernel when I call my (C++) constructors.
First of all, here's a general overview of what I'm doing:
After having loaded my kernel by grub, I use Tim Robinson's GDT trick in order to let my kernel appear in higher half at address 0xC0000000 (segments base address 0x40000000, kernel is linked to address 0xC0000000, see http://www.osdever.net/tutorials/pdf/memory1.pdf). After that I enable paging, i.e. map my kernel's page frames to 0xC0000000 and finally I set up a "normal" GDT that uses segments starting at base address 0x00000000.
Now, calling the constructors come into play:
When I call my constructors after what I described so far, everything is fine.
But when I call the constructors right after using the GDT trick,it seems to end up in an endless loop (according to Bochs Debug CPU view, EIP looks like being in a loop...). More precisely, this seems to be right in the compiler generated code that is called to create the constructors (adding hlt instructions before and after that call brought me to that explanation...)
Does anyone have an idea?
calling constructor seems to end up in an endless loop
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: calling constructor seems to end up in an endless loop
No specific ideas, but you should try bochs' debugger, you can singlestep through your code and see what is (not) called and why. Do you know exactly what code is passed through by this "infinite loop" yet?
Re: calling constructor seems to end up in an endless loop
Glass ball method of debugging:
- You have a Instance() function which returns a local static object, which refers to itself indirectly through that Instance() function causing it to go into a loop because you don't actually lock with __cxa_somethingorother_lock and it assigns after actually constructing it, so every loop it finds a null pointer, allocates the lock & tries again.
- You have an explicit dependency on an address somewhere, that causes it to loop after moving.
Generally, put halt() at nearly-no-code further & determine what causes it to fail. Hope that it still reproduces without multicore/interrupts/threads; that makes debugging ages easier.
- You have a Instance() function which returns a local static object, which refers to itself indirectly through that Instance() function causing it to go into a loop because you don't actually lock with __cxa_somethingorother_lock and it assigns after actually constructing it, so every loop it finds a null pointer, allocates the lock & tries again.
- You have an explicit dependency on an address somewhere, that causes it to loop after moving.
Generally, put halt() at nearly-no-code further & determine what causes it to fail. Hope that it still reproduces without multicore/interrupts/threads; that makes debugging ages easier.
Re: calling constructor seems to end up in an endless loop
First of all, thanks for your replies.
In the meantime, I found out that Qemu doesn't bother where I put the code that calls my constructors. Qemu does not get stucked in an endless loop and executes the rest of the kernel just as expected. So I already looked for differences between Bochs and Qemu that may lead to that behaviour but I found nothing appropriate. Maybe I missed anything...?
To give you a deeper insight into my problem, I want to post the realting code (including my linker script) here. I marked the two places where I called the constructors. Calling them at PLACE A will produce the endless loop, whereas calling them at PLACE B will be ok.
the linker script:
In the meantime, I found out that Qemu doesn't bother where I put the code that calls my constructors. Qemu does not get stucked in an endless loop and executes the rest of the kernel just as expected. So I already looked for differences between Bochs and Qemu that may lead to that behaviour but I found nothing appropriate. Maybe I missed anything...?
To give you a deeper insight into my problem, I want to post the realting code (including my linker script) here. I marked the two places where I called the constructors. Calling them at PLACE A will produce the endless loop, whereas calling them at PLACE B will be ok.
Code: Select all
[BITS 32]
[global start_kernel]
[extern __CTOR_LIST__]
[extern second_stage]
[extern paging]
%include "multi_boot_header.i"
section .text
start_kernel:
; set up stack
mov esp, 0x7FFFF
; set up trick gdt according to tim robinson (segment base at 0x40000000 etc.)
lgdt [trick_gdt]
mov ax, 0x10
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
; adapt stack pointer
add esp, 0xC0000000
; far jump
jmp 0x08:flush_trick_gdt
flush_trick_gdt:
; PLACE A:
; call constructors right after setting up the trick gdt
; doing it that way, it ends up in an endless loop (see call ebx)
lea eax, [__CTOR_LIST__]
mov ecx, [eax] ; number of ctor
add eax, 4
call_all_ctor:
mov ebx, [eax]
push eax ; save eax on stack as eax changes during call
push ecx
; here I call the compiler generated code to create the global objects
; this call does not return and seems to get stucked...
call ebx
pop ecx
; decrement number of ctor
dec ecx
; get function pointer from stack
pop eax
; increment function pointer of ctor
add eax, 4
cmp ecx, 0
jnz call_all_ctor
call paging
; set up "normal" gdt (segment base at 0x00000000)
lgdt [normal_gdt]
mov ax, 0x10
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
; far jump
jmp 0x08:flush_normal_gdt
flush_normal_gdt:
; PLACE B:
; calling constructors here, does not produce any problem...
; the kernel will execute just as expected
; rest of kernel...
jmp second_stage
section .setup
trick_gdt:
dw trick_gdt_end - trick_gdt_start - 1
dd trick_gdt_start
trick_gdt_start:
dd 0, 0 ; null gate
db 0xFF, 0xFF, 0, 0, 0, 10011010b, 11001111b, 0x40 ; code selector 0x08: base 0x40000000, limit 0xFFFFFFFF, type 0x9A, granularity 0xCF
db 0xFF, 0xFF, 0, 0, 0, 10010010b, 11001111b, 0x40 ; data selector 0x10: base 0x40000000, limit 0xFFFFFFFF, type 0x92, granularity 0xCF
trick_gdt_end:
section .data
normal_gdt:
dw normal_gdt_end - normal_gdt_start - 1
dd normal_gdt_start
normal_gdt_start:
dd 0, 0
db 0xFF, 0xFF, 0, 0, 0, 10011010b, 11001111b, 0x0 ; code selector 0x08: base 0x0, limit 0xFFFFFFFF, type 0x9A, granularity 0xCF
db 0xFF, 0xFF, 0, 0, 0, 10010010b, 11001111b, 0x0 ; data selector 0x10: base 0x0, limit 0xFFFFFFFF, type 0x92, granularity 0xCF
normal_gdt_end:
Code: Select all
OUTPUT_FORMAT("elf32-i386")
ENTRY(start_kernel)
PAGESIZE = 4096;
GRUB_OFFSET = 0x100000;
SECTIONS
{
. = GRUB_OFFSET;
/*some add on to recover from GRUB error 13 (complaining about header)*/
.__mbHeader :
{
*(.__mbHeader)
}
.setup ALIGN (PAGESIZE) :
{
*(.setup)
}
. += 0xC0000000;
.text ALIGN (PAGESIZE) : AT(ADDR(.text) - 0xC0000000)
{
code = .; _code = .; __code = .;
*(.text)
}
.data ALIGN (PAGESIZE) : AT(ADDR(.data) - 0xC0000000)
{
data_start = .;
__CTOR_LIST__ = .;
LONG((__CTOR_END__ - __CTOR_LIST__) / 4-2)
*(.ctors)
LONG(0)
__CTOR_END__ = .;
__DTOR_LIST__ = .;
LONG((__DTOR_END__ - __DTOR_LIST__) / 4-2)
*(.dtors)
LONG(0)__DTOR_END__ = .;
data = .; _data = .; __data = .;
*(.data)
}
.bss ALIGN (PAGESIZE) : AT(ADDR(.bss) - 0xC0000000)
{
bss = .; _bss = .; __bss = .;
*(.bss)
}
end = .; _end = .; __end = .;
page_dir_sym = ALIGN(PAGESIZE);
. = . + 4096;
page_table_sym = ALIGN(PAGESIZE);
. += 4096;
phys_bitmap_start_sym = ALIGN(PAGESIZE);
}
Re: calling constructor seems to end up in an endless loop
I think, I solved the problem, but...
The situation is as follows. I have to global objects, say object A and object B. In the constructor of object A, a function of object B is called. When I remove that function call, everything is OK. After all, I'm pretty happy that I found a solution but honestly speaking, I don't understand it completely...
Is it related to the fact that (I think it's according to C++ specification) there's no guarantee about the order in which the objects will be created? If so, what's the relation to what I observe? Is it possible that to the time when object A is created (and a function of object B shall be called) object B is not yet created at all?
And above all, why does this constructor call (with the function call to another global object) succeed when it's called later after enabling paging and so on (I described that in my earlier posts).
Hope, I could give a more precise description of my problem this time. I don't like the glass ball method of debugging, too.
The situation is as follows. I have to global objects, say object A and object B. In the constructor of object A, a function of object B is called. When I remove that function call, everything is OK. After all, I'm pretty happy that I found a solution but honestly speaking, I don't understand it completely...
Is it related to the fact that (I think it's according to C++ specification) there's no guarantee about the order in which the objects will be created? If so, what's the relation to what I observe? Is it possible that to the time when object A is created (and a function of object B shall be called) object B is not yet created at all?
And above all, why does this constructor call (with the function call to another global object) succeed when it's called later after enabling paging and so on (I described that in my earlier posts).
Hope, I could give a more precise description of my problem this time. I don't like the glass ball method of debugging, too.
Re: calling constructor seems to end up in an endless loop
Definitely. Best way to handle that is to make *no* global objects at all. At least, none that have a smart constructor. Make all of those singletons with Instance() function like:Armin wrote:Is it related to the fact that (I think it's according to C++ specification) there's no guarantee about the order in which the objects will be created? If so, what's the relation to what I observe? Is it possible that to the time when object A is created (and a function of object B shall be called) object B is not yet created at all?
Code: Select all
static X &X::Instance()
{
static X x;
return x;
}
Random uninitialized data? Different location that it mallocs?And above all, why does this constructor call (with the function call to another global object) succeed when it's called later after enabling paging and so on (I described that in my earlier posts).