Page 1 of 3

Strange compilation problem

Posted: Sun Jan 16, 2005 9:04 am
by aladdin
i have a really strange problem

i did not made changes in my OS three months ago (not enough time), and yesterday, when i tried to retest it, it compil but dont run, i tried older versions, but they didn't work as well. older binaries works well, but when i recompile them they dont work ???
bochs gived me this error
00004929986p[CPU0 ] >>PANIC<< prefetch: running in bogus memory
00004929986i[SYS ] Last time is 4
00004929986i[XGUI ] Exit.
00004929986i[CPU0 ] protected mode
00004929986i[CPU0 ] CS.d_b = 32 bit
00004929986i[CPU0 ] SS.d_b = 32 bit
00004929986i[CPU0 ] | EAX=4e206568 EBX=0010e4a8 ECX=0000000d EDX=000003d5
00004929986i[CPU0 ] | ESP=0009efe0 EBP=0009effc ESI=0000f4a8 EDI=00000000
00004929986i[CPU0 ] | IOPL=0 NV UP DI NG NZ AC PO CY
00004929986i[CPU0 ] | SEG selector base limit G D
00004929986i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00004929986i[CPU0 ] | DS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | ES:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | FS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | GS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | SS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | CS:0008( 0001| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | EIP=4e206568 (4e206568)
00004929986i[CPU0 ] | CR0=0x60000011 CR1=0x00000000 CR2=0x00000000
00004929986i[CPU0 ] | CR3=0x00000000 CR4=0x00000000
00004929986i[MEM0 ] dbg_fetch_mem out of range. 0x4e206578 > 0x400000
00004929986i[CPU0 ] >> f9
00004929986i[CPU0 ] >> : stc
00004929986i[CTRL ] quit_sim called with exit code 1
qemu says "segmentation fault"

does anyone here got the same problem what can be the cause.

please help

:'(

Re:Strange compilation problem

Posted: Sun Jan 16, 2005 9:43 am
by aladdin
BTW : here is my ld script, it may help

Code: Select all

OUTPUT_FORMAT("elf32-i386")
ENTRY(_start)
SECTIONS {
  . = 0x101000;

  .text : {
    *(.text)
    _etext = .;
    etext = _etext;
  }
  .rodata : {
    *(.rodata)
    _erodata = .;
    erodata = _erodata;
  }

  .data : {
    *(.data)
    _edata = .;
    edata = _edata;
  }
  .bss : {
    *(.bss)
    _ebss = .;
    ebss = _ebss;
  }
  
}

by hexediting an old binarie and a new one (compiled from the same sources) i noticed that there is some changes and a new section called note.GNU-stack, i added a /DISCARD/ : { *(.note.GNU-stack) }, to my ld script but that doesn't change anything ???

Re:Strange compilation problem

Posted: Sun Jan 16, 2005 11:42 am
by Solar
A shot in the dark, but did your compiler change since your last compile? Or your makefile / compiler options?

Re:Strange compilation problem

Posted: Sun Jan 16, 2005 1:33 pm
by aladdin
the source code and makefiles are the same
I do regular system upgrades to my gentoo (with emerge system), but it's the first time that this happens to me
olders versions of my OS also doesn't work, can this be caused by a gcc update ?

Re:Strange compilation problem

Posted: Mon Jan 17, 2005 1:15 am
by Solar
aladdin wrote: olders versions of my OS also doesn't work, can this be caused by a gcc update ?
That's how I came up with the idea that a compiler update might be to blame in the first place. GCC is making very strong progress in the 3.x releases, but that also means things are changing constantly.

So either you have to constantly take care of your sources, or you have to build yourself a dedicated cross-compiler of a given version and stick to that.

Allthough I'm always recommending the cross-compiler approach for various reasons, that also means that other people might have to use a specific compiler for your OS sources, which can be a nuisance to them.

Re:Strange compilation problem

Posted: Mon Jan 17, 2005 5:53 am
by aladdin
there is another think
i compared the memory mapping of a working kernel and a non working one (compiled from the same source code) and i noticed that there is lot of differencies, and this makes me think that the problem comes from linking.
can someone give me a solution to my problem, i know a cross-compiler may be the best solution but i don't have enought time to read docs about cross-compiler building (I have exams next week ;) )

Re:Strange compilation problem

Posted: Mon Jan 17, 2005 6:05 am
by aladdin
here is a small part of a diff
75c75
< *fill* 0x000000000010137e 0x2 00
---
> *fill* 0x000000000010137e 0x400309c300000002 00

89c89
< *fill* 0x0000000000101625 0x3 00
---
> *fill* 0x0000000000101625 0x400309c300000003 00

93c93
< *fill* 0x0000000000101711 0x3 00
---
> *fill* 0x0000000000101711 0x400309c300000003 00

97c97
< *fill* 0x00000000001017fb 0x1 00
---
> *fill* 0x00000000001017fb 0x400309c300000001 00
addresses starting with "0x400309c3000000" are from the working version

Re:Strange compilation problem

Posted: Mon Jan 17, 2005 6:55 am
by Solar
aladdin wrote: can someone give me a solution to my problem, i know a cross-compiler may be the best solution but i don't have enought time to read docs about cross-compiler building (I have exams next week ;) )
Follow the link I gave you. It is a step-by-step tutorial on how to build a cross-compiler for the explicit use of compiling your self-written OS kernel. I doubt it would take you more than 15 minutes for reading and typing the commands.

It's sure a quicker way than sifting through binary diffs trying to figure out what happened. Those might just be changes in the GCC code backend.

Re:Strange compilation problem

Posted: Mon Jan 17, 2005 7:27 am
by Pype.Clicker
you obviously have some "call-to-function-pointer" (eax == eip) towards an invalid address ...

hence the "bogus memory" message from bochs (trying to run code at some physically not-yet-initialized memory couldn't result anything good)

Maybe your compilation suite is responsible, maybe it's not. If i were you, i'd first try to re-compile from a completely clean state (no intermediate result of anykind), and using -Wall option to make sure the compiler warns you about anything it might not like ...

Re:Strange compilation problem

Posted: Mon Jan 17, 2005 7:36 am
by distantvoices
I'd trace the bug down with bochs debugger: there are eip's before this crucial address is shoved to eip, so, you might find out, in which function this is happening, and so you'll get a narrowing-down for the location of the error. Then it's a matter of crucial printf's and register dumps to get it.

To find the function in question is easy:

first (if in linux) say objdump -d yourkrn.bin>yourkrn.txt
this will give you a nice disassembly with adresses for each instruction
second, get the eip in question.

third, simply look up the address in yourkrn.txt, et voila, you have the function in which it happens.

Uninitialized memory is also - as pype has stated - a source of bugs.

Re:Strange compilation problem

Posted: Mon Jan 17, 2005 11:53 am
by aladdin
@clicker : i already tried to re-compile from working sources (backups) -> original binaries work well and recompiled ones does not.

@beyond infinity : i know where the problem occure, i'm using macros to register some os services in a global services table, the kernel brows this table and run all registered services, but it hangs in the first service call (a pointer to a function)

but i think the problem comes from linking process, coz the first printed text (kernel version ...etc) must be in the top of screen, but now it start at the middle, this meen that there is some adressing problem so the function pointer point somwhere in memory and this cause the fault.

another reason of why i m saying that this comes from linking process :
i tried to recompil an old version which not use the ld script (kernel was linked as a flat binary) and this one work well, all version using the ld script (to create an elf kernel) dont work.

so, can someone tell me (give me a link) about ld changes between gcc-3.2.3 and gcc-3.3.x ?


i'll build a cross compiler with gcc-3.2.3 to be able to continue working on my OS, but I want to understand what is the problem.

Re:Strange compilation problem

Posted: Mon Jan 17, 2005 1:32 pm
by Pype.Clicker
hmm, imho (but i may be wrong) a linker problem is unlikely to make your text printed at the middle of the screen, mainly because the screen address is a constant that will not be affected by the linker. Instead, it is more likely to output the wrong message (a wrongly relocated pointer in the data).

Trying to disassemble the corresponding code might (maybe) help, too ...

Re:Strange compilation problem

Posted: Mon Jan 17, 2005 10:59 pm
by Solar
aladdin wrote: so, can someone tell me (give me a link) about ld changes between gcc-3.2.3 and gcc-3.3.x ?
ld is part of the binutils package, not gcc.

I think it's unlikely your problems come from changes of ld's internals. I think your code makes some invalid assumption at some point that happened to be correct with your old environment but breaks with the new one...

Re:Strange compilation problem

Posted: Tue Jan 18, 2005 2:57 am
by Pype.Clicker
<googlebot>http://gcc.gnu.org/gcc-3.3/changes.html</googlebot>

couldn't resist, just to say i'm only 12 posts away of my 4000th post :P

Re:Strange compilation problem

Posted: Tue Jan 18, 2005 3:16 am
by distantvoices
ah, hail to the kung F00 master o' the universe ];-> May the schwartz be with you.

@aladdin: fine: what are those macros doing en detail, where do they take the addresses for the funciton pointers from, do you have a function table lying around somewhere ... lotsa questions, eh?

oh and btw: it's approx 180 days or so until "Harry Potter and the Half Blood Prince" is out. Canna await it.

oh and btw II: It's 197 posts til I reach the 1000 posts margin. Gonna open some bottles o' beer then.