Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
i did not made changes in my OS three months ago (not enough time), and yesterday, when i tried to retest it, it compil but dont run, i tried older versions, but they didn't work as well. older binaries works well, but when i recompile them they dont work ???
bochs gived me this error
00004929986p[CPU0 ] >>PANIC<< prefetch: running in bogus memory
00004929986i[SYS ] Last time is 4
00004929986i[XGUI ] Exit.
00004929986i[CPU0 ] protected mode
00004929986i[CPU0 ] CS.d_b = 32 bit
00004929986i[CPU0 ] SS.d_b = 32 bit
00004929986i[CPU0 ] | EAX=4e206568 EBX=0010e4a8 ECX=0000000d EDX=000003d5
00004929986i[CPU0 ] | ESP=0009efe0 EBP=0009effc ESI=0000f4a8 EDI=00000000
00004929986i[CPU0 ] | IOPL=0 NV UP DI NG NZ AC PO CY
00004929986i[CPU0 ] | SEG selector base limit G D
00004929986i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00004929986i[CPU0 ] | DS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | ES:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | FS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | GS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | SS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | CS:0008( 0001| 0| 0) 00000000 000fffff 1 1
00004929986i[CPU0 ] | EIP=4e206568 (4e206568)
00004929986i[CPU0 ] | CR0=0x60000011 CR1=0x00000000 CR2=0x00000000
00004929986i[CPU0 ] | CR3=0x00000000 CR4=0x00000000
00004929986i[MEM0 ] dbg_fetch_mem out of range. 0x4e206578 > 0x400000
00004929986i[CPU0 ] >> f9
00004929986i[CPU0 ] >> : stc
00004929986i[CTRL ] quit_sim called with exit code 1
qemu says "segmentation fault"
does anyone here got the same problem what can be the cause.
by hexediting an old binarie and a new one (compiled from the same sources) i noticed that there is some changes and a new section called note.GNU-stack, i added a /DISCARD/ : { *(.note.GNU-stack) }, to my ld script but that doesn't change anything ???
the source code and makefiles are the same
I do regular system upgrades to my gentoo (with emerge system), but it's the first time that this happens to me
olders versions of my OS also doesn't work, can this be caused by a gcc update ?
aladdin wrote:
olders versions of my OS also doesn't work, can this be caused by a gcc update ?
That's how I came up with the idea that a compiler update might be to blame in the first place. GCC is making very strong progress in the 3.x releases, but that also means things are changing constantly.
So either you have to constantly take care of your sources, or you have to build yourself a dedicated cross-compiler of a given version and stick to that.
Allthough I'm always recommending the cross-compiler approach for various reasons, that also means that other people might have to use a specific compiler for your OS sources, which can be a nuisance to them.
Every good solution is obvious once you've found it.
there is another think
i compared the memory mapping of a working kernel and a non working one (compiled from the same source code) and i noticed that there is lot of differencies, and this makes me think that the problem comes from linking.
can someone give me a solution to my problem, i know a cross-compiler may be the best solution but i don't have enought time to read docs about cross-compiler building (I have exams next week )
aladdin wrote:
can someone give me a solution to my problem, i know a cross-compiler may be the best solution but i don't have enought time to read docs about cross-compiler building (I have exams next week )
Follow the link I gave you. It is a step-by-step tutorial on how to build a cross-compiler for the explicit use of compiling your self-written OS kernel. I doubt it would take you more than 15 minutes for reading and typing the commands.
It's sure a quicker way than sifting through binary diffs trying to figure out what happened. Those might just be changes in the GCC code backend.
Every good solution is obvious once you've found it.
you obviously have some "call-to-function-pointer" (eax == eip) towards an invalid address ...
hence the "bogus memory" message from bochs (trying to run code at some physically not-yet-initialized memory couldn't result anything good)
Maybe your compilation suite is responsible, maybe it's not. If i were you, i'd first try to re-compile from a completely clean state (no intermediate result of anykind), and using -Wall option to make sure the compiler warns you about anything it might not like ...
I'd trace the bug down with bochs debugger: there are eip's before this crucial address is shoved to eip, so, you might find out, in which function this is happening, and so you'll get a narrowing-down for the location of the error. Then it's a matter of crucial printf's and register dumps to get it.
To find the function in question is easy:
first (if in linux) say objdump -d yourkrn.bin>yourkrn.txt
this will give you a nice disassembly with adresses for each instruction
second, get the eip in question.
third, simply look up the address in yourkrn.txt, et voila, you have the function in which it happens.
Uninitialized memory is also - as pype has stated - a source of bugs.
@clicker : i already tried to re-compile from working sources (backups) -> original binaries work well and recompiled ones does not.
@beyond infinity : i know where the problem occure, i'm using macros to register some os services in a global services table, the kernel brows this table and run all registered services, but it hangs in the first service call (a pointer to a function)
but i think the problem comes from linking process, coz the first printed text (kernel version ...etc) must be in the top of screen, but now it start at the middle, this meen that there is some adressing problem so the function pointer point somwhere in memory and this cause the fault.
another reason of why i m saying that this comes from linking process :
i tried to recompil an old version which not use the ld script (kernel was linked as a flat binary) and this one work well, all version using the ld script (to create an elf kernel) dont work.
so, can someone tell me (give me a link) about ld changes between gcc-3.2.3 and gcc-3.3.x ?
i'll build a cross compiler with gcc-3.2.3 to be able to continue working on my OS, but I want to understand what is the problem.
hmm, imho (but i may be wrong) a linker problem is unlikely to make your text printed at the middle of the screen, mainly because the screen address is a constant that will not be affected by the linker. Instead, it is more likely to output the wrong message (a wrongly relocated pointer in the data).
Trying to disassemble the corresponding code might (maybe) help, too ...
aladdin wrote:
so, can someone tell me (give me a link) about ld changes between gcc-3.2.3 and gcc-3.3.x ?
ld is part of the binutils package, not gcc.
I think it's unlikely your problems come from changes of ld's internals. I think your code makes some invalid assumption at some point that happened to be correct with your old environment but breaks with the new one...
Every good solution is obvious once you've found it.
ah, hail to the kung F00 master o' the universe ];-> May the schwartz be with you.
@aladdin: fine: what are those macros doing en detail, where do they take the addresses for the funciton pointers from, do you have a function table lying around somewhere ... lotsa questions, eh?
oh and btw: it's approx 180 days or so until "Harry Potter and the Half Blood Prince" is out. Canna await it.
oh and btw II: It's 197 posts til I reach the 1000 posts margin. Gonna open some bottles o' beer then.