Page 1 of 1

G++ behavior of storing char* literals causing issues

Posted: Thu Aug 25, 2022 4:46 pm
by SomeGuyWithAKeyboard
I've been fumbling around with writing a protected mode operating system in C++ using G++ and nasm. I've written a pretty basic real mode command line interface in assembly before but haven't done anything like this with anything other than assembly before.

Anyway I've got a dynamic memory allocation system set up that I'm happy with as well as a custom made "dynamic array" data container that's kind of like the vector but with a few differences.

One problem that has had me stuck for a while is the way G++ allocates and stores char* literals. I never can reliably access the strings without them getting corrupted by.. something. I'm talking about when you do something like:

Code: Select all

char *foo = "bar";
or

Code: Select all

char[] foo = "bar";
My kernel loads into ram at location 0x7e00. The size of the program (right now) is 7329 bytes. This means any memory beyond 0x9AA1 and before 0x7e00 should theoretically be a safe place to store stuff. Well, anytime there's something surrounded by quotes in the form of "blah blah blah", it gets stored somewhere but the characters don't always get copied to that location or at least are no longer in that location by the time any other code gets to them.

For example, if I run the code:

Code: Select all

char test[] = "test111";
Running it will result in the string "test111" getting copied to memory location 28364 or 0x6ECC. With the pointer getting set to 0x6ECC. Reading that memory shows that it worked correctly. "t" is at 0x6ECC, "e" is at 0x6ECD and so on and so fourth. However most of the time, this doesn't work. If I do the same exact thing but modify the string slightly, it doesn't work. For example.

Code: Select all

char test[] = "test11";
will set the pointer to a value of 28349 or 0x6EBD. The memory at location 0x6EBD will be 0, the next one will be 0 and everything will just be zeros.
Also, doing

Code: Select all

char *test = "foo";
never works at all. Only the [] operator makes it somewhat work.

Now you may be wondering: why is this a problem? Because manually allocating with calloc doesn't solve the problem. I can do:

Code: Select all

char *test = (char*)calloc(5, sizeof(char);
test = "test";
While calloc will indeed allocate an array of the specified size where it's intended to go, using the char array literal of test = "test" changes the value of the pointer and attempts to copy the text to that location instead of where calloc initially allocated the array. It doesn't matter if the = "test" assignment was way shorter, the same size or way longer, it always reallocates it. This means is usually doesn't work except for sometimes. This problem is especially detrimental to allowing me to accomplish anything because it uses the same broken behavior for copying char array literals as function parameters which means that doesn't work. I've even written functions to search for strings in all of ram. Anytime the chars just get lost like that, they don't appear elsewhere. It's not just a miscalculated pointer, the data is just completely gone out of existence.

Is there a way to modify the way gcc allocates this stuff? I mean besides spending days, perhaps weeks trying to figure out where in the source code gcc deals with allocating char arrays and modifying it to work with my memory allocation system instead of whatever memory allocation convention its currently using that's messing things up. But I guess if that's the only way then it is what it is.

Re: G++ behavior of storing char* literals causing issues

Posted: Fri Aug 26, 2022 8:21 am
by nullplan
What you describe is indicative of the .rodata section not being stored or linked correctly. Investigate where it is in your kernel file. Maybe you aren't loading enough sectors? I certainly heard of that before.
SomeGuyWithAKeyboard wrote:

Code: Select all

char *test = (char*)calloc(5, sizeof(char);
test = "test";
That on the other hand is indicative of a programmer who doesn't know either C or C++. You create a pointer, assign to it the result of calloc(), then overwrite the pointer with the value of the string literal "test". Meaning you have now leaked the 5-char object allocated with calloc(), and "test" points somewhere completely different. Maybe you meant to copy the string into the pointer, but that would have been a call to memcpy(). But since strings currently don't work for you, even that wouldn't help.

You know, if strings never work but most other code does, then that's indicative of your program not running where you think it does. A debugger would help you figure this out.

Re: G++ behavior of storing char* literals causing issues

Posted: Fri Aug 26, 2022 2:08 pm
by SomeGuyWithAKeyboard
nullplan wrote:What you describe is indicative of the .rodata section not being stored or linked correctly. Investigate where it is in your kernel file. Maybe you aren't loading enough sectors? I certainly heard of that before.
I know I'm loading enough sectors I have this for my loader code that happens while still in real mode:

Code: Select all

; start putting in values:
mov ah, 2h    ; int13h function 2
mov al, 50    ; we want to read a lot of sectors I guess. 50 * 512 = ~25kb = a lot 
mov ch, 0     ; from cylinder number 0
mov cl, 2     ; the sector number 2 - second sector (starts from 1, not 0)
mov dh, 0     ; head number 0
xor bx, bx    
mov es, bx    ; es should be 0
mov bx, 7e00h ; 512bytes from origin address 7e00h
int 13h
512*50 + 0x7e00= 0xE200 which *should* work since it's not copying more than 64kb.

One of the few things I haven't yet exhausted every possible thing I can possibly try is the way it links.

Now I had a lot of problems getting makefiles and linking to actually work. I wasn't able to get anything to compile using my own intuition. I wasn't able to get anything to work from looking at "working" examples either. I had to basically make a script that has g++ compile it into a standalone file, compile the assembly separately with nasm into a standalone file and then combine those 2 files together. This isn't the same as that elf and link.ld business everyone else does but it makes everything except string literals work which is more than any of my attempts at using elfs and the ld linker was able to achieve.

My compile script is:

Code: Select all

g++ -march=i486 -m32 -nostartfiles -ffreestanding -nostdlib -nolibc -nodefaultlibs -Ttext 0x7e00 system.cpp -o system
objcopy -O binary -j .text system system.raw
nasm bootloader.asm -o bootloader.bin
cat system >> bootloader.bin
This does introduce the problem of rodata indeed not getting compiled. I can get rodata with "objcopy -O binary -j .rodata system system_rodata.raw" of course but it's essentially useless because gcc has a special place in memory ro data is supposed to go and copying it to bootloader.bin with cat doesn't put it in the right place. There doesn't seem to be a easy way to make g++ put rodata where you want like you can with the text block via the "-Ttext 0x7e00" parameter.
From what i've gathered, possibly the only way to specify where ro data goes is with a linker script. Unfortunately, I cannot for the life of me get ld or link.ld scripts to work for some reason.

I've tried using a linker script with something like:

Code: Select all

g++ -march=i486 -m32 -nostartfiles -ffreestanding -nostdlib -nolibc -nodefaultlibs -Tlink.ld system.cpp -o system
nasm bootloader.asm -o bootloader.bin
cat system.raw >> bootloader.bin
with a link.ld of

Code: Select all

OUTPUT_FORMAT("elf32-i386")
ENTRY(begin)
SECTIONS
{
    . = 0x7e00;

    .text BLOCK(8K) : ALIGN(4K)
    {
        *(.text)
    }

    .rodata BLOCK(4K) : ALIGN(4K)
    {
        *(.rodata)
    }

    .data BLOCK(4K) : ALIGN(4K)
    {
        *(.data)
    }

    .bss BLOCK(4K) : ALIGN(4K)
    {
        *(.bss)
    }

    end = .;
}

but it never links correctly enough to even boot. I've tried fidgeting around, trying stuff some other projects on the internet and extensively trying all kinds of different command line parameters in the man pages but the best I can ever get is an error of
warning: cannot find entry symbol lf_i386; defaulting to 0000000000008000
. Copying this to memory location 0x8000 doesn't allow it to boot, naming a function "lf_i386" anywhere in my c++ code doesn't fix it, trying something like ".lf_i386 = 0x7e00" in the linker file doesn't fix it and that's about all the things there are to try.

What can I do to make a link.ld script actually work and potentially solve my rodata string problems?

Re: G++ behavior of storing char* literals causing issues

Posted: Fri Aug 26, 2022 2:41 pm
by nullplan
SomeGuyWithAKeyboard wrote:
warning: cannot find entry symbol lf_i386; defaulting to 0000000000008000
Well, that error means you are somehow passing the option "-elf_i386" to the linker. Which obviously doesn't work. If anything it should be "-m elf_i386". But even simpler ought to be to just get a cross-compiler for i386-elf going. Then you don't need any emulation options.

Your current approach strips out all sections not named ".text", and so you will likely have a problem as soon as templates enter the mix. There is a fix, but obviously you need to fix your linker instead

Re: G++ behavior of storing char* literals causing issues

Posted: Fri Aug 26, 2022 2:54 pm
by SomeGuyWithAKeyboard
Trying the following

Code: Select all

g++ -march=i486 -m32 -std=c++17 -nostartfiles -ffreestanding -fPIE -Ttext 0x7e00 system.cpp
nasm -g -F dwarf bootloader.asm
ld -o bootloader.bin bootloader.o -Ttext 0x7c00 --oformat=binary
ld -o system.raw system -Tlink.ld
cat system.raw >> bootloader.bin
Doesn't work. It complains the dwarf parameter isn't valid and when I remove that, ld will refuse to touch anything nasm spits out. Ld will just report "file format not recognized, treating as link script"
I guess from further investigation, it seems ro data wants to be before the executable text block. Since rodata changes in size, this means I can't jump to my program from assembly without some kind of global label that somehow allows nasm to see stuff in c++ code which is another thing I haven't been able to get to work.

Here is my github repository for this:
https://github.com/Xeraster/SimpleProtectedModeOS

If anyone knows what exactly I need to do to accomplish this, I would love to hear it. It's really frustrating because I just can't get the compiler or linker to cooperate. If ONLY there was a parameter to manually define the rodata location on g++ without having to go through ld to do so. I suppose I could make a really ugly hack where I declare a character array in the begin function using string literals, search memory for that exact string (to find the rodata that I manually attached to the end of .text with a script) and then memcpy that of whatever size the rodata is to wherever the pointer address of the character array is. Would be way better if I could figure out how to get the linker to work so I don't have to do this. I'm pretty stumped and I feel as though I have exhausted every possible thing to make ld work but hopefully someone on here will know the secret after seeing my source code.

Re: G++ behavior of storing char* literals causing issues

Posted: Fri Aug 26, 2022 11:10 pm
by Octocontrabass
SomeGuyWithAKeyboard wrote:

Code: Select all

g++ -march=i486 -m32 -std=c++17 -nostartfiles -ffreestanding -fPIE -Ttext 0x7e00 system.cpp
nasm -g -F dwarf bootloader.asm
ld -o bootloader.bin bootloader.o -Ttext 0x7c00 --oformat=binary
ld -o system.raw system -Tlink.ld
cat system.raw >> bootloader.bin
That's not going to work. Maybe you were trying to do something like this instead?

Code: Select all

i686-elf-g++ -c -march=i486 -std=c++17 -ffreestanding -o system.o system.cpp
nasm -g -f elf32 -o bootloader.o bootloader.asm
i686-elf-g++ -nostdlib -o disk.bin -T link.ld bootloader.o system.o -lgcc
If you don't yet have a cross-compiler, something along the lines of "g++ -m32 -no-pie" might work as a temporary substitute for "i686-elf-g++", but you need a cross-compiler.

Your linker script is... odd. ALIGN() and BLOCK() mean the same thing, so it doesn't make sense to have any BLOCK() statements. Other than that, it's pretty close. Some small changes will get you what you want.

Code: Select all

OUTPUT_FORMAT("binary")
SECTIONS
{
    . = 0x7c00;

    .text : ALIGN(1K)
    {
        bootloader.o(.text)
        *(.text .text.*)
    }

    .rodata : ALIGN(4K)
    {
        *(.rodata .rodata.*)
    }
This is incomplete, but I think you can fill in the rest by copying the .rodata section. I changed the output format to a flat binary, but you can use ELF and objcopy it into a flat binary if you prefer. The wildcards I've used may not be enough to catch all of the sections GCC emits.

There are problems with bootloader.asm as well. The first few lines should look like this:

Code: Select all

CPU 586
bits 16

SECTION .text
Note the removal of the org statement, the addition of the bits statement, the removal of square brackets around the cpu statement, and the change from "text" to ".text".

To reference symbols in another object file, declare the symbol with "extern" in your assembly. This will allow your bootloader to do things like call global constructors and jump to the correct entry point.

There are too many problems with your C++ code for me to fix it. You can't include <cmath> or <sys/io.h>. You don't need code to initialize global variables - those are in the .data section. You do need code to initialize global constructors - I'm not familiar enough with the C++ ABI to tell you how, but the wiki has some information that might be helpful if it's not too outdated.

Re: G++ behavior of storing char* literals causing issues

Posted: Sat Aug 27, 2022 1:55 pm
by SomeGuyWithAKeyboard
I was able to use your suggestions to actually get it to compile and link. Thanks a lot! I didn't know you were supposed to / could use g++ a second time to link. :oops:

I needed to use "-fno-pie" instead of "-no-pie" to make it work or else it would spit out a bunch of undefined reference to "`_GLOBAL_OFFSET_TABLE_'" errors.

Re: G++ behavior of storing char* literals causing issues

Posted: Sun Aug 28, 2022 12:36 am
by Ethin
You do know that if you built a cross-compiler instead of trying to hack it together with a hosted build of GCC most of your problems would go away, right? I mean, you'd have a new set of problems like trying to include files that don't exist and all that, but you've completely skipped the cross-compiler step which is going to cause you all kinds of problems because your compiler is going to assume that your running in a hosted environment, which means its going to let you do things that you can't actually do (or that you shouldn't do).