[SOLVED] Strange errors when kernel goes bigger

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
wichtounet
Member
Member
Posts: 90
Joined: Fri Nov 01, 2013 4:05 pm
Location: Fribourg, Switzerland
Contact:

[SOLVED] Strange errors when kernel goes bigger

Post by wichtounet »

Hi,

I've run into a strange error. I've added some code in a function that is not executed at bootup and now my OS refuses to boot.

Bochs outputs a lot of these errors:

Code: Select all

00050007770e[CPU0 ] write_virtual_qword_64(): canonical failure
00050007817e[CPU0 ] write_virtual_word_64(): canonical failure
00050007864e[CPU0 ] write_virtual_word_64(): canonical failure
00050007911e[CPU0 ] write_virtual_word_64(): canonical failure
...
And then it ends in Page Fault.

I was wondering what can cause a canonical failure ?

From what I've found, it comes from when an address uses more than 48 bits, is that the only case when that can occurs ?

Thanks

Baptiste
Last edited by wichtounet on Sun Jan 19, 2014 2:34 pm, edited 2 times in total.
Thor Operating System: C++ 64 bits OS: https://github.com/wichtounet/thor-os
Good osdeving!
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: What can cause a canonical failure in Bochs ?

Post by Brendan »

Hi,
wichtounet wrote:I was wondering what can cause a canonical failure ?
For long mode paging, the virtual address space is split in 2 usable parts with a hole in the middle, like this:
  • 0x0000000000000000 to 0x00007FFFFFFFFFFF = usable (typically "user space")
  • 0x0000800000000000 to 0xFFFF7FFFFFFFFFFF = unusable (not "canonical")
  • 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF = usable (typically "kernel space")
Basically the highest 17 bits of a virtual address must contain the same (either all 1 or all 0).
wichtounet wrote:From what I've found, it comes from when an address uses more than 48 bits, is that the only case when that can occurs ?
There are no other cases where virtual addresses can be non-canonical.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
wichtounet
Member
Member
Posts: 90
Joined: Fri Nov 01, 2013 4:05 pm
Location: Fribourg, Switzerland
Contact:

Re: What can cause a canonical failure in Bochs ?

Post by wichtounet »

Thanks for the complete answer :) I didn't know about the hole.

As there are no other case causing canonical failure, it is probably again because of my bootloader screwing things around :(
Last edited by wichtounet on Sat Feb 08, 2014 1:01 pm, edited 1 time in total.
Thor Operating System: C++ 64 bits OS: https://github.com/wichtounet/thor-os
Good osdeving!
User avatar
wichtounet
Member
Member
Posts: 90
Joined: Fri Nov 01, 2013 4:05 pm
Location: Fribourg, Switzerland
Contact:

Re: Strange errors when kernel goes bigger

Post by wichtounet »

This problem is driving me crazy, after hours of debugging, I haven't found a solution :x

I was able to pinpoint the assembly instruction causing issues and it is indeed a canonical failure. What I don't understand is were it does come from.

The line is this one: mov qword ptr ds:[rsi], 0x3fe0

A. When the kernel makes 32248 bytes, it works perfectly, rsi has value that make sense (0x100000).
B. When I uncomment some code (not executed code) and the kernel gets bigger (37368 bytes), it doesn't work because rsi has a value that does not make sense at all (0x737361207961772D)

I noticed something that could be interesting. If I compile the kernel that works (A.) with mcmodel=large, it fails in the same fashion as (B.) with different rsi value (0x746169636F737361). Could it be something going one with my compile options ?

Here are the compile and link flags:

Code: Select all

CC=x86_64-elf-g++
AS=x86_64-elf-as
OC=x86_64-elf-objcopy

WARNING_FLAGS=-Wall -Wextra -pedantic -Wold-style-cast -Wshadow
COMMON_CPP_FLAGS=-masm=intel -Iinclude/ -nostdlib -Os -std=c++11 -fno-stack-protector -fno-exceptions -funsigned-char -fno-rtti -ffreestanding -fomit-frame-pointer -mno-red-zone -mno-3dnow -mno-mmx -fno-asynchronous-unwind-tables -mcmodel=large

CPP_FLAGS_LOW=-march=i386 -m32 -fno-strict-aliasing -fno-pic -fno-toplevel-reorder -mno-sse -mno-sse2 -mno-sse3 -mno-sse4 -mno-sse4.1 -mno-sse4.2

CPP_FLAGS_16=$(COMMON_CPP_FLAGS) $(CPP_FLAGS_LOW) -mregparm=3 -mpreferred-stack-boundary=2
CPP_FLAGS_32=$(COMMON_CPP_FLAGS) $(CPP_FLAGS_LOW) -mpreferred-stack-boundary=4
CPP_FLAGS_64=$(COMMON_CPP_FLAGS) -mno-sse3 -mno-sse4 -mno-sse4.1 -mno-sse4.2

COMMON_LINK_FLAGS=-lgcc
I also attached the linker script.

It doesn't seem to come from the bootloader, since the correct code seems to be executed in both case, when I debug with magic breakpoints, I fall at the same lines of code in A and B.

I'm open to every idea, as mine didn't made a change :(

I know it is quite few information, but I don't know what else to include as the problem really seems weird.

Thanks
Attachments
linker.ld
(797 Bytes) Downloaded 20 times
Thor Operating System: C++ 64 bits OS: https://github.com/wichtounet/thor-os
Good osdeving!
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Strange errors when kernel goes bigger

Post by Combuster »

Sounds like a typical case of the bootloader not loading enough so that part of the binary gets replaced with garbage.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
wichtounet
Member
Member
Posts: 90
Joined: Fri Nov 01, 2013 4:05 pm
Location: Fribourg, Switzerland
Contact:

Re: Strange errors when kernel goes bigger

Post by wichtounet »

Combuster wrote:Sounds like a typical case of the bootloader not loading enough so that part of the binary gets replaced with garbage.
Yes, I know, that is what I thought at first, but I haven't found an error in the bootloader. And the point that mcmodel=large breaks the code seems to indicate the same, isn't it ?

The bootloader is using FAT32 to read the file. The file is 37368 bytes and 10 clusters of 4096 bytes are read and loaded into memory.

Here are the values loaded:

Code: Select all

Cluster LBA     Segment    Offset   
3        4050     1536          0
4        4058     1792          0
5        4066     2048          0
6        4074     2304          0
7        4082     2560          0
8        4090     2816          0
9        4098     3072          0
10       4106     3328          0
11       4114     3584          0
12       4122     3840          0
I'll check it again tomorrow, but it seems to me that enough data is loaded.
Thor Operating System: C++ 64 bits OS: https://github.com/wichtounet/thor-os
Good osdeving!
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Strange errors when kernel goes bigger

Post by Owen »

The obvious thing to do now is to set a breakpoint somewhere before rsi is loaded with the bogus value and step through it to see where it comes from.

-mcmodel=large causing issues would agree with the possibility of the bootloader not loading enough; it tends to increase the size of the binary somewhat.
User avatar
iansjack
Member
Member
Posts: 4711
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Strange errors when kernel goes bigger

Post by iansjack »

Stack colliding with code perhaps? As Owen said, time to start debugging.
User avatar
wichtounet
Member
Member
Posts: 90
Joined: Fri Nov 01, 2013 4:05 pm
Location: Fribourg, Switzerland
Contact:

Re: Strange errors when kernel goes bigger

Post by wichtounet »

Thanks for your ideas.
Owen wrote:-mcmodel=large causing issues would agree with the possibility of the bootloader not loading enough; it tends to increase the size of the binary somewhat.
Yes, you're right, I didn't thought about that, the kernel is indeed quite a bit larger with large mcmodel.
iansjack wrote:Stack colliding with code perhaps?
Unfortunately not, the stack is set at 0x0:0x4000:

Code: Select all

    xor ax, ax
    mov ss, ax
    mov sp, 0x4000
Owen wrote:The obvious thing to do now is to set a breakpoint somewhere before rsi is loaded with the bogus value and step through it to see where it comes from.
iansjack wrote:As Owen said, time to start debugging.
I'll have to try to use remote gdb on Bochs, because the generated assembly code is too big and complicated to be debugged by hand. I hope that works...
Thor Operating System: C++ 64 bits OS: https://github.com/wichtounet/thor-os
Good osdeving!
User avatar
wichtounet
Member
Member
Posts: 90
Joined: Fri Nov 01, 2013 4:05 pm
Location: Fribourg, Switzerland
Contact:

Re: Strange errors when kernel goes bigger

Post by wichtounet »

I finally found the problem...

The .bss section was not included in my flat binary :(

I was creating the flat binary using objcopy:

$(OC) -R .note -R .comment -O binary kernel.bin.o kernel.bin

but that did not include the .bss section. I add to use this command:

$(OC) -R .note -R .comment -O binary --set-section-flags .bss=alloc,load,contents kernel.bin.o kernel.bin

I cannot believe I lost so many time for this single error :(
Thor Operating System: C++ 64 bits OS: https://github.com/wichtounet/thor-os
Good osdeving!
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Strange errors when kernel goes bigger

Post by xenos »

Well, normally .bss should not be included in the binary, as it should not contain any data before the kernel writes something there. Initially it should contain only zeros (and including those zeros into the kernel file is a waste of space). So the bootloader should load your kernel's .code and .data sections and write zeros to the .bss. However, as you are obviously using a flat binary, the bootloader does not know anything about the .bss, and therefore does not reserve that space, unless you forcefully include those zeros into the file.

The proper solution to this would be to use a binary format with a section (or rather segment) header such as ELF. Then the bootloader can read this header and determine which parts of the kernel file should be loaded where in memory, and which should initially be filled with zeros.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
User avatar
wichtounet
Member
Member
Posts: 90
Joined: Fri Nov 01, 2013 4:05 pm
Location: Fribourg, Switzerland
Contact:

Re: Strange errors when kernel goes bigger

Post by wichtounet »

XenOS wrote:Well, normally .bss should not be included in the binary, as it should not contain any data before the kernel writes something there. Initially it should contain only zeros (and including those zeros into the kernel file is a waste of space). So the bootloader should load your kernel's .code and .data sections and write zeros to the .bss. However, as you are obviously using a flat binary, the bootloader does not know anything about the .bss, and therefore does not reserve that space, unless you forcefully include those zeros into the file.

The proper solution to this would be to use a binary format with a section (or rather segment) header such as ELF. Then the bootloader can read this header and determine which parts of the kernel file should be loaded where in memory, and which should initially be filled with zeros.
Yes, I know that, but I don't want to implement all this in my bootloader in 16bits assembly... That is what I'm going to do in my OS when I will load programs. I want to spend the few time I have writing my OS, not spend too much of it on my bootloader (I know, I should use Grub in this case, but I don't like this option either). Moreover, as a flat binary is not ELF, I would have thought that by default the .bss section would be included.
Thor Operating System: C++ 64 bits OS: https://github.com/wichtounet/thor-os
Good osdeving!
palk
Posts: 16
Joined: Mon Nov 15, 2010 8:30 pm

Re: Strange errors when kernel goes bigger

Post by palk »

The BSS is never included because there's nothing in it. It's the loader's job to zero the BSS, but it's not a bad idea to also zero the BSS in your kernel as a "just in case."

It's pretty simple code… no reason not to do it in real mode unless the addressing prevents it. All you have to do is set up ES:(E)DI to point to the start of the BSS segment, clear AL, set up (E)CX with the size of the BSS segment, and then just REP STOSB.

Example code for real mode:

Code: Select all

mov es, (BSS_START >> 16)
mov di, (BSS_START & 0xFFFF)
xor al, al
mov cx, BSS_SIZE
rep stosb
User avatar
wichtounet
Member
Member
Posts: 90
Joined: Fri Nov 01, 2013 4:05 pm
Location: Fribourg, Switzerland
Contact:

Re: Strange errors when kernel goes bigger

Post by wichtounet »

Yes, the assembly code for writing zeros is simple... but the assembly code to find BSS_START and BSS_SIZE is not... Moreover, it implies an ELF executable, which I don't have, since I'm only using a flat binary.
Thor Operating System: C++ 64 bits OS: https://github.com/wichtounet/thor-os
Good osdeving!
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Strange errors when kernel goes bigger

Post by xenos »

wichtounet wrote:Yes, the assembly code for writing zeros is simple... but the assembly code to find BSS_START and BSS_SIZE is not... Moreover, it implies an ELF executable, which I don't have, since I'm only using a flat binary.
Why? It is very simple to define these two symbols in your linker script and to use them in your assembly. This also works for flat binary files.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
Post Reply