Page 1 of 1

Linker issues[Solved]

Posted: Sun Dec 26, 2010 10:00 am
by davidv1992
Whilst working on my kernel today I ran into some very weird memory overrun. When digging deeper I bumped into stuff that I no longer understand completely, and I hoped someone on this forum could shed some light on it.

First of all the following loop causes the actual overrun (determined via bochs)

Code: Select all

	for (i=0; i<NUM_DEV; i++)
	{
		if (file_used[i])
			continue;
		
		print_num(i);
		print("\n");
		
		file_used[i] = 1;
		files[i] = *file;
		return 0;
	}
Definitions used:

Code: Select all

#define NUM_DEV 128
struct devicefile files[NUM_DEV];
int file_used[NUM_DEV];
This kinda suprised me (especially cause I later figured out that it did so on the 3d iteration (i=2).
the struct devicefile has a size of 40 bytes.

It runs over another variable called freeHead (static to some module).
Definition:

Code: Select all

static struct timer freeHead, freeTail;
struct timer is a 20 byte struct padded for some reason to 24.

Next I went through the output of readelf -w for the final executable
Relevant for the loop file

Code: Select all

...
 <1><5b2>: Abbrev Number: 7 (DW_TAG_structure_type)
    <5b3>   DW_AT_name        : (indirect string, offset: 0x109): devicefile	
    <5b7>   DW_AT_byte_size   : 40	
    <5b8>   DW_AT_decl_file   : 4	
    <5b9>   DW_AT_decl_line   : 7	
    <5ba>   DW_AT_sibling     : <0x5f7>
 <2><5be>: Abbrev Number: 9 (DW_TAG_member)
    <5bf>   DW_AT_name        : (indirect string, offset: 0xf56): name	
    <5c3>   DW_AT_decl_file   : 4	
    <5c4>   DW_AT_decl_line   : 13	
    <5c5>   DW_AT_type        : <0x5f7>	
    <5c9>   DW_AT_data_member_location: 2 byte block: 23 0 	(DW_OP_plus_uconst: 0)
 <2><5cc>: Abbrev Number: 9 (DW_TAG_member)
    <5cd>   DW_AT_name        : (indirect string, offset: 0x2a0): open	
    <5d1>   DW_AT_decl_file   : 4	
    <5d2>   DW_AT_decl_line   : 14	
    <5d3>   DW_AT_type        : <0x581>	
    <5d7>   DW_AT_data_member_location: 2 byte block: 23 10 	(DW_OP_plus_uconst: 16)
 <2><5da>: Abbrev Number: 9 (DW_TAG_member)
    <5db>   DW_AT_name        : (indirect string, offset: 0x23f): perm	
    <5df>   DW_AT_decl_file   : 4	
    <5e0>   DW_AT_decl_line   : 15	
    <5e1>   DW_AT_type        : <0x22e>	
    <5e5>   DW_AT_data_member_location: 2 byte block: 23 14 	(DW_OP_plus_uconst: 20)
 <2><5e8>: Abbrev Number: 9 (DW_TAG_member)
    <5e9>   DW_AT_name        : (indirect string, offset: 0x1135): data	
    <5ed>   DW_AT_decl_file   : 4	
    <5ee>   DW_AT_decl_line   : 17	
    <5ef>   DW_AT_type        : <0x607>	
    <5f3>   DW_AT_data_member_location: 2 byte block: 23 18 	(DW_OP_plus_uconst: 24)
...
<1><94b>: Abbrev Number: 13 (DW_TAG_array_type)
    <94c>   DW_AT_type        : <0x5b2>	
    <950>   DW_AT_sibling     : <0x95b>	
...
 <1><95b>: Abbrev Number: 25 (DW_TAG_variable)
    <95c>   DW_AT_name        : (indirect string, offset: 0x2c0): files	
    <960>   DW_AT_decl_file   : 1	
    <961>   DW_AT_decl_line   : 9	
    <962>   DW_AT_type        : <0x94b>	
    <966>   DW_AT_external    : 1	
    <967>   DW_AT_location    : 5 byte block: 3 d4 c 14 0 	(DW_OP_addr: 140cd4)
...
The files variable takes 128*40 = 5120 = 0x1400 bytes of space.

For the file where the variable which is overrun resides

Code: Select all

...
 <1><1804>: Abbrev Number: 3 (DW_TAG_structure_type)
    <1805>   DW_AT_name        : (indirect string, offset: 0x609): timer	
    <1809>   DW_AT_byte_size   : 24	
    <180a>   DW_AT_decl_file   : 1	
    <180b>   DW_AT_decl_line   : 9	
    <180c>   DW_AT_sibling     : <0x1856>	
 <2><1810>: Abbrev Number: 4 (DW_TAG_member)
    <1811>   DW_AT_name        : ID	
    <1814>   DW_AT_decl_file   : 1	
    <1815>   DW_AT_decl_line   : 10	
    <1816>   DW_AT_type        : <0x17fd>	
    <181a>   DW_AT_data_member_location: 2 byte block: 23 0 	(DW_OP_plus_uconst: 0)
 <2><181d>: Abbrev Number: 4 (DW_TAG_member)
    <181e>   DW_AT_name        : TID	
    <1822>   DW_AT_decl_file   : 1	
    <1823>   DW_AT_decl_line   : 11	
    <1824>   DW_AT_type        : <0x17fd>	
    <1828>   DW_AT_data_member_location: 2 byte block: 23 4 	(DW_OP_plus_uconst: 4)
 <2><182b>: Abbrev Number: 5 (DW_TAG_member)
    <182c>   DW_AT_name        : (indirect string, offset: 0x66d): alarmtime	
    <1830>   DW_AT_decl_file   : 1	
    <1831>   DW_AT_decl_line   : 12	
    <1832>   DW_AT_type        : <0x1856>	
    <1836>   DW_AT_data_member_location: 2 byte block: 23 8 	(DW_OP_plus_uconst: 8)
 <2><1839>: Abbrev Number: 5 (DW_TAG_member)
    <183a>   DW_AT_name        : (indirect string, offset: 0x5fd): next	
    <183e>   DW_AT_decl_file   : 1	
    <183f>   DW_AT_decl_line   : 13	
    <1840>   DW_AT_type        : <0x185d>	
    <1844>   DW_AT_data_member_location: 2 byte block: 23 10 	(DW_OP_plus_uconst: 16)
 <2><1847>: Abbrev Number: 5 (DW_TAG_member)
    <1848>   DW_AT_name        : (indirect string, offset: 0x6a2): prev	
    <184c>   DW_AT_decl_file   : 1	
    <184d>   DW_AT_decl_line   : 14	
    <184e>   DW_AT_type        : <0x185d>	
    <1852>   DW_AT_data_member_location: 2 byte block: 23 14 	(DW_OP_plus_uconst: 20)
...
 <1><198a>: Abbrev Number: 14 (DW_TAG_variable)
    <198b>   DW_AT_name        : (indirect string, offset: 0x627): freeHead	
    <198f>   DW_AT_decl_file   : 1	
    <1990>   DW_AT_decl_line   : 18	
    <1991>   DW_AT_type        : <0x1804>	
    <1995>   DW_AT_location    : 5 byte block: 3 30 d 14 0 	(DW_OP_addr: 140d30)
...
The address 140d30 lies within a block of 5120 bytes starting at 140cd4. So obviously something is going wrong in the linking proces. The question I have is what.

Makefile used:

Code: Select all

SRCFILES = $(shell find ./ -mindepth 1 -maxdepth 5 -name "*.c")
ASMFILES = $(shell find ./ -mindepth 1 -maxdepth 5 -name "*.s")
OBJFILES = $(patsubst %.s,%.o,$(ASMFILES)) $(patsubst %.c,%.o,$(SRCFILES))
DFILES   = $(patsubst %.c,%.d,$(SRCFILES))
AS = nasm
ASFLAGS = -f elf
CC = gcc
CFLAGS = -MMD -MT "$*.d" -MP -Wall -Wextra -Werror -nostdlib -nostartfiles -nodefaultlibs -I./include/ -m32 -g 
 
floppy.img: kernel.bin stage1 stage2 pad asmdum
	cat stage1 stage2 pad kernel.bin >floppy.img

kernel.bin: $(OBJFILES)
	ld -T linker.ld -o kernel.bin $(OBJFILES) -melf_i386 `gcc -m32 -print-libgcc-file-name`

asmdum: kernel.bin
	objdump -S kernel.bin >asmdum

.PHONY clean:
	rm -f $(OBJFILES) $(DFILES) kernel.bin floppy.img asmdum
Linkerscript used:

Code: Select all

ENTRY (loader)

SECTIONS{
	. = 0x00100000;
	
	.text ALIGN (0x1000) : {
		*(.text)
	}

	.rodata ALIGN (0x1000) : {
		*(.rodata)
	}
	
	.data ALIGN (0x1000) : {
		*(.data)
	}
	
	.bss ALIGN (0x1000) : {
		sbss = .;
		*(COMMON)
		*(.bss)
		ebss = .;
	}
	kernel_end = .;
}
I'd gladly hear any suggestions for workarounds and or causes.

Re: Linker issues

Posted: Sun Dec 26, 2010 10:52 am
by NickJohnson
You have a line dereferencing a variable called "file": what is that variable?

Re: Linker issues

Posted: Sun Dec 26, 2010 11:12 am
by davidv1992
Function prototype for function containing the loop. File is just a pointer to a struct devicefile.

Code: Select all

int devfs_reg_device(struct devicefile *file)

Re: Linker issues

Posted: Sun Dec 26, 2010 1:53 pm
by NickJohnson
Could the problem be in the code that calls devfs_reg_device? Where is "file" pointing when it breaks?

Re: Linker issues

Posted: Sun Dec 26, 2010 2:58 pm
by davidv1992
These are currently the only three call sites:

Code: Select all

void init_null()
{
	struct devicefile dev;
	
	// For all our devices
	dev.perm = 0777 | FLAG_STREAM | FLAG_DEV;
	dev.open = open;
	
	// /dev/null
	kmemcpy(dev.name, "null", 5);
	dev.data[0] = 2;
	devfs_reg_device(&dev);
	
	// /dev/zero
	kmemcpy(dev.name, "zero", 5);
	dev.data[0] = 3;
	devfs_reg_device(&dev);
	
	// /dev/full
	kmemcpy(dev.name, "full", 5);
	dev.data[0] = 1;
	devfs_reg_device(&dev);
}
problem is, I don't think it is a code issue per se, the output of readelf -w (if I interpret it correctly) says that something goes horribly wrong during linking.

Edit:
Exact versions of toolchain might be interresting here just in case. I've learned to assume I'm the one causing the problem but it might just be I bumped into some issue in the toolchain

OS: GNU/Linux (Ubuntu 10.10)
gcc: gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
ld: GNU ld (GNU Binutils for Ubuntu) 2.20.51-system.20100908

Re: Linker issues

Posted: Sun Dec 26, 2010 4:33 pm
by xenos
Probably one of the first suggestions you'd get here is to use a cross compiler. There are always some problems popping up as soon as one tries to use a Linux or Windows targeted compiler for OS development, and many of these can be solved by using a cross compiler. In your case, an ELF targeted version of gcc would probably be the best choice. You can find a complete guide on how to compile a cross compiler here:

GCC Cross-Compiler

I know that it might look complicated at first sight (and it took me a while when I started doing this myself), but it's quite simple once you worked through it. Just follow the steps of this tutorial, and get yourself some coffee while it's compiling ;)

Apart from that, I'm not quite sure how to interpret the output of readelf you have posted... So maybe you could explain: Which variable / pointer ends up with a wrong value / address? Which value / address do you expect, and which do you get?

Re: Linker issues

Posted: Sun Dec 26, 2010 4:37 pm
by gerryg400
Does your code compile and link without _any_ warnings ?

Re: Linker issues

Posted: Sun Dec 26, 2010 5:08 pm
by davidv1992
gerryg400 wrote:Does your code compile and link without _any_ warnings ?
Yes, and it should cause warnings usually point to hard to find mistakes.
XenOS wrote:Probably one of the first suggestions you'd get here is to use a cross compiler. There are always some problems popping up as soon as one tries to use a Linux or Windows targeted compiler for OS development, and many of these can be solved by using a cross compiler. In your case, an ELF targeted version of gcc would probably be the best choice. You can find a complete guide on how to compile a cross compiler here:

GCC Cross-Compiler

I know that it might look complicated at first sight (and it took me a while when I started doing this myself), but it's quite simple once you worked through it. Just follow the steps of this tutorial, and get yourself some coffee while it's compiling ;)
I know cross-compilers can solve trouble when having unreferenced labels but I don't believe they will be the solution in this case. It might seem to work but it wouldn't tell me the underlying cause (and more importantly whether the bug/problem was in anything I wrote). Back when I set up build environments I've tried and verified that no code from the guest operating system came with the builds, so it shouldn't be the cause anyway.
XenOS wrote: Apart from that, I'm not quite sure how to interpret the output of readelf you have posted... So maybe you could explain: Which variable / pointer ends up with a wrong value / address? Which value / address do you expect, and which do you get?
The variable freeHead is overwritten (more specifically, it contains a field with a pointer called next that is overwritten) and the write in question ought to be a valid write inside the area of the files array. (specifically the filling of the 2nd entry of it). THe value which i get is junk, I didn't go sofar as to track back what it is part of because it doesn't seem relevant. What i'd be expecting is a valid pointer in that field (to 140d60), and i get 0xf (clear junk).

Re: Linker issues

Posted: Mon Dec 27, 2010 2:12 am
by xenos
davidv1992 wrote:The variable freeHead is overwritten (more specifically, it contains a field with a pointer called next that is overwritten) and the write in question ought to be a valid write inside the area of the files array. (specifically the filling of the 2nd entry of it). THe value which i get is junk, I didn't go sofar as to track back what it is part of because it doesn't seem relevant. What i'd be expecting is a valid pointer in that field (to 140d60), and i get 0xf (clear junk).
So in that case, I would try the following:
  • Determine the (physical) memory location where your freeHead variable resides.
  • Start your kernel in bochs with debugger enabled.
  • Place a memory write watch point at the location of the pointer that is being overwritten.
  • Run through your code and find out 1. which part of your code overwrites this location and 2. which value it puts in there.
  • As soon as you find the 0xf, there are two possibilities: Either some code modifies freeHead which should write the 0xf to some completely different memory location, or the code should write to freeHead but gets the wrong value 0xf from somewhere else. In the first case, you need to trace back where this code gets the wrong write address from. In the second case, you need to trace back the origin of the value 0xf.

Re: Linker issues

Posted: Mon Dec 27, 2010 8:23 am
by davidv1992
I already did just that, the code of the loop does the write of the 0xf, which is some legal value inside the devicefile struct. The adres it uses is generated by the compiler and the linker. (see original post for loop code)

The digging I then did through output of readelf -w and the addresses it gives for variables suggest to me that the actual space allocated by the linker and compiler overlaps. What I don't get is why the hell this happens.

Just in case it is helpfull, the dissambled loop and read:

Code: Select all

...
ID = freeHead.next->ID;
  102ce9:	a1 40 0d 14 00       	mov    0x140d40,%eax
  102cee:	8b 00                	mov    (%eax),%eax
  102cf0:	89 45 f0             	mov    %eax,-0x10(%ebp)
...
int devfs_reg_device(struct devicefile *file)
{
  1010fe:	55                   	push   %ebp
  1010ff:	89 e5                	mov    %esp,%ebp
  101101:	83 ec 28             	sub    $0x28,%esp
	int i;
	
	if (find_file(file->name) > 0)
  101104:	8b 45 08             	mov    0x8(%ebp),%eax
  101107:	89 04 24             	mov    %eax,(%esp)
  10110a:	e8 95 ff ff ff       	call   1010a4 <find_file>
  10110f:	85 c0                	test   %eax,%eax
  101111:	7e 0a                	jle    10111d <devfs_reg_device+0x1f>
		return EEXIST;
  101113:	b8 13 00 00 00       	mov    $0x13,%eax
  101118:	e9 c7 00 00 00       	jmp    1011e4 <devfs_reg_device+0xe6>
		
	for (i=0; i<NUM_DEV; i++)
  10111d:	c7 45 f4 00 00 00 00 	movl   $0x0,-0xc(%ebp)
  101124:	e9 ac 00 00 00       	jmp    1011d5 <devfs_reg_device+0xd7>
	{
		if (file_used[i])
  101129:	8b 45 f4             	mov    -0xc(%ebp),%eax
  10112c:	8b 04 85 00 c0 10 00 	mov    0x10c000(,%eax,4),%eax
  101133:	85 c0                	test   %eax,%eax
  101135:	74 09                	je     101140 <devfs_reg_device+0x42>
	int i;
	
	if (find_file(file->name) > 0)
		return EEXIST;
		
	for (i=0; i<NUM_DEV; i++)
  101137:	83 45 f4 01          	addl   $0x1,-0xc(%ebp)
  10113b:	e9 95 00 00 00       	jmp    1011d5 <devfs_reg_device+0xd7>
	{
		if (file_used[i])
			continue;
		
		print_num(i);
  101140:	8b 45 f4             	mov    -0xc(%ebp),%eax
  101143:	89 04 24             	mov    %eax,(%esp)
  101146:	e8 0a 6b 00 00       	call   107c55 <print_num>
		print("\n");
  10114b:	c7 04 24 00 a0 10 00 	movl   $0x10a000,(%esp)
  101152:	e8 dc fc ff ff       	call   100e33 <print>
		
		file_used[i] = 1;
  101157:	8b 45 f4             	mov    -0xc(%ebp),%eax
  10115a:	c7 04 85 00 c0 10 00 	movl   $0x1,0x10c000(,%eax,4)
  101161:	01 00 00 00 
		files[i] = *file;
  101165:	8b 55 f4             	mov    -0xc(%ebp),%edx
  101168:	89 d0                	mov    %edx,%eax
  10116a:	c1 e0 02             	shl    $0x2,%eax
  10116d:	01 d0                	add    %edx,%eax
  10116f:	c1 e0 03             	shl    $0x3,%eax
  101172:	8b 55 08             	mov    0x8(%ebp),%edx
  101175:	8b 0a                	mov    (%edx),%ecx
  101177:	89 88 d4 0c 14 00    	mov    %ecx,0x140cd4(%eax)
  10117d:	8b 4a 04             	mov    0x4(%edx),%ecx
  101180:	89 88 d8 0c 14 00    	mov    %ecx,0x140cd8(%eax)
  101186:	8b 4a 08             	mov    0x8(%edx),%ecx
  101189:	89 88 dc 0c 14 00    	mov    %ecx,0x140cdc(%eax)
  10118f:	8b 4a 0c             	mov    0xc(%edx),%ecx
  101192:	89 88 e0 0c 14 00    	mov    %ecx,0x140ce0(%eax)
  101198:	8b 4a 10             	mov    0x10(%edx),%ecx
  10119b:	89 88 e4 0c 14 00    	mov    %ecx,0x140ce4(%eax)
  1011a1:	8b 4a 14             	mov    0x14(%edx),%ecx
  1011a4:	89 88 e8 0c 14 00    	mov    %ecx,0x140ce8(%eax)
  1011aa:	8b 4a 18             	mov    0x18(%edx),%ecx
  1011ad:	89 88 ec 0c 14 00    	mov    %ecx,0x140cec(%eax)
  1011b3:	8b 4a 1c             	mov    0x1c(%edx),%ecx
  1011b6:	89 88 f0 0c 14 00    	mov    %ecx,0x140cf0(%eax)
  1011bc:	8b 4a 20             	mov    0x20(%edx),%ecx
  1011bf:	89 88 f4 0c 14 00    	mov    %ecx,0x140cf4(%eax)
  1011c5:	8b 52 24             	mov    0x24(%edx),%edx
  1011c8:	89 90 f8 0c 14 00    	mov    %edx,0x140cf8(%eax)
		return 0;
  1011ce:	b8 00 00 00 00       	mov    $0x0,%eax
  1011d3:	eb 0f                	jmp    1011e4 <devfs_reg_device+0xe6>
	int i;
	
	if (find_file(file->name) > 0)
		return EEXIST;
		
	for (i=0; i<NUM_DEV; i++)
  1011d5:	83 7d f4 7f          	cmpl   $0x7f,-0xc(%ebp)
  1011d9:	0f 8e 4a ff ff ff    	jle    101129 <devfs_reg_device+0x2b>
		file_used[i] = 1;
		files[i] = *file;
		return 0;
	}
	
	return ENOMEM;
  1011df:	b8 2f 00 00 00       	mov    $0x2f,%eax
}
  1011e4:	c9                   	leave  
  1011e5:	c3                   	ret    
...

Re: Linker issues

Posted: Mon Dec 27, 2010 1:52 pm
by DavidCooper
davidv1992 wrote:Just in case it is helpfull, the dissambled loop and read:

Code: Select all

...
ID = freeHead.next->ID;
  102ce9:	a1 40 0d 14 00       	mov    0x140d40,%eax
I was going to have a go at reading through this, but the first line of it has put me off - how can a disassembler make a mistake like that? A1 (161 in decimal) is an instruction to load eax from an immediate memory address. Either the "a1" should be "a3" or the "0x140d40,%eax" part is the wrong way round.

Re: Linker issues

Posted: Mon Dec 27, 2010 2:17 pm
by fronty
That means just what you described. In AT&T syntax the operands are in logical order: I don't say move to b a, I say move a to b. Thus mov addr, %rd means load from address addr to register %rd.

Re: Linker issues

Posted: Mon Dec 27, 2010 3:46 pm
by DavidCooper
fronty wrote:That means just what you described. In AT&T syntax the operands are in logical order: I don't say move to b a, I say move a to b. Thus mov addr, %rd means load from address addr to register %rd.
That's a relief to hear that: I was beginning to wonder if I was going mad. I had assumed that assemblers and disassemblers would all work the same way round, but it seems not. I thought a lot of the instructions were working in the other direction, but the compiler's produced code that's working the opposite way round from the way that I write machine code, for example by using 137 instead of 139 for reg to reg moves (the Intel manual I learned from made me think the higher value instructions should be used every time when there is a choice, but this compiler's using the lower value instruction instead, and the directionality is reversed).

Anyway, I doubt the error's in there if it's an issue of the same memory being allocated to two different things. I can't help with that, so I'll get out of the way.

Re: Linker issues

Posted: Tue Dec 28, 2010 6:12 am
by davidv1992
Got it figured out. I had two global variables of different types both named files. One of them was put straight into the .bss section and the other was a .comm. ld didn't warn or error on that though, but went on and only used the one in the .bss section, which caused my troubles.

Re: Linker issues[Solved]

Posted: Tue Dec 28, 2010 10:20 am
by davidv1992
No, the implementation of them is bad, and the default access policy C has for them (ie, public)