Page 1 of 1

Strange problem

Posted: Mon Aug 21, 2006 7:37 am
by matthias
I have this problem that I don't know how to solve. I thought my stack was corrupted but it isn't. I have several functions which calculate something adding a #defined value. But when I compile it the value's don't seem to be right. In one function I needed to add the PAGE_SIZE, which is normally 0x1000, but when I disassembled the code it said 0x4000 :? I solved this by setting the page size to 0x400 which is not ok of course (temporary solution) But now I can not run from the problem anymore I have the same issue on another function in which I cannot simply use another number:

Code: Select all

/* user id is defined as address of structure - BASE */
/* this gives us random ID's but this allows for better security */
#define BASE	0xF0000000;

static user* head_user = 0;
static user* tail_user = 0;

size_t user_num = 0;

/* create a new user and add it to user list */
id_t CreateUser(char* name)
{
	user* new_user = malloc(sizeof(user));

	/* first clear structure */
	memset(new_user, 0, sizeof(user));

	/* copy user name */
	strcpy(new_user->name, name);

	/* copy remaining properties */
	new_user->uid = (id_t)new_user - BASE;

	new_user->prev = tail_user->prev;
	tail_user->prev = new_user;
	new_user->next = tail_user;

	new_user->next_group_member = 0;
	new_user->prev_group_member = 0;

	user_num++;

	printf("added user %s with uid %d\n", new_user->name, new_user->uid);

	return new_user->uid;
}

/* remove a user from system and destroy all it's running processes */
void DestroyUser(id_t id)
{
	user* destroy_user = (user*)id + BASE;

	/* update list */
	destroy_user->prev->next = destroy_user->next;
	destroy_user->next->prev = destroy_user->prev;

	/* free memory */
	free(destroy_user);

	user_num--;

	/* TODO kill processes */
}
Piece of code to look at:

Code: Select all

user* destroy_user = (user*)id + BASE;
Which is used to calculate the memory offset of the user structure. So when I use this code I'll get a page fault addressing to 0x800000xx. That is not ok. So I disassembled the code and guess what:

Code: Select all

000000a0 <_DestroyUser>:
  a0:	55                   	push   %ebp
  a1:	89 e5                	mov    %esp,%ebp
  a3:	83 ec 14             	sub    $0x14,%esp
  a6:	8b 55 08             	mov    0x8(%ebp),%edx
  a9:	81 ea 00 00 00 80    	sub    $0x80000000,%edx
  af:	8b 4a 1c             	mov    0x1c(%edx),%ecx
  b2:	8b 42 18             	mov    0x18(%edx),%eax
  b5:	89 41 18             	mov    %eax,0x18(%ecx)
  b8:	8b 42 18             	mov    0x18(%edx),%eax
  bb:	89 48 1c             	mov    %ecx,0x1c(%eax)
  be:	52                   	push   %edx
  bf:	e8 3c ff ff ff       	call   0 <.text>
  c4:	a1 00 00 00 00       	mov    0x0,%eax
  c9:	48                   	dec    %eax
  ca:	a3 00 00 00 00       	mov    %eax,0x0
  cf:	89 ec                	mov    %ebp,%esp
  d1:	5d                   	pop    %ebp
  d2:	c3                   	ret    
  d3:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi
  d9:	8d bc 27 00 00 00 00 	lea    0x0(%edi,1),%edi
  e0:	69 6e 69 74 69 61 6c 	imul   $0x6c616974,0x69(%esi),%ebp
  e7:	69 7a 69 6e 67 20 75 	imul   $0x7520676e,0x69(%edx),%edi
  ee:	73 65                	jae    155 <_users_install+0x45>
  f0:	72 20                	jb     112 <_users_install+0x2>
  f2:	6d                   	insl   (%dx),%es:(%edi)
  f3:	61                   	popa   
  f4:	6e                   	outsb  %ds:(%esi),(%dx)
  f5:	61                   	popa   
  f6:	67                   	addr16
  f7:	65                   	gs
  f8:	6d                   	insl   (%dx),%es:(%edi)
  f9:	65 6e                	outsb  %gs:(%esi),(%dx)
  fb:	74 2e                	je     12b <_users_install+0x1b>
  fd:	2e 2e 20 00          	and    %al,%cs:(%eax)
 101:	69 6e 69 74 00 4f 4b 	imul   $0x4b4f0074,0x69(%esi),%ebp
 108:	0a 00                	or     (%eax),%al
 10a:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi

Code: Select all

a9:	81 ea 00 00 00 80    	sub    $0x80000000,%edx

should be:

a9:	81 ea 00 00 00 80    	add    $0xf0000000,%edx
I don't see why gcc does this :? For complete info I will also post my ld config here:

Code: Select all

OUTPUT_FORMAT("binary")
INPUT("Debug/loader.o")
INPUT("core.a")
INPUT("boot.a")
INPUT("drivers.a")
INPUT("libc.a")
INPUT("libmem.a")
ENTRY(start)
SECTIONS
{
    .text 0x100000 :
    {
        code = .; _code = .; __code = .;
        *(.text)
        . = ALIGN(4096);
    }

    .data :
    {
        data = .; _data = .; __data = .;
        *(.data)
        . = ALIGN(4096);
    }

    .bss :
    {
        bss = .; _bss = .; __bss = .;
        *(.bss)
        . = ALIGN(4096);
    }

    end = .; _end = .; __end = .;
}
edit: the only common factor there is between the first problem and this one is they both do a malloc() call which requests memory and then an invlpg is done to map it into virtual memory:

http://www.osdev.org/phpBB2/viewtopic.php?t=2833

Posted: Mon Aug 21, 2006 3:18 pm
by gaf

Code: Select all

user* destroy_user = (user*)((unsigned int)id + BASE)
Just an idea: Could it be that id_t is a signed integer ? As 0xf0000000 is too big for a signed 32bit value all kinds of overflow problems might result.
In one function I needed to add the PAGE_SIZE, which is normally 0x1000, but when I disassembled the code it said 0x4000
Multiplication with 4 ? Sounds as if the compiler took the page-size as an index to a field of 32bit values. Any way that your code gets interpreted as something like "id + sizeof(id_t)*BASE" ? What's the definition of your id type?

Code: Select all

c4:   a1 00 00 00 00          mov    0x0,%eax 
c9:   48                      dec    %eax 
ca:   a3 00 00 00 00          mov    %eax,0x0
That piece of code also looks quite strange to me. If your kernel is mapped to 0xF0000000 how can user_num be stored at address 0 ?
/* user id is defined as address of structure - BASE */
/* this gives us random ID's but this allows for better security */
Sorry, but this doesn't sound very secure and the resulting values should be rather easy to guess provided that you know how the kernel works internally. If the IDs must be kept secret you should probably implement some basic random number generator and seed it with the system clock. Another approach would be to store the ID data in the kernel itself. Tasks then no longer have to identify, the kernel just refers to an internal table to find out which IDs were granted to an application.

regards,
gaf

Posted: Mon Aug 21, 2006 3:37 pm
by matthias
typedef unsigned lond id_t; :wink:

I use malloc() to get memory from heap, heap starts @ 0xf0000000 so I can do this without any problem ;) Yeah I know the security idea is stupid, but still I find it very strange when I define a value as 0xf0000000 it gets compiled as 0x80000000. I'll change the id creation later on, but I want to solve this problem since I have the same problem in my sbrk() function.

Posted: Mon Aug 21, 2006 4:08 pm
by matthias
wtf, it works:

I changed a little bug in my code:

#define BASE 0xf0000000; -> #define BASE 0xf0000000

(notice the ; on the end)

And to calculate the pointer I now use:

Code: Select all

user* destroy_user = (user*)BASE + id;
Thanks gaf for your ideas, and also your notice on the security stuff

:)

Posted: Mon Aug 21, 2006 6:37 pm
by Daedalus
That's because #define's are preprocessor commands, not C statements.

When the compiler sees your BASE in the C code, it puts whatever you defined there - and since you had a semicolon, it put that in too.

Posted: Tue Aug 22, 2006 3:39 am
by matthias
Daedalus wrote:That's because #define's are preprocessor commands, not C statements.

When the compiler sees your BASE in the C code, it puts whatever you defined there - and since you had a semicolon, it put that in too.
Yeah ok, but it only works if BASE is put in front of gid/uid the semicolon only prevented me from doing that and I got a nice error while compiling so the real bug was that BASE should be in front of gid and uid ;)

Posted: Tue Aug 22, 2006 1:00 pm
by gaf
Good to see that I'm at least not the only one who seems to be a bit off these days :D

Some time after I posted my reply it suddenly became totally obvious to me what really caused the problem and I just couldn't believe that I really hadn't seen it. You know these bugs that you just don't seem to be able to track down because they they're just too trivial to even consider ? I'll leave you with some puzzle so that you can work it out yourself :twisted: :

Code: Select all

uint* array = malloc(sizeof(uint)*42);

array++;     // go to the next element

array += 1;  // increase by one (go to the next element)
array += 2;  // increase by two (skip an element)

uint* chosen_one = array + 2; // ???
Any bells ringing ? And why did everything work nicely after you changed the faulting line ?

regards,
gaf

(tip: pay attention to the types of the variables)