X86 instruction "lea" and segments

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
CodEFouR
Posts: 5
Joined: Wed Mar 07, 2007 10:37 am
Location: Beijing, China

X86 instruction "lea" and segments

Post by CodEFouR »

Hi, my toy OS now could load an ELF bin file. But I am totally confused about the instruction "lea". "lea" could calculate the effective address, but it seems to be only an offset, so we need the correct segment to access the memory. Right now my toy OS does not support paging mechenism and the SS, DS are different (DS - 1M~3.5M, SS- 3M~3.5M). When the C program tries to pass the local variable (on stack), I got such things:

Code: Select all

char msg [16];

*( int* )msg = 'tset';            /* Use this form to avoid .rodata and .data
*(( int* )msg + 1) = '\0';       * section coz my toy OS does not load these
                                           * sections, little endian so reverse order */
print( msg );
The machine code is like:

Code: Select all

lea 0xfffffff0(%ebp), %eax      ; msg address      
mov $0x74736574, (%eax)     ; Move "test" to %ds:%eax
lea 0xfffffff0(%ebp), %eax
add $0x4, %eax
mov $0x0, (%eax)                 ; Move '\0' to %ds:%eax
lea 0xfffffff0(%ebp), %eax
mov %eax, (%esp)
call print
The print function is written in C as well, the machine code is like

Code: Select all

mov 0x8(%ebp), %eax           ; msg address
movzwl (%eax), %eax            ; eax = [ds:eax]
The code above will print "test" on the screen. However, "test" will be stored in data segment, NOT stack segment. And the function print access data segment as well (which is not desired because I think it will access the local variable on the stack). So if I write the code like:

Code: Select all

msg[0] = 't';
The corresponding machine code will be:

Code: Select all

movb $0x74, 0xfffffff0(%ebp)      ; Move 't' to [ss:ebp]
As you can see, the code will write the byte to ss:ebp on the stack, which is desired by us. Then the print function will NOT behave correctly because it could not touch [ss:eax]. So if I want to make changes to the variable on the stack, I could never ever do that. On Linux, the code behaves correctly because it uses the same SS and DS segments. Any one can provide some explanations on this? I use GCC 4.0 to compile the code.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

Short answer: GCC is written to assume that DS=ES=SS and by my knowledge there is nothing you can do about it :(

Shorter answer: get a different compiler or change to DS=ES=SS
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
CodEFouR
Posts: 5
Joined: Wed Mar 07, 2007 10:37 am
Location: Beijing, China

Post by CodEFouR »

Combuster wrote:Short answer: GCC is written to assume that DS=ES=SS and by my knowledge there is nothing you can do about it :(

Shorter answer: get a different compiler or change to DS=ES=SS
All right... Thank you very much for your answer. Right now I think I have to get the paging mechenism and memory management module work ASAP to satisfy GCC's assumption.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

Doesn't GCC actually assume that CS=DS=ES=SS. That is, code and data live in the same flat space?
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
CodEFouR
Posts: 5
Joined: Wed Mar 07, 2007 10:37 am
Location: Beijing, China

Post by CodEFouR »

mystran wrote:Doesn't GCC actually assume that CS=DS=ES=SS. That is, code and data live in the same flat space?
No - Actually DS = ES = SS. Under Linux, we have a separate code segment selector. But all the segment descriptors do have the same base linear address as well as the limit. GCC seems to be assuming that we are using the same segment descriptors for DS, ES and SS.
Uncompromised duty, honour and valor.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

Well codesegment is necessarily separate in the sense that you need a different kind of descriptor for it....

What I ment was that doesn't GCC assume that CS:X points to the same place as DS:X, ES:X, and SS:X.

I mean, for purposes like referencing constants or manipulating function pointers such things can be important. :)
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
Post Reply