OSDev.org

Posted: **Fri May 07, 2010 1:29 pm**

question 1
I am wondering how I would get the address of the currently executing instruction?
I don't think you can do

mov eax , eip

question 2
how can I get a functions address in eax or something register?
What I mean is if I have a function

Code: Select all

function1:
;my code
ret

How can I get the address of where to jump to to go to that function.
Note it is easy if I just modify the function1 to return the value in one of the registers like eax or something by doing mov [eax] ,esp ...etc but I don't want to modify the function at all by adding mov eax , [esp] in it.

I know in c/c++ you can just use &function1 which will return the address of the function.
But in asm I don't know how to get the address of the function without calling it and mov esp into eax and returning the value which takes modifying the function1 which I don't want to do....

How does call know where to jump to after it pushes the return address on the stack?

is it vaild to just do mov eax, function1 or mov eax,[function1] don't know what these would do or what the difference between them is .... (if that is the way to get the equivalent to '&' in asm then I am being stupid)

I have only used [] on varibles to get the exact value but I am wondering if [] used on function names give their address.

So may all call command is doing is
this
push varibles
jmp [function1]

If that is it then I am all set with this question

Question 3
For registers

Code: Select all

General registers
EAX EBX ECX EDX

Segment registers
CS DS ES FS GS SS

Index and pointers
ESI EDI EBP EIP ESP

Indicator
EFLAGS

Undocumented or special purpose registers
Control registers are CR0 to CR4, Debug registers are DR0 to DR7, test registers are TR3 to TR7 and the protected mode segmentation registers are GDTR (Global Descriptor Table Register), IDTR (Interrupt Descriptor Table Register), LDTR (Local DTR), and TR(task register).

I have pretty much tried reading and writing to all these register at one time or another.
Leaving out the debuging registers and test register (because these are the ones I never used before so I have no comment on DR0-DR7 TR3-TR7 )
I am just wondering if the only read only register is EIP or IP instruction pointer or can you modify the instruction pointer to run code at a specific address by mov eip , ax,..etc

Also on an operating system like windows when you create an .exe file are you only allowed 1 code segment cs or can you have many code segment that you can switch from.
I know you can do mov cs , ax and say the original cs segment address on the stack but How can you get eip to point at the correct offset of where the starting instruction begins..
usually cs:0x00000000 this would mean you would have to beable to change/modify the eip pointer ???

Posted: **Fri May 07, 2010 1:31 pm**

Answer 1:
If you can use the stack, you can call a function which copies the saved execution pointer from the stack and returns it in eax:

Code: Select all

get_eip:
  mov eax, [esp]
  ret

Answer 2:
In NASM at least, you can simply have "mov eax, function" to get the address of function in eax.

Answer 3:
Instructions like jmp and call modify the instruction pointer. You can do things like "jmp eax", which is the logical equivalent of what "mov eip, eax" would be. Jumps are even better because they can be conditional, which is why they're separate instructions.

Posted: **Fri May 07, 2010 1:54 pm**

Answer 1:
If you can use the stack, you can call a function which copies the saved execution pointer from the stack and returns it in eax:

Code:
get_eip:
mov eax, [esp]
ret

are you saying use this function but before you do ...
do a push eip <-is that even a vaild command

If so I would think your function should be

Code: Select all

get_eip:
  mov eax, [esp+4] ; since the 32 bit return address is at esp skipping that gives you the pushed eip
  ret

But maybe you had something different in mind?
But this seems like a good way to do it. Hopefully it works.

Question2
OOPs my stupidity mov eax , functionname should give you the address of where to jump to go to the function. Anyway is their any difference between functionname and [functionname] I know for varibles their is it means the address or the [value] but for function don't know if their is anything other then an address???

Question 3
Just to confirm you can never mov stuff into eip directly like mov eip , someregister?
If this is the case can you have more then one code segment in your program if so the when switching to a different cs segment by doing mov cs , ax does the eip or ip instruction pointer automatically get set back to the first instruction 0x0000000 in the new cs segment??? Because if not then their will be problems in having more then one cs segment.

Posted: **Fri May 07, 2010 2:08 pm**

The point is that "call" pushes the current instruction pointer onto the stack before it jumps to the called function. That's the only way it which it is different from "jmp", and how the called function knows where to jump back to ("ret" pops the saved instruction pointer and jumps to it). The function I posted takes advantage of this fact by copying the saved stack pointer into eax so it can be returned.

[function] and function are different because the former accesses memory at function and the latter is the address of function. In this case, you really should read the NASM manual first: it's a basic syntax question.

In general, in protected mode, all code segments are base 0x00000000, so there are no problems switching cs in running code (eip is not changed when loading cs). If you need to jump to a new code segment that is at a different base (like when booting), you can jump far like this: "jmp <cs selector>:<address>" safely.

Posted: **Fri May 07, 2010 2:16 pm**

Please allow me to reiterate the basics:

Code: Select all

get_eip:

is called a label. Once you have defined this label, you can use the letters "get_eip" to refer to the address where the label is sitting.

The square bracket operators act exactly like * in C. You use them to find out what data is stored at that memory location. So saying [get_eip] will get you some bytes of opcodes -- because that is what is stored at that memory address.

If you don't have a label to use, then the function that Nick_Johnson gave you works.
And no, you can't save or store values directly to EIP.

Posted: **Fri May 07, 2010 3:11 pm**

bewing wrote:... Nick_Johnson ...

Since when does anyone use camel case and underscores simultaneously?

Posted: **Fri May 07, 2010 6:29 pm**

Code: Select all

In general, in protected mode, all code segments are base 0x00000000, so there are no problems switching cs in running code (eip is not changed when loading cs). If you need to jump to a new code segment that is at a different base (like when booting), you can jump far like this: "jmp <cs selector>:<address>" safely.

questions pertaining to the quote I am all set with 1 , 2 questions but 3 here is my problems with it...

Question 1 real mode

Ok,

In real mode when I program I can have different code segments that start at different places so I need to access them by doing a far jump cs:ip.
When I do a far jump in real 16 bit mode does the far jump load the cs register with the new segment and the ip with the new offset...

Like is jmp cs:ip equivalent
to
mov cs , segment address
mov ip , offset address

Question 2 protected mode

Going by what you wrote about 0x0000000 based cs
I thought the cs value that you load is not zero based but the higher 13 bits of the 16bit cs register corrospond to which entry it will look up in the GDT to find out info such as size access attributes read/write ,...etc and the first 3 bits is for privlage ring 0 ,1 ,2,...etc

eip on a 32bit machine can access in theory all the 4GB memory that a 32bit computer can have.

So my question is where are you getting this zero based stuff from???

Question 3
I am assuming when a program in user land gets a memory access violation it is because it is trying to access a part of memory that either the GDT or LDT table has protected.
If this is the case is their away that you can make a call in a userland program to change an entry or add/remove an entry in the GDT or LDT tables.

What would happen if somebody used inline asm to change the LDTR or GDTR to point to the address of their cooked up LDT or GDT tables?
When I was creating a dummy os I create my own GDT and add my own entries what is stoping me from just reversing and having microsofts GDT point to my own ....

Basically what is stopping a smart person in userland from just providing complete access to what ever program he wants by changing the GDTR pointer,....?

I am assuming it is illegial to do mov gdtr , myaddressofnewgdt but what is making it illegial or lgdt ,...etc commands
How is the operating system preventing these commands from being executed in user land
I know the GDT can protect memory but what structure allows your os to provent executing particular deadly asm commands?

Question 4
In userland programs when I usually objdump to dump the o files I see that their is
one labeled segment .data (this is for global varibles and tables ,...etc where ds points)
one labeled segment called either .code or .text (which is usually read only and is where your code execution is done in)
one labled segment called .bss ( is this the stack segment i.e where ss is pointing to?)
Some times I have an .rodata segment is this (what they call the heap segment ?)

I am just wondering because I hear alot about the heap instead of the stack don't know what exactly is the heap and what segment register is pointing at it maybe it is an es or some other segments job.

Posted: **Fri May 07, 2010 8:25 pm**

Sorry for mangling your name, Nick.

1. Yes, in real mode, doing a far jump sets the value of CS, and sets IP to the offset -- exactly as you said.

2. When Nick was talking about a 0 offset, he meant the base offset stored into the hidden part of the segment register from the GDT entry. He was not talking about the actual value of the segment register, which is often 8 or 0x10.
In the GDT entry for the segment, you often simply use a base of 0 and a limit of 4GB. This makes life easier because it is more compatible with long mode (which you will probably want to use later), and you do not have to save and restore the segment registers on task switches, because you know what their values should be.

3. No -- LDT and GDT are called "segment-level protection mechanisms". Segment-level stuff is almost never used anymore. If a userland app gets a memory error, it is most likely from a virtual memory paging error. "Paged memory mapping" is completely separate from segments. You will need to read up about it later, when you implement pmode.
Start here: http://wiki.osdev.org/Memory_management

What stops a person from messing with memory is something called the CPL or Privilege level. Userland has a CPL of 3. The kernel has a CPL of 0. You can only change important system registers if the CPL is 0 at that moment. When userland apps are running, the CPL is never 0. If a userland app tries to do an LGDT command, the CPU will generate an instant GPF exception, and start your kernel running to terminate the user program.

4. .data is global variables that are initialized to specific values.
.bss is global variables that are allocated, but are initialized to all zeroes.
.rodata stands for "read only data" -- and this is where constant globals go -- for example, all the strings in your code are usually stored there.

The stack is usually separate. It is generally allocated at runtime. This is OS-specific.

A stack is something that a single program has.
A "heap" is a kernel thing. It is all the memory that is available for the kernel and user apps to allocate dynamically, with malloc().

Posted: **Sat May 08, 2010 1:45 am**

A stack is something that a single program has.

So when you execute a userland .exe file
I am assume the os loader program would
first set the ds register to the begining of the the .data segment
Or does it set it to the .rodata or .bss
either way when you compile and build the .exe file their must be alot of ds switching between the 3 segments .data , .rodata , .bss to get/set the different varibles used in the code....

Either way their is one one .text (code segment so the loader loads it into the memory and jumps to it that is it for the code segment ) basically jumps to the entrypoint given be one of the header fields usually _start or something main ,..etc

As for the stack where does the loader create/put this and how much stack space does it use? Is their some PE header setting to set the size of the stack or something... their must be a default value....
As well as where to point the begining of the stack ss register

As for

What stops a person from messing with memory is something called the CPL or Privilege level. Userland has a CPL of 3. The kernel has a CPL of 0. You can only change important system registers if the CPL is 0 at that moment. When userland apps are running, the CPL is never 0. If a userland app tries to do an LGDT command, the CPU will generate an instant GPF exception, and start your kernel running to terminate the user program.

Correct me if I am wrong but their are 3 bits in the cs register used for this which means you can have a max of 8 different levels.
If this is true then what is the difference between the levels like 0 to 1 or 0 to 2 or 1 to 4 ,...etc their must be distinghising features for each level????

Posted: **Sat May 08, 2010 2:19 am**

Correct me if I am wrong but their are 3 bits in the cs register used for this which means you can have a max of 8 different levels.
If this is true then what is the difference between the levels like 0 to 1 or 0 to 2 or 1 to 4 ,...etc their must be distinghising features for each level????

There are only 2 bits used for privilege. Do you have copies of the the Intel and/or AMD manuals? There is a great deal of information in them about this.

With regards to segments. Most modern operating systems don't use segments the way you imagine. You are looking in .o files. If you look in a linked PE, ELF or even a binary file you will see that the segment have been combined together into a single 'segment'. You can of course use the segment information to make the code pages executable and not writable, etc.

The base of cs, ds and ss typically are all set to the same (usually 0) value and .text, .data and .rodata are all in the same memory space. The primary use of the segments is to set privilege (in the case of cs) or the segment limit (in the case of 32bit cs, ds and ss). Segments are so rarely used that they have more or less been dropped in the x86_64 architecture.

Some executables do contain information about where the esp (or rsp) should initially point, but if you write your own OS you can put the stack where you wish.

Once again I stress that you should read the Intel and AMD books

- gerryg400

Posted: **Sat May 08, 2010 9:15 am**

Yes, you are getting confused about the use of the word "segments". There are some words that programmers reused many times to mean many different things, that have nothing to do with each other. "Segments" is the most-often reused word of all.

The CPU chip has "segment registers". You will almost never use these. You set them once when you start a program running. You typically set them all to the same value -- which basically turns them off.

When you compile a program, you get an object file. The object file has "segments" in it. The "segments" in an object file have nothing to do with any other kind of segments, anywhere. You do not need to match the segment registers to object file segments. Once an object file has been "linked" it doesn't have any "segments" left in it anymore, anyway.

Typically, you set the default stack size to 4K. That is the most convenient size. It can grow after that. Your OS also probably needs to enforce a maximum size. You get to decide how this is done.

Posted: **Sat May 08, 2010 1:35 pm**

First off I am confused about .exe file only have everything in one place no segments
When I look at the format of a PE file I see
Dos , PE header ...etc which tells the os loader where to load the exe where the code begins , stack size ,...etc , as well where the different segments or blocks of data/code begin...etc
the following are typical block/segments in a PE .data , .text .import .export .comment .reloc ...etc

So I don't get where you are saying it is all one thing...looks to me like it use the same type of structure that an object file uses taking into account different header structure so that the os loader knows how to execute it.

I would think the only real difference between a standalone object file with know external references and the equivalent exe PE file for it would be just the header structure and maybe some address's in the code...

But other then that if you knew what the PE header's would have to be you should beable to add it to the top of the file using a hexeditor ...then the program should run fine. provided their wasn't any external references

Basically for a standalone object with no external references what does the linker do to that object file to turn it in to the PE .exe file I would say it just has to create the DOS,PE,OPtional PE headers
Because their is no address's it has to resolve....

Once again I stress that you should read the Intel and AMD books

I use to have the intel manules back when I had them their where 3 volumes.
I had them in pdf format but I cann't seem to find them any more.
Does anybody have a link all the intel manules (latest versions would be nice)

As for AMD I never looked into this because my machine has always been intel but just for
kicks does anybody have all the AMD current manuels in pdf or a link to them?

Thanks for your help

Posted: **Sat May 08, 2010 2:02 pm**

Basically for a standalone object with no external references what does the linker do to that object file to turn it in to the PE .exe file I would say it just has to create the DOS,PE,OPtional PE headers
Because their is no address's it has to resolve....

It takes the segments (you saw these when you looked in the obj file !!) and combines them into a format that the OS can load into memory. Often this means 'fixups' in the code segment so it can access the relocated data. It also makes sure that the OS will know the start address.

The concepts you are trying to understand are very simple. But they are not easily explained in a forum. Generally they require tables and figures with accompanying text. The wiki or the broader internet (try google) has all the information. You really need to study these things in detail.

You can also find the manuals with google. If you don't know how to use google you can follow these links

http://www.intel.com/products/processor/manuals/

http://developer.amd.com/documentation/ ... fault.aspx

- gerryg400

Posted: **Sat May 08, 2010 2:28 pm**

gerryg400 wrote:
Basically for a standalone object with no external references what does the linker do to that object file to turn it in to the PE .exe file I would say it just has to create the DOS,PE,OPtional PE headers
Because their is no address's it has to resolve....
It takes the segments (you saw these when you looked in the obj file !!) and combines them into a format that the OS can load into memory. Often this means 'fixups' in the code segment so it can access the relocated data. It also makes sure that the OS will know the start address.

The concepts you are trying to understand are very simple. But they are not easily explained in a forum. Generally they require tables and figures with accompanying text. The wiki or the broader internet (try google) has all the information. You really need to study these things in detail.

You can also find the manuals with google. If you don't know how to use google you can follow these links

http://www.intel.com/products/processor/manuals/

http://developer.amd.com/documentation/ ... fault.aspx

- gerryg400

Both PE executables and ELF binaries (And pretty much any non-aout format) keep their segment information.

After all, how else would the OS mark their permissions correctly? (For W^X and such)

OSDev.org

a few asm ?

a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?

Re: a few asm ?