Page 1 of 1

Data before code in segment

Posted: Wed Nov 13, 2013 5:42 pm
by yee1
Hey, take a look over it:

Code: Select all

mov eax, 12h

program:

data:
I use segmentation. When i do such template in my kernel some creepy things happens after it enters protected mode. After all bochs displays some kind of errors about segmentation limit. After entering protected mode code (watched in bochs debugger seems to be messed in some strange kind of way, dunno why).

program label is base address of CS
data label is base address of DS

And when i do it like that (move instruction below "program" label):

Code: Select all

program:
mov eax, 12h

data:
then all is fine.



My kernel starts at 0:8000h with org 0 declared.

What is the cause of it ?

Are there any limitations that I don't know about ?

Re: Data before code in segment

Posted: Wed Nov 13, 2013 6:53 pm
by Brendan
Hi,
yee1 wrote:program label is base address of CS
data label is base address of DS
yee1 wrote:My kernel starts at 0:8000h with org 0 declared.
yee1 wrote:What is the cause of it ?

Are there any limitations that I don't know about ?
In general; the virtual address that the assembler thinks a section of code/data starts at must be equal to the address that the piece of code/data was loaded minus the segment base address.

You told the assembler "org 0" and you're only using one section (the default ".text" section); so the assembler thinks that both labels (in the ".text" section) are offsets from 0x0000. When you set the segment bases differently this can't be right for both cases.

To fix the problem you have 3 choices:
  • set CS segment base and DS/ES/SS segment base to 0x8000 so that "org 0" is correct for all labels; or
  • use a "CS segment override prefix" for all instructions that attempt to access data that you've placed in the ".text" (code) section
  • put code in the ".text" section and data in the ".data" section; and tell the assembler or linker that both sections start at offset 0x0000 in their corresponding segments; then set CS base to the address of wherever the ".text" section was actually loaded, and set DS/ES/SS base to the address of wherever the ".data" section was actually loaded.
Please note that using segmentation causes unnecessary complications (and bad performance) and doesn't prevent any bugs (it just causes more bugs, while making a very small number of trivial bugs slightly easier to debug). For this reason I'd recommend forgetting about segmentation as much as possible, and using the first option (set CS segment base and DS/ES/SS segment base to 0x8000).

However, it's still a good idea to put code in the ".text" section and data in the ".data" section; as this prevents code and data getting all mixed up in memory (which causes "false sharing" performance problems; because CPUs see you modifying a cache line that contains code and data, and think you're doing some self-modifying code thing even when the code in that cache line wasn't modified). Also note that once you've got paging setup it's even better to use ".text" (executable code), ".rodata" (read only data), ".data" (read/write data), and ".bss" (read/write data that is expected to be initialised to zero); as this lets your OS setup each page's attributes so that no data can be executed, code and read only data can't be modified, etc. It also allows you to implement "allocate on write" for the ".bss" section; where the first write to a page in the ".bss" causes that page to be allocated (so that you avoid wasting RAM for pages that aren't actually used). Of course there's a whole lot of tricks like that (memory mapped files is the next one) which can end up saving a massive amount of RAM if they're done right; and using less RAM means you get more RAM left over for other things (like disk caches) that improve performance a lot.


Cheers,

Brendan