Why code and data segment starts from same address

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
0xnlbs
Posts: 9
Joined: Mon Apr 21, 2014 4:20 am

Why code and data segment starts from same address

Post by 0xnlbs »

I've checked a lot of guides on both bootloaders and kernel. everywhere They create two GDT's one for code and another for data segment. and sets their address in cs and ds registers. But both these structures specify same base address and same limit. why ? If they are supposed to be same why there exists two different address registers ?
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Why code and data segment starts from same address

Post by alexfru »

You must have at least 2 distinct segment descriptors in protected mode because code and data segments are described differently in those descriptors and you can't use a code segment descriptor for data or a data segment descriptor for code (generally, there are a few exceptions). So, you create 2 descriptors and set CS to point to the one designated for code and set DS, ES and SS to point to the other one, designated for data.

As for why they usually have the same base address (often 0) and limit (often equivalent of 4G-1), it simply means that they don't want to mess with segmentation, they just want to use all memory however they please and locate code and data anywhere in the entire address space.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Why code and data segment starts from same address

Post by Combuster »

If they are supposed to be same
In principle, they are not. It is simply a very common special case - and even modern OSes while using it as a basis, don't strictly do it that way.

One of the fundamental security principles says that something should not be both writeable and executable. In GDT design you see that as having separate entries, code is always executable, never writeable, and possibly readable. Data is always readable, never executable, and possibly writeable.

Then it so happens that x86 has two forms of protection: segmentation and paging. People disliked segmentation so they mostly went for paging. But since segmentation is always enabled they wanted to make sure it didn't get in the way, so what people typically do is create an entry that maps whatever address comes in to the same address going out so that the only visible effect is paging. Then they realize they can't have both executable and writeable permissions with just one entry, so they make two, stick one in CS and the other in the other segment registers.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
0xnlbs
Posts: 9
Joined: Mon Apr 21, 2014 4:20 am

Re: Why code and data segment starts from same address

Post by 0xnlbs »

still not very clear to me. What will happen If I put two differrent base address for code segment and data segment ? e.g. If I don't overlap ? I understand I may not be able to use 4GB virtual memory.
The question sounds very noob but I should ask: the executables have their code and data in same file. and whil executing instructions doesn't get copied anywhere in diiferent segments. So why two segments ? cpu will keep reading the instructions sequentially. what the problem in it ?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Why code and data segment starts from same address

Post by Combuster »

the executables have their code and data in same file.
And this is relevant how? Do you want to have to download several .exe files to be able to run a program? :wink:
executing instructions doesn't get copied
You have to copy the instructions from storage to RAM before you can execute them.

And because you are making hardly any sense now, let's check on the basics:
- what registers and table entries are used to determine what instruction is executed next? Describe how.
- what registers and table entries are used to actually execute an instruction? Describe how this works for "push [eax]"
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply