RealMode Segmentation questions for Emulator
RealMode Segmentation questions for Emulator
Hey guys, I understand segmentation in RealMode but not a ton when it comes to the actual opcodes and how it all works. So I was testing Bochs and my Os's Opcode reader and began to read back some code from the VGA bios and did the same in bochs and compared the results. Well my os prints out
Mov ax, word[bp+18] for 8b 46 12
And Bochs prints out Mov ax, word ptr ss:[bp+18] for the same thing (8b 46 12)
Though if I use Nasm dissembler it pukes and fails to print out the right instruction. Its like mov ax, 4090 something.
Anyways..
Isn't mov ax, word[bp+18] and mov ax, word ptr ss:[bp+18] the same thing?
Only thing I'm worried about is my real mode emulator will get the value at address [bp +18] where as I think
Bochs is grabbing it from ss+bp+18?
Now from what I understand Bochs starts out with
CS at 0x9300
DS at 0x9300
SS at 0x9300
ES at 0x9300
Does this matter? I set my emulator to those values but my emulator isn't using cs, or ds when I do any opcode operations. I am used to 32bit protected mode where I just do mov some register, dword[address] but I guess the compiler and cpu do more than I understand at the moment. So if you guys wouldn't mind explaining this to me, that would be great!
Thanks, Matt
Mov ax, word[bp+18] for 8b 46 12
And Bochs prints out Mov ax, word ptr ss:[bp+18] for the same thing (8b 46 12)
Though if I use Nasm dissembler it pukes and fails to print out the right instruction. Its like mov ax, 4090 something.
Anyways..
Isn't mov ax, word[bp+18] and mov ax, word ptr ss:[bp+18] the same thing?
Only thing I'm worried about is my real mode emulator will get the value at address [bp +18] where as I think
Bochs is grabbing it from ss+bp+18?
Now from what I understand Bochs starts out with
CS at 0x9300
DS at 0x9300
SS at 0x9300
ES at 0x9300
Does this matter? I set my emulator to those values but my emulator isn't using cs, or ds when I do any opcode operations. I am used to 32bit protected mode where I just do mov some register, dword[address] but I guess the compiler and cpu do more than I understand at the moment. So if you guys wouldn't mind explaining this to me, that would be great!
Thanks, Matt
-
- Member
- Posts: 5625
- Joined: Mon Mar 25, 2013 7:01 pm
Re: RealMode Segmentation questions for Emulator
Every memory access involves a segment register.
You don't notice it in protected mode, because most operating systems are nice and map the same addresses to CS, DS, ES, and SS.
When you fetch opcodes from memory, you must always use CS. This is important when a far jump, far call, or far return occurs, because those will change the value of CS.
When you access memory in general, the segment is usually DS. However, if the address calculation involves a stack register (BP, SP, EBP, or ESP), the segment will be SS instead. A segment override prefix will override these defaults. (Some instructions also use ES for a second memory operand. Segment override prefixes have no effect on that operand.)
When you push or pop data on the stack, the address calculation involves SP so the segment will always be SS.
Your example compares "Mov ax, word[bp+18]" and "Mov ax, word ptr ss:[bp+18]". Those are equivalent, because the effective address calculation involves BP and therefore the default segment register will be SS. (NASM produces gibberish because NASM thinks you want to disassemble 32-bit code.)
You should also check out the wiki article: Segmentation
You don't notice it in protected mode, because most operating systems are nice and map the same addresses to CS, DS, ES, and SS.
When you fetch opcodes from memory, you must always use CS. This is important when a far jump, far call, or far return occurs, because those will change the value of CS.
When you access memory in general, the segment is usually DS. However, if the address calculation involves a stack register (BP, SP, EBP, or ESP), the segment will be SS instead. A segment override prefix will override these defaults. (Some instructions also use ES for a second memory operand. Segment override prefixes have no effect on that operand.)
When you push or pop data on the stack, the address calculation involves SP so the segment will always be SS.
Your example compares "Mov ax, word[bp+18]" and "Mov ax, word ptr ss:[bp+18]". Those are equivalent, because the effective address calculation involves BP and therefore the default segment register will be SS. (NASM produces gibberish because NASM thinks you want to disassemble 32-bit code.)
You should also check out the wiki article: Segmentation
Re: RealMode Segmentation questions for Emulator
Octocontrabass wrote:Every memory access involves a segment register.
You don't notice it in protected mode, because most operating systems are nice and map the same addresses to CS, DS, ES, and SS.
When you fetch opcodes from memory, you must always use CS. This is important when a far jump, far call, or far return occurs, because those will change the value of CS.
When you access memory in general, the segment is usually DS. However, if the address calculation involves a stack register (BP, SP, EBP, or ESP), the segment will be SS instead. A segment override prefix will override these defaults. (Some instructions also use ES for a second memory operand. Segment override prefixes have no effect on that operand.)
When you push or pop data on the stack, the address calculation involves SP so the segment will always be SS.
Your example compares "Mov ax, word[bp+18]" and "Mov ax, word ptr ss:[bp+18]". Those are equivalent, because the effective address calculation involves BP and therefore the default segment register will be SS. (NASM produces gibberish because NASM thinks you want to disassemble 32-bit code.)
You should also check out the wiki article: Segmentation
Oh! I get it now! Thank you so much for your reply. That makes perfect sense. So in that opcode SS is the override then but how would you calculate the physical address of where that value is that will be moved into ax? Would it be the value in SS + the value in BP + 18? Or is there a formula for this?
Thanks a ton, Matt
Edit: So I read the wiki again for the third time and I think I understand my question. So if you did mov ax, word ss:[bp + 18] would the equation be (ss * 0x10) + (bp + 18)?
-
- Member
- Posts: 5625
- Joined: Mon Mar 25, 2013 7:01 pm
Re: RealMode Segmentation questions for Emulator
The opcode is "8b 46 12", right? There's no prefix and the address includes BP, so SS is the default segment.PearOs wrote:So in that opcode SS is the override
The override prefixes are 0x2E, 0x3E, 0x26, 0x64, 0x65, and 0x36 for CS, DS, ES, FS, GS, and SS. If the opcode does not have one of those prefixes, there is no override.
Yes, that is correct. Make sure you truncate the offset to 16 bits before adding the segment.PearOs wrote:Edit: So I read the wiki again for the third time and I think I understand my question. So if you did mov ax, word ss:[bp + 18] would the equation be (ss * 0x10) + (bp + 18)?
Re: RealMode Segmentation questions for Emulator
Oh ok I see. Out of curiosity how do you truncate the offset? Do you mean keeping it as a word? Thanks, MattOctocontrabass wrote:The opcode is "8b 46 12", right? There's no prefix and the address includes BP, so SS is the default segment.PearOs wrote:So in that opcode SS is the override
The override prefixes are 0x2E, 0x3E, 0x26, 0x64, 0x65, and 0x36 for CS, DS, ES, FS, GS, and SS. If the opcode does not have one of those prefixes, there is no override.
Yes, that is correct. Make sure you truncate the offset to 16 bits before adding the segment.PearOs wrote:Edit: So I read the wiki again for the third time and I think I understand my question. So if you did mov ax, word ss:[bp + 18] would the equation be (ss * 0x10) + (bp + 18)?
-
- Member
- Posts: 5625
- Joined: Mon Mar 25, 2013 7:01 pm
Re: RealMode Segmentation questions for Emulator
By truncate, I mean something like this:
(ss * 0x10) + ((bp + 18) & 0xFFFF)
(ss * 0x10) + ((bp + 18) & 0xFFFF)
Re: RealMode Segmentation questions for Emulator
Oh ok. Thank you for helping me with this. I really appreciate it. My other question is what should I set the segment registers to by defualt before running int 10h because I don't know that the BIOS code is expecting. Or should I just set them to what Bochs is setting them?Octocontrabass wrote:By truncate, I mean something like this:
(ss * 0x10) + ((bp + 18) & 0xFFFF)
Thanks, Matt
-
- Member
- Posts: 5625
- Joined: Mon Mar 25, 2013 7:01 pm
Re: RealMode Segmentation questions for Emulator
The BIOS code is expecting to be called from an int 0x10 instruction within a real-mode program, so you must set up the registers to look like you're doing that.
A typical real-mode program will set up a stack, put the parameters for the BIOS into the appropriate registers, and then execute int 0x10. The int instruction will push FLAGS, CS, and IP to the stack, clear the IF, TF, and AC bits, and set CS:IP to the value in the IVT. When the BIOS code is done, it executes iret which pops IP, CS, and FLAGS.
The BIOS code must have CS and IP set to the correct values, and will be expecting a stack set up as I described, but any registers I didn't mention in the above paragraph can be whatever you want.
A typical real-mode program will set up a stack, put the parameters for the BIOS into the appropriate registers, and then execute int 0x10. The int instruction will push FLAGS, CS, and IP to the stack, clear the IF, TF, and AC bits, and set CS:IP to the value in the IVT. When the BIOS code is done, it executes iret which pops IP, CS, and FLAGS.
The BIOS code must have CS and IP set to the correct values, and will be expecting a stack set up as I described, but any registers I didn't mention in the above paragraph can be whatever you want.
Re: RealMode Segmentation questions for Emulator
Ok cool. This might sound stupid but how do I know what to set CS to? Is that based on the physical address of where INT10h is calling to? So like take the segment out of the IVT for Int 10h and put it in CS?Octocontrabass wrote:The BIOS code is expecting to be called from an int 0x10 instruction within a real-mode program, so you must set up the registers to look like you're doing that.
A typical real-mode program will set up a stack, put the parameters for the BIOS into the appropriate registers, and then execute int 0x10. The int instruction will push FLAGS, CS, and IP to the stack, clear the IF, TF, and AC bits, and set CS:IP to the value in the IVT. When the BIOS code is done, it executes iret which pops IP, CS, and FLAGS.
The BIOS code must have CS and IP set to the correct values, and will be expecting a stack set up as I described, but any registers I didn't mention in the above paragraph can be whatever you want.
Thanks a ton, Matt
-
- Member
- Posts: 5625
- Joined: Mon Mar 25, 2013 7:01 pm
Re: RealMode Segmentation questions for Emulator
CS comes from the segment for int 0x10 in the IVT. IP comes from the offset.
From your questions, it sounds like you're trying to use the physical address for the instruction pointer. I don't think that's a good idea; clever or buggy code may depend on the fact that IP is only 16 bits and wraps around when it overflows.
From your questions, it sounds like you're trying to use the physical address for the instruction pointer. I don't think that's a good idea; clever or buggy code may depend on the fact that IP is only 16 bits and wraps around when it overflows.
Re: RealMode Segmentation questions for Emulator
Your right I am, which is kinda stupid of me I agree. I used the IP as a 32bit number and Increment it when reading opcodes. This works but I may run into trouble later I feel. Thank you for your reply. I know what to set CS now. Should I make the IP a 16bit number and then just calculate it when I'm reading from memory by CS and then IP as the offset just in case?Octocontrabass wrote:CS comes from the segment for int 0x10 in the IVT. IP comes from the offset.
From your questions, it sounds like you're trying to use the physical address for the instruction pointer. I don't think that's a good idea; clever or buggy code may depend on the fact that IP is only 16 bits and wraps around when it overflows.
Thanks, again
Matt
-
- Member
- Posts: 5625
- Joined: Mon Mar 25, 2013 7:01 pm
Re: RealMode Segmentation questions for Emulator
That's the simplest way to make sure it will work correctly.PearOs wrote:Should I make the IP a 16bit number and then just calculate it when I'm reading from memory by CS and then IP as the offset just in case?
It's up to you if you want to find a more complex solution. (You probably will not use the video BIOS often enough for a more complex design to have any benefit, but it might be a fun side project.)
Re: RealMode Segmentation questions for Emulator
Haha. Indeed. I already changed it so that CS and IP are now used for Opcodes. One thing I'm curious about. DS (Data Segment) how will I find the value for this? Or will the Video Bios not care? I see Bochs uses 0x9300 or something should I just mimick that?Octocontrabass wrote:That's the simplest way to make sure it will work correctly.PearOs wrote:Should I make the IP a 16bit number and then just calculate it when I'm reading from memory by CS and then IP as the offset just in case?
It's up to you if you want to find a more complex solution. (You probably will not use the video BIOS often enough for a more complex design to have any benefit, but it might be a fun side project.)
Thanks, Matt
Re: RealMode Segmentation questions for Emulator
Hi,
Don't forget that you still get general protection faults in real mode (even with 16-bit instructions). For example, "mov ax,[ss:0xFFFF]" should cause a general protection fault; and if BP=0xFFED then "mov ax,[ss:bp + 18]" should also cause a general protection fault. For this reason you want to do "offset = (bp+18) & 0xFFFF;" then "if(offset+size > 0xFFFF) do_GPF();" then do "address = ss*16 + offset" after that.
Also, (for future-proofing) I'd cache the "base address" and the "limit" (and some attributes) for each segment. This allows you to properly support CS base when the CPU first starts (where CS.base = 0xFFFF0000 and not "cs * 16"); and makes it easy to support "unreal mode" and protected mode later. This means you want:
Finally, because this sort of thing will be used a lot, you should have "do_virtual_read()" and "do_virtual_write()" functions. For example:
In the same way you should have "do_physical_read()" and "do_physical_write()" functions that take care of things like how many bits of the physical address the CPU implements and if A20 is enabled/disabled; plus memory mapped IO areas, etc. For example, imagine someone reading from 0x9FFFF where the lowest byte comes from RAM and the highest byte comes from legacy VGA.
Cheers,
Brendan
Make EIP a 32-bit integer, and use the lowest 16-bits of it for instruction fetch and 16-bit instructions. Don't forget that nothing prevents people from using 32-bit calls and 32-bit jumps in real mode (with a size override prefix). For example, "call dword 0x00001234" should push a 32-bit "return EIP" on the stack.PearOs wrote:Your right I am, which is kinda stupid of me I agree. I used the IP as a 32bit number and Increment it when reading opcodes. This works but I may run into trouble later I feel. Thank you for your reply. I know what to set CS now. Should I make the IP a 16bit number and then just calculate it when I'm reading from memory by CS and then IP as the offset just in case?From your questions, it sounds like you're trying to use the physical address for the instruction pointer. I don't think that's a good idea; clever or buggy code may depend on the fact that IP is only 16 bits and wraps around when it overflows.
That's technically correct for 16-bit code in real mode; but don't implement your emulator like that.Octocontrabass wrote:(ss * 0x10) + ((bp + 18) & 0xFFFF)
Don't forget that you still get general protection faults in real mode (even with 16-bit instructions). For example, "mov ax,[ss:0xFFFF]" should cause a general protection fault; and if BP=0xFFED then "mov ax,[ss:bp + 18]" should also cause a general protection fault. For this reason you want to do "offset = (bp+18) & 0xFFFF;" then "if(offset+size > 0xFFFF) do_GPF();" then do "address = ss*16 + offset" after that.
Also, (for future-proofing) I'd cache the "base address" and the "limit" (and some attributes) for each segment. This allows you to properly support CS base when the CPU first starts (where CS.base = 0xFFFF0000 and not "cs * 16"); and makes it easy to support "unreal mode" and protected mode later. This means you want:
Code: Select all
offset = (bp+18) & 0xFFFF;
segment = &SS_segment_info;
if(offset+size > segment->limit) {
do_GPF();
}
address = segment->base + offset;
Code: Select all
uint32_t do_virtual_read(struct segment_info *segment, uint32_t offset, int size) {
uint32_t linear_address;
uint64_t physical_address;
uint32_t result;
// Convert virtual address to linear address
if( ((segment->attributes & SEG_READABLE) != 0) || (offset+size > segment->limit) ) {
do_GPF(segment->value);
}
linear_address = segment->base + offset;
// Convert linear address to physical address
if(false) {
// For future, "if paging enabled"
} else {
physical_address = linear_address;
}
// Do read from physical address
return do_physical_read(physical_address, size);
}
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 5625
- Joined: Mon Mar 25, 2013 7:01 pm
Re: RealMode Segmentation questions for Emulator
If it's not a parameter for the function you're calling, then the video BIOS won't care.PearOs wrote:DS (Data Segment) how will I find the value for this? Or will the Video Bios not care?