RealMode Segmentation questions for Emulator

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
PearOs
Member
Member
Posts: 194
Joined: Mon Apr 08, 2013 3:03 pm
Location: Usually at my keyboard!

RealMode Segmentation questions for Emulator

Post by PearOs »

Hey guys, I understand segmentation in RealMode but not a ton when it comes to the actual opcodes and how it all works. So I was testing Bochs and my Os's Opcode reader and began to read back some code from the VGA bios and did the same in bochs and compared the results. Well my os prints out

Mov ax, word[bp+18] for 8b 46 12
And Bochs prints out Mov ax, word ptr ss:[bp+18] for the same thing (8b 46 12)

Though if I use Nasm dissembler it pukes and fails to print out the right instruction. Its like mov ax, 4090 something.

Anyways..
Isn't mov ax, word[bp+18] and mov ax, word ptr ss:[bp+18] the same thing?
Only thing I'm worried about is my real mode emulator will get the value at address [bp +18] where as I think
Bochs is grabbing it from ss+bp+18?

Now from what I understand Bochs starts out with
CS at 0x9300
DS at 0x9300
SS at 0x9300
ES at 0x9300
Does this matter? I set my emulator to those values but my emulator isn't using cs, or ds when I do any opcode operations. I am used to 32bit protected mode where I just do mov some register, dword[address] but I guess the compiler and cpu do more than I understand at the moment. So if you guys wouldn't mind explaining this to me, that would be great!

Thanks, Matt
Octocontrabass
Member
Member
Posts: 5604
Joined: Mon Mar 25, 2013 7:01 pm

Re: RealMode Segmentation questions for Emulator

Post by Octocontrabass »

Every memory access involves a segment register.

You don't notice it in protected mode, because most operating systems are nice and map the same addresses to CS, DS, ES, and SS.

When you fetch opcodes from memory, you must always use CS. This is important when a far jump, far call, or far return occurs, because those will change the value of CS.

When you access memory in general, the segment is usually DS. However, if the address calculation involves a stack register (BP, SP, EBP, or ESP), the segment will be SS instead. A segment override prefix will override these defaults. (Some instructions also use ES for a second memory operand. Segment override prefixes have no effect on that operand.)

When you push or pop data on the stack, the address calculation involves SP so the segment will always be SS.


Your example compares "Mov ax, word[bp+18]" and "Mov ax, word ptr ss:[bp+18]". Those are equivalent, because the effective address calculation involves BP and therefore the default segment register will be SS. (NASM produces gibberish because NASM thinks you want to disassemble 32-bit code.)

You should also check out the wiki article: Segmentation
PearOs
Member
Member
Posts: 194
Joined: Mon Apr 08, 2013 3:03 pm
Location: Usually at my keyboard!

Re: RealMode Segmentation questions for Emulator

Post by PearOs »

Octocontrabass wrote:Every memory access involves a segment register.

You don't notice it in protected mode, because most operating systems are nice and map the same addresses to CS, DS, ES, and SS.

When you fetch opcodes from memory, you must always use CS. This is important when a far jump, far call, or far return occurs, because those will change the value of CS.

When you access memory in general, the segment is usually DS. However, if the address calculation involves a stack register (BP, SP, EBP, or ESP), the segment will be SS instead. A segment override prefix will override these defaults. (Some instructions also use ES for a second memory operand. Segment override prefixes have no effect on that operand.)

When you push or pop data on the stack, the address calculation involves SP so the segment will always be SS.


Your example compares "Mov ax, word[bp+18]" and "Mov ax, word ptr ss:[bp+18]". Those are equivalent, because the effective address calculation involves BP and therefore the default segment register will be SS. (NASM produces gibberish because NASM thinks you want to disassemble 32-bit code.)

You should also check out the wiki article: Segmentation

Oh! I get it now! Thank you so much for your reply. That makes perfect sense. So in that opcode SS is the override then but how would you calculate the physical address of where that value is that will be moved into ax? Would it be the value in SS + the value in BP + 18? Or is there a formula for this?

Thanks a ton, Matt

Edit: So I read the wiki again for the third time :D and I think I understand my question. So if you did mov ax, word ss:[bp + 18] would the equation be (ss * 0x10) + (bp + 18)?
Octocontrabass
Member
Member
Posts: 5604
Joined: Mon Mar 25, 2013 7:01 pm

Re: RealMode Segmentation questions for Emulator

Post by Octocontrabass »

PearOs wrote:So in that opcode SS is the override
The opcode is "8b 46 12", right? There's no prefix and the address includes BP, so SS is the default segment.

The override prefixes are 0x2E, 0x3E, 0x26, 0x64, 0x65, and 0x36 for CS, DS, ES, FS, GS, and SS. If the opcode does not have one of those prefixes, there is no override.
PearOs wrote:Edit: So I read the wiki again for the third time :D and I think I understand my question. So if you did mov ax, word ss:[bp + 18] would the equation be (ss * 0x10) + (bp + 18)?
Yes, that is correct. Make sure you truncate the offset to 16 bits before adding the segment.
PearOs
Member
Member
Posts: 194
Joined: Mon Apr 08, 2013 3:03 pm
Location: Usually at my keyboard!

Re: RealMode Segmentation questions for Emulator

Post by PearOs »

Octocontrabass wrote:
PearOs wrote:So in that opcode SS is the override
The opcode is "8b 46 12", right? There's no prefix and the address includes BP, so SS is the default segment.

The override prefixes are 0x2E, 0x3E, 0x26, 0x64, 0x65, and 0x36 for CS, DS, ES, FS, GS, and SS. If the opcode does not have one of those prefixes, there is no override.
PearOs wrote:Edit: So I read the wiki again for the third time :D and I think I understand my question. So if you did mov ax, word ss:[bp + 18] would the equation be (ss * 0x10) + (bp + 18)?
Yes, that is correct. Make sure you truncate the offset to 16 bits before adding the segment.
Oh ok I see. Out of curiosity how do you truncate the offset? Do you mean keeping it as a word? Thanks, Matt
Octocontrabass
Member
Member
Posts: 5604
Joined: Mon Mar 25, 2013 7:01 pm

Re: RealMode Segmentation questions for Emulator

Post by Octocontrabass »

By truncate, I mean something like this:

(ss * 0x10) + ((bp + 18) & 0xFFFF)
PearOs
Member
Member
Posts: 194
Joined: Mon Apr 08, 2013 3:03 pm
Location: Usually at my keyboard!

Re: RealMode Segmentation questions for Emulator

Post by PearOs »

Octocontrabass wrote:By truncate, I mean something like this:

(ss * 0x10) + ((bp + 18) & 0xFFFF)
Oh ok. Thank you for helping me with this. I really appreciate it. My other question is what should I set the segment registers to by defualt before running int 10h because I don't know that the BIOS code is expecting. Or should I just set them to what Bochs is setting them?

Thanks, Matt
Octocontrabass
Member
Member
Posts: 5604
Joined: Mon Mar 25, 2013 7:01 pm

Re: RealMode Segmentation questions for Emulator

Post by Octocontrabass »

The BIOS code is expecting to be called from an int 0x10 instruction within a real-mode program, so you must set up the registers to look like you're doing that.

A typical real-mode program will set up a stack, put the parameters for the BIOS into the appropriate registers, and then execute int 0x10. The int instruction will push FLAGS, CS, and IP to the stack, clear the IF, TF, and AC bits, and set CS:IP to the value in the IVT. When the BIOS code is done, it executes iret which pops IP, CS, and FLAGS.

The BIOS code must have CS and IP set to the correct values, and will be expecting a stack set up as I described, but any registers I didn't mention in the above paragraph can be whatever you want.
PearOs
Member
Member
Posts: 194
Joined: Mon Apr 08, 2013 3:03 pm
Location: Usually at my keyboard!

Re: RealMode Segmentation questions for Emulator

Post by PearOs »

Octocontrabass wrote:The BIOS code is expecting to be called from an int 0x10 instruction within a real-mode program, so you must set up the registers to look like you're doing that.

A typical real-mode program will set up a stack, put the parameters for the BIOS into the appropriate registers, and then execute int 0x10. The int instruction will push FLAGS, CS, and IP to the stack, clear the IF, TF, and AC bits, and set CS:IP to the value in the IVT. When the BIOS code is done, it executes iret which pops IP, CS, and FLAGS.

The BIOS code must have CS and IP set to the correct values, and will be expecting a stack set up as I described, but any registers I didn't mention in the above paragraph can be whatever you want.
Ok cool. This might sound stupid but how do I know what to set CS to? Is that based on the physical address of where INT10h is calling to? So like take the segment out of the IVT for Int 10h and put it in CS?

Thanks a ton, Matt
Octocontrabass
Member
Member
Posts: 5604
Joined: Mon Mar 25, 2013 7:01 pm

Re: RealMode Segmentation questions for Emulator

Post by Octocontrabass »

CS comes from the segment for int 0x10 in the IVT. IP comes from the offset.

From your questions, it sounds like you're trying to use the physical address for the instruction pointer. I don't think that's a good idea; clever or buggy code may depend on the fact that IP is only 16 bits and wraps around when it overflows.
PearOs
Member
Member
Posts: 194
Joined: Mon Apr 08, 2013 3:03 pm
Location: Usually at my keyboard!

Re: RealMode Segmentation questions for Emulator

Post by PearOs »

Octocontrabass wrote:CS comes from the segment for int 0x10 in the IVT. IP comes from the offset.

From your questions, it sounds like you're trying to use the physical address for the instruction pointer. I don't think that's a good idea; clever or buggy code may depend on the fact that IP is only 16 bits and wraps around when it overflows.
Your right I am, which is kinda stupid of me I agree. I used the IP as a 32bit number and Increment it when reading opcodes. This works but I may run into trouble later I feel. Thank you for your reply. I know what to set CS now. :D Should I make the IP a 16bit number and then just calculate it when I'm reading from memory by CS and then IP as the offset just in case?

Thanks, again


Matt
Octocontrabass
Member
Member
Posts: 5604
Joined: Mon Mar 25, 2013 7:01 pm

Re: RealMode Segmentation questions for Emulator

Post by Octocontrabass »

PearOs wrote:Should I make the IP a 16bit number and then just calculate it when I'm reading from memory by CS and then IP as the offset just in case?
That's the simplest way to make sure it will work correctly.

It's up to you if you want to find a more complex solution. :wink: (You probably will not use the video BIOS often enough for a more complex design to have any benefit, but it might be a fun side project.)
PearOs
Member
Member
Posts: 194
Joined: Mon Apr 08, 2013 3:03 pm
Location: Usually at my keyboard!

Re: RealMode Segmentation questions for Emulator

Post by PearOs »

Octocontrabass wrote:
PearOs wrote:Should I make the IP a 16bit number and then just calculate it when I'm reading from memory by CS and then IP as the offset just in case?
That's the simplest way to make sure it will work correctly.

It's up to you if you want to find a more complex solution. :wink: (You probably will not use the video BIOS often enough for a more complex design to have any benefit, but it might be a fun side project.)
Haha. Indeed. I already changed it so that CS and IP are now used for Opcodes. One thing I'm curious about. DS (Data Segment) how will I find the value for this? Or will the Video Bios not care? I see Bochs uses 0x9300 or something should I just mimick that?

Thanks, Matt
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: RealMode Segmentation questions for Emulator

Post by Brendan »

Hi,
PearOs wrote:
From your questions, it sounds like you're trying to use the physical address for the instruction pointer. I don't think that's a good idea; clever or buggy code may depend on the fact that IP is only 16 bits and wraps around when it overflows.
Your right I am, which is kinda stupid of me I agree. I used the IP as a 32bit number and Increment it when reading opcodes. This works but I may run into trouble later I feel. Thank you for your reply. I know what to set CS now. :D Should I make the IP a 16bit number and then just calculate it when I'm reading from memory by CS and then IP as the offset just in case?
Make EIP a 32-bit integer, and use the lowest 16-bits of it for instruction fetch and 16-bit instructions. Don't forget that nothing prevents people from using 32-bit calls and 32-bit jumps in real mode (with a size override prefix). For example, "call dword 0x00001234" should push a 32-bit "return EIP" on the stack.
Octocontrabass wrote:(ss * 0x10) + ((bp + 18) & 0xFFFF)
That's technically correct for 16-bit code in real mode; but don't implement your emulator like that.

Don't forget that you still get general protection faults in real mode (even with 16-bit instructions). For example, "mov ax,[ss:0xFFFF]" should cause a general protection fault; and if BP=0xFFED then "mov ax,[ss:bp + 18]" should also cause a general protection fault. For this reason you want to do "offset = (bp+18) & 0xFFFF;" then "if(offset+size > 0xFFFF) do_GPF();" then do "address = ss*16 + offset" after that.

Also, (for future-proofing) I'd cache the "base address" and the "limit" (and some attributes) for each segment. This allows you to properly support CS base when the CPU first starts (where CS.base = 0xFFFF0000 and not "cs * 16"); and makes it easy to support "unreal mode" and protected mode later. This means you want:

Code: Select all

    offset = (bp+18) & 0xFFFF;
    segment = &SS_segment_info;
    if(offset+size > segment->limit) {
        do_GPF();
    }
    address = segment->base + offset;
Finally, because this sort of thing will be used a lot, you should have "do_virtual_read()" and "do_virtual_write()" functions. For example:

Code: Select all

uint32_t do_virtual_read(struct segment_info *segment, uint32_t offset, int size) {
    uint32_t linear_address;
    uint64_t physical_address;
    uint32_t result;

    // Convert virtual address to linear address
    if( ((segment->attributes & SEG_READABLE) != 0) || (offset+size > segment->limit) ) {
        do_GPF(segment->value);
    }
    linear_address = segment->base + offset;

    // Convert linear address to physical address
    if(false) {
        // For future, "if paging enabled"
    } else {
        physical_address = linear_address;
    }

    // Do read from physical address
    return do_physical_read(physical_address, size);
}
In the same way you should have "do_physical_read()" and "do_physical_write()" functions that take care of things like how many bits of the physical address the CPU implements and if A20 is enabled/disabled; plus memory mapped IO areas, etc. For example, imagine someone reading from 0x9FFFF where the lowest byte comes from RAM and the highest byte comes from legacy VGA.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Octocontrabass
Member
Member
Posts: 5604
Joined: Mon Mar 25, 2013 7:01 pm

Re: RealMode Segmentation questions for Emulator

Post by Octocontrabass »

PearOs wrote:DS (Data Segment) how will I find the value for this? Or will the Video Bios not care?
If it's not a parameter for the function you're calling, then the video BIOS won't care.
Post Reply