BIOS Calls in Emulators

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

BIOS Calls in Emulators

Post by Octacone »

Hi,

I'm wondering on how this works:
You write a real mode emulator and let in run in protected mode, you tell it to execute int 10h, somehow it magically does a VBE mode switch.
Can somebody explain this magic part to me?

Also what is the minimum amount of instructions I would have to emulate in order to be able to execute int 10h interrupt handling code?
Why are you doing this? Well this topic interests me and sounds like a fun project (x86 Real Mode Emulator for Mode Switching).
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: BIOS Calls in Emulators

Post by Combuster »

An emulator - such as libx86emu which was specifically written for just this - executes code pretending to be a real machine. It is a closed system - if you want it to affect the real machine, then the emulated one has to forward to the real one, often by sharing I/O ports, BIOS and video card memory ranges.

The instructions you need? No two video cards are equal. Be prepared to support at least everything a 486 has, protected mode and everything.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
zity
Member
Member
Posts: 99
Joined: Mon Jul 13, 2009 5:52 am
Location: Denmark

Re: BIOS Calls in Emulators

Post by zity »

Hi Octacone,

I remember reading this older post, which contains a lot of explanation and useful information.

viewtopic.php?f=1&t=22363
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: BIOS Calls in Emulators

Post by Octacone »

Combuster wrote:An emulator - such as libx86emu which was specifically written for just this - executes code pretending to be a real machine. It is a closed system - if you want it to affect the real machine, then the emulated one has to forward to the real one, often by sharing I/O ports, BIOS and video card memory ranges.

The instructions you need? No two video cards are equal. Be prepared to support at least everything a 486 has, protected mode and everything.
Yup, after some deeper digging, it looks like I/O ports are the main thing that is used for mode setting internally. The interrupt code itself is just to generate the required values that need to be forwarded trough ports.
I also found out that protected mode support is not required and that all the code is 16 bit for compatibility reasons.
zity wrote:Hi Octacone,

I remember reading this older post, which contains a lot of explanation and useful information.

viewtopic.php?f=1&t=22363
That was super useful.

I've started working on my emulator since, the thing that bothers me is, how do I handle single opcode multiple instructions?
For e.g. 0x80 can be add, adc, and, xor, or, sbb, sub, cmp.
Edit: I just discovered that opcodes are not just "randomly assigned numbers by Intel", there is a whole lot of things going on. Prefix bytes, m/r byte, SIB... a lot more that I initially thought.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: BIOS Calls in Emulators

Post by Octacone »

mariuszp wrote:
Octacone wrote:I've started working on my emulator since, the thing that bothers me is, how do I handle single opcode multiple instructions?
For e.g. 0x80 can be add, adc, and, xor, or, sbb, sub, cmp.
Edit: I just discovered that opcodes are not just "randomly assigned numbers by Intel", there is a whole lot of things going on. Prefix bytes, m/r byte, SIB... a lot more that I initially thought.
Each opcode is one instruction (though it might have different mnemonics in assembly to make things more clear). Sometimes prefix + opcode is a different instruction; that's specified explicitly where necessary.

As for the 0x80 thing: the ModR/M byte following the 0x80 opcode has 3 unsued bits (where normally you would specify a register operand) because it doesn't need 2 register operands. For example for "XOR r/m8, imm8" the encoding is "80 /6 ib", meaning the byte 0x80 is followed by a ModR/M byte where the 3 unsued bits are assigned the value "6", and then an immediate byte. The other instructions have different values in those 3 unused bits. As such, these 3 bits are actually an extension of the opcode, and that's how you differentied them.

(I wrote an x86 assembler as part of a project once, these things do get quite confusing. Trying to actually emulate these instructions is a whole new level of complex altogether)
It is a real pain, so many variations. I've been reading on this topic for days and I'm still struggling to catch up.
There are just not many resources on 8086 instruction decoding.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
azblue
Member
Member
Posts: 147
Joined: Sat Feb 27, 2010 8:55 pm

Re: BIOS Calls in Emulators

Post by azblue »

Octacone wrote:
It is a real pain, so many variations. I've been reading on this topic for days and I'm still struggling to catch up.
There are just not many resources on 8086 instruction decoding.
The A86 assembler has a file called A86MANU.TXT; the section "The 86 Instruction Set" has always been the simplest to understand in my opinion.

Additionally, sandpile.org is indispensable (specifically look at their opcode encoding and opcode groups).
alexfru
Member
Member
Posts: 1111
Joined: Tue Mar 04, 2014 5:27 am

Re: BIOS Calls in Emulators

Post by alexfru »

Octacone wrote:It is a real pain, so many variations. I've been reading on this topic for days and I'm still struggling to catch up.
There are just not many resources on 8086 instruction decoding.
You don't need many. Just the CPU manual (have both, intel and AMD) and a way to experiment and check your understanding of the manual. For the latter use an assembler and a disassembler. NASM (and its NDISASM) will work perfectly here. You may also want a hex file viewer and a programmer's calculator. That's all.

Start with e.g. the add instruction. Write a bunch of different variants of it like so:

Code: Select all

; assemble: nasm -fbin file.asm -o file.bin
bits 16
add ax, bx
add ax, [bx]
add [bx], ax
add ax, [bx+di]
add [bx+di+2], ax
add [bx+di-2], ax
add [bx+di+1024], ax
add [bx+di-1024], ax
add word [bx+di-1024], 0x1234
Observe (from the assembly listing or from disassembly of the binary (e.g. "ndisasm -b 16 file.bin")) how they're encoded.

For fun try to do the reverse. Given an instruction description/encoding, try to encode it by hand and see that the disassembly of your bytes gives you the expected instruction.

Extend this to 32 bits, throw in segment override prefixes, etc.

Beware, some instructions may have alternative encodings.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: BIOS Calls in Emulators

Post by Antti »

I would recommend the disassembly method mentioned above for all assembly language programmers. It helps to get a grasp of instructions and the their logic. No need to spend too much time inspecting the bytes but maybe a few hours or a day? In addition to that, trying labels and seeing what values the assembler sets to those could be enlightning, e.g. how "mov ax, my_label" or "jmp my_label" translate to bytes and how the values change when code is modified or assembler directives are used. As a more advanded topic, check how object files handle relocations.
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: BIOS Calls in Emulators

Post by Octacone »

azblue wrote: The A86 assembler has a file called A86MANU.TXT; the section "The 86 Instruction Set" has always been the simplest to understand in my opinion.

Additionally, sandpile.org is indispensable (specifically look at their opcode encoding and opcode groups).
+1, that file contains a metric ton of useful data, was looking for something like that.
alexfru wrote: You don't need many. Just the CPU manual (have both, intel and AMD) and a way to experiment and check your understanding of the manual. For the latter use an assembler and a disassembler. NASM (and its NDISASM) will work perfectly here. You may also want a hex file viewer and a programmer's calculator. That's all.

Start with e.g. the add instruction. Write a bunch of different variants of it like so:

Code: Select all

; assemble: nasm -fbin file.asm -o file.bin
bits 16
add ax, bx
add ax, [bx]
add [bx], ax
add ax, [bx+di]
add [bx+di+2], ax
add [bx+di-2], ax
add [bx+di+1024], ax
add [bx+di-1024], ax
add word [bx+di-1024], 0x1234
Observe (from the assembly listing or from disassembly of the binary (e.g. "ndisasm -b 16 file.bin")) how they're encoded.

For fun try to do the reverse. Given an instruction description/encoding, try to encode it by hand and see that the disassembly of your bytes gives you the expected instruction.

Extend this to 32 bits, throw in segment override prefixes, etc.

Beware, some instructions may have alternative encodings.
That's a smart idea. I didn't know ndisasm existed. Although it would be useful to have a program that could differentiate between prefixes, opcodes and other bytes, instead of having them all written together.
Antti wrote:I would recommend the disassembly method mentioned above for all assembly language programmers. It helps to get a grasp of instructions and the their logic. No need to spend too much time inspecting the bytes but maybe a few hours or a day? In addition to that, trying labels and seeing what values the assembler sets to those could be enlightning, e.g. how "mov ax, my_label" or "jmp my_label" translate to bytes and how the values change when code is modified or assembler directives are used. As a more advanded topic, check how object files handle relocations.
I'll definitely have to address jumps and calls sooner or later, since VBE code jumps around a lot.
Getting my code to recognize the instruction is the hardest part, emulating them is easy. After all I don't need all of them.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
nullplan
Member
Member
Posts: 1770
Joined: Wed Aug 30, 2017 8:24 am

Re: BIOS Calls in Emulators

Post by nullplan »

Near and short direct jumps and calls are encoded relative to the end of the instruction. You get an opcode and then an offset, and you first set IP to the end of the instruction and then add the sign-extended operand to get the new IP.

So for instance, the following snippet:

Code: Select all

hltloop:
  hlt
  jmp hltloop
is encoded:

Code: Select all

F4 EB FD
That last FD being -3 when sign-extended.

Far and indirect calls and jumps encode their target absolutely. So for instance, in 16-bit mode, the code bytes

Code: Select all

FF 27
mean

Code: Select all

jmp [bx]
And that means: Look in memory at the word BX is pointing to and copy that into IP.

VBE code might also use software interrupts. If you don't know the function called in that case, you might also just emulate that as an indirect far call that pushes flags.
Carpe diem!
drsly
Posts: 4
Joined: Sun Nov 04, 2018 5:20 am
Location: Sydney, Australia

Re: BIOS Calls in Emulators

Post by drsly »

I found this online tool quite useful for encoding / decoding x86 assembly. Works with both 32/64 bit code.

https://defuse.ca/online-x86-assembler.htm
Post Reply