Tomasz Grystar wrote:This is a topic I wanted to cover in a short talk on this year's fasmcon. However, since that event is most probably not going to happen, I post it here in the text form.
It is the story of our adventures with FRM and unREAL modes, and by "we" I mean here me and Leonid Petroff, with whom I was discussing these topics actively back in the early 2000's. At that time I was using the nick Privalov, and Leonid was known as PetroffHeroj.
It all started because I made the first officially released versions of fasm (look here for some more history) use the so-called Flat Real Mode (FRM for short). Though the very first versions of fasm were working in protected mode (and used my own DOS extender for that purpose, it was based on Thomas Pytel's PMODE), I then adapted it to operate in FRM instead.
Just in case there is anyone who still doesn't know what FRM is, here comes a short explanation: it is real mode after switching back from protected mode, where the limit for all segment registers has been set to 4G and not reset upon changing back to real mode. This allows to use 32-bit offsets in each real mode segment, so that you can access extended memory easily, and use contiguous block of physical memory up to gigabytes in size.
This allowed to combine the power of 32-bit memory addressing with the simplicity and speed of real mode. Nowadays there is no such difference, but in early 32-bit processors like 80386/80486, the same code executing in real mode was noticeably faster.
For this reason I chose to adapt fasm to use FRM instead of protected mode. And so I just did it as I saw it in some examples I had seen earlier: set up simple protected mode with all descriptors having 4G limit, and go back to real mode without restoring those limits properly.
And then I got an e-mail from PetroffHeroj informing that there is one problem. He was sometimes working in DOS with Cubic Player playing some music in the background. However when in such case he tried to use fasm, it always made his system hang.
He investigated and found out what was the reason. CP was a protected mode program. So when it was playing music in the background, it was hooked to IRQ interrupt, and upon the interrupt it was switching to protected mode, doing its stuff, and then going back to real mode - but this time doing this properly, that is: restoring 64K limit along the way. This of course was messing fasm up, as it was expecting the FRM to be working for all the time once it did enable it at startup. So sooner or later it tried to access some address above the 16-bit range, and this caused an exception 13 (General Protection Fault).
So, what was the solution? Quite simple, really. Hook the exception 13 handler and make it switch to FRM and re-try executing the instruction that caused the fault (and in case of GPF it is just enough to do IRET in order to do that). Wait, there is a catch. Interrupt 13 in DOS is also the IRQ 5 vector. So you have to distinguish the exception from interrupt call, otherwise CP would still not be able to play the music if the sound card's IRQ is set to 5 (and it was a frequent choice for Sound Blasters). And so exception 13 handler had to read OCW3 from PIC, check out if there is an IRQ 5 in service, and in such case call the previous handler of that interrupt - otherwise, switch the FRM on and go back.
After I did that, I realized that with interrupt 13 handler set up this way is it even no longer necessary to initialize FRM at startup. Because the first time when program tries to use the FRM feature and access some address about 16-bit range, the GPF comes in, FRM gets initialized and voila! Everything is working.
This in fact meant that if DOS was already running in FRM, then the FRM initialization would never be called, as GPF would not happen. I was already aware of the fact, that DOS often already has FRM enabled for all the time, because the standard XMS driver in MS DOS - HIMEM.SYS - was using FRM for the purpose of accessing extended memory and it was leaving the system operating in this mode thereafter. This all reassured me that this way of using FRM - by just hooking interrupt 13 - was not only the best one, but in fact the only correct one.
There was one more problem with FRM, however. When you access 32-bit registers/addresses from the 16-bit mode, all the instructions get longer because of the 66h and 67h prefixes. On larger program, like fasm, this was making a huge difference. And so I started wondering - what if we could also execute 32-bit code in real mode?
I shared this idea with PetroffHeroj and we started some tests. I tried to do it just like the FRM trick - set up protected mode with 32-bit code segment, and then just "forget" to restore it to 16-bit mode when going back to real mode. And the first tests were successful! At least the Intel processor was able to do it. PetroffHeroj had access to many much more exotic DOS machines, so he did some more tests later. It turned out this trick was working not only on Intel, but also AMD processors. Only later he found out that there was some bizarre processor (I don't remember exactly, but I think it was manufactured by Cyrix) which was not able to execute 32-bit code in real mode. But it was then too late, because I already decided that this mode (I called it unREAL) is so awesome, that I have to use it in fasm.
There is one serious problem with unREAL, which was absent in case of FRM. Namely: the interrupts. All the BIOS/DOS interrupt handlers were 16-bit, so they would have serious problems running when executed from the 32-bit code segment. The first tests we were conducting were, of course, with the interrupts disabled. But to use it in real application, like fasm, I had to make interrupts working.
The obvious solution was to replace all the interrupt vectors with pointers to "gate" handlers, which would switch to 16-bit code, call the original interrupt vector, switch back to 32-bit code and return. But what if another interrupt happens when we are in the 16-bit code already? The "gate" code is 32-bit - another failure.
So I came up with idea of a hybrid code - something that would work no matter whether we are in 16-bit or 32-bit mode, and what would do appropriate task for each one of these cases. It looked like this:
Code: Select all
interrupt_gate:
use16
cmp ax,word 0
jmp short we_are_16bit
use32
; set up 16-bit mode,
; call the original interrupt vector,
; restore 32-bit mode and return
we_are_16bit:
use16
; just jump to the original vector
In 32-bit mode the "cmp ax,word" instruction would become "cmp eax,dword" and be two bytes longer, since the immediate value would become two bytes longer. And the "jmp short" instruction (code 0EBh) is exactly two bytes long, so it would get "eaten" by the 32-bit immediate. Thus this code really was forking correctly for the cases of 16-bit and 32-bit code execution. Also, "cmp" instruction was only destroying flags, and the flags were preserved on stack when calling interrupt handler anyway - so this solution seemed perfect.
Still, it was causing problems. PetroffHeroj reported to me that disk accesses were not working with this variant. Why? Well, we forgot about this tiny detail - not all of the interrupt vectors are actual pointers to interrupt handler. Some of them are used by BIOS or other programs as a pointers to data structures!
So we started to make a growing list of exceptions - the interrupt numbers that should not get replaced with "gates". But it was still possible, that some interrupt number would on one system be a pointer to some data structure, and on other system an interrupt handler. Well, it was even possible that something would have been both at the same time (for example some data at negative offsets before the code). So I realized this was a serious obstacle in making the unREAL version of fasm universal enough to make it official.
It appeared that the only correct solution would be to maintain two separate interrupt tables, one for 16-bit code, and one for 32-bit code, and switch between them when changing mode. But exchanging 1024 bytes of memory upon each switch was not looking like a nice thing. If only we could relocate the IDT table as in protected mode. Wait... who said that we cannot?
I only then realized, that it was possible to use "lidt" for real mode, too. So all I had to do was to initialize a dedicated interrupt table for 32-bit code (by default filled with simple gates switching to 16-bit mode and executing the 16-bit handlers) and then switch to it with "lidt" when entering unREAL. And restore the 0 base address for IDT when switching back to 16-bit. I tried - it worked. I sent it to PetroffHeroj - he was impressed, too. Finally we got it all working perfectly!
Still, there were some exotic processors where it did not work - but they were so rare that we did not care much about it. Later people reported that fasm's unREAL mode made Bochs emulator crash. Apparently the 32-bit real mode is so unknown (as opposed to FRM, which is quite popular) that even the emulator writers did not think (or realize) that it was something worth emulating. DOSBox does not emulate unREAL mode, too. So I finally decided to add a check for this case and show the appropriate error message ("processor is not able to enter 32-bit real mode"). And this was the last correction I had to do - here ends the story of my adventures with unREAL.
There is one more note, however - the size of return address, which is stored on stack upon entering the interrupt handler, depend on whether we are in real or protected mode only, and is not affected by the bitness of code. Thus there is always just CS:IP stored on stack and if interrupt happened while the code was executing at addresses higher than 0FFFFh, the high bits of EIP are lost. This means that the code segment in unREAL mode is still limited to 64K. There are some methods to partially deal with this problem - you can disable interrupts each time you call some routine above the 64K and re-enable them after it returns - but this can be applicable only in some cases. You can also store the high half of EIP somewhere each time you call the routine from other 64K block of code - but this is in fact so similar to just having multiple code segments, that it is not worth the effort in my opinion. Thus unREAL version of fasm still has to deal with 64K limit for code