MZ and PM16

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
kerravon
Member
Member
Posts: 302
Joined: Fri Nov 17, 2006 5:26 am

MZ and PM16

Post by kerravon »

I had previously given up on MZ because I thought it was not technically possible to use it to migrate to PM16. I was going to use NE instead and be compatible between MSDOS 4 European and OS/2 1.x. Maybe only the latter. And this is for both 8086 and 80286. ie OS/2 1.x on 8086.

However, work on trying to create a mini Atari 68000 clone led to a new idea (ie not all of this came from me, and it's all tentative - and some of it came from previous discussion in this forum).

So the goal is to write an 8086 program circa 1984 that will survive the transition to PM16. But the PM16 target will be PDOS/286 or PDOS/PM16 with a linear address space for a single process/whatever you call it. The valid MSDOS program like microemacs will suddenly have access to nearly 16 MiB of memory (or much more when PM16 on the 80386 comes out). I am especially interested in huge memory model, including using industry standard Microsoft C 5.1 - now available for free under MIT license (binary only) which can run on a Book 8088.

Here is the MZ header:

typedef struct {
unsigned char magic[2]; /* "MZ" or "ZM". */ /* 0 */
unsigned short num_last_page_bytes; /* 2 */ /* page = 512 bytes */
unsigned short num_pages; /* 4 */
unsigned short num_reloc_entries; /* 6 */
unsigned short header_size; /* In paragraphs (16 byte). */ /* 8 */
unsigned short min_alloc; /* 10 A */
unsigned short max_alloc; /* 12 C */
unsigned short init_ss; /* 14 E */
unsigned short init_sp; /* 16 10 */
unsigned short checksum; /* 18 12 */
unsigned short init_ip; /* 20 14 */
unsigned short init_cs; /* 22 16 */
unsigned short reloc_tab_offset; /* 24 18 */
unsigned short overlay; /* 26 1A */
unsigned short reserved1[4]; /* First set of reserved words. */ /* 28 1C */
unsigned short oem_id; /* 36 24 */
unsigned short oem_info; /* 38 26 */
unsigned short reserved2[10]; /* Second set of reserved words */ /* 40 28 */
unsigned long e_lfanew; /* Offset to the PE header. */ /* 60 3C */
} Mz_hdr;
[kerravon@paul-pinebook src]$


I don't think I should populate e_lfanew - that should be reserved for when a 32-bit Win32 version of my program comes along, and someone wants to have it multiplatform or whatever the term is. So they can have a PE signature and override my 16-bit version (that works on standard MSDOS 2.0 or PDOS/286).

So maybe the field before e_lfanew - ideally something that doesn't clash with anyone else's MZ extensions.

Ok, so I believe there are two ways that linkers can resolve calls in medium/large memory models.

1. Keep the offset to a minimum, and change the segment.
2. Minimize changes to the segment until a single object code would exceed the offset limit ffff and then break into a new segment.

Number 2, if used - even by whim - and I think Watcom does that for its MSDOS executables - has already created a segmented executable like NE needs. However, the segment markers have been lost in MZ because no-one cares. But now I care.

Rather than throw the baby (MZ) out with the bathwater and switch to NE, it can instead just be NE-inspired.

The new ex-reserved word can point to an extension that identifies each segment length and whether it is code or data.

So:

0xfe00, 0x1 (code)
0xfc08, 0x1 (code)
0x3456, 0x1 (code)
0x235, 0x2 (data)

The relocatable information is not changed - it's normal - set assuming an 8086. There are no intersegment gaps - once again, normal 8086. The PM16 loader can move this stuff around, ie align each segment on a 64k boundary, and make the adjustments, with just the existing relocatable information - even though it is totally inappropriate for the 80286. And only the segment needs to be changed, as usual.

The next thing that is needed is support for Microsoft C's AHINCR/AHSHIFT. These only occur with huge memory model (or huge pointers, anyway).

So after the segment size information you have:

AHINCR info
segment 1
offset 1
offset 2
offset 3
segment 2
offset 1
offset 2

AHSHIFT info
segment 1
offset 1
offset 2

with all the places that need to be zapped on a PM16 environment - because the existing values are all set already, to values suitable for the 8086 - not appropriate for PM16.

Additionally, the PM16 loader will detect this "new format MZ" and set the first 4 bytes of BSS not to zero, but to a structure containing callback functions - mainly just two functions - an int86 and int86x.

The MZ application (I'm only trying to support new MZ executables - new rules for new executables - not trying to make existing programs run on somewhere other than MSDOS) is expected to call int86 to do its work, and it checks a flag (that variable in BSS), and if it is non-zero, it does a callback, otherwise it executes an INT instruction. This means a PM16 environment doesn't need to have interrupt handlers, and nor does some usermode clone need to have privilege to intercept real INT instructions. The application itself is supposed to gracefully return control to the person/system/OS/util who/that loaded the executable.

There may be an existing executable format that does some/all of that, which means I could hook into that existing format (or a subset of that format). I don't think "NE" fits the bill - it's a replacement for MZ.

Also note that I'm not interested in self-relocating executables. I only want extra data (no code) - minimal extra data (for PM16) added to a beautiful, pure, simple, stock-standard MZ executable. If Watcom was used as the compiler - or if no huge pointers are present, the extra information is very trivial - a handful of bytes showing the segment lengths (to identify boundaries).

And nor do I want conditional execution plastered through the code - that one place in int86 would be fine though. Basically the OS nominally tells you which method it would like the application to use to interact with it. It's still MSDOS INT 21H calls - but they don't result in a real interrupt. This is not the same as OS/2 1.x where it's the other way around - OS/2 dictates the API and you only have a subset available if you are using an MSDOS system. Not sure what else Family API involves - but it's not a simple, pure, MZ executable - it's NE that is smart enough to cope with being run under MSDOS. I think that's the situation, anyway.

Note that one of the things I do is use as86 and ld86 to produce MZ executables, and it uses a.out as the intermediate format. I thought that might have been technically impossible, but nope, it was able to cope WITH THE SUBSET I am willing to live with. So I was wondering how the AHINCR/SHIFT worked, and it appears that those go into the a.out symbol table as values. That was just from a quick look at the a.out from dossupa.asm - I may be mistaken.

The above proposal will require support from the linker - ie, ld86, to organize an appropriate BSS variable. Not exactly sure how that will work. Presumably some special name it needs to detect. It will need to detect AHINCR/SHIFT anyway..

Any thoughts?

Any existing executable format that suits all my needs without being overly complicated because they wanted to add other unrelated stuff (like DLLs), so made it more complicated?

I don't mind the more complicated format so long as I can use a subset without too much fuss.

Oh - other schemes other than the first 4 bytes of BSS being non-zero are possible - something in the PSP? - suggestions?

Thanks. Paul.
kerravon
Member
Member
Posts: 302
Joined: Fri Nov 17, 2006 5:26 am

Re: MZ and PM16

Post by kerravon »

kerravon wrote: Wed May 22, 2024 11:36 am Ok, so I believe there are two ways that linkers can resolve calls in medium/large memory models.

1. Keep the offset to a minimum, and change the segment.
2. Minimize changes to the segment until a single object code would exceed the offset limit ffff and then break into a new segment.
Microsoft C 6.0 appears to instead do:

3. Start each new object file on its own segment.

However, there is a /packcode option to the linker that will combine code into a minimal number of segments, which is what we ideally want.

There is a new problem though - Microsoft C is generating a new segment for FAR_BSS which would be difficult to detect in the loader. Also, I was wondering how that was meant to be initialized - it turns out that that is bytes in the executable itself! And indeed, maybe that is exactly how to detect it.
0xfe00, 0x1 (code)
0xfc08, 0x1 (code)
0x3456, 0x1 (code)
0x235, 0x2 (data)
Otherwise this problem could likely be solved by just keeping a list of all unique segment references, which should be minimal (and this will give you the length too), and assuming that the last one is data.

However, that problem, and also this problem:
The next thing that is needed is support for Microsoft C's AHINCR/AHSHIFT. These only occur with huge memory model (or huge pointers, anyway).
can be solved by switching to Watcom.
rdos
Member
Member
Posts: 3371
Joined: Wed Oct 01, 2008 1:55 pm

Re: MZ and PM16

Post by rdos »

I don't think I understand what you are trying to do. I used MZ format in both device drivers and the kernel (still do for the kernel as a legacy), but my restriction was that they couldn't contain the SEG operator, which C compilers output at many places. Thus, I required executables without relocations, which could only be supported for assembler source.

The biggest problem I see is that you cannot access 16M in real-mode, since segment arithmetic is built into real mode. This is what AHINCR/AHSHIFT is all about. They must have fixed values in real mode, and I think they should be 4096 and 12. In protected mode, which must be used to access 16M, you need to allocate segments (selectors) in the GDT or LDT. The way these worked back then was that for segments larger than 64k, the OS would allocate multiple descriptors consecutively. This meant that AHINCR/AHSHIFT were now 8 and 3 instead. However, if you run in protected mode, you cannot call DOS or BIOS functions that must be run in real mode. You cannot pass pointers above 1M to real mode, even if you can allocate them in protected mode.

This problem was not solved until the 386, when virtual 8086 mode was introduced. In that environment, you can both virtualize IO and the interrupt flag, and use paging to pass pointers that reside above 1M (physical address). However, this mode still does not allow passing linear addresses above 1M to DOS or BIOS. This must be handled with paging.
kerravon
Member
Member
Posts: 302
Joined: Fri Nov 17, 2006 5:26 am

Re: MZ and PM16

Post by kerravon »

rdos wrote: Sun Apr 27, 2025 7:39 am The biggest problem I see is that you cannot access 16M in real-mode, since segment arithmetic is built into real mode. This is what AHINCR/AHSHIFT is all about. They must have fixed values in real mode, and I think they should be 4096 and 12. In protected mode, which must be used to access 16M, you need to allocate segments (selectors) in the GDT or LDT. The way these worked back then was that for segments larger than 64k, the OS would allocate multiple descriptors consecutively. This meant that AHINCR/AHSHIFT were now 8 and 3 instead.
16 and 4 for my implementation.
However, if you run in protected mode, you cannot call DOS or BIOS functions that must be run in real mode. You cannot pass pointers above 1M to real mode, even if you can allocate them in protected mode.
I use UEFI, not the BIOS.
This problem was not solved until the 386, when virtual 8086 mode was introduced. In that environment, you can both virtualize IO and the interrupt flag, and use paging to pass pointers that reside above 1M (physical address). However, this mode still does not allow passing linear addresses above 1M to DOS or BIOS. This must be handled with paging.
There is more than one way to skin a cat. :-)

https://www.bttr-software.de/forum/boar ... p?id=22441
rdos
Member
Member
Posts: 3371
Joined: Wed Oct 01, 2008 1:55 pm

Re: MZ and PM16

Post by rdos »

UEFI cannot run on a 286 CPU and not even a 386 since back in the days of the 386 every PC had only BIOS. :mrgreen:

Also, there is no need to use extended 128 bit descriptors in 64 bit UEFI. It works just a well to allocate standard descriptors. Descriptors are a limited resource so no use to waste them by allocating two instead of only one. You can only set 32 bit bases regardless.
kerravon
Member
Member
Posts: 302
Joined: Fri Nov 17, 2006 5:26 am

Re: MZ and PM16

Post by kerravon »

rdos wrote: Sun Apr 27, 2025 12:33 pm UEFI cannot run on a 286 CPU and not even a 386 since back in the days of the 386 every PC had only BIOS. :mrgreen:
The pseudobios will need to be replaced for an 80286 port.
Also, there is no need to use extended 128 bit descriptors in 64 bit UEFI. It works just a well to allocate standard descriptors. Descriptors are a limited resource so no use to waste them by allocating two instead of only one. You can only set 32 bit bases regardless.
I am using standard descriptors. One code and one data descriptor for every 64k chunk of memory starting at 0. So aliasing is just adding/subtracting 8.

It's meant to be a simple, understandable system.
rdos
Member
Member
Posts: 3371
Joined: Wed Oct 01, 2008 1:55 pm

Re: MZ and PM16

Post by rdos »

No BIOS will solve the issue of passing protected mode segment registers to real mode. You cannot even switch back to real mode on a 286 processor since this is not supported. You will need a protected mode BIOS and DOS implementation, which never existed for 286.
kerravon
Member
Member
Posts: 302
Joined: Fri Nov 17, 2006 5:26 am

Re: MZ and PM16

Post by kerravon »

rdos wrote: Mon Apr 28, 2025 2:26 pm No BIOS
I said pseudobios - which is what I call my layer on top of the BIOS.
will solve the issue of passing protected mode segment registers to real mode.
I already do "this" with PDOS/386. I switch from PM32 to RM16 in order to do the BIOS interrupt.

It is "just" a matter of repeating that with PM16.
You cannot even switch back to real mode on a 286 processor since this is not supported.
That's not correct. Intel documented that you need to do a triple fault to get back to RM16. IBM engineers didn't notice that, and instead came up with a convoluted method involving the keyboard. And then that was hidden behind an official BIOS interrupt so that theoretically they should be able to switch to the documented method if they wish to.
You will need a protected mode BIOS and DOS implementation, which never existed for 286.
There is more than one way to skin a cat. :-)
rdos
Member
Member
Posts: 3371
Joined: Wed Oct 01, 2008 1:55 pm

Re: MZ and PM16

Post by rdos »

Even if you can switch from PM16 to RM16 using some trickery, RM16 will still not be able to use linear addresses above 1M, so you cannot pass pointers to descriptors mapped above 1M. Which kind of means you will not be able to support 16M of memory. Unless you add some trickery with intermediate buffers below 1M. Or use a PM16 BIOS and DOS.
rdos
Member
Member
Posts: 3371
Joined: Wed Oct 01, 2008 1:55 pm

Re: MZ and PM16

Post by rdos »

kerravon wrote: Mon Apr 28, 2025 3:31 pm I already do "this" with PDOS/386. I switch from PM32 to RM16 in order to do the BIOS interrupt.
For 386, there is no need for this kind of trickery. The V86 mode was added to handle this in a safe manner that did not require changing processor mode. By using paging, it's also possible to do BIOS calls that require buffers without a need to copy the content to an intermediate buffer below 1M.
rdos
Member
Member
Posts: 3371
Joined: Wed Oct 01, 2008 1:55 pm

Re: MZ and PM16

Post by rdos »

Early in my OS development, my goal was to support multithreaded (and multiprocess) DOS programs. I used V86 mode for this, and provided my own DOS implementation that was thread-safe. I even supported DPMI and dos-extenders. I also had an emulator that could emulate contents in the DOS and BIOS data area so these could be virtualized. This worked well, but I didn't find the DOS environment useful for more modern projects, so I implemented my own 32-bit system, and eventually DOS support has eroded and probably no longer works. So, it's possible to support multithreading in DOS, but switching processor modes to do BIOS interrupts simply breaks this. I never relied on BIOS for device-drivers, rather had my own native drivers. Relying on BIOS for devices simply would make the system just as unstable as MS-DOS and early Windows versions.
kerravon
Member
Member
Posts: 302
Joined: Fri Nov 17, 2006 5:26 am

Re: MZ and PM16

Post by kerravon »

rdos wrote: Tue Apr 29, 2025 1:53 am Even if you can switch from PM16 to RM16 using some trickery, RM16 will still not be able to use linear addresses above 1M, so you cannot pass pointers to descriptors mapped above 1M.
Sure. Just like I can't in PM32 either, which is what PDOS/386 runs in. So?
Which kind of means you will not be able to support 16M of memory.
Yes I can.
Unless you add some trickery with intermediate buffers below 1M.
And you just explained how.

I don't consider that to be "trickery" though. Just design. Not necessarily the best design for commercial purposes though.
For 386, there is no need for this kind of trickery. The V86 mode was added to handle this in a safe manner that did not require changing processor mode. By using paging, it's also possible to do BIOS calls that require buffers without a need to copy the content to an intermediate buffer below 1M.
Which is a lot more code and complication required.

Yes, if you're after a commercially viable solution, that would probably be the way to go.

If you're after an understandable system, my way I understand already. I needed that knowledge/understanding to get into PM32.

I would have to learn V86.

So would anyone else who wants to know how to write an OS.

And my way I just change the pseudobios (the same as CP/M had a BIOS). My PM32 OS, with PDOS-generic, is completely portable. And I've got it largely demonstrated on a modern x64 system (both CM16 and CM32 - the same OS code will run on both). The V86 code would be useless in that situation.

This way I just switch pseudobios, the same as CP/M did. CP/M wasn't written in assembler either. Although I'm not familiar with the internals of CP/M. They may be doing things differently to the way I do things.
kerravon
Member
Member
Posts: 302
Joined: Fri Nov 17, 2006 5:26 am

Re: MZ and PM16

Post by kerravon »

BTW, section 9.6.2 of the "80286 and 80287 Programmer's Reference Manual":

https://bitsavers.org/components/intel/ ... l_1987.pdf

has the reference to exiting PM16 via triple fault on an 80286.

Now to find the BIOS interrupt ...
kerravon
Member
Member
Posts: 302
Joined: Fri Nov 17, 2006 5:26 am

Re: MZ and PM16

Post by kerravon »

Looks like I might have been wrong about there being a BIOS interrupt.

Time for me to find out how to do a triple fault I guess!
Post Reply