OSDev.org

Posted: **Wed Aug 09, 2017 3:24 pm**

mallard wrote:Also, while it's more complex, it is possible to use V86 mode in a "mostly 64-bit" OS by switching to 32-bit pmode as an intermediate step.

In which case you can just switch to real mode itself and reduce hassle of implementing a v8086 monitor.

Posted: **Wed Aug 09, 2017 5:22 pm**

omarrx024 wrote:
mallard wrote:Also, while it's more complex, it is possible to use V86 mode in a "mostly 64-bit" OS by switching to 32-bit pmode as an intermediate step.
In which case you can just switch to real mode itself and reduce hassle of implementing a v8086 monitor.

Haha.. That's a joke, right?

Posted: **Wed Aug 09, 2017 5:26 pm**

LtG wrote:
omarrx024 wrote:
mallard wrote:Also, while it's more complex, it is possible to use V86 mode in a "mostly 64-bit" OS by switching to 32-bit pmode as an intermediate step.
In which case you can just switch to real mode itself and reduce hassle of implementing a v8086 monitor.
Haha.. That's a joke, right?

16 bit mode is a joke by itself. (COM files included)
It is 2017, why do people still use 16 bit stuff anways?

Posted: **Wed Aug 09, 2017 7:15 pm**

Octacone wrote:16 bit mode is a joke by itself. (COM files included)
It is 2017, why do people still use 16 bit stuff anways?

They shouldn't.

I wouldn't be surprised if soon (with the help of UEFI) AMD/Intel drop support for prot mode, and thus to real mode as well.

The only reason for v86 is for really really old games, but those can easily be supported thru emulation, given that modern computers are so much faster than any real mode game would expect so there's no issue. So no modern OS should provide any direct support for real mode, only thru emulation. So there's also no use for v86.

Beyond that the only thing that comes to mind is the ROM on peripherals that expect real mode, which basically boils down to graphics cards. Brendan has explained quite a few times why that's almost completely unnecessary, just set the graphics mode at boot to either the highest possible (and let the display downscale if it doesn't match) or set it to match the display and hope the display isn't changed without rebooting. If you ever get into the position that this stuff actually matters, then you already have quite a few users and can provide native drivers to some extent.

Personally I have no plans to support "downgrading" from long mode to prot mode just to support v86. If I end up using some type of bytecode then I plan not to support prot mode with x86_64 at all. Even prot mode is legacy these days..

Posted: **Wed Aug 09, 2017 11:41 pm**

Octacone wrote:16 bit mode is a joke by itself. (COM files included)
It is 2017, why do people still use 16 bit stuff anways?

I am working on a multitasking Desktop system running on (Free)DOS, so I thought it would be a good idea to actually support DOS-programs.

I know I know, a desktop doesn't count as an OS and doesn't belong to this forum.

About the "joke": At my work, DOS is running on a lot of systems to control our milling machines and lathes (is lathe the right word? I'm no native english speaker).

Posted: **Wed Aug 09, 2017 11:47 pm**

If your desktop is running on FreeDOS then you have no problem. It already has the ability to run 16-bit programs.

Posted: **Wed Aug 09, 2017 11:54 pm**

But FreeDOS isn't a multitasking system and I can't just start a new task. I have to write my own protected mode "kernel", which handles processes (something like Windows 3).

Posted: **Thu Aug 10, 2017 1:06 am**

So you're not running on FreeDOS?

Posted: **Thu Aug 10, 2017 1:50 am**

I am. My desktop is a program running on FreeDOS

Posted: **Thu Aug 10, 2017 4:25 pm**

Getting back to the original question - assuming that @Postmann isn't satisfied with the existing answers - you need to understand that .COM files a purely executable image files, and don't separate executable code from data in the least - there are no separate sections for them, so the loader isn't aware of which is which in the file. This is markedly different from the MZ (MS-DOS .EXE) format, or later formats such as PE (Windows .EXE), COFF (early 1990s Unix) and ELF (modern Unix,including Linux), which structure the files themselves to allow relocation, permit them to handle data separately, have BSS sections for defining uninitialized variables, etc. A .COM file is always loaded to address 0x0100 in the segment it is loaded into, and must fit entirely in one 64K segment - for both code and data.

It might help to understand the purpose of the .COM file a little better. The format was devised - based on the executable format of 8080/Z80 CP/M - and included in MS-DOS for three primary reasons. The first two are directly related to the segmented memory model, and are basically the same reasons Intel used it to begin with: to make it easier to port 8080 code to the 8086, and to make it easier for assembly-language programmers to migrate there 8080 coding skills. A lot of thought went into making the 8086 familiar to 8080 programmers, in part because they expected both to be used for the same purpose: embedded systems. They thought of both of them as short-term designs, and assumed most programming for both would be in assembly.

The third reason is simple efficiency, especially for the original 64KiB and 256KiB IBM PC models. The assumption was that, at least initially, most programs would fit inside a single segment, so in order to avoid doing unnecessary segment relocations, they had the .COM format that always loaded the whole program in exactly one way at exactly one (segment-relative) location. Why did Bill Gates (allegedly) say 640KiB was 'enough for anybody'? Because it was ten times what the baseline PC shipped with for that first year, which was already more than most small computers of the time had.

It was never expected that either the format, or the PC itself, would be around ten years later, never mind 36 (in fact, IBM's primary goal in releasing it was to kill the home computer market for good, and Intel didn't want to be in the home market at all). Seriously, if you had told Gordon Moore or Stephen Morse in 1977 that the 8086 would still be in use - in any form at all - forty years after it was designed, they would have thought you were insane.

Posted: **Fri Aug 11, 2017 3:15 am**

One thing to add in the origins on MS/PC-DOS executable formats is that .EXE was very much a "last minute addition". Even pre-release versions of DOS from very close to the final release have no support for .EXE files. .COM clearly wasn't just for 8080/Z80 ports or "familiarity", it was the only executable format until just weeks before release.

Schol-R-LEA wrote:in fact, IBM's primary goal in releasing it was to kill the home computer market for good

I'm not sure there's much evidence of that. IBM was never really interested in the home market (1985's PCjr was about the only home-oriented IBM product and was so much of a failure that the improved graphics capabilities it introduced became known as "Tandy-compatible" after the far more successful range of clones from Tandy). They were far more interested in the business market and were afraid that "cheap" (by the standards of the time) 8-bit CP/M-based business micros would eat into their minicomputer and even mainframe business.

Also note we're talking about "true" .COM files here. Many later .COM files are more than 64KB in size and are basically an arbitrary executable format joined to a .COM-compatible loader. You've even less chance of distinguishing code from data in one of these files, since you'd need to know details about the "real" executable format which are unlikely to be easy to find out.

Posted: **Fri Aug 11, 2017 7:50 am**

mallard wrote:One thing to add in the origins on MS/PC-DOS executable formats is that .EXE was very much a "last minute addition". Even pre-release versions of DOS from very close to the final release have no support for .EXE files. .COM clearly wasn't just for 8080/Z80 ports or "familiarity", it was the only executable format until just weeks before release.

I was unaware of this. That's an interesting fact, as it really indicates that the assumption was that most programs would be under 64KiB, as larger programs would have to perform explicit loading and manipulation of any additional segments (via, e.g., separate binary files, or one of the .COM-compatible extended formats you mention later). It is also a bit surprising that the use of a linkable and relocatable format was such a low priority at such a late date, when such were in regular use for mainframes by 1966 and minis by about ten years later.

mallard wrote:
Schol-R-LEA wrote:in fact, IBM's primary goal in releasing it was to kill the home computer market for good
I'm not sure there's much evidence of that. IBM was never really interested in the home market (1985's PCjr was about the only home-oriented IBM product and was so much of a failure that the improved graphics capabilities it introduced became known as "Tandy-compatible" after the far more successful range of clones from Tandy). They were far more interested in the business market and were afraid that "cheap" (by the standards of the time) 8-bit CP/M-based business micros would eat into their minicomputer and even mainframe business.

It is a bit... controversial, and I doubt it was ever an articulated strategy even internally, but several IBMers from the time have gone on record either opining that this is what they thought the company was doing, or what they thought it should have been doing. My understanding is that there was a lot of opposition to the Boca Raton group's proposal from the start, out of fear that it would 'legitimize' microcomputers; while they Boca Raton group themselves clearly thought it was something the company needed to be doing, there has been some claims that Armonk only greenlit the development project to shut the 'troublemakers' up.

According to some claims (I'd need to look them up), one of the arguments in favor of releasing it as a product was that they could use it as a way of shifting the 'home market' towards remote use of timesharing (which was itself still seen by some IBMers - including at least two former employees whom I personally spoke to in the early 1990s - as an undesirable intrusion into a 'properly controlled' batch-processed computing environment).

This particular view of the PC would later be embodied in the 3270 PC, which was basically a PC configured to act as a smart terminal, with the PC's CPU used only for local editing and display functions, as well as the PC/mainframe hybrids such as the XT/370, AT/370, and 7437 VM/SP, and the Personal/370, all of which were meant to work in conjunction with a System/370 or System/3090 series mainframe (or later, a System/390, it seems they'd given up by the time the zSeries were developed), and run some 370 programs in emulation.

So it seems that, while I can't say if this was the actual goal of upper management, a number of people at IBM thought it was, and it is likely that at least a faction of the management wanted it to be the primary strategy.

Posted: **Fri Aug 11, 2017 9:06 am**

I probably don't understand elf's design goals as well as I should, so I would like to ask a related question here. What was the original motivation for separating the code and the read-only data? I can see the benefit for security mitigations like data execution prevention, but does this choice affect performance one way or another.

I am asking, because I just realized that the elf approach to structuring information into sections and segments impacts the compiler design as well. Even if you produce binary blob on the final linking stage, the intermediate object files already separate the output according to its function. Wouldn't some architectures benefit from interleaving constants and code for better locality? Or even variable data and code, if the product wont be on a device that is facing a network, is simple enough, etc. I know that gcc sometimes does that by placing constants near the code, but what is elf's design philosophy on the subject? Say, reading from the same page in RAM (i.e. open device rows) incurs smaller latency than reading from different pages, so wouldn't interleaving code and data, on say 64 byte blocks to separate the cache lines from each other, help avoid an occasional tRCD. That is, if we are looking at the problem performance wise.

Sorry if I am hijacking, but thought the question is relevant enough.

P.S. The memory pages context is naive, in the sense that tRCD can be masked by the parallel operation of the banks on the memory device, by prefetching, etc, but it is just an example context.

Posted: **Fri Aug 11, 2017 12:14 pm**

I would need to do more research on the subject, but I can tell you a few things.

First, it is entirely possible for the compiler or assembler to emit blocks of data inside of the code sections, but IIUC, in the majority of cases, doing so would be counter-productive.

Now, to be fair, I am not entirely clear if by 'interleaved data' you mean just immediate operands, or non-immediate constants embedded in code, with the flow of execution going around or stopping prior to reaching the data block. Immediate addressing for most ISAs means that the data is part of the opcode, and does not need to be fetched separately, so it wouldn't be a concern for the object format at all.

Note that it is also the case that most ISAs' immediate addressing modes can only load a value that can fit into a single general register, and often is limited to one significantly smaller - for 32-bit MIPS, for example, immediate values cannot be larger than two bytes - half of one 32-bit instruction - as it has to fit into the same instruction as the opcode proper. The usual workaround in MIPS - where the 'load immediate' is actually a pseudo-instruction for 'OR immediate against the Zero register' - is to load the upper part of a larger value, then perform a 16-bit left shift, then OR in the rest of the large immediate value; the pseudo-instruction handling in the assembler is smart enough to emit this whenever 'li' is used with a larger value.

EDIT: apparently, the way 32-bit ARM does immediates is rather different from what I misunderstood it to be; it actually uses 12 bits, four for position in a 32 final value and 8 for the value itself. So, loading a 32-bit value could take up to four MOV or MVN instructions, but doesn't need any explicit shift instructions. If anyone can clarify this further, please do so.

Few instruction sets have 'short' addressing for data accesses, the way some do for jumps. As I understand it, the way associative caching works, there can be separate locality for instructions and data, or even multiple locality 'nodes' for both. Indeed, it is my understanding that most modern CPUs use separate caching for instructions and data, so I don't know if data locality relative to the instruction stream is even a factor, regardless of whether the data is writable or not. I may be wrong about this, however, so any corrections on this would be welcome.

I know that it is not uncommon on some systems - 6502 for the Apple II and C-64 in particular - for assembly coders to place certain kinds of data (e.g., strings) in the assembly code stream itself in the point of the code where it is used, and manually jump around that data when it is reached, but that approach really can't be applied in a meaningful way on more modern CPUs - it would probably degrade performance rather than improve it, if I understand this correctly. I don't know if this is at all what you had in mind, though.

Second, most systems use the same overall object format for several different kinds of object files, such as intermediate linkable object files, executable object files, static library files, and shared library files, with the header indicating which type a given file is. This is certainly the case with ELF, which has several different sub-types. The sections are designed so that, when linking several object files together, only those parts which the linker needs to patch for relocations and other link-time changes need to be altered, with the rest simply copied to the final executable file.

Third, depending on the memory architecture, the operating system, the program's needs, and several other factors, the loader may need to perform load time relocation, of either the code or data in the executable file, or of some form of shared/dynamic-link libraries. While most current systems use the paging mechanism to eliminate load-time relocation in the majority or cases, by mapping the code into the correct memory locations independent of the physical addressing, and relative addressing modes can reduce the need for it even further, there are usually at least some programs where load-time relocation is needed.

Fourth, most relocatable object formats - not just ELF - allow read-only data sections, read-write data, and BSS (uninitialized data space allocation declarations) and even auxiliary sections such as comments, stack frames, etc., to be separated from the text (code) sections in order to allow the linker to treat them separately for the purposes of managing relocation (among other things). the loader also needs to be able to manipulate them independently for similar reasons.

Finally, there is no reason why the loader could not merge them at run time, assuming that it would provide an advantage, and doing so would not add any overhead not inherent in a relocatable executable format already (no matter what else, the loader has to parse the headers of the object file, and find the individual sections, compute the sizes and assign addresses for the BSS variables, determine if there are any load-time relocations for either the code or the data, request that the paging system map any sections of the file that don't need relocation into memory pages, figure out which non-relocating sections need to be paged in immediately and which can be kept paged out, generate temporary pages for any that do need relocation, and do several other housekeeping tasks).

Posted: **Fri Aug 11, 2017 2:47 pm**

Note that it is also the case that most ISAs' immediate addressing modes can only load a value that can fit into a single general register, and often is limited to one significantly smaller - for 32-bit MIPS, for example, immediate values cannot be larger than two bytes - half of one 32-bit instruction - as it has to fit into the same instruction as the opcode proper. The usual workaround in MIPS - where the 'load immediate' is actually a pseudo-instruction for 'OR immediate against the Zero register' - is to load the upper part of a larger value, then perform a 16-bit left shift, then OR in the rest of the large immediate value; the pseudo-instruction handling in the assembler is smart enough to emit this whenever 'li' is used with a larger value. I believe that it is similar on ARM, which optimizes it a little - all instructions have an optional shift operand, which means that the first and second operation can be done in one instruction.

Just a note. I don't think ARM is better here. MIPS has managed to have a 16-bit immediate directly encoded in one instruction without stupid trickery, with the same 4 byte length, still having twice as much registers! What arm does is just pfff, it's brain damaging. It has only 12-bits for the encoded value and rotation (8+4). All those rotations mean it only can load a subset of values (but with encoding redundancy), and this turns assembly prigramming into something not as enjoyable as one could think of. So on arm, often you still would need a pair of movt/movw (or literal pool trickery which is even more "fun") to load immediates and addresses. interesting thing about it is that, you will know what exactly you need either after having to do this rotations manually every time or walking through assembler complains.

And next, ARM has more limited offset field length, which result in an even more brain damaging thing, - those infamous "literal pools". I was amazed how much easier all these things are on MIPS compared to ARM, despite they are almost twins.
what I mean about offsets. Suppose you have somewhere a symbol gMySymbol and you want to read from it into a register. The same applies to loading immediates into a register. On mips you just do:

Code: Select all

/* for loading an address, for farther ordinary variable manipulations */
la $t0, gMySymbol    /* this might result just in one istruiction if the addr of gMySymbol is 16-bit aligned */
lw $t1, 0($t0)

/* for loading an immediate */
li $t0, MY_IMMEDIATE

it's the best, it's all needed on a load/store risc architecture. You have a chance to load a 32-bit address even in one instruction if you are lucky to have it 16-bit aligned or immdeiate if it fits into 16 bit (taking in account possible sign extension, so, for example -1 will fit into 16 bit). or two maximum. the la peudoinstruction is rolled into either

Code: Select all

lui $t0, %hi(gMySymbol)

or

Code: Select all

lui $t0, %hi(gMySymbol)
ori $t0, $t0, %lo(gMySymbol)

li differs only in sign extension treatment, as it is intended for immediates, it can roll out into sign aware instructuions where it's possible, for minimizing the instruction number needed.
But on arm, with that pc-related addressing, and very limited offsets, you have something like that. First, your symbol is lying somewhere outside of your code section, obviously, it's data. So, at the end of your code section or even not at the end, you, manually, or your compiler, put an indirect pointer to your symbol into a "literal pool". which is a distinguishing arm "feature" capable not only conceptually mess up all things around and add an additional level of indirection, but also, well capable to mislead CPU's branch predictor, as an interesting side effect, because BP thinks it's instructions whereas they are not. Obviously, it's not as easy to follow arm recommendations to not make your data in a literal pool "look like jumps".
so you put this:

Code: Select all

LITERAL_POOL_ITEM: .long gMySymbol    @ that's right, a local, near pointer to gMySymbol, for the cpu to reach it with the limited pc-addressing

and then do:

Code: Select all

LDR r0, [pc, #(LITERAL_POOL_ITEM - . - 8)]  @the arithmetics inside is yet another fun stuff of arm.
LDR r1, [r0]

an immediate, "label" as arm calls it, even though it's an offset from the current instruction to the literal pool label, its width is limited to 12 bits. So only -4096/4095 bytes from the curret location can the symbol be placed to, thus the need to have a literal pool. indeed, even not a size limitation is a problem, rather the whole idea - there is no possibilty to know where the symbol ends up in a resulting section, it's just a different section, with this approach you need to put indirection pointers in the code section as this is the only way to know the offset to it (offsets to symbols from different sections will be resolved only at the link time, and most probably will not fit into 12-bit limit).
With immediates, only if your immediate can be encoded with rotations, you end up with 1 instruction. otherwise you need either movt/movw pair or literal pool indirection.
As to me, the arm approach sucks compared to the ellegant mips one.

However, of course, there is possibility to manually recreate mips-like behavior on arm, in this example - you just use movt/movw pair for loaing the address of your varibale you want to reach to (or an immediate). but it's you doing this manually, what the compiler does, is up to it. And, judging by the arm documentation, they think placing a literal pool item is a better choice. They have LDR pseudo-instrcution (yes, LDR could be both a pseudoinstruction and instruction), which deals with all the hassle of this, for an ordinary assembly writer (above, I did it manually to show how it works, there, LDR is an instruction), and this is what they write about its preferences:

armasm doc wrote: When using the LDR pseudo-instruction:
• If the value of expr can be loaded with a valid MOV or MVN instruction, the assembler uses that
instruction.
• If a valid MOV or MVN instruction cannot be used, or if the label_expr syntax is used, the assembler
places the constant in a literal pool and generates a PC-relative LDR instruction that reads the constant
from the literal pool.
Note
— An address loaded in this way is fixed at link time, so the code is not position-independent.
— The address holding the constant remains valid regardless of where the linker places the ELF
section containing the LDR instruction.
The assembler places the value of label_expr in a literal pool and generates a PC-relative LDR
instruction that loads the value from the literal pool.
If label_expr is an external expression, or is not contained in the current section, the assembler places a
linker relocation directive in the object file. The linker generates the address at link time.

Only immediate loading could take advantage of rotational encoding resulting in a 1 instruction. loading an address of your variable to read it or write, always results in an additional level of indirection through literal pool items. because arm dislikes movt/movw pair usage. What's better to have?
1 or 2 instructions not touching memory ("lui" or "lui/ori"), mips; "movt/movw" arm, but, articificially, in the arm case there is no possibilty to pick just a "movt" if it fits 16 bit - assembly follows ARM preference of literal pools and doesn't care about a nice behaving "la/li" analog. i forgot, there is "mov32" pseudo-instruction, but it always generates movt/movw pair, unlike mips's la/li. because lui zeroes lower 16-bits of the destination register, and movt doesn't.
or
1 memory touching instruction (ldr rX, [pc, #offset]), arm
at least at the ideological level and gastroenterological as well, the mips approach seems to be cleaner.

OSDev.org

COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?

Re: COM File, where is data/code located?