COM File, where is data/code located?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Postmann
Posts: 17
Joined: Wed Jul 05, 2017 9:39 pm
Libera.chat IRC: Postmann

COM File, where is data/code located?

Post by Postmann »

Hello :)
I have a problem loading DOS .COM binaries. How do I know where the data is located in the file, and where the code? I always thought, the data would be located at the end, which is right, when my code looks like this:

Code: Select all

section .code
	les bx, [ptrstr]
	ret
section .data
	str2 db 'Hello Hallo Hola. $'
	ptrstr dd str2
But when I swap the sections, the code is at the end in the binary (compiled with NASM):

Code: Select all

section .data
	str2 db 'Hello Hallo Hola. $'
	ptrstr dd str2
section .code
	les bx, [ptrstr]
	ret
In my execution-handler, I simply "call" the address, where the binary was loaded to. But this doesn't work, when data comes first.
Any ideas?
alexfru
Member
Member
Posts: 1118
Joined: Tue Mar 04, 2014 5:27 am

Re: COM File, where is data/code located?

Post by alexfru »

Code and data can be anywhere in the file. The only requirement is that the file begins with an instruction where execution starts. It's often a jump to some other code. Between the jump and that code there can be code or data. DOS .EXE files have a file header, which tells where the first executable instruction is in the file and it doesn't have to be right after the header. But even in .EXEs there's no requirement on how code and data are arranged. You can have any sequence of code and data, e.g. code, data, code again, data again, etc. DOS executables were designed to run on processors without memory protection and DOS did not support any kind of virtual memory, so there was never a need to distinguish code from data, load code or data on demand, share pieces of code or data (DLLs), etc.
Postmann
Posts: 17
Joined: Wed Jul 05, 2017 9:39 pm
Libera.chat IRC: Postmann

Re: COM File, where is data/code located?

Post by Postmann »

But how do I check, what's data and what's code?
My loader is converting the 16bit rm-code to 32bit pm-code, since I don't want to play with vm8086. But it can't distinguish code and data. :?
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

Re: COM File, where is data/code located?

Post by mariuszp »

Postmann wrote:But how do I check, what's data and what's code?
My loader is converting the 16bit rm-code to 32bit pm-code, since I don't want to play with vm8086. But it can't distinguish code and data. :?
You can't tell. Code is just a special type of data. And COM files do not make the distinsction (in fact, they might even use data as code, etc). What you're trying to do is impossible in the general case.
Postmann
Posts: 17
Joined: Wed Jul 05, 2017 9:39 pm
Libera.chat IRC: Postmann

Re: COM File, where is data/code located?

Post by Postmann »

So, back to vm8086 I guess. Fun :(
Thanks anyway
User avatar
zaval
Member
Member
Posts: 673
Joined: Fri Feb 17, 2017 4:01 pm
Location: Ukraine, Bachmut
Contact:

Re: COM File, where is data/code located?

Post by zaval »

mariuszp wrote:
Postmann wrote:But how do I check, what's data and what's code?
My loader is converting the 16bit rm-code to 32bit pm-code, since I don't want to play with vm8086. But it can't distinguish code and data. :?
You can't tell. Code is just a special type of data. And COM files do not make the distinsction (in fact, they might even use data as code, etc). What you're trying to do is impossible in the general case.
really? what if I told you there are jump instructions? :D if one needs data coming before code in a non-structured file (like COM files), he/she (<- yes, I am for diversity by both hands) needs to insert a jump instruction at the beginning of data, which will jump over.
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).
User avatar
TheCool1Kevin
Posts: 24
Joined: Fri Oct 14, 2016 7:37 pm
Location: Canada
Contact:

Re: COM File, where is data/code located?

Post by TheCool1Kevin »

Why would you need to distinguish between code and data? Since COM is 16 bit you need to do some segmenting and the code must fit within that segment. Let's say the COM file is located at the segment CS:0x0000, then you would execute a

Code: Select all

call cs:0x100
to run the program and the program will terminate with a

Code: Select all

ret
instruction. Note the offset of 0x100!
http://i.imgur.com/cuyliWz.png
LiquiDOS, my weird hobbyist OS.
"Strive for progress, not perfection" - Anonymous
Postmann
Posts: 17
Joined: Wed Jul 05, 2017 9:39 pm
Libera.chat IRC: Postmann

Re: COM File, where is data/code located?

Post by Postmann »

Ya, but during loading, I am converting the 16bit instructions to 32bit (some sort of recompiling) and changing JMPs, CALLs and pointers, so I can execute them in protected mode.
User avatar
BrightLight
Member
Member
Posts: 901
Joined: Sat Dec 27, 2014 9:11 am
Location: Maadi, Cairo, Egypt
Contact:

Re: COM File, where is data/code located?

Post by BrightLight »

Postmann wrote:Ya, but during loading, I am converting the 16bit instructions to 32bit (some sort of recompiling) and changing JMPs, CALLs and pointers, so I can execute them in protected mode.
There's no real reason to translate the 16-bit instructions of a COM binary into 32-bit instructions. If you insist on running COM binaries from protected mode, you really have two options.
  • Use v8086.
  • Write a software CPU implementation.
The first option is somewhat easier because the CPU can do most of the dirty work, and all you need to do is handle interrupts and other v8086 GPFs. However, you can't do this in 64-bit mode, because v8086 is removed. The second option is mostly a project of its own, and doesn't belong with OSDev, really. In both cases, you need to implement the DOS API (INT 0x21) functions.

Unless your goal is to run 16-bit binaries in 32-bit mode, this is mostly pointless and will never get anywhere.
You know your OS is advanced when you stop using the Intel programming guide as a reference.
Postmann
Posts: 17
Joined: Wed Jul 05, 2017 9:39 pm
Libera.chat IRC: Postmann

Re: COM File, where is data/code located?

Post by Postmann »

I guess, I will write an emulator for 16bit-code then. :?
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

Re: COM File, where is data/code located?

Post by mariuszp »

zaval wrote:
mariuszp wrote:
Postmann wrote:But how do I check, what's data and what's code?
My loader is converting the 16bit rm-code to 32bit pm-code, since I don't want to play with vm8086. But it can't distinguish code and data. :?
You can't tell. Code is just a special type of data. And COM files do not make the distinsction (in fact, they might even use data as code, etc). What you're trying to do is impossible in the general case.
really? what if I told you there are jump instructions? :D if one needs data coming before code in a non-structured file (like COM files), he/she (<- yes, I am for diversity by both hands) needs to insert a jump instruction at the beginning of data, which will jump over.
But that wasn't the question. The question was whether you can tell what is "data" and what is "code"; and I answered that unless there is ancilliary information, you can't tell. So you cannot write a program which, for the general case, translates 16-bit instructions into 32-bit, while skiping over data (it would also be a mess in general even if it succeeded, since it would have to adjust all offsets etc).
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: COM File, where is data/code located?

Post by ~ »

Wherever you want and wherever it works.

It's just a binary that needs to start with valid code, and you can do anything from there.
User avatar
iansjack
Member
Member
Posts: 4811
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: COM File, where is data/code located?

Post by iansjack »

Wouldn't it be easier just to compile the programs as 32-bit protected mode in the first place? Compiling them as real mode and converting as you load them sounds crazy
Postmann
Posts: 17
Joined: Wed Jul 05, 2017 9:39 pm
Libera.chat IRC: Postmann

Re: COM File, where is data/code located?

Post by Postmann »

I want to be able to execute 16bit DOS-programs though. I am going for the emulator now. :wink:
mallard
Member
Member
Posts: 280
Joined: Tue May 13, 2014 3:02 am
Location: Private, UK

Re: COM File, where is data/code located?

Post by mallard »

omarrx024 wrote:The second option is mostly a project of its own, and doesn't belong with OSDev, really.
If you're not averse to using existing, well-tested, third-party code in your OS (I'm not going to get into the debate about whether you should do this, but I have no issue with it; I see myself as the "architect" of my OS, not the "bricklayer"), the emulation option becomes by far the simplest and most compatible option. The "libx86emu" emulator is pretty trivial to port and has been in widespread use for some time (forming part of SciTech's graphics driver products and the driver layers of XFree86/Xorg; used extensively on non-x86 platforms to run video BIOS code). It's not exactly the most performant emulator ever, but there's not a lot of real-mode x86 code that actually needs to be run at full speed on a modern CPU.

Also, while it's more complex, it is possible to use V86 mode in a "mostly 64-bit" OS by switching to 32-bit pmode as an intermediate step. Unofficial patches exist to implement this on Linux. However, some recent CPUs have shipped with bugs in their V86 mode implementations, so I'd still recommend emulation as a more future-proof solution.
Image
Post Reply