Hi all,
i've been working on some routines to check whether a file is executable (currently only .com files supported) i also read on os dev resource center that the com file is loaded in offset 100h and executed
can anyone tell me how to identify to .com file (i don't want my operating system to run text files )
(note: my os is operating in real mode )
Thank you
.com
Hello joke,
there is no reliable way to identify .com files as they don't have any headers that could be checked. Wotsit.org mentions that most executables start with a jump and that they should normally not exceed the 64 KiB barrier of a real-mode segment. These checks will however only give you a rough idea whether the file is executable or not as there're probably many executable files that do not keep to these rules. Another simpe check would be to have a look at the file name extention and only run executables that end with *.com
In any case you operating system should never keep the user from exectuting something that looks dubious as you just can't tell for sure if the file really isn't an exectable. If you're not sure just print a prompt asking the user if he wants to proceed running a file that might be corrupted/misnamed/whatever.
regards,
gaf
there is no reliable way to identify .com files as they don't have any headers that could be checked. Wotsit.org mentions that most executables start with a jump and that they should normally not exceed the 64 KiB barrier of a real-mode segment. These checks will however only give you a rough idea whether the file is executable or not as there're probably many executable files that do not keep to these rules. Another simpe check would be to have a look at the file name extention and only run executables that end with *.com
In any case you operating system should never keep the user from exectuting something that looks dubious as you just can't tell for sure if the file really isn't an exectable. If you're not sure just print a prompt asking the user if he wants to proceed running a file that might be corrupted/misnamed/whatever.
regards,
gaf
- smiddy
- Member
- Posts: 127
- Joined: Sun Oct 24, 2004 11:00 pm
- Location: In my cube, like a good leming. ;-)
I agree with gaf, however I want to point out that *.COM extensions are derived from DOS and I beleive CPM, which gives the 64kB limitation. You can write COM files for 32 bit OSes that exceed 64kB.
Anything to consider are scripting files that are executeable too. Initially after reading your post it seemed you could scan a file for FFh, as there aren't any text I am aware of that uses that as a displayable symbol, though I could be wrong, I don't know all languages.
I suppose if your scheduler can isolate the EIP for each process, then you can see if the process is locked up and drop the process as a non executeable (again I agree with gaf, you don't want to limit execution, but more likely be able to recover from errors).
Just my own thoughts...
Anything to consider are scripting files that are executeable too. Initially after reading your post it seemed you could scan a file for FFh, as there aren't any text I am aware of that uses that as a displayable symbol, though I could be wrong, I don't know all languages.
I suppose if your scheduler can isolate the EIP for each process, then you can see if the process is locked up and drop the process as a non executeable (again I agree with gaf, you don't want to limit execution, but more likely be able to recover from errors).
Just my own thoughts...
- carbonBased
- Member
- Posts: 382
- Joined: Sat Nov 20, 2004 12:00 am
- Location: Wellesley, Ontario, Canada
- Contact:
Technically, the scripts aren't executable, the programs that parse them are.smiddy wrote: Anything to consider are scripting files that are executeable too. Initially after reading your post it seemed you could scan a file for FFh, as there aren't any text I am aware of that uses that as a displayable symbol, though I could be wrong, I don't know all languages.
Also, are you implying that all binary/executable files with have a 0xff character in them? You definitely cannot rely on this, if that's the intention.
An attempt to execute a non-executable file will more likely generate an invalid opcode exception, or a GPF, rather then a loop. Some form of exception will be generated, at least (most likely).smiddy wrote: I suppose if your scheduler can isolate the EIP for each process, then you can see if the process is locked up and drop the process as a non executeable (again I agree with gaf, you don't want to limit execution, but more likely be able to recover from errors).
.com files are a bad example of an executable, really. Any modern executable format will have a defined header, and substantial information describing the content of the file (ie, sections, locations, external link libraries, etc). These executables are much easier to spot (just look for the header).
--Jeff
- smiddy
- Member
- Posts: 127
- Joined: Sun Oct 24, 2004 11:00 pm
- Location: In my cube, like a good leming. ;-)
carbonBased,
I was only making suggestions (brainstorming, offering suggestions, what did you offer?), nothing is certain. If you want 100% reliability then you may want to ignore flat file formats altogether. Though that is another limiting factor. The question was how can you detect if a file is executeable and how to identify a .com file. What I suggested are ways to do so, they may not be reliable, but that wasn't a requirement from the question asked.
As for text files or scripts being executeable I will add that opcodes are interpreted as are segments of text in order to facilitate a logical routine. One is directly interpreted by the CPU, the other is interpreted by software then by the CPU. A layer is between the text and the CPU, but none the less executeable. Compilers interpret code and arrange it in an opcode layout for the CPU to interpret and execute (not disimilar to text scritp interpreters). A software layer interpreting text dos the same thing, though executes the routine. Most modern OS' have an engine to interpret text scripts in order to execute them. Even legacy OS' had engines to do this. Technically, text files are executeable!
I agree that an invalid opcode will likely be generated (if you don't go through your text routine interpreter first, instead of directly running it as opcodes), however you can use that to remove it from running the process (this is recoverable).
I was only making suggestions (brainstorming, offering suggestions, what did you offer?), nothing is certain. If you want 100% reliability then you may want to ignore flat file formats altogether. Though that is another limiting factor. The question was how can you detect if a file is executeable and how to identify a .com file. What I suggested are ways to do so, they may not be reliable, but that wasn't a requirement from the question asked.
As for text files or scripts being executeable I will add that opcodes are interpreted as are segments of text in order to facilitate a logical routine. One is directly interpreted by the CPU, the other is interpreted by software then by the CPU. A layer is between the text and the CPU, but none the less executeable. Compilers interpret code and arrange it in an opcode layout for the CPU to interpret and execute (not disimilar to text scritp interpreters). A software layer interpreting text dos the same thing, though executes the routine. Most modern OS' have an engine to interpret text scripts in order to execute them. Even legacy OS' had engines to do this. Technically, text files are executeable!
I agree that an invalid opcode will likely be generated (if you don't go through your text routine interpreter first, instead of directly running it as opcodes), however you can use that to remove it from running the process (this is recoverable).
- AndrewAPrice
- Member
- Posts: 2309
- Joined: Mon Jun 05, 2006 11:00 pm
- Location: USA (and Australia)
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
i'd suggest you concentrate on the two instructions that can terminate your .com program: INT 20h or INT 21h.
Btw, that would just catch the obvious case of hello.txt renamed into hello.com ... chances are that a .jpg or .wav file could make through. Seriously, .com files are just flat binary code, usually written in assembly (thus with no specific structure) and it could contain anything. If you want a somewhat more serious detection, enforce a proper format (e.g. prepending magic numbers and checksums) that wraps .com programs and that can be recognized. Now, if Mr. Anykey types "com2prg hello.txt", he'll be the only one to blame.
Btw, that would just catch the obvious case of hello.txt renamed into hello.com ... chances are that a .jpg or .wav file could make through. Seriously, .com files are just flat binary code, usually written in assembly (thus with no specific structure) and it could contain anything. If you want a somewhat more serious detection, enforce a proper format (e.g. prepending magic numbers and checksums) that wraps .com programs and that can be recognized. Now, if Mr. Anykey types "com2prg hello.txt", he'll be the only one to blame.
If you want to be sure, why do you not say any com file to run on my OS MUST start like this:
Then on loading, you jump to that address and check for 'COM1'.
Code: Select all
use16
ORG 0x100 ; 0ffset of where our program is loaded to
jmp start ; jump to the start of program.
db 'COM1' ; We check for this, to make shore it a valid Com file.
;----------------------------------------------------;
; Start of program. ;
;----------------------------------------------------;
start: