OSDev.org

Posted: **Sun Oct 08, 2006 2:29 am**

Hi all,

i've been working on some routines to check whether a file is executable (currently only .com files supported) i also read on os dev resource center that the com file is loaded in offset 100h and executed

can anyone tell me how to identify to .com file (i don't want my operating system to run text files

)
(note: my os is operating in real mode )
Thank you

Posted: **Sun Oct 08, 2006 7:39 am**

Hello joke,
there is no reliable way to identify .com files as they don't have any headers that could be checked. Wotsit.org mentions that most executables start with a jump and that they should normally not exceed the 64 KiB barrier of a real-mode segment. These checks will however only give you a rough idea whether the file is executable or not as there're probably many executable files that do not keep to these rules. Another simpe check would be to have a look at the file name extention and only run executables that end with *.com

In any case you operating system should never keep the user from exectuting something that looks dubious as you just can't tell for sure if the file really isn't an exectable. If you're not sure just print a prompt asking the user if he wants to proceed running a file that might be corrupted/misnamed/whatever.

regards,
gaf

Posted: **Sun Oct 08, 2006 8:41 am**

I agree with gaf, however I want to point out that *.COM extensions are derived from DOS and I beleive CPM, which gives the 64kB limitation. You can write COM files for 32 bit OSes that exceed 64kB.

Anything to consider are scripting files that are executeable too. Initially after reading your post it seemed you could scan a file for FFh, as there aren't any text I am aware of that uses that as a displayable symbol, though I could be wrong, I don't know all languages.

I suppose if your scheduler can isolate the EIP for each process, then you can see if the process is locked up and drop the process as a non executeable (again I agree with gaf, you don't want to limit execution, but more likely be able to recover from errors).

Just my own thoughts...

Posted: **Sun Oct 08, 2006 11:16 am**

smiddy wrote: Anything to consider are scripting files that are executeable too. Initially after reading your post it seemed you could scan a file for FFh, as there aren't any text I am aware of that uses that as a displayable symbol, though I could be wrong, I don't know all languages.

Technically, the scripts aren't executable, the programs that parse them are.

Also, are you implying that all binary/executable files with have a 0xff character in them? You definitely cannot rely on this, if that's the intention.

smiddy wrote: I suppose if your scheduler can isolate the EIP for each process, then you can see if the process is locked up and drop the process as a non executeable (again I agree with gaf, you don't want to limit execution, but more likely be able to recover from errors).

An attempt to execute a non-executable file will more likely generate an invalid opcode exception, or a GPF, rather then a loop. Some form of exception will be generated, at least (most likely).

.com files are a bad example of an executable, really. Any modern executable format will have a defined header, and substantial information describing the content of the file (ie, sections, locations, external link libraries, etc). These executables are much easier to spot (just look for the header).

--Jeff

Posted: **Mon Oct 09, 2006 5:07 am**

carbonBased,

I was only making suggestions (brainstorming, offering suggestions, what did you offer?), nothing is certain. If you want 100% reliability then you may want to ignore flat file formats altogether. Though that is another limiting factor. The question was how can you detect if a file is executeable and how to identify a .com file. What I suggested are ways to do so, they may not be reliable, but that wasn't a requirement from the question asked.

As for text files or scripts being executeable I will add that opcodes are interpreted as are segments of text in order to facilitate a logical routine.

One is directly interpreted by the CPU, the other is interpreted by software then by the CPU. A layer is between the text and the CPU, but none the less executeable. Compilers interpret code and arrange it in an opcode layout for the CPU to interpret and execute (not disimilar to text scritp interpreters). A software layer interpreting text dos the same thing, though executes the routine. Most modern OS' have an engine to interpret text scripts in order to execute them. Even legacy OS' had engines to do this. Technically, text files are executeable!

I agree that an invalid opcode will likely be generated (if you don't go through your text routine interpreter first, instead of directly running it as opcodes), however you can use that to remove it from running the process (this is recoverable).

Posted: **Wed Oct 18, 2006 5:42 am**

If there's no header you could try scanning the first 1kB of the file to see if they are valid instructions, although this method would require writing a parser that would recognise every possible IA-32 instruction.

Posted: **Wed Oct 18, 2006 7:44 am**

i'd suggest you concentrate on the two instructions that can terminate your .com program: INT 20h or INT 21h.

Btw, that would just catch the obvious case of hello.txt renamed into hello.com ... chances are that a .jpg or .wav file could make through. Seriously, .com files are just flat binary code, usually written in assembly (thus with no specific structure) and it could contain anything. If you want a somewhat more serious detection, enforce a proper format (e.g. prepending magic numbers and checksums) that wraps .com programs and that can be recognized. Now, if Mr. Anykey types "com2prg hello.txt", he'll be the only one to blame.

Posted: **Wed Oct 18, 2006 9:56 am**

If you want to be sure, why do you not say any com file to run on my OS MUST start like this:

Code: Select all

use16
        ORG   0x100  ; 0ffset of where our program is loaded to
        jmp   start   ; jump to the start of program.
        db    'COM1'  ; We check for this, to make shore it a valid Com file.

 ;----------------------------------------------------;
 ; Start of program.                                  ;
 ;----------------------------------------------------;
start:

Then on loading, you jump to that address and check for 'COM1'.