OSDev.org

Posted: **Tue Jun 20, 2006 9:26 am**

This isn't for any practical application, it's just a question for interest.

In the Intel manuals, it details various steps which should be taken if code is self-modifying. These can be easily implemented in assembly. However, do compilers make any attempts to recognise that you're writing self-modifying code and account for this? Obviously there are cases where it can't tell, but if you do:

Code: Select all

void f(void)
{
    unsigned short *smc = (unsigned short*) (void*) &f2;
    // Write something to that address (in this case, UD2 instruction)
    *smc = 0x0B0F;
    // ^-- This will (I imagine?) probably segfault except in some circumstances (like kernel code?)
    f2();
    // ^-- Should generate invalid-opcode exception

Then it's quite clear that you're calling memory which has been modified. This is actually a really poor example, because I'm guessing the processor's caches wouldn't have a problem here - but it gets the general principle across.

I'm guessing the answer is 'No' and that it will conceivably throw a warning (at least), but I'm just curious (and don't have a compiler to hand here).

Posted: **Tue Jun 20, 2006 10:13 am**

Since you are technically speaking relying on undefined behaviour of your C compiler, it will not help you.

You will have to manually take care of everything. Inline assembly can help with this ofcourse.

That said, self-modifying code isn't such a huge problem, unless you try to local modifications to the code you are currently executing. If you generate new blocks of code at runtime, then call that after it's ready, there shouldn't be that many problems.

If you try to do something clever like branch out of an inner loop by adding a jump instruction in the middle of the loop, then you are going to hit problems.

I guess the main problem here is that nobody likes to implement superscalar processors which deal with modifications to instructions they've already decoded and scheduled and which might be currently executing. If one bothers to detect that, the only reasonable thing to do would be clearing the whole pipeline, and the performance hit for doing that is quite nasty.

I'm not sure if there are futher issues with instruction cache, but I'd guess the processors deal with that one properly (I might be wrong though). In any case, if you don't modify the code you are currently executing, then the memory you are modifying is just data, until you branch into it, so for practical purposes it need not be considered as "self-modifying" in the sense of causing problems with processor.

If you need to generate code that you then immediately need to jump into, then consider using inline-assembly for the necessary syncronization or whatever.

Posted: **Tue Jun 20, 2006 10:37 am**

i'd say if you try to mess self-modifying code with a high-level compiler, you're about to face big issues. You should at least rely on a sort of processor-abstraction library to notify the system some code has changed, and you're probably better to have the modifiable code written in assembly rather than in HLL.

On the other side, there are many cases where you see program writing programs and directly running them: just-in-time interpreters/compiler. Still, in that case, the global environment is designed to face such issues (or at least it should be

)

Posted: **Tue Jun 20, 2006 9:06 pm**

Midas wrote: However, do compilers make any attempts to recognise that you're writing self-modifying code and account for this?

I don't think latest C compilers have brains of thier own yet.

But you never know.. it depends on how advance is the compiler, which the question is more compiler specific then generic. The code you given here makes it quite hard for the compiler to detect its self modifying code with type casts. Perhaps though, the feature is more common on higher languages that is typesafe. If the compiler can detect self modifying code I would imagine that compile time would be long, having to search through the address region of any code on every occurance of data being modified. To make things more complicated, what if you have external assembly function that uses data and code in the same segment?

edit: To think of it, I think its probably impossible for a C compiler to detect a self modifying code since compilers only deal with symbols. I dont see it possible to be detected at compile time anyway. It may seem a linker can detect this, however they arn't exactly smart enough to figure what is going on in the code after its already been compiled.

Posted: **Wed Jun 21, 2006 9:45 am**

Thanks for the informative replies, guys. I was aware that doing it in an HLL was a bad idea, but was just curious.

I reckoned that it wouldn't happen - but wasn't entirely certain why, as I have only the vaguest of ideas as to how the compiler works (I know what comes out if I put something in, but don't know how it lexically analyses the code, parses it, whatever). I can see a variety of reasons, but the simplest one that I can see that absolutely puts a bullet in it is the external code section that uses code and data from the same segments.

Thanks!

(This isn't something I've ever wanted to do, it has to be said, JIT compilers feel a bit beyond me at the moment!)

Posted: **Thu Jun 22, 2006 5:09 am**

Another problem could be that the compiler can place the function code inline , and you just modify one of the various copies.

Posted: **Sat Jul 22, 2006 8:24 pm**

the compiler can place the function code inline

Might it still inline the function if you make it [tt]volatile[/tt]? (I think that that just means that it doesn't get optimised away.) Can anyone shed some light?

Posted: **Sun Aug 06, 2006 4:59 am**

couldn't tell for sure about "volatile". I feel like a "volatile function" won't exist in C (though i may be wrong).

Yet if the function is in an external .o, you're pretty certain the compiler cannot inline it, because at the time of generating code, it doesn't know what the function does.

Posted: **Sun Aug 06, 2006 11:00 am**

Pype.Clicker wrote: ... is in an external .o, you're pretty certain the compiler cannot inline it, ...

Until you use a compiler like the microsoft one, which can inline such functions when whole-program-optimisation is enabled...

Posted: **Mon Aug 07, 2006 10:46 pm**

Instead of changing a function that a compiler has produces, why not create your own function.

Moreover create an array of bytes and add your code there. and then set last byte to a return instruction.

For example:

Code: Select all

#define CODE_SIZE 1024
#define OP_NOP 0x90
#define OP_RET_NEAR 0xC3
#define OP_UD2 0x0B0F

int main()
{
   int i;
   unsigned char * CodeData = malloc(CODE_SIZE);

   void (*Func)() = (void (*)()) CodeData;
   for(i=0;i<CODE_SIZE;++i)
      CodeData[i] = OP_NOP;

   CodeData[CODE_SIZE-2] = OP_RET_NEAR;

// if i comment this line out it works
   *((unsigned short *)CodeData) = OP_UD2; 

   Func();
}

@self: I hate vi, when i type in another editor i get in to a habit of type i before every insert and then excape once i have finshed and at the end :wq. I wish the default mode was insert!!!!!!!!!!!!!!!

Posted: **Tue Aug 08, 2006 5:24 am**

coz your allocating memry from non-executable data area. most systems have a clear split of executable memory area and non-exec memory area. you'd cause a fault if you try and run code in the non-exec area.

it would vary from system to system.

Posted: **Tue Aug 08, 2006 8:13 am**

so your only option would be to couple your code-generator function with an ELF-header library so that you write a valid ELF object that you then dynamically load ::)

that's no longer much of "self-modifying" but rather "generated code" though...

Posted: **Wed Aug 09, 2006 4:20 am**

I would imagine this is how JIT compilers would work.

@df: The operating system would have the ability to change the flags of the page, for example windows allows you to change them. BTW with Midas example, if you have to instances of the program, I'm assuming that the operating system would map the same pages of the application code. This means that if you change the function, side effects would occur on the other process.

@Pype.Clicker. I had to read your post many times. If you mean that you would have to create somewhat of a shared library on disc and the dynamically load it. There would be no need to. All you would have to do is modify the buffer then execute it. If you want to be able to save and restore the code, then all would have to do is save the buffer as a flat binary file. To load it you would be as simple as loading the file into the buffer.

@both. Although having said that there are a few things to consider. First both Midas mine's examples would only work on a flat memory model. Second you would have to know something about the processor. The examples would only work on an x86.

Posted: **Fri Aug 11, 2006 4:43 am**

B.E wrote: @Pype.Clicker. I had to read your post many times. If you mean that you would have to create somewhat of a shared library on disc and the dynamically load it.

Yes, that was the idea ... maybe making sure that your system uses something like 'tmpfs' and do not waste time in writing the file to the disk.

There would be no need to. All you would have to do is modify the buffer then execute it. If you want to be able to save and restore the code, then all would have to do is save the buffer as a flat binary file. To load it you would be as simple as loading the file into the buffer.

True. at least as long as your OS doesn't enforce something like "NX" (no-execute) pages to make sure noone overwrites data buffers with code. Most of the security threats came from the lack of such protection (e.g. everytime you see a security alert about a "buffer overflow"), and JAVA JIT compilers typically fail to run on systems with such protection (until they've been ported to use the new "allocate-executable-buffer" or run with root priviledge or whatever).

probably there was too much assumptions in my post. sorry.

OSDev.org

Self modifying code

Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code

Re:Self modifying code