Hi,
Pype.Clicker wrote:Yes, that could be done. That'd look much like a kernel where you'd have e.g. a memman module compiled for i386, another one compiled for i686, etc. and having the loader figure out what to link with what (or having a meta-module sensing the CPU features and exporting a different set of functions).
Though, most of the time, your module has relocation info, right? so there's no need for an actual "jump table": you know by advance where all the function calls are and how to patch the "call xxxxxxx" with the actual location of the callee.
For the kernel API, user-level code needs to know how to access kernel functions. A common way to do this is to have an entry point (software interrupt, SYCALL/SYSENTER, etc) that does something like "call [kernelAPItable + eax * 4]". It's this call table that is wide open for boot-time configuration.
The IDT is also quite similar - for e.g. it's not hard to have several "device not available" exception handlers, and decide which to install depending on whether or not FPU, FXSAVE, MMX/SEE, etc is supported. None of this reduces performance, or requires any boot-time linker or auto-configure mess...
For my OS, there's kernel modules which also use a call table to access each others functions (the "internal kernel API"). On a larger scale this means I can have a "plain paging" module and a "PAE paging" module and still use the same "scheduler" module without caring which paging module is being used. On a smaller scale, it means that a module can select which versions of it's functions to install into the internal kernel API depending on what features, etc are present. Despite this, none of my binaries ever use run-time linking and there is no relocation information, etc. The only disadvantage is the additional cost of indirect calls, and that it requires the "internal kernel API" to be well documented (something I consider a good thing in any case).
Pype.Clicker wrote:I guess arguing more about "we could have something that patches 'bne $+4 mov eax, ebx' into 'cmoveq eax, ebx'" without actually coming with a tool that produces "diffs" between memman-i386.o and memman-i686.o so that we can add this into the .patch-cpu-i686 section of the final "memman.o" is just speaking in the void. So i'll be back on it when i have the "diff" tool -- and the accompanying patcher
Hehehee - I'll expect your return somewhere near the end of the century...
Will your "memman.o" patch also replace occurances of "mov eax,cr3; mov cr3,eax" with "invlpg [???]" instructions, re-optimize register usage now that "eax" isn't used, change all of the relative and fixed offsets (for both code and data) to account for the differences in instruction sizes, and still not mess things up when presented with multi-CPU "TLB shootdown"?
After you've spent many years perfecting this method will anyone be able to notice the performance difference compared to run-time decisions, like:
Code: Select all
if (CPU.features.invlpg == 1) {
invalidate(address);
} else {
reload_CR3();
}
And finally, if you had tools capable of doing this patching now, would you decide to use them or would it increase maintenance and testing hassles too much to consider? I would assume it wouldn't take that long before you're applying several different patches to the same base code, and would need to find any incompatabilities and/or dependancies this causes...
The way I see it, you could spend a little time creating tools to allow very minor changes with very minor performance differences and minor hassles, or you could spend a huge amount of time creating tools that allow large changes with large performance differences and large hassles....
Cheers,
Brendan