Brendan wrote:I think you're confusing "statically allocated virtual space" with "statically allocated physical pages". If an OS uses statically allocated physical pages and maps them at random virtual addresses (e.g. KASR without PASR) then this would be a potential problem. If an OS uses randomly allocated physical pages and maps them at statically allocated virtual addresses (PASR without KASR) then this isn't a potential problem.
I indeed required originally that the physical allocator is deterministic, or at least offers a large page allocation interface to the user process. However, in my last proposal, the requirement is only that the kernel pages are resident and stationary, which is not unreasonable in general.
Here is another even more concise illustration of the idea. Let's say a process calls the kernel API GetRandomInteger 1000 times, and concludes that after this call some 20 virtual addresses accesses (in different pages of that process) are consistently slower. From now on, every time 18 or more of those 20 pages generate latencies, the process will assume that someone has called GetRandomInteger. That is the gist of it. In order for this to work, the program must guarantee the 20 pages it owns (not their kernel counterparts, which are most likely resident anyway) stay resident in memory and are not relocated. A high enough access frequency should avoid paging, so unless the kernel actively relocates pages as a security measure, or there is an extreme memory pressure, there wont be a problem. We also need regularly occurring eviction from cache, to force a direct RAM access. This may be possible by vacuously writing to the process memory in high volume, assuming that the CPU load is favorable, the monitored event frequency (GetRandomInteger in our case) is sufficiently low, and preferably that write combining optimization is available. Otherwise a system-wide cache flushing must be somehow triggered, which if not possible foils the plan. The information obtained in this way, I imagine, can narrow down a brute-force cryptographic attack, but don't quote me on that.
Brendan wrote:Essentially, a thread would tell kernel "do this list of things" and the kernel would split more expensive things into smaller pieces and give all the smaller pieces a priority (that is derived directly from the priority of the thread that asked the kernel to do the work); and all of the CPUs would do "highest priority piece first" in parallel (whenever running a user-space thread isn't more important); so that something like creating a process might use 5 CPUs at the same time (and finish faster). For dependencies; there's no problem - if "small piece B" can't be started until "small piece A" has finished, then initially only "small piece A" would be put onto the kernel's work queue, and when "small piece A" finished the kernel would put "small piece B" onto the kernel's work queue. Of course with many user-space threads running on multiple CPUs (plus work the kernel does on its own) the kernel's work queues end up being a mixed bag of many small unrelated pieces.
I remember your mention of that scheme in a different post and it seems suitable to your architecture. I am slightly curious why do you emphasize the performance of the service calls. I imagine process synchronization and communication will be the most demanding kernel activities. However, if the primitives are user-space objects, only blocking and unblocking during contention for the mutex case, and starvation for the producer-consumer case, would require kernel involvement. And I ponder if frequent blocking under full workload is not essentially a software issue - insufficient parallelism, excessive parallelism, or excessive synchronization. That is, can't it be argued that in a perfect world, performance sensitive software should not trigger these events frequently. It does in practice. There is also memory management, process and thread management, preemption. I would consider those rare in comparison. And interrupt services are not in the same basket, I believe. (They cannot be interleaved with the other tasks.)
Brendan wrote:With all of this in mind; what if I added a random "boost or nerf" adjustment to the priority of the smaller pieces (instead of the priority of the smaller pieces being derived directly from the priority of the thread that asked the kernel to do the work)? With a very minor change, the kernel would execute the "mixed bag of many small unrelated pieces" in a semi-randomised order.
How would one calibrate correctly the standard deviation of the boost? I mean, it sounds sensible, but can it ever be proven correct and random enough?
Brendan wrote:Last year I was experimenting with graphics; and wanted to convert a buffer of pixel data that was in a standard format (96-bits per pixel in the CIE XYZ colour space, with no gamma) into whatever the video mode and monitor happens to want (e.g. maybe 24-bit per pixel in one of the RGB colour spaces, with gamma), including doing dithering. Normal optimised assembly wasn't really fast enough (there were too many branches and too many memory accesses for local variables) so my solution was dynamic code generation. Specifically, a collection of pre-optimised snippets that would be stitched together and then patched, so that half the branches disappeared and half of the local variables effectively became constants.
The performance improvement from code generation is understandable. On the other hand, prepackaged code similarly has to make unnecessarily generalizations about the execution environment.
Indeed, patience testing build times are not unusual for larger projects. But those involve source parsing and a whole program optimization. Ahead of time compilers optimize the entire code, despite the fact that the majority of it is executed in a 1:100 ratio. (Although sometimes performance can be more distributed admittedly.) What JIT can do instead is perform a generic first compilation pass, usually on-demand to reduce load times, and then follow it by optimizing pass for hot spots detected at run-time. Those hot spots are far fewer and smaller (even if a few calls deep) then the scope of a link-time optimization pass. So, they should compile within the first seconds of execution. Meanwhile the slow generic version of the code would be running. The code can also be cached and compilation needn't restart every time the software loads.
Also, some of the work is done at the development site. Like source parsing and most of the global and local optimizations, i.e. inlining, constant folding, common subexpression elimination, loop-invariant code motion, etc. Backend tasks, such as register allocation, vectorization, instruction scheduling, will be done JIT at the user site. It can also try to further inline one component's functions in a different component during hot spot optimization. This need not be done globally, as the goal is to eliminate the ABI cost, to eliminate dead code branches from argument checks, etc.
So, JIT may be slower all things equal, but 500%? And it can leverage other tactical advantages. We are not talking about managed languages in general here, though. They suffer from lack of concise memory layout (as every object lives in a separate allocation) and can introduce sparse access patterns. This relates to automatic memory management and the object-oriented facilities. Also, managed languages are heavier on dynamic dispatch, especially when the programmers don't bother with type sealing.
Brendan wrote:You're mostly saying that self modifying code is less secure (which is arguably correct); and then assuming that AI always means "intelligently self modifying". From my perspective; "intelligently self modifying" is the least plausible scenario. For example; the idea of using an extremely expensive and extremely unintelligent "try all the things until something works" approach every time anyone decides to (e.g.) open a text editor (and then not saving the generated code after you've done the extremely expensive/unintelligent code generation, and not ending up with "AI generated code that is not self modifying at all") is far fetched at best.
Well, I am talking about either self-modifying code, or liberal use of code-generation in software. AI is just a context, which draws my curiosity in regard to the security aspect, but is otherwise irrelevant. More concretely, the question is how freely the software should be allowed to dynamically generate control flow logic. If we reach a solution for AGI someday, possibly even after our lifetime - by the same line of reasoning, our security principles will be completely inapplicable to it.
Brendan wrote:So, given that saying "don't worry, you'll do better next time" while patting them on the back in sympathy has failed to make any difference for decades; how can we encourage people to try harder to avoid releasing exploitable software? Threatening to destroy their business (by blacklisting all their products and guaranteeing zero sales forever) is one way to encourage prevention (but perhaps that's a little extreme?).
Software correctness is plain difficult. It is true that the companies are complacent, and that the market expectations have been molded. But with the present day tools, correctness is far too costly to be feasible, as an absolute goal. The skills and processes are insufficient to make it viable. My opinion is that unless we evolve the tools, defect-free software will be financially unfeasible. We will continue to take risks with defect-laden products instead. Or alternatively, if we take the stance that only non-exploitable software will be viable in the hostilities of our cyber-future, software will simply not be viable at all, unless revolutionary changes in the way we develop software take place.