You vastly overestimate the impact paging has on speed: Next to none. It adds latency, yes, but does not reduce speedbontanu wrote:Yes... but it is not really needed just an old custom and it does occupy a lot of area in the CPU core and as the number of cores does increase this area does become a drag.Owen wrote: Complicating the processor if the application needs/benefits from it is not an issue
And there is another problem: those additional layers do reduce speed by approximatively 15% - 30% and in a world where the CPU frequency has reached the top limits this is a further problem.
My estimate is that paging and memory protection as we know it today will be dropped and instead we will see many more simple and faster cores that can be configured to execute a single process into a "zone" by a master CPU.
This trend is already visible.
Please stop assuming I don't understand electronics. I do, very well.bontanu wrote:I suggest that you understand electronics and CPU architecture better.Why must the address adders be able to perform 64 bit addition? You can cut out most of the logic for the bits above 47, reduce gate delayes and speed things up
The address calculations for the instructions and operands are performed BEFORE the paging mechanism by the CPU.
Paging is only a translation mechanism that converts a logical address into a physical address.
Hence if you think a little you will observe that the adders already perform full 64bits additions (and yeah this does slow down 64 bits long mode)
For your example: mov rax, [rsi + 4*rbx + my_offset_32]
Here I can setup values in RSI and RBX that added together will generate an linear address of your choice... with whatever bit set or reset... obviously the adder must be able to perform the addition (think about LEA) even IF later on the paging mechanism will generate an address that is not canonical and eventually an exception.
Besides the current address limitations will most likely be removed and they will not redesign the CPU with each bit of address added.
Paging in not there to simplify the CPU. Limits on the physical address canonical form are but they will be slowly removed.
Bits propagate with adders / carry and NO you can not cut off bits above 47 )
You can set up the logic in order to detect faults without calculating the results. This can be very fast. The knowledge required to do such things is very well known in the semiconductor industry, and I would expect AMD and Intel to have it.
As for performance? 64 bit arithmetic does not affect load-store performance. Most of the maths is done in parallel with fetching the TLB entries. The other bits only need to match the latency of the TLB. You can delay taking the exception for quite a while.
You're missing something: Intel and AMD have to test their CPUs to make sure that under no conditions does a valid instruction cause an error. In other words, that an OS generally does something only at CPU startup does not reduce testing load. Prediction does not come into this; enforcing full pipeline serialization does.bontanu wrote:Yes but the mode change is only done once at OS startup and left in place for ever after. The circuits are bad at predicting this even today. You are required to help them a little by writing bits in control registers and performing a long jump. It is not something that is expected to happen at every instruction... not even twice in an hour ... hence not a big problem.Every mode switch requires circuitry to appropriately maintain the pipelines to preserve the image of in-order execution. That's a non-trivial cost, particularly in testing
Stop trying to justify forcing paging on long mode from an architectural point of view. There is no logical reason to have paging forced in there... nothing other than a very small benefit and mostly because "AMD said so" when they designed long mode.
Cell's design has nothing like CPUs or segmentation; nor is the SPE design general. You might see SPE style coprocessors developing, but they'll have limited niche.bontanu wrote:Well you do not understand the concept of segmentation.Segmentation will only be resurrected if you kill C, C++ and Fortran.
It is already "resurrected" in the new Cell CPU's and in Sun's CPU's. They have a "zone" that is pretty much the same as a segment (and more) reserved for the process / CPU. yeah the compilers have to adapt a little (more for Cell but less for SUN)... Not much to change if the segments are FLAT. Practically the application does not know or care but the security is better.
Our applications compile in C/C++ with no problem on Sun CPU's... no change is required in code for this feature hence I think you are considering segmentation like something you have learned about the 8086 real mode in 1990 ...
those things evolve you know
Hence there is no connection with killing C/C++ or whatever HLL programming language of your choice.
The general purpose CPU rules.
As to Sun's "zones": Care to link to a product? This is certainly not a feature of any product on their roadmap. Unless you're referring to Solaris zones (which would mean Sun were being very confusing); they are a completely unrelated product.
Paging's effect on memory bandwidth is laughably small; CPUs have optimized TLBs, and there are various paging systems inside modern CPUs in order to maximize performance. A reliable OS cannot be built without paging unless you use managed code, and unmanaged code is still king.bontanu wrote:I do program every day in C ... it is my jobWell... C was the fastest growing programming language this year. I think you may be waiting a while here...
Yes of course unless it is not... modern segmentation does not break any backward compatibility. Paging is also transparent to applications and even to ASM code hence with or without it the application land does not care...Even on new architectures, backwards compatibility is king.
But you do have a point in the fact that most mainstream OS rely heavy on paging for memory management. However this can be changed underneath (as IBM does or did it) and the gain in CPU speed and simplicity might be worth complicating the OS memory management code a little... and it will allow more cores per CPU ... we will see.
Yes, I agree that probably this was the reason behing AMD dropping it. And yes there is a minimal gain... a very small gain in the front end of the CPU.As for my claim of unpaged access existing for backwards compatibility - you just proved my point! No mainstream OS requires it, therefore there is no reason to complicate the processor supporting it in new models.
But your claim that it complicates things does not stand beacuse this "mode" is the simple base on witch the "paging" mode is build.
Hence long mode is just do not giving the "user" access to the more simple mode but it has to have this mode inside the CPU as a basement for the more complicated paging mode to exist
I will give you a much "better" argument: you can enable paging BUT make it identity mapped in order to simulate non-paging access ... (of course that this also has a problem but a much smaller one)
Really, paging eats up a lot of resources in both speed / latency / delays and in CPU die areas ... and this is NO longer acceptable with the top limit we have on CPU speeds... hence in time it will disappear. The Simple Address Space research made by Microsoft with Singularity shows this trend.
In consequence I would not base my OS architecture on paging and /or on memory protection schemes of today.
As for the latency issues of paging? For most code, about 1 or 2 cycles. Maybe 10 if you have to hit L2 cache. When you're talking about memory latencies of 100s to 1000s of cycles... well, its in the noise floor.