OSDev.org

Posted: **Sat Jul 31, 2010 5:36 am**

It seems from all descriptions (and the manual) that one needs to enable paging in order to enable long mode (on x86-64) or is it a way to have paging disabled in 64-bit mode? Is there a known reason why amd chose to do it this way or is it only the fact that the OS-es that is available already used paging anyway?

Posted: **Sat Jul 31, 2010 6:03 am**

Long mode has 48 bit virtual addresses, but 52 bit physical addresses.

Therefore, paging is required.

(I also expect that, in a few years, we will start to see things like graphics cards getting moved up to the top of the address space as they outgrow the <4GB hole)

Posted: **Sat Jul 31, 2010 6:14 am**

Owen wrote:Long mode has 48 bit virtual addresses, but 52 bit physical addresses.

Therefore, paging is required.

(I also expect that, in a few years, we will start to see things like graphics cards getting moved up to the top of the address space as they outgrow the <4GB hole)

But some 32-bit processors have 36-bit physical addresses, but still 32-bit modes without paging is not required (also 52-bit physical addressing would be possible anyway since long mode have 64-bit registers).

Posted: **Sat Jul 31, 2010 6:21 am**

skyking wrote:
Owen wrote:Long mode has 48 bit virtual addresses, but 52 bit physical addresses.

Therefore, paging is required.

(I also expect that, in a few years, we will start to see things like graphics cards getting moved up to the top of the address space as they outgrow the <4GB hole)
But some 32-bit processors have 36-bit physical addresses, but still 32-bit modes without paging is not required (also 52-bit physical addressing would be possible anyway since long mode have 64-bit registers).

Much of the circuitry involved in address calculations lacks the upper 16 bits. Much of the circuitry required for operating with paging disabled (i.e. bypass for bits above 31) is missing.

This greatly simplifies the processor.

Basically:

Every operating system uses paging
Therefore unpaged addressing is not required
Therefore don't implement it and simplify the processor (Save $$$ in design, manufacturing and testing)

(And by the way: You can use all the processor's address bits via PAE. Unpaged access primarily exists these days for backwards compatibility.)

Posted: **Sat Jul 31, 2010 9:09 am**

skyking wrote:It seems from all descriptions (and the manual) that one needs to enable paging in order to enable long mode (on x86-64) or is it a way to have paging disabled in 64-bit mode? Is there a known reason why amd chose to do it this way or is it only the fact that the OS-es that is available already used paging anyway?

I guess you are talking about the sense of paging. I guess you are asking yourself why they do not make an CPU with direct flat access to the 64bit address. The answer - this is the situation. This is what they have decided, and we have to accomplish with this requirements. Personally, sometimes I am asking myself why they just don't make CPUs of 10ghz instead of multiple cores of lower potencies, but this is the situation and this is the expansive idea of the newer processors: more cores instead ghz. There must be some good reason for things to be done this way, you can be sure.

Good luck!

Posted: **Sat Jul 31, 2010 10:09 am**

nikito wrote:Personally, sometimes I am asking myself why they just don't make CPUs of 10ghz instead of multiple cores of lower potencies...

That can be answered in a single word: Physics.

Modern CPUs are walking the line of what's possible. You can't get much smaller structures that you have today, due to the limits of the production process and due to quantum effects playing into it the smaller the structures get. But if you want to make CPUs even faster, you have to make the structures smaller, or your die will simply melt under the heat.

Besides, the usual workload of today's computers benefits more from multiple cores than from a single high-speed CPU. It's very seldom that you have one thread that requires all the oomph your system can provide, and it will become even more so in the future when programmers finally realize that the time of single-threaded apps is over.

Posted: **Sat Jul 31, 2010 10:28 am**

skyking wrote:It seems from all descriptions (and the manual) that one needs to enable paging in order to enable long mode (on x86-64)

Yes it is required and mandatory. A must be.

or is it a way to have paging disabled in 64-bit mode?

No, there is no way to avoid paging on x86-64 long mode

Is there a known reason why amd chose to do it this way

No. There is no logical reason.

or is it only the fact that the OS-es that is available already used paging anyway?

This is the most likely "guessed" explanation. Profit making dictates that you remove circuits that are no used by mainstream OSes.

Posted: **Sat Jul 31, 2010 10:46 am**

Owen wrote:
Much of the circuitry involved in address calculations lacks the upper 16 bits. Much of the circuitry required for operating with paging disabled (i.e. bypass for bits above 31) is missing.

Not true if you know electronics. Paiging is just complicating things a lot by adding a few extra layers of adders and delays into the circuit.

The 64 bits have to be in there in the registers and temp buffers and adders anyway and the address adders must be able to perform 64 bits additions anyway ... not to mention 128 bits or more arithmetics on SSE.

However -- slow as it is -- paging also helps a lot in memory management algorithms and this is what the software market demands today in x86 world.

This greatly simplifies the processor.

Not exactly. It is true that removing a "mode of operation" does simplify the CPU BUT unfortunately this simple mode does not "greatly simplify"... it simplifies just a very little bit.

Alternatively, dropping paging would have simplified a HUGE ammount of circuits and it would have speeded up the CPU a lot also.

Unpaged access primarily exists these days for backwards compatibility.)

Not true. I am unaware of any mainstream commercial 32 bits OS that requires non paging direct memory access (aka a single address space OS).

Some research does exist and some hobby OS do it ... like my own Solar_OS but other that this there is NO backward compatibility issue here. (no relation with "Solar" user here on this forum just an old name clash)

In fact this "paging" is an obsolete and backward compatible 1970 like technology (not necesarely bad... but old).

The new segementation or SAS concepts did not had sucess in software developers world and because of this they got dropped by x86-64 hardware creators...

Do not worry... probably those advanced concepts will get resurected in the future when people will be ready for them or maybe they are already partly resurected in Sun's CPU new "zones" concept

Posted: **Sat Jul 31, 2010 11:01 am**

Solar wrote:
nikito wrote:Personally, sometimes I am asking myself why they just don't make CPUs of 10ghz instead of multiple cores of lower potencies...
That can be answered in a single word: Physics.

Correct. We have reached the limits of our technology and we can not do much better. hence the vendors want to maintain profit and sell us the "next best thing" and that is having multiple cores on the same CPU. A kind of 4 in one or 8 in one product sales trick.

Besides, the usual workload of today's computers benefits more from multiple cores than from a single high-speed CPU.

Not true. It benefits only because of bad initial design (too much multi threading in their mind set)

It's very seldom that you have one thread that requires all the oomph your system can provide,

Not really. Most of the important algorithms are sequential in nature and can not be addapted to paralelization since they need the previouse result in order to obtain the next step in the algorithm.

The very few ones that can benefit from paralelization will also have to handle the "doom" of a new kind of bug: the time syncronization almost impossible to reproduce and debug kind of bugs that arise from real multiple cores as opposed to single cores simulating multithreading.

and it will become even more so in the future when programmers finally realize that the time of single-threaded apps is over.

Well... everybody in proffesional software development knows this already .... We will do what we can with the curent limits of our technology and this multi threading common mindset ... (that is wrong IMHO)

But since many important algorithms are serial and non-paralelizable in nature we will just "steal our own hat" and pretend we are inteligent and go for this multi core "trend".

While in the "background task" we are waiting for the next technological breaktrought that will once again enable the dreams for 1000Ghz+ CPU's ... if ever

).

-------
PS: I do work in a domain where we do benefit a lot from multicore and the tasks are kind of independent and hence paralelizable... hence I am lucky and happy with this multicore trend... BUT I am objective and I can also see many non paralel algorithms.

Posted: **Sat Jul 31, 2010 11:15 am**

Solar wrote:That can be answered in a single word: Physics.

Modern CPUs are walking the line of what's possible. You can't get much smaller structures that you have today, due to the limits of the production process and due to quantum effects playing into it the smaller the structures get. But if you want to make CPUs even faster, you have to make the structures smaller, or your die will simply melt under the heat.

Being a physicist, I have a small comment on this.

The reason that faster CPUs (i.e. CPUs with a higher clock rate) must be smaller is not heat dissipation, but the finite signal speed: No signal can be faster than light, and if you have a 10GHz CPU, a signal can travel only 3cm within a clock cycle. That means that the whole CPU must be significantly smaller than 3cm, otherwise you have to deal with synchronisation problems.

Indeed, heat dissipation is a problem, and it gets even worse when CPUs get smaller. The heat dissipation is limited by the surface of the die, and a smaller CPU can dissipate less power.

Posted: **Sat Jul 31, 2010 12:09 pm**

But smaller structures allow dies that generate less heat.

@ bontanu: I don't necessarily mean multithreading within a single application. I mean that I have, in ten years in the trade, not seen a single server that had only one CPU-bound process running on it - several users are running their service at the same time, or there's more than one job to be done. It's a bit different for workstations, but not much.

Posted: **Sat Jul 31, 2010 12:38 pm**

bontanu wrote:
Owen wrote:
Much of the circuitry involved in address calculations lacks the upper 16 bits. Much of the circuitry required for operating with paging disabled (i.e. bypass for bits above 31) is missing.
Not true if you know electronics. Paiging is just complicating things a lot by adding a few extra layers of adders and delays into the circuit.

Complicating the processor if the application needs/benefits from it is not an issue

The 64 bits have to be in there in the registers and temp buffers and adders anyway and the address adders must be able to perform 64 bits additions anyway ... not to mention 128 bits or more arithmetics on SSE.

Why must the address adders be able to perform 64 bit addition? You can cut out most of the logic for the bits above 47, reduce gate delayes and speed things up

However -- slow as it is -- paging also helps a lot in memory management algorithms and this is what the software market demands today in x86 world.

This greatly simplifies the processor.
Not exactly. It is true that removing a "mode of operation" does simplify the CPU BUT unfortunately this simple mode does not "greatly simplify"... it simplifies just a very little bit.

Alternatively, dropping paging would have simplified a HUGE ammount of circuits and it would have speeded up the CPU a lot also.

Every mode switch requires circuitry to appropriately maintain the pipelines to preserve the image of in-order execution. That's a non-trivial cost, particularly in testing

Unpaged access primarily exists these days for backwards compatibility.)
Not true. I am unaware of any mainstream commercial 32 bits OS that requires non paging direct memory access (aka a single address space OS).

Some research does exist and some hobby OS do it ... like my own Solar_OS but other that this there is NO backward compatibility issue here. (no relation with "Solar" user here on this forum just an old name clash)

In fact this "paging" is an obsolete and backward compatible 1970 like technology (not necesarely bad... but old).

The new segementation or SAS concepts did not had sucess in software developers world and because of this they got dropped by x86-64 hardware creators...

Do not worry... probably those advanced concepts will get resurected in the future when people will be ready for them or maybe they are already partly resurected in Sun's CPU new "zones" concept

Segmentation will only be resurrected if you kill C, C++ and Fortran. Well... C was the fastest growing programming language this year. I think you may be waiting a while here...

Even on new architectures, backwards compatibility is king.

As for my claim of unpaged access existing for backwards compatibility - you just proved my point! No mainstream OS requires it, therefore there is no reason to complicate the processor supporting it in new models.

Solar wrote:But smaller structures allow dies that generate less heat.

@ bontanu: I don't necessarily mean multithreading within a single application. I mean that I have, in ten years in the trade, not seen a single server that had only one CPU-bound process running on it - several users are running their service at the same time, or there's more than one job to be done. It's a bit different for workstations, but not much.

You would think so. They actually often generate more - the smaller structures mean we are now at the point where transistors can't actuate each other into fully saturated/desaturated states - i.e. they are always generating heat - without interconnect damage. There are other issues like quantum tunnelling affecting things too.

At the 90nm node this was half of all the power most ICs were generating. Some process improvements have reduced this issue, but shrinks will increase the minimum leakage for the process, and there is nothing that can be done to stop it.

Most die shrinks are about cramming everything together to reach latency requirements and increase speed. For example, a die shrink may allow you to trim a cycle or two off the time taken for a cache snoop (and therefore reduce memory latency)

Posted: **Sat Jul 31, 2010 12:51 pm**

Solar wrote:But smaller structures allow dies that generate less heat.

Of course, that's right. But that doesn't limit the overall size of the die. One could easily spread those small structures over a large die, so they have enough space inbetween for dissipating heat. This is certainly inefficient as hell, but not a physical constraint.

Posted: **Sat Jul 31, 2010 1:22 pm**

Solar wrote: @ bontanu: I don't necessarily mean multithreading within a single application. I mean that I have, in ten years in the trade, not seen a single server that had only one CPU-bound process running on it - several users are running their service at the same time, or there's more than one job to be done. It's a bit different for workstations, but not much.

Oh I see. Mea culpa, In this case I completely agree with you.

Multi-core does help a lot in this areas of servers serving multiple users and/or multiple applications performing relatively independent tasks in the same time. Workstations benefit also because even there there are quite a few applications and drivers that run in parallel.

Additionally it also helps a lot if you can remove other I/O bottleneck check points like: multiple separated RAM's/caches; multiple network cards, multiple hard drives, etc.

I also estimate that in time -- as the CPU cores count increases -- we will move away from the "thread" as an abstraction model for multitasking. We will replace threads with "processes" and in time we will be running a single process per core in a single address space.

This will make memory protection "as we know it" obsolete for most cases and will most likely also remove "the need for paging" as we know it.

Instead it will be replaced with configurable "unbreakable zones" for each slave CPU and a master CPU that controls the tasks executed and configures the zones for each slave CPU/core.

Posted: **Sat Jul 31, 2010 2:13 pm**

Owen wrote: Complicating the processor if the application needs/benefits from it is not an issue

Yes... but it is not really needed just an old custom and it does occupy a lot of area in the CPU core and as the number of cores does increase this area does become a drag.

And there is another problem: those additional layers do reduce speed by approximatively 15% - 30% and in a world where the CPU frequency has reached the top limits this is a further problem.

My estimate is that paging and memory protection as we know it today will be dropped and instead we will see many more simple and faster cores that can be configured to execute a single process into a "zone" by a master CPU.

This trend is already visible.

Why must the address adders be able to perform 64 bit addition? You can cut out most of the logic for the bits above 47, reduce gate delayes and speed things up

I suggest that you understand electronics and CPU architecture better.

The address calculations for the instructions and operands are performed BEFORE the paging mechanism by the CPU.

Paging is only a translation mechanism that converts a logical address into a physical address.

Hence if you think a little you will observe that the adders already perform full 64bits additions (and yeah this does slow down 64 bits long mode)

For your example: mov rax, [rsi + 4*rbx + my_offset_32]

Here I can setup values in RSI and RBX that added together will generate an linear address of your choice... with whatever bit set or reset... obviously the adder must be able to perform the addition (think about LEA) even IF later on the paging mechanism will generate an address that is not canonical and eventually an exception.

Besides the current address limitations will most likely be removed and they will not redesign the CPU with each bit of address added.

Paging in not there to simplify the CPU. Limits on the physical address canonical form are but they will be slowly removed.

Bits propagate with adders / carry and NO you can not cut off bits above 47

)

Every mode switch requires circuitry to appropriately maintain the pipelines to preserve the image of in-order execution. That's a non-trivial cost, particularly in testing

Yes but the mode change is only done once at OS startup and left in place for ever after. The circuits are bad at predicting this even today. You are required to help them a little by writing bits in control registers and performing a long jump. It is not something that is expected to happen at every instruction... not even twice in an hour ... hence not a big problem.

Stop trying to justify forcing paging on long mode from an architectural point of view. There is no logical reason to have paging forced in there... nothing other than a very small benefit and mostly because "AMD said so" when they designed long mode.

Segmentation will only be resurrected if you kill C, C++ and Fortran.

Well you do not understand the concept of segmentation.

It is already "resurrected" in the new Cell CPU's and in Sun's CPU's. They have a "zone" that is pretty much the same as a segment (and more) reserved for the process / CPU. yeah the compilers have to adapt a little (more for Cell but less for SUN)... Not much to change if the segments are FLAT. Practically the application does not know or care but the security is better.

Our applications compile in C/C++ with no problem on Sun CPU's... no change is required in code for this feature hence I think you are considering segmentation like something you have learned about the 8086 real mode in 1990 ...
those things evolve you know

Hence there is no connection with killing C/C++ or whatever HLL programming language of your choice.

Well... C was the fastest growing programming language this year. I think you may be waiting a while here...

I do program every day in C ... it is my job

Even on new architectures, backwards compatibility is king.

Yes of course unless it is not... modern segmentation does not break any backward compatibility. Paging is also transparent to applications and even to ASM code hence with or without it the application land does not care...

But you do have a point in the fact that most mainstream OS rely heavy on paging for memory management. However this can be changed underneath (as IBM does or did it) and the gain in CPU speed and simplicity might be worth complicating the OS memory management code a little... and it will allow more cores per CPU ... we will see.

As for my claim of unpaged access existing for backwards compatibility - you just proved my point! No mainstream OS requires it, therefore there is no reason to complicate the processor supporting it in new models.

Yes, I agree that probably this was the reason behing AMD dropping it. And yes there is a minimal gain... a very small gain in the front end of the CPU.

But your claim that it complicates things does not stand beacuse this "mode" is the simple base on witch the "paging" mode is build.

Hence long mode is just do not giving the "user" access to the more simple mode but it has to have this mode inside the CPU as a basement for the more complicated paging mode to exist

I will give you a much "better" argument: you can enable paging BUT make it identity mapped in order to simulate non-paging access ...

(of course that this also has a problem but a much smaller one)

Really, paging eats up a lot of resources in both speed / latency / delays and in CPU die areas ... and this is NO longer acceptable with the top limit we have on CPU speeds... hence in time it will disappear. The Simple Address Space research made by Microsoft with Singularity shows this trend.

In consequence I would not base my OS architecture on paging and /or on memory protection schemes of today.

OSDev.org

Why does long mode require paging.

Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.

Re: Why does long mode require paging.