General protection in 64bit mode (Intel)

nulik · Post by **nulik** » Fri Oct 21, 2011 7:01 am

Combuster wrote:Basically, your assumption only holds for simple embedded devices

it is not my assumption, it is my goal. It is not an OS for desktops.

nulik · Post by **nulik** » Fri Oct 21, 2011 7:12 am

Casm wrote:
bluemoon wrote: It should be at least a little while before anybody needs more than the 64Gb they can get with physical address extension in 32 bit mode.

32bit is not a solution either for me, I will lose additional registers including SSE, and I need those.

bluemoon · Post by **bluemoon** » Fri Oct 21, 2011 7:21 am

It worth consider that raw access speed of individual bytes in memory is less meaningful to the overall performance.
Without paging it is most likely that a system need many memory copy everywhere, thus the overall workload increases.

Chandra · Post by **Chandra** » Fri Oct 21, 2011 7:26 am

32bit is not a solution either for me

Are we allowed to know what solution fits for you? Did you reckon the advantages of paging?

Besides, what trouble are you having setting up paging?

nulik · Post by **nulik** » Fri Oct 21, 2011 7:52 am

Chandra wrote:
32bit is not a solution either for me
Are we allowed to know what solution fits for you? Did you reckon the advantages of paging?

Besides, what trouble are you having setting up paging?

No troubles yet. I just wanted to do segment limit checking in 64bits (because it is faster). but as I see it is not possible.

And I want to note that the removal of this feature has a big impact on some applications, for example VMWare wasn't be able to run in 64bit mode until virtualization instructions were released in next processor revisions:

http://www.pagetable.com/?p=25

Regards

Solar · Post by **Solar** » Fri Oct 21, 2011 8:05 am

@ nulik: Dear Sir, if you look into the mirror, do you - by any chance - look somewhat alike rdos, perhaps with a sock pulled over his hand?

I just ask because you are the second person insisting on using segmentation instead of paging for protection, and using quite similar arguments (pointing to years-old papers).

Segmentation "being faster"? Assumptions and presumptions prove nothing. Blaming the CPU designers for taking away your toys solves nothing. It is as it is, cope, or design your own CPU, and toolchain, and language, capable of coping with segmentation.

Oh wait, you want to code your OS in ASM only, too, do you?

As for the VMware article, did you catch the last paragraph where the author shakes his head in wonder why VMware didn't use paging for protection in the first place...?

rdos · Post by **rdos** » Fri Oct 21, 2011 8:18 am

Solar wrote:@ nulik: Dear Sir, if you look into the mirror, do you - by any chance - look somewhat alike rdos, perhaps with a sock pulled over his hand?

I just ask because you are the second person insisting on using segmentation instead of paging for protection, and using quite similar arguments (pointing to years-old papers).

But I've used paging for 23 years. I use both paging and segmentation.

Solar wrote:Segmentation "being faster"? Assumptions and presumptions prove nothing. Blaming the CPU designers for taking away your toys solves nothing. It is as it is, cope, or design your own CPU, and toolchain, and language, capable of coping with segmentation.

Segmentation is a debugging-tool primarily. And it avoids putting device-drivers in their own address-spaces. As soon as the debugging-phase is over, much of the segment-based protection can be disabled. I think I posted code on just how to do this previously for ACPI.

Solar wrote:Oh wait, you want to code your OS in ASM only, too, do you?

I will most likely write some major pieces of C-code in the kernel in the near future. Not that I will ever provide the scheduler or memory manager in C, but some fairly complex device-drivers would quite likely be in C or C++.

Solar · Post by **Solar** » Fri Oct 21, 2011 8:34 am

Uh... I understood your memory model to be non-flat?

rdos · Post by **rdos** » Fri Oct 21, 2011 9:02 am

Solar wrote:Uh... I understood your memory model to be non-flat?

The kernel device-driver model uses one code selector and one data selector (DGROUP) per device-driver for protection. All pointers to data are 48-bit. Today all applications are flat (slightly modified PE-format), but segmented applications are still supported. Paging is used for virtual/physical memory allocation, and for running applications in separate address spaces.

DavidCooper · Post by **DavidCooper** » Fri Oct 21, 2011 12:53 pm

I don't use paging at all at the moment, but I certainly do plan to add it at some stage so that when the OS runs out of usable memory space due to fragmentation it will be able to switch to using paging - that will slow it down a little (I don't know how much though), but it will enable more things to be packed into the extra space made available and it is well worth compromising on raw speed at that point. However, it seems to me that 3GB of usable space is a massive amount to play with, so if your OS the apps you're using are guaranteed safe and stable and you don't need any protection features, why use paging before you're actually forced to? If the speed gain is significant, it's clearly worth having. Exactly the same would apply to running in 64-bit mode if the facility to run in that mode without paging had been provided.

gerryg400 · Post by **gerryg400** » Fri Oct 21, 2011 3:06 pm

rdos wrote:Even if it were 30 years old it would still show the obvious that a system without paging is faster than a system with paging. I don't understand how anybody can argue otherwise.

I can argue it like this:

The key here is the word 'system'. Faster can be measured lots of ways. The perceived speed of modern operating systems benefits from the OS features that can be built on paging like mmap-ing files, demand loading executable images, copy-on-write, forking, passing data by paging, remapping DMA buffers, etc. The speed-up provided by these features outweighs the occasional slow-ups caused by having to do page translations.

There may be a penalty at the single instruction level (and be sure this is mitigated by the TLB's working in parallel with the rest of the core and other tricks) but overall the system speed of the operating systems of the primary OS vendors using chips supplied by Intel and AMD is faster with paging.

If your OS really doesn't need paging and the OS features that can be built on it then I guess none of this argument applies. But for most of us it certainly does apply.

nulik · Post by **nulik** » Fri Oct 21, 2011 7:52 pm

gerryg400 wrote:]I can argue it like this:

The key here is the word 'system'. Faster can be measured lots of ways. The perceived speed of modern operating systems benefits from the OS features that can be built on paging like mmap-ing files, demand loading executable images, copy-on-write, forking, passing data by paging, remapping DMA buffers, etc. The speed-up provided by these features outweighs the occasional slow-ups caused by having to do page translations.

It is not occasional, it should be happening a lot, given the size of the TLB.

From Intel's 64-ia-32 optimization manual (page 2-19):

DTLB can perform three linear to physical address translations every cycle, two for load address and one for a store address. If the address is missing in the DTLB, the processor looks for it in the STLB, which holds data and instruction address translations. The penalty of a DTLB miss that hits the STLB is seven cycles.

Here you lost 7 cpu cycles.

Page 8-26

The next largest set of memory access delays are associated with the TLBs when linear-to-physical addresss translation is mapped with a finite number of entries in the TLBs. A miss in the first level TLBs results in a very small penalty that can usually be hidden by the OOO execution and comipler's scheduling. A miss in the shared TLB results in the Page Walker being invoked and this penalty can be noticeable in the execution.

Now here you will lose a lot more. To read a cache line the processor may take about 200 clock cycles if it reads memory contigously and the memory latency is hidden.

So, it is 207 cycles against 1 cycle to use physical addresses directly without any paging.

For example, to open the web page of this thread the server probably incurred in tens of page walks. There are memory copies from disk to memory, then from user app (web server) to the kernel, from kernel, it is copied inside to the device buffer, and from the buffer it goes to the NIC and then to your PC over a network. Multiply this by the thousands of users who open this webiste. If someone would calculate how much clock cycles are wasted in supporting virtual memory that is no longer needed because memory is very cheap , he could probably shot himself in the head.

nulik · Post by **nulik** » Fri Oct 21, 2011 8:00 pm

So, it is 207 cycles against 1 cycle to use physical addresses directly without any paging.

Ahh! And I forgot that by doing page walk , it may evict your variables from the cache, so it is probably much worse, since another 40 cycles maybe wasted for access to bring your lost variables back to the L1 cache from L2 cache (if they are still there)
Note, I am not even giving you the worst scenario and we are talking about 247 cycles here, against only 1 cycle.

Rusky · Post by **Rusky** » Fri Oct 21, 2011 8:31 pm

That's not the only way to measure performance.

You have to take into account the performance increases due to paging, the security improvements due to paging, and the performance losses due to lack of paging.

Microbenchmarks out of context are useless.

gerryg400 · Post by **gerryg400** » Fri Oct 21, 2011 8:41 pm

Did you read the entire paragraph ?

Intel's 64-ia-32 optimization manual actually wrote:An DTLB0 miss and STLB hit causes a penalty of 7cycles. Software only pays this penalty if the DTLB0 is used in some dispatch cases. The delays associated with a miss to the STLB and PMH are largely non-blocking.

But that is beside the point..

For example, to open the web page of this thread the server probably incurred in tens of page walks. There are memory copies from disk to memory, then from user app (web server) to the kernel, from kernel, it is copied inside to the device buffer, and from the buffer it goes to the NIC and then to your PC over a network. Multiply this by the thousands of users who open this webiste. If someone would calculate how much clock cycles are wasted in supporting virtual memory that is no longer needed because memory is very cheap , he could probably shot himself in the head.

That's an argument in favour of using paging. Pageable memory allows the OS to reduce the number of copies and speeds up disk access because DMA'ed data can be mapped to exactly where it's needed by the OS or the application. All that copying that you describe would only need to happen if paging weren't supported.

You are describing exactly what must happen on a processor that doesn't support paging. A well written full-size OS on a processor that supports paging needn't do any of those copies. And it will be faster because it uses paging.

A small, embedded or less-than-fully-featured OS may get better performance by disabling paging but a desktop or server OS must have paging.

OSDev.org

General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)

Re: General protection in 64bit mode (Intel)