Re: General protection in 64bit mode (Intel)
Posted: Fri Oct 21, 2011 7:01 am
it is not my assumption, it is my goal. It is not an OS for desktops.Combuster wrote:Basically, your assumption only holds for simple embedded devices
The Place to Start for Operating System Developers
http://f.osdev.org/
it is not my assumption, it is my goal. It is not an OS for desktops.Combuster wrote:Basically, your assumption only holds for simple embedded devices
32bit is not a solution either for me, I will lose additional registers including SSE, and I need those.Casm wrote:bluemoon wrote: It should be at least a little while before anybody needs more than the 64Gb they can get with physical address extension in 32 bit mode.
Are we allowed to know what solution fits for you? Did you reckon the advantages of paging?32bit is not a solution either for me
No troubles yet. I just wanted to do segment limit checking in 64bits (because it is faster). but as I see it is not possible.Chandra wrote:Are we allowed to know what solution fits for you? Did you reckon the advantages of paging?32bit is not a solution either for me
Besides, what trouble are you having setting up paging?
But I've used paging for 23 years. I use both paging and segmentation.Solar wrote:@ nulik: Dear Sir, if you look into the mirror, do you - by any chance - look somewhat alike rdos, perhaps with a sock pulled over his hand?
I just ask because you are the second person insisting on using segmentation instead of paging for protection, and using quite similar arguments (pointing to years-old papers).
Segmentation is a debugging-tool primarily. And it avoids putting device-drivers in their own address-spaces. As soon as the debugging-phase is over, much of the segment-based protection can be disabled. I think I posted code on just how to do this previously for ACPI.Solar wrote:Segmentation "being faster"? Assumptions and presumptions prove nothing. Blaming the CPU designers for taking away your toys solves nothing. It is as it is, cope, or design your own CPU, and toolchain, and language, capable of coping with segmentation.
I will most likely write some major pieces of C-code in the kernel in the near future. Not that I will ever provide the scheduler or memory manager in C, but some fairly complex device-drivers would quite likely be in C or C++.Solar wrote:Oh wait, you want to code your OS in ASM only, too, do you?
The kernel device-driver model uses one code selector and one data selector (DGROUP) per device-driver for protection. All pointers to data are 48-bit. Today all applications are flat (slightly modified PE-format), but segmented applications are still supported. Paging is used for virtual/physical memory allocation, and for running applications in separate address spaces.Solar wrote:Uh... I understood your memory model to be non-flat?
I can argue it like this:rdos wrote:Even if it were 30 years old it would still show the obvious that a system without paging is faster than a system with paging. I don't understand how anybody can argue otherwise.
It is not occasional, it should be happening a lot, given the size of the TLB.gerryg400 wrote:]I can argue it like this:
The key here is the word 'system'. Faster can be measured lots of ways. The perceived speed of modern operating systems benefits from the OS features that can be built on paging like mmap-ing files, demand loading executable images, copy-on-write, forking, passing data by paging, remapping DMA buffers, etc. The speed-up provided by these features outweighs the occasional slow-ups caused by having to do page translations.
Here you lost 7 cpu cycles.DTLB can perform three linear to physical address translations every cycle, two for load address and one for a store address. If the address is missing in the DTLB, the processor looks for it in the STLB, which holds data and instruction address translations. The penalty of a DTLB miss that hits the STLB is seven cycles.
Now here you will lose a lot more. To read a cache line the processor may take about 200 clock cycles if it reads memory contigously and the memory latency is hidden.The next largest set of memory access delays are associated with the TLBs when linear-to-physical addresss translation is mapped with a finite number of entries in the TLBs. A miss in the first level TLBs results in a very small penalty that can usually be hidden by the OOO execution and comipler's scheduling. A miss in the shared TLB results in the Page Walker being invoked and this penalty can be noticeable in the execution.
Ahh! And I forgot that by doing page walk , it may evict your variables from the cache, so it is probably much worse, since another 40 cycles maybe wasted for access to bring your lost variables back to the L1 cache from L2 cache (if they are still there)
So, it is 207 cycles against 1 cycle to use physical addresses directly without any paging.
But that is beside the point..Intel's 64-ia-32 optimization manual actually wrote:An DTLB0 miss and STLB hit causes a penalty of 7cycles. Software only pays this penalty if the DTLB0 is used in some dispatch cases. The delays associated with a miss to the STLB and PMH are largely non-blocking.
That's an argument in favour of using paging. Pageable memory allows the OS to reduce the number of copies and speeds up disk access because DMA'ed data can be mapped to exactly where it's needed by the OS or the application. All that copying that you describe would only need to happen if paging weren't supported.For example, to open the web page of this thread the server probably incurred in tens of page walks. There are memory copies from disk to memory, then from user app (web server) to the kernel, from kernel, it is copied inside to the device buffer, and from the buffer it goes to the NIC and then to your PC over a network. Multiply this by the thousands of users who open this webiste. If someone would calculate how much clock cycles are wasted in supporting virtual memory that is no longer needed because memory is very cheap , he could probably shot himself in the head.