Page 1 of 2

Long Mode

Posted: Mon Mar 02, 2009 2:19 am
by johnsa
Hey all,

Was wondering if anyone working on a 64bit kernel had tried to see if there were any "backdoors" (as is usual with Intel/AMD x86) to get into Long Mode and then not using paging?
TBH I can't see a good reason why paging is the only option available in 64bit as opposed to offering flat as well (as it was with 32bit pmode).
Obviously almost all OSes will use paging, but there should be some good reasons NOT to use it possibly?

John

Re: Long Mode

Posted: Mon Mar 02, 2009 5:34 am
by jal
johnsa wrote:I can't see a good reason why paging is the only option available in 64bit as opposed to offering flat as well (as it was with 32bit pmode).
No doubt it simplifies processor design, and allows certain optimizations.


JAL

Re: Long Mode

Posted: Mon Mar 02, 2009 5:38 am
by johnsa
Speaking of which.. has anyone done any performance tests on flat vs paged memory to see how much difference is made by having the CPU have to look everything up in the page tables?
Perhaps for some different operations.. like copying/filling memory for a block which is sub-page-size (like 1kb) and doing large ops like 1meg copy from array 1 to array 2.. etc

I remember in the 486 days paging was much slower, around 15-20%.. curious if this is still the case or if the CPU's have evolved and optimized away the performance issues of paging in 32bit pmode.

Re: Long Mode

Posted: Mon Mar 02, 2009 6:34 am
by stlw
johnsa wrote:Speaking of which.. has anyone done any performance tests on flat vs paged memory to see how much difference is made by having the CPU have to look everything up in the page tables?
Perhaps for some different operations.. like copying/filling memory for a block which is sub-page-size (like 1kb) and doing large ops like 1meg copy from array 1 to array 2.. etc

I remember in the 486 days paging was much slower, around 15-20%.. curious if this is still the case or if the CPU's have evolved and optimized away the performance issues of paging in 32bit pmode.
Don't forget about TLB+other page caches existsing in all modern CPUs. It is simple arithmetic task to show how often you actually gonna walk you pages and how often you hit in the TLBs. Largest load you could do in x86 is 16-byte length. The page size is 4K. I am already not talking about 2Mb or 1Gb pages. So how many page walks you will do for one page copy ?

When you copy large blocks of memory (say 1Mb, although 1Mb is not large enough because it fits in L2/L3 completely) you will be completely limited by cache/memory bandwidth. Core 2 Duo L2 can't handle more than 16 bytes per cycle with latency of 18 clocks. The buffers inside the core are not large enough to handle it without stalling the whole pipeline. The Barcelona is even worse.

It looks like you are lazy of understanding it and prefer no-paging mode instead and looking for excuse :)

Stanislav

Re: Long Mode

Posted: Mon Mar 02, 2009 10:16 am
by johnsa
I do understand it and have implemented it before. But I don't necessarily see a lot of benefit to paging except in terms of building an OS for end-users where you want user processes seperated and the ability to fault on page access etc.. IE it's all about it being a security mechanism rather than a critical feature. Yes for PAE it's essential if you want more than 4gig, but that is no longer an issue as it was when PAE originated pre-64bit.. if you want more than 4gig.. go 64...imho. Mapping each process into a virtual 0-4gig space isnt a necessicty for me, nor is protecting memory from being over-written. If I was building an end-user OS it would be and I'd agree with paging 100%.

I'm just curious if anyone had at least tried to see if there was a back door mode in long mode (ala unreal mode etc) to switch off paging.

Re: Long Mode

Posted: Mon Mar 02, 2009 11:31 am
by JohnnyTheDon
it's all about it being a security mechanism rather than a critical feature.
Thats not the only reason. Paging allows you to run multiple programs that expect to be in the same space without moving around a ton of memory. It also allows for disk mapping and virtual memory. Don't dismiss it because its inconvenient sometimes :)

Re: Long Mode

Posted: Mon Mar 02, 2009 11:40 am
by nekros
I don't really see it's inconvenience (then again I'm writing a general purpose os), If your going to do multitasking it's actually convenient to use paging. :wink:

Re: Long Mode

Posted: Mon Mar 02, 2009 1:05 pm
by johnsa
I personally think virtual memory is a lousy idea.. (imho) ... at least when implement ala Windows style. Other architecture machines managed multi-tasking and dealing with relocations no trouble without a paging mechanism.. Amiga/68k eg?

In any event I guess we could debate pros and cons for hours.. My only two real questions were if anyone had meaused the performance difference (if any) .. IE: use rdtsc switch to pmode non paged, do some memory copies/writes of various sizes.. and then do the same in paged mode and see if there is any difference at all. and secondly if anyone who was working on a 64bit kernel had tried looking for some backdoors to get long mode up without paging.. I wouldn't put it past amd/intel to have a way to do it that just isn't documented.

Re: Long Mode

Posted: Mon Mar 02, 2009 1:19 pm
by 01000101
I've developed a few 64-bit kernels.

The only real speed performance I've seen is when using optimized memcpy's using 64-bit registers instead of 32-bit, but even then, I usually use SSE for those memory functions anyways if possible so it's hard to judge.

Getting an accurate comparison between paging vs. non-paging kernels would be near-impossible. Also, comparing 64-bit to 32-bit is difficult as well as I'd imagine in emulators it would run much differently then on real hardware is they usually associate speed with code size.

I know of no tricks to get into Long Mode without enabling paging as the core concept of Long Mode relies on the PML4 giving the addressable area an extension.

btw, I just thought of something that *may* be an arguable speed increase on 64-bit systems is the use of scratch registers as MEMORY/INTEGER function arguments instead of using the stack.

Re: Long Mode

Posted: Mon Mar 02, 2009 2:40 pm
by JohnnyTheDon
johnsa wrote:I personally think virtual memory is a lousy idea..
Uh... why?

It lets a program map in any file it needs, including its code, and share it with other programs that are using the same data. Whats not to love?

Re: Long Mode

Posted: Mon Mar 02, 2009 5:57 pm
by nekros
It might put a slight damper on performance, but it makes memory management much easier. =D>

Re: Long Mode

Posted: Mon Mar 02, 2009 6:34 pm
by Brendan
Hi,
stlw wrote:When you copy large blocks of memory (say 1Mb, although 1Mb is not large enough because it fits in L2/L3 completely) you will be completely limited by cache/memory bandwidth.
For long mode paging, you can copy or move PML4 entries (or PDP entries, or page directory entries, or page table entries) instead of copying/moving individual bytes. For "best case" this works out to about 30 billion times faster than memory bandwidth.

If you think paging is slow, then you're not using it right... ;)


Cheers,

Brendan

Re: Long Mode

Posted: Tue Mar 03, 2009 1:47 am
by johnsa
Surely if you copy the PML or table entries you'd kind of be defeating the objective of having a copy.. as you'd land up with two different ranges of addresses mapped to the same physical data? You'd seldom copy something if you weren't going to do anything destructive to it.

Re: Long Mode

Posted: Tue Mar 03, 2009 2:31 am
by Brendan
Hi,
johnsa wrote:Surely if you copy the PML or table entries you'd kind of be defeating the objective of having a copy.. as you'd land up with two different ranges of addresses mapped to the same physical data? You'd seldom copy something if you weren't going to do anything destructive to it.
See "copy-on-write". Now consider how you'd implement "fork()" for potentially huge (64-bit) processes without an insane amount of overhead.


Cheers,

Brendan

Re: Long Mode

Posted: Wed Mar 04, 2009 2:16 pm
by Sly
I heard that you can get to long mode by changing a bit in a MSR.