how wide AMD64's vir and phy address are?

lemonyii · Post by **lemonyii** » Thu Dec 01, 2011 9:57 am

in the topic "What does your OS look like?" http://forum.osdev.org/viewtopic.php?f= ... start=1095 PAGE 74, a picture in the post by Farok, i saw something i can't understand.
at the 4th line of the picture's text, the entry point value is 0xFFFFFF0000006820, which is a 64bit virtual address sign extended from a 48bit virtual address i think.
i do remember that i tested this long time ago (longtimeago means that i'm not gonna do it once more), which conclude in that AMD64 won't work if the virtual address is larger than 0x7FFFFFFFFFFF, the largest value of a 47bit (NOT 48) value, and was proved by AMD64 Architecture Programmer’s Manual Volume 2:System Programming.
i checked this document again once after i found this problem. and i get this:

5.1 Page Translation Overview

The AMD64 architecture enhances this support to allow translation of 64-bit virtual addresses into 52-bit physical addresses, although processor implementations can support smaller virtual-address and physical-address spaces.

The AMD64 architecture enhances the legacy translation support by allowing virtual addresses of up to 64 bits long to be translated into physical addresses of up to 52 bits long.

Currently, the AMD64 architecture defines a mechanism for translating 48-bit virtual addresses to 52-bit physical addresses. The mechanism used to translate a full 64-bit virtual address is reserved and will be described in a future AMD64 architectural specification.

i think this means that the virtual address available is 47 bits only, and those larger than 0x7FFFFFFFFFFF should be sign extended to 64 bit in a "Canonical Address Form". and with Canonical Address Form we CANNOT access any address larger that 47bit(128TB), because the last 128TB of a 64bit space is mapped to the first 128T, and those between them cannot be referred to for the sake of sign extension defined in a Canonical address Form. and the address form used in Farok's sample have no benefit because it doesn't actually separate the system space to a higher part of a 64bit space. Am i right?
i still have no idea of how to access anywhere beyond 128T, i didn't write msr to enable the NX bit.
thank you!

Combuster · Post by **Combuster** » Thu Dec 01, 2011 11:27 am

because the last 128TB of a 64bit space is mapped to the first 128T

Wrong.

The only thing sign extension does is copy the top bit provided to the remaining bits. If you sign extend a 48-bit number to 64 bits, the bottom 48 bits will stay exactly the same, and these bits are what are used for the levels of page tables: 12 offset bits + 4 * 9 index bits = 48 bits. The highest level of page directory is not magically halved because the address 2^47 is limited, that has a completely different reason.

The reason is that previously, 8-bit register writes only overwrite a part of a larger register, and 16-bit writes did the same. If you are using chars and ints at the same time, registers would get reused and the processor has to keep track of the previous int value long after the register has been used as a character. This dependency has the effect of causing extra operations when the register use is widened again and the processor has to internally combine the values used.

Since small numbers are more memory efficient, 64-bit mode fixed the register stall by always overwriting the entire 64 bits when a 32-bit operation is performed. Since negative numbers are far more likely than very high numbers, sign extension was used as the method of choice.

That has the effect that using a 32-bit offset will yield addresses between 0x0000000000000000-0x000000007ffffff and 0xffffffff8000000 and 0xffffffffffffffff. For optimum usability, both sets need to be valid so 0xffffffffffffffff needs to point somewhere unique. For preventing people doing stupid things there should not be any two addresses that will always point to the same memory. Therefore they closed off the range not accessible by 32-bit addresses by making sure the top 25 bits are identical: this is the canonical address rule. Behind that, just the bottom 48 bits are used for doing the actual translation.

Qeroq · Post by **Qeroq** » Thu Dec 01, 2011 2:33 pm

Great insight, didn't know this before, Combuster, partly because I just assumed the 48 limit was only implemented to reduce costs.

I always image the AMD64 address space as two 128TB halves that are mapped from 0 to +128TB and from -128TB to 0 (that's the sign extension). Each of these addresses in -128TB to 128TB is valid and unique; all other addresses violate the canonical form and bad things will happen if you try to access them (IIRC #GP).

lemonyii · Post by **lemonyii** » Thu Dec 01, 2011 9:14 pm

i don't quite understand Combuster's idea. i've been talking about the situation in LONG MODE.
1st, do u agree with that the physical address limit is 0x7fffffffffff (47bits), and those beyond that is unaccessable ?
2nd, where are virtual addresses beyond 47bits mapped to and are they causing #GP or #PF?
3rd, do i have to take care of addresses beyond 47bits since the present implementation supports 47bits only?

Brendan · Post by **Brendan** » Thu Dec 01, 2011 10:22 pm

Hi,

I know it's AMD's fault, but I personally find the idea of "signed addresses" to be unnecessary and confusing, so...

lemonyii wrote:i don't quite understand Combuster's idea. i've been talking about the situation in LONG MODE.
1st, do u agree with that the physical address limit is 0x7fffffffffff (47bits), and those beyond that is unaccessable ?

No.

First, physical addresses have nothing to do with paging and are not limited to 48-bit. The physical address size limit is different for different CPUs and can only be determined by CPUID (including a work-around for errata in some Pentium 4 chips that mis-report it), and can vary from 32-bit up to 56-bit (although I don't think any CPU has been made that handles more than 52-bit physical addresses yet).

For virtual addresses in 64-bit code; valid (canonical) addresses are split into 2 ranges:

From 0x0000000000000000 to 0x00007FFFFFFFFFFF, and
From 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF

These are entirely independent (not copies of each other). Typically you'd use one area for "kernel space" and another for "user space".

The virtual addresses are 48 bit; and the 48th bit is copied to higher bits. For example, the 48-bit address 0x800000000000 would be extended to the 64-bit address 0xFFFF800000000000. Of course software never uses 48-bit addresses, so this way of looking at it is awkward at best (even without the confusion of "negative addresses").

In practice, a much better way of looking at it is to call it a 64-bit address space with a massive hole in the middle, like this:

From 0x0000000000000000 to 0x00007FFFFFFFFFFF = usable
From 0x0000800000000000 to 0xFFFF7FFFFFFFFFFF = not usable (massive hole)
From 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF = usable

lemonyii wrote:2nd, where are virtual addresses beyond 47bits mapped to and are they causing #GP or #PF?

Typically, using a "non-canonical" addresses (or, trying to access things in that massive hole) causes a general protection fault. Any other accesses (not in the massive hole) depend on paging (e.g. if the page is present, writeable, executable, etc) and you may or may not get a page fault.

lemonyii wrote:3rd, do i have to take care of addresses beyond 47bits since the present implementation supports 47bits only?

The present implementation supports 48 bit virtual addresses (think of the 48th bit as a flag that determines if you want the lower 47-bit area or the higher 47-bit area if you like). You can choose to leave the entire upper 47-bit area as "not present" if you want to (and get "page not present" page faults if anyone tries to access it). Most people would use it for "kernel space" though, so normal processes have the entire lower 47-bit area to use.

Cheers,

Brendan

lemonyii · Post by **lemonyii** » Thu Dec 01, 2011 11:25 pm

oh i do mean virtual but i wrote physical address in the 1st question, and i think i got the idea now.
the idea means that virtual address within 0-0x7fffffffffff will use the first 256 entries of the PML4 and will use the other 256 entries.
the fault i made in my test is that i didn't notice the word "Canonical" and sign extension things, which made me thought about to use 0-0xffffffffffff. and i'm gonna regard those beyond 47bits as invalid for the exists of NX bit (which means i cannot use 0xffff800000000000-0xffffffffffffffff as kernel space if i'm going to use it), easier memory management, and compatibility with my previous work.
thank you all.
and i'm wondering where is Brendan's keyboard because i think it is still during the night in Euro and US when he replies. sorry if it is thought as private info.

Brendan · Post by **Brendan** » Thu Dec 01, 2011 11:33 pm

Hi,

lemonyii wrote:and i'm wondering where is Brendan's keyboard because i think it is still during the night in Euro and US when he replies. sorry if it is thought as private info.

My keyboard is in Australia. I'm also in Australia, but I tend to drift into other time zones without travelling anywhere.

Cheers,

Brendan

lemonyii · Post by **lemonyii** » Fri Dec 02, 2011 5:27 am

Australia = +10, got it.

OSDev.org

how wide AMD64's vir and phy address are?

how wide AMD64's vir and phy address are?

Re: how wide AMD64's vir and phy address are?

Re: how wide AMD64's vir and phy address are?

Re: how wide AMD64's vir and phy address are?

Re: how wide AMD64's vir and phy address are?

Re: how wide AMD64's vir and phy address are?

Re: how wide AMD64's vir and phy address are?

Re: how wide AMD64's vir and phy address are?