x86 virtual address space

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Scoopta
Member
Member
Posts: 26
Joined: Tue Jun 13, 2017 3:17 am
Libera.chat IRC: Scoopta

x86 virtual address space

Post by Scoopta »

I'm not sure if this is better to ask here or ask on the general programming forum but can someone explain to me the virtual address space table on the x86 Wikipedia because it has me thoroughly confused and I'm not even sure if it's accurate. It lists the 286 as having a 30-bit virtual space and all IA-32 CPUs as having 46-bit space and I'm just trying to understand what it's even talking about and if it's right. I was always under the impression that linear space and virtual space were the same thing, that is the amount of memory that can be accessed without switching segments or page tables. https://en.wikipedia.org/wiki/X86#Chronology.
Scoopta
Member
Member
Posts: 26
Joined: Tue Jun 13, 2017 3:17 am
Libera.chat IRC: Scoopta

Re: x86 virtual address space

Post by Scoopta »

Ok, so I think I figured it out, the virtual address space on the page appears to just be the linear space + 14-bits and in protected mode you can have 16383 segments, or in other words 14-bits worth of segments. This also explains why they don't separate it for 64-bit CPUs which don't have segmentation when running in their native long mode.
Octocontrabass
Member
Member
Posts: 5446
Joined: Mon Mar 25, 2013 7:01 pm

Re: x86 virtual address space

Post by Octocontrabass »

Scoopta wrote: Mon Aug 05, 2024 8:09 pmthe virtual address space on the page appears to just be the linear space + 14-bits
It's not the linear space, it's the offset within a segment. On the 286, the linear space is 24 bits, but offsets are only 16 bits.
Scoopta
Member
Member
Posts: 26
Joined: Tue Jun 13, 2017 3:17 am
Libera.chat IRC: Scoopta

Re: x86 virtual address space

Post by Scoopta »

Octocontrabass wrote: Mon Aug 05, 2024 8:51 pm It's not the linear space, it's the offset within a segment. On the 286, the linear space is 24 bits, but offsets are only 16 bits.
The physical space would be 24-bits. Linear space is the amount of memory that can be contiguously addressed within a segment. I'm specifically sticking to the terms in the article that is being discussed.
nullplan
Member
Member
Posts: 1743
Joined: Wed Aug 30, 2017 8:24 am

Re: x86 virtual address space

Post by nullplan »

Octocontrabass wrote: Mon Aug 05, 2024 8:51 pm It's not the linear space, it's the offset within a segment. On the 286, the linear space is 24 bits, but offsets are only 16 bits.
But that makes no sense. The base is 24 bits, the offset is 16, but when you add a 24-bit number and a 16-bit number, you get at most a 25-bit number.

What they are writing makes no sense to me, either. I am familiar with the concept of a linear, a virtual, and a physical address, but virtual addresses in x86 were never bigger than 32 bits until x86-64 rolled around. PowerPC had a linear->virtual translation that became larger, yes, but not x86. On x86, the linear->virtual translation is just adding the segment base, and that was a 24-bit number on the 286, and a 32-bit number ever since. Even adding two 32 bit numbers only gives a 33-bit number at most, but the 386 has no way to translate a 33-bit virtual address to a physical one, so they can only have truncated the result to 32 bits again.
Carpe diem!
Octocontrabass
Member
Member
Posts: 5446
Joined: Mon Mar 25, 2013 7:01 pm

Re: x86 virtual address space

Post by Octocontrabass »

Scoopta wrote: Mon Aug 05, 2024 9:08 pmI'm specifically sticking to the terms in the article that is being discussed.
The article itself doesn't stick to those terms. (If it did, it wouldn't have the same number in all three columns for every x86-64 CPU.)
nullplan wrote: Tue Aug 06, 2024 1:50 amOn x86, the linear->virtual translation
I think this is a problem of conflicting terminology. Intel uses "linear" to refer to the intermediate address that paging translates into a physical address after the segment base has been added to the offset, but it sounds like you're using "virtual" to refer to the same thing. Intel uses "virtual" to refer to a segment+offset pair. It sounds like you're using "linear" to refer to just the offset without a segment.
nullplan
Member
Member
Posts: 1743
Joined: Wed Aug 30, 2017 8:24 am

Re: x86 virtual address space

Post by nullplan »

Octocontrabass wrote: Tue Aug 06, 2024 12:24 pm I think this is a problem of conflicting terminology. Intel uses "linear" to refer to the intermediate address that paging translates into a physical address after the segment base has been added to the offset, but it sounds like you're using "virtual" to refer to the same thing. Intel uses "virtual" to refer to a segment+offset pair. It sounds like you're using "linear" to refer to just the offset without a segment.
The AMD APM uses the terms "linear" and "virtual" interchangeably. I mistook the article's "linear" with the "effective" address (because that's the only one left, after taking care of the virtual and physical addresses). The effective address is the one the programs generate (and yes, that is the offset without the segment base).

Still, it makes no sense for the page to claim the 286 had a virtual address space, since it didn't have paging. And the claim of a 46-bit virtual address space in the 32-bit CPUs makes no sense.
Carpe diem!
Octocontrabass
Member
Member
Posts: 5446
Joined: Mon Mar 25, 2013 7:01 pm

Re: x86 virtual address space

Post by Octocontrabass »

nullplan wrote: Tue Aug 06, 2024 12:51 pmStill, it makes no sense for the page to claim the 286 had a virtual address space, since it didn't have paging.
But it did have a virtual address space. Paging is not the only form of address space virtualization.
nullplan wrote: Tue Aug 06, 2024 12:51 pmAnd the claim of a 46-bit virtual address space in the 32-bit CPUs makes no sense.
The segment selector contains 14 bits of the virtual address.
rdos
Member
Member
Posts: 3247
Joined: Wed Oct 01, 2008 1:55 pm

Re: x86 virtual address space

Post by rdos »

Octocontrabass wrote: Tue Aug 06, 2024 5:22 pm The segment selector contains 14 bits of the virtual address.
Not really. The GDT and LDT only takes 32-bit base addresses, and if base + offset is above 32-bits, bit 32 is ignored.

It would be true if the 64-bit CPUs added new descriptors that could take a 64-bit base, but this is not the case.

Also, the paging implementations of protected mode can only translate 32-bit linear addresses.
nullplan
Member
Member
Posts: 1743
Joined: Wed Aug 30, 2017 8:24 am

Re: x86 virtual address space

Post by nullplan »

rdos wrote: Thu Aug 08, 2024 1:08 am Not really. The GDT and LDT only takes 32-bit base addresses, and if base + offset is above 32-bits, bit 32 is ignored.
No, you're making the same mistake I did. What they mean is that the offset by itself is the "linear" address, and "segment:offset" is the virtual address. Since in protected mode, two bits of the 16-bit segment selector are the permission level, and the offset is 32 bits, this is then a 14+32 bit "virtual" address, which is then translated through segmentation and paging to another 32-bit address. And I suspect the 36-bit physical addresses they have a bit later under PAE was not architectural, but rather the limits of the CPUs, because PAE is capable of producing 52-bit physical addresses (which is the actual architectural limit).

This is not what the AMD APM contains, and I suspect also not what the Intel SDM says. This is purely the creation of the Wikipedia authors. Both of these simply start with the offset as the "effective" address, then segmentation turns it into the linear or virtual address, and then paging turns it into the physical one. That is the content of Figure 2-3 in AMD APM vol 1.
Carpe diem!
rdos
Member
Member
Posts: 3247
Joined: Wed Oct 01, 2008 1:55 pm

Re: x86 virtual address space

Post by rdos »

nullplan wrote: Thu Aug 08, 2024 1:36 am
rdos wrote: Thu Aug 08, 2024 1:08 am Not really. The GDT and LDT only takes 32-bit base addresses, and if base + offset is above 32-bits, bit 32 is ignored.
No, you're making the same mistake I did. What they mean is that the offset by itself is the "linear" address, and "segment:offset" is the virtual address. Since in protected mode, two bits of the 16-bit segment selector are the permission level, and the offset is 32 bits, this is then a 14+32 bit "virtual" address, which is then translated through segmentation and paging to another 32-bit address. And I suspect the 36-bit physical addresses they have a bit later under PAE was not architectural, but rather the limits of the CPUs, because PAE is capable of producing 52-bit physical addresses (which is the actual architectural limit).
I don't think this is the way it should be interpreted. In real mode, segments increase the address range from 16 bits to 20 bits since the contents of the segment register is rotated four positions. So, the virtual address space is 20 bits, even if offsets are only 16 bits. In protected mode, the base is 32-bits, and so the virtual address space is only 32-bits, and the selector only functions to limit access to a smaller range within the 32-bit address space. When you create flat selectors you essentially bypass segmentation and create a 32-bit address space.

PAE is not at all related to this. PAE uses long mode page frames to support a 64-bit physical address space from a 32-bit linear address space. Initially, PAE was documented to only support a limited number of bits, but in essence, it supports the same number of bits as long mode, and the number of supported bits depends on the processor architecture. So, PAE makes it possible to use all available physical memory in protected mode, but it doesn't make all of it addressable in the virtual address space, rather to achieve this memory needs to be selectively mapped, or used in many different processes. Segmentation does not make it possible to access more physical memory than a flat memory model.
Octocontrabass
Member
Member
Posts: 5446
Joined: Mon Mar 25, 2013 7:01 pm

Re: x86 virtual address space

Post by Octocontrabass »

rdos wrote: Thu Aug 08, 2024 4:02 amIn protected mode, the base is 32-bits, and so the virtual address space is only 32-bits, and the selector only functions to limit access to a smaller range within the 32-bit address space.
You don't have to have all segments loaded at the same time. If you use #NP exceptions to swap segments the same way you can use #PF to swap pages, the segment selector becomes part of the virtual address.
sh42
Posts: 13
Joined: Sat Aug 17, 2024 4:45 pm

Re: x86 virtual address space

Post by sh42 »

Scoopta wrote: Mon Aug 05, 2024 8:09 pm Ok, so I think I figured it out, the virtual address space on the page appears to just be the linear space + 14-bits and in protected mode you can have 16383 segments, or in other words 14-bits worth of segments. This also explains why they don't separate it for 64-bit CPUs which don't have segmentation when running in their native long mode.
As you can see from the other 2 debating things. Wikipedia is hard, as this is often based on different sources which might use subtle different terminologies, and writers there can mix things, perhaps assuming people know this.

For x86 I'd recommend the intel developer guides. They are quite clear on how page translation etc. works. For Intel 64 and AMD 64 there's also different documents like this one:
https://www.amd.com/content/dam/amd/en/ ... /24593.pdf for example.

They can be seen as a ground-truth as their cpu will work that way.
You will find similar confusions bubble up and pop in your understanding going between these documents.

linear addresses, logical addresses, virtual addresses, effective addresses. It's all a mess :D.

I try to stick with the vendor of the CPU i am using and just grab the documents each time to make sure i'm not confusing myself. it's a good habit just to go for the manuals rather than other people's summary of whats in there. - it takes a while to get used to reading these guides, but it's worth to take the time to learn to reference them.
Post Reply