I see that most PCI devices in QEMU do not support 64 bit addressing, so I always should manually search for memory under 4gb.
Is there a better way, what is DMA Remapping ?
Devices that don't support 64 bit addresses
-
- Member
- Posts: 5560
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Devices that don't support 64 bit addresses
If your (virtual) hardware supports it, you can use the IOMMU to translate the device's 32-bit DMA addresses to 64-bit physical addresses. (Microsoft calls this DMA remapping but hardware vendors might use different names.)
MSI and MSI-X use DMA, so those will also be remapped by the IOMMU. Some IOMMUs also support remapping legacy PCI interrupts. Depending on the (virtual) hardware topology, multiple devices may be forced to share a single mapping.
Address translation has runtime overhead. You may find situations where you get better overall performance by spending the extra effort to allocate memory below 4GB instead of using the IOMMU.
MSI and MSI-X use DMA, so those will also be remapped by the IOMMU. Some IOMMUs also support remapping legacy PCI interrupts. Depending on the (virtual) hardware topology, multiple devices may be forced to share a single mapping.
Address translation has runtime overhead. You may find situations where you get better overall performance by spending the extra effort to allocate memory below 4GB instead of using the IOMMU.
Re: Devices that don't support 64 bit addresses
Ive seen in some windows example that some devices even only support 16 bit addresses.
I think that I will just split images in my new (optimized heap management api that supports page frame allocations and heaps) into below 4 gb image and above 4 gb image. I call the context "heap image"idk lol.
And btw my algorithm works, I've never seen someone use it before.
Can I ask you, now just like every allocator I have a recent heap variable which I allocate from until there is a requested memory that is bigger than recent one. Then I use the function to get the biggest chunk possible, I tested it on my 3.6GHz Xeon e51620 and it runs 150 million times in 1 second (this is the worst case function) and the way its designed so that the timing is fixed no matter how much heaps exist. Is that considered fast, now the best case could be something like 600 million allocations per second when you just take from the last free block that you just found.
I think that I will just split images in my new (optimized heap management api that supports page frame allocations and heaps) into below 4 gb image and above 4 gb image. I call the context "heap image"idk lol.
And btw my algorithm works, I've never seen someone use it before.
Can I ask you, now just like every allocator I have a recent heap variable which I allocate from until there is a requested memory that is bigger than recent one. Then I use the function to get the biggest chunk possible, I tested it on my 3.6GHz Xeon e51620 and it runs 150 million times in 1 second (this is the worst case function) and the way its designed so that the timing is fixed no matter how much heaps exist. Is that considered fast, now the best case could be something like 600 million allocations per second when you just take from the last free block that you just found.
Re: Devices that don't support 64 bit addresses
Device capabilities vary. ISA DMA devices can only take 24 bit addresses, and needs the buffers to not cross a 64k border. And if you want to start a secondary CPU, you need to put the SMP trampoline into a page in the low 1MB (or 20 bit address). My strategy to deal with these is to always allocate the highest addresses for any given physical memory request. Low addresses are less plentiful by definition (there is obviously only 1MB of memory available in the low 1MB of address space, while there are multiple GB available above it), and more valuable since some hardware has restrictions.devc1 wrote:Ive seen in some windows example that some devices even only support 16 bit addresses.
I had the idea of using different zones too, but then I read in the Linux kernel source code that some devices have weird limitations. Certain EHCI models can only deal with 31 bit addresses, for example. So now I have a unified physical memory allocator that is customizable with a callback function, and usually returns the highest address that can fulfill a request. That keeps the low memory free until it is needed.
Carpe diem!
Re: Devices that don't support 64 bit addresses
You're right but that sacrifices performance.
But guess what, modern hardware devices have 64 bit support.
So I think I will be requiring that, I guess all computers nowadays should have that.
Because my algorithm depends on length not on searching, it's a bit complicated.
But guess what, modern hardware devices have 64 bit support.
So I think I will be requiring that, I guess all computers nowadays should have that.
Because my algorithm depends on length not on searching, it's a bit complicated.
Re: Devices that don't support 64 bit addresses
You don't even know how much I don't care about performance. In the list of my priorities, performance ranks third, behind correctness and ease of understanding. Because if the code isn't correct, it doesn't matter how fast it is, and if I can't understand it, then I can't maintain or improve it, and so again it doesn't matter how fast it is. I will care about performance once I see there is an issue.devc1 wrote:You're right but that sacrifices performance.
For stuff like the physical memory allocator, making it return the highest address that fits instead of the lowest one was simply a matter of reversing the order of the memory blocks and rethinking the logic a little. It sacrifices nothing in my PMM.
Yes, but also modern computers contain old hardware. Do you even know how much ISA stuff is rolling around in that modern system you are using? I will not constrain my OS architecturally to only be able to support 64-bit capable devices, even if I only support 64-bit CPUs.devc1 wrote:But guess what, modern hardware devices have 64 bit support.
Carpe diem!
Re: Devices that don't support 64 bit addresses
That's what I did before when I had a simple allocator, but I will use this in a specialized (slow) function to go through the lowest heaps.reversing the order of the memory blocks and rethinking the logic a little.
If I'm working with the absolute biggest memory areas in the memory map, they should normally always be in the highest address don't you think so? They always are like this:
< 4GB: Some fragmented memory and system IO
> 4GB: maybe some MMIO, then a big contiguous chunk of 10 GB of RAM.
So I'm forcibly consumming memory from the highest address.
And even in malloc instead of going through a bunch of lists, there is a very very easy method that I discovered that will return the biggest chunk probably faster than your best case scenario lol
SMP boot area is preallocated by the bootloader though
Well what if some pc has 6 GB of RAM (probably will contain bigger chunks in the low 4G), well drivers run first so I don't think the low 4GB would be eaten. If you install a driver it can run on runtime or if it quits because of "no memory below 31 bits lets say" we can advise the user to reboot.