Page 1 of 1

NUMA

Posted: Wed May 12, 2004 7:26 am
by Brendan
Hi,

Ok, I've been re-writing my OS and I'm wondering if it'd be worthwhile adding support for NUMA. So far it's supporting PAE, page colouring, MTRRs & PAT, MP, ACPI.

I do want to support NUMA eventually (and I don't want to end up re-writing again to add this support), but I have no way of testing NUMA. Should I write code to support NUMA while "blindfolded" and try to fix all the bugs when I can test it, or should I skip NUMA for now? I intend purchasing a 2 or 4 way opteron based machine in late 2005 (for both NUMA and AA-64)...

If I skip NUMA for now what would be involved in retro-fitting it? Does anyone have any suggestions for making retro-fitting easier (e.g. "Don't use <METHOD> for physical memory management because <REASON> and the kernel will need to support <FEATURE/S>")?

I know that the kernel should support processor affinity, but I intend supporting that anyway (can improve CPU cache re-use for normal MP too). I'd also need seperate physical memory managers for each memory domain. Is this all or have I missed something?

Is there any detailed information on this topic (for 80x86 PCs) available anywhere? I've already got MS's SRAT details (couldn't find anything from AMD though). Google comes up with some Linux chatter and "National Underwater and Marine Agency" :)


Thanks,

Brendan

Re:NUMA

Posted: Wed May 12, 2004 10:09 am
by Candy
I did find a NUMA how-do-I-start-opterons manual on the AMD site, for the rest, use your head :). What's NUMA based on, what makes it strong, what makes it weak.

Re:NUMA

Posted: Thu May 13, 2004 12:39 am
by Brendan
Hi,
Candy wrote: I did find a NUMA how-do-I-start-opterons manual on the AMD site, for the rest, use your head :). What's NUMA based on, what makes it strong, what makes it weak.
Is this the "BIOS and Kernel Developer's Guide for AMD AthlonTM 64 and AMD OpteronTM Processors"?

Cheers,

Brendan

Re:NUMA

Posted: Thu May 13, 2004 1:18 am
by Candy
Brendan wrote: Is this the "BIOS and Kernel Developer's Guide for AMD AthlonTM 64 and AMD OpteronTM Processors"?
yea, that's the one. Contains a procedure (crappy one IMO, but you can figure out a better one based on that) to start NUMA MP systems, and it contains some info on how the opterons map memory. Didn't read that too thoroughly though, so can't help with that yet.

Re:NUMA

Posted: Thu May 13, 2004 5:35 am
by Brendan
Hi,
Candy wrote: yea, that's the one. Contains a procedure (crappy one IMO, but you can figure out a better one based on that) to start NUMA MP systems, and it contains some info on how the opterons map memory. Didn't read that too thoroughly though, so can't help with that yet.
I haven't read it properly either, but it seems to be directed more to BIOS writers than kernel developers - most of the CPU & memory information in this manual would be handled by the BIOS and passed to the kernel in ACPI & MP spec tables.

Anyway I've decided not to support NUMA yet. It's getting too close to CPU clustering and other "big iron" things. I'll create 32 bit kernels (UP & HT/MP) without NUMA and leave NUMA to the 64 bit version only. The price of Opterons would have dropped by then..

Thanks,

Brendan

Re:NUMA

Posted: Thu May 13, 2004 6:00 am
by Candy
Yes and no.

Yes, it's a good plan only to do MP/SMP. You will probably be able to use some form of SMP emulation that the opteron might have (something in the back of my head says this but I can't confirm nor deny), and you can probably get all memory as it's mapped by the BIOS to a linear segment

No, it's a bad plan. Using those registers you can map memory everywhere you like it to be, offer advanced memory configuration and allow use of all memory without losing valuable megabytes on PCI mmapped IO. You can also create more optimized routing tables for the opterons and support weird configurations, and display the configuration in a more clear way.

Re:NUMA

Posted: Thu May 13, 2004 7:13 am
by Brendan
Hi,
Candy wrote: Yes, it's a good plan only to do MP/SMP. You will probably be able to use some form of SMP emulation that the opteron might have (something in the back of my head says this but I can't confirm nor deny), and you can probably get all memory as it's mapped by the BIOS to a linear segment
A NUMA/Opteron/64bit computer will work fine with a non-NUMA MP 32bit OS, the OS just won't take full advantage of the computers features. Access to memory won't be optimized and the OS wouldn't take advantage of the 64 bit registers (and additional registers). The 32 bit OS would still be able to access all physical memory though (PAE) and use all CPUs (MP/SMP).
No, it's a bad plan. Using those registers you can map memory everywhere you like it to be, offer advanced memory configuration and allow use of all memory without losing valuable megabytes on PCI mmapped IO. You can also create more optimized routing tables for the opterons and support weird configurations, and display the configuration in a more clear way.
My 32 bit kernel could optimize for NUMA memory accesses, but IMHO people with opteron servers will want a 64 bit OS anyway. On my OS linear memory isn't so valuable, as a device driver has an entire address space (3 Gb) for it's own private use. Access to the physical address space in 32 bit with PAE has the same restrictions as access in 64 bit.

Eventually I'll write a kernel for 64 bit. This kernel would support NUMA, but the NUMA support will be handled by the kernel (it'd be invisible to applications) and wouldn't effect the kernel API.

Therefore what I think I need (for the 32 bit kernel) is support for CPU affinity and CPU clustering. IMHO supporting CPU affinity is a good idea anyway, as it can make better use of CPU caches. CPU clustering is required for decent hyper-threading support - e.g. an SMP computer with 2 physical CPUs that contain 2 logical CPUs each would be considered 2 CPU clusters. In this case the CPU affinity support would try to make a thread run on the same CPU it ran last time, or if that's not practical it would try to make the thread run on a CPU in the same cluster (although this would depend on CPU load balancing). It's all about recycling shared caches, and making sure the kernel API is consistant across different kernel versions.

Also I fully expect AMD to add "hyper-threading" like technology to opterons, and wouldn't be surprised if Intel started using NUMA. This could easily lead to an 8 way motherboard with 32 logical CPUs (8 clusters of 4). Please note I'm talking about CPU clustering where all memory is directly accessable by all CPUs, not the CPU clustering done by mega-beasts (eg. Intel & IBM blade servers).


Cheers,

Brendan