Page 1 of 2

x2APIC/NUMA Emulator

Posted: Mon Feb 16, 2009 6:27 pm
by JohnnyTheDon
Does anyone know of a (free) emulator that will simulate a NUMA system? Possibly with an x2APIC?

Re: x2APIC/NUMA Emulator

Posted: Mon Feb 16, 2009 11:17 pm
by bewing
I'm working on it. :D (A complete rewrite of bochs with x2APIC support, that is.)
A basic version of it should be ready for alpha testing in a couple weeks. But adding the NUMA stuff to it may take another month after that.

Re: x2APIC/NUMA Emulator

Posted: Tue Feb 17, 2009 5:33 am
by stlw
bewing wrote:I'm working on it. :D (A complete rewrite of bochs with x2APIC support, that is.)
A basic version of it should be ready for alpha testing in a couple weeks. But adding the NUMA stuff to it may take another month after that.
Adding x2apic to Bochs should be a couple of days.
But anybody interested in it ?

Stanislav

Re: x2APIC/NUMA Emulator

Posted: Tue Feb 17, 2009 5:44 am
by kmtdk
well
yea
it will be funny ( and since many are trying to implent it now. ...)

[ how are you doing with bochs gui ???(know this should be Privat), and ect if i can have a copy (i got lost in my own compilingen, and stopped)] :oops:

KMT dk

Re: x2APIC/NUMA Emulator

Posted: Tue Feb 17, 2009 8:22 am
by stlw
kmtdk wrote:well
yea
it will be funny ( and since many are trying to implent it now. ...)

[ how are you doing with bochs gui ???(know this should be Privat), and ect if i can have a copy (i got lost in my own compilingen, and stopped)] :oops:

KMT dk
Bochs GUI is integrated to Bochs CVS and fairly tested. Download latest CVS snapshot and run.

Stanislav

Re: x2APIC/NUMA Emulator

Posted: Tue Feb 17, 2009 11:47 am
by JohnnyTheDon
bewing wrote:I'm working on it. :D (A complete rewrite of bochs with x2APIC support, that is.)
A basic version of it should be ready for alpha testing in a couple weeks. But adding the NUMA stuff to it may take another month after that.
Awesome. I'd be happy to alpha test.

Re: x2APIC/NUMA Emulator

Posted: Wed Feb 18, 2009 3:57 pm
by 01000101
likewise. I'd definitely love to check out x2 APIC functionality as I've already written some code for it in the past.

Re: x2APIC/NUMA Emulator

Posted: Thu Feb 19, 2009 12:04 am
by bewing
Good, because I may well need to tap your expertise (and Brendan's) -- or get example code from you to do my initial debugging. I'm concentrating first on other devices (such as the hard disk) that I know fairly well, but my knowledge of x2APIC is mighty iffy. I'm intending to max out at modeling 65,536 CPUs (multithreaded, one CPU per thread), so an x2APIC model is necessary.

Re: x2APIC/NUMA Emulator

Posted: Thu Feb 19, 2009 7:08 am
by stlw
bewing wrote:Good, because I may well need to tap your expertise (and Brendan's) -- or get example code from you to do my initial debugging. I'm concentrating first on other devices (such as the hard disk) that I know fairly well, but my knowledge of x2APIC is mighty iffy. I'm intending to max out at modeling 65,536 CPUs (multithreaded, one CPU per thread), so an x2APIC model is necessary.
65535 CPUs is fun but the same rewrite is not needed to wait until you (or me) code x2apic.
The Xapic interface in Bochs today allows up to 256 CPUs - anyway nobody could run all of them in the same machine without multithreading.
The multithreading is a really complicated in this, not the x2apic or other apic changes - and it could be tested with 256 cpus (or even with 2 cpus) as well.

Stanislav

Re: x2APIC/NUMA Emulator

Posted: Thu Feb 19, 2009 10:48 am
by Brendan
Hi,
stlw wrote:The Xapic interface in Bochs today allows up to 256 CPUs - anyway nobody could run all of them in the same machine without multithreading.
In the past I've run large numbers of CPUs in Bochs (> 64 if I remember right) without any performance problems, and you've done a lot of work to improve performance since. The current Bochs code should handle 255 CPUs, as long as the user realizes that each CPU will be 255 times slower than it would be if Bochs only emulated one CPU.

Note: this is quite acceptable for someone who's testing how well their OS scales (lock contention, etc). Even if you need to run the emulator for 8 hours instead of 8 minutes, it's still far more practical than buying a real computer with a few hundred CPUs... :D


Cheers,

Brendan

Re: x2APIC/NUMA Emulator

Posted: Thu Feb 19, 2009 11:20 am
by stlw
Brendan wrote:Hi,

In the past I've run large numbers of CPUs in Bochs (> 64 if I remember right) without any performance problems, and you've done a lot of work to improve performance since. The current Bochs code should handle 255 CPUs, as long as the user realizes that each CPU will be 255 times slower than it would be if Bochs only emulated one CPU.

Note: this is quite acceptable for someone who's testing how well their OS scales (lock contention, etc). Even if you need to run the emulator for 8 hours instead of 8 minutes, it's still far more practical than buying a real computer with a few hundred CPUs... :D

Brendan
I agree, the limitation of 8 CPUs in Bochs is quite artificial. Nothing in Bochs itself limits you for this particular value.
The limitation was added aftre Bochs BIOS started to become MP BIOS + ACPI BIOS.
The BIOS needed to know some kind of limitation for amount of entries in MP and ACPI tables, and the limit of 8 CPUs was choosen.
Choosen I could tell randomly.
I suggest you to talk with Bochs BIOS experts to see how many CPUs actually Bochs BIOS could support (without overlapping tables at least).
I think there is no real problem to increase the number.

Anyway, even aftre complete Bochs code rewrite and parallelization of all CPUs it will not help you to increase their amount.
The bottleneck still in the BIOS.

Stanislav

Re: x2APIC/NUMA Emulator

Posted: Thu Feb 19, 2009 12:09 pm
by JohnnyTheDon
stlw wrote: Anyway, even aftre complete Bochs code rewrite and parallelization of all CPUs it will not help you to increase their amount.
The bottleneck still in the BIOS.
Maybe a bochs bios patch could fix this? It just seems like you would need to change where ACPI and MP tables are allocated.

Re: x2APIC/NUMA Emulator

Posted: Fri Feb 20, 2009 7:16 am
by Brendan
Hi,
stlw wrote:
Brendan wrote:In the past I've run large numbers of CPUs in Bochs (> 64 if I remember right) without any performance problems, and you've done a lot of work to improve performance since. The current Bochs code should handle 255 CPUs, as long as the user realizes that each CPU will be 255 times slower than it would be if Bochs only emulated one CPU.

Note: this is quite acceptable for someone who's testing how well their OS scales (lock contention, etc). Even if you need to run the emulator for 8 hours instead of 8 minutes, it's still far more practical than buying a real computer with a few hundred CPUs... :D
I agree, the limitation of 8 CPUs in Bochs is quite artificial. Nothing in Bochs itself limits you for this particular value.
The limitation was added aftre Bochs BIOS started to become MP BIOS + ACPI BIOS.
The BIOS needed to know some kind of limitation for amount of entries in MP and ACPI tables, and the limit of 8 CPUs was choosen.
Choosen I could tell randomly.
From memory (and possibly wrong, but)...

Once upon a time the Bochs BIOS used conditional code to build either a single-CPU BIOS, a 2-way BIOS, a 4-way BIOS or an 8-way BIOS. To get it to generate a BIOS suitable for any other number of CPUs you had to hack the BIOS code because the MP specification tables were statically defined. There was no ACPI at all in this old version of the BIOS. I assume this situation originated from the original addition of SMP support, and Bochs was limited to 8 CPUs at this time.

This is how the BIOS looked when I got sick of it and decided to write my own BIOS that generated MP specification tables and ACPI tables dynamically. My BIOS (and my hacks) worked fine for me, and I stopped working on my BIOS because it did everything I needed for testing my OS at the time (I've always been an OS developer, only interested in Bochs as a means to an end).

Eventually the Qemu people got sick of the old BIOS too, and they added code to dynamically generate the MP specification tables and ACPI tables to the old BIOS. Qemu does support up to 255 CPUs (at least it did when I tried it last, which is a relatively long time ago because I can't get Qemu to compile anymore), and because Bochs is using Qemu's BIOS code the BIOS should mostly be fine with 255 CPUs.

From this I assumed that (at least most of) the problem is Bochs and not the BIOS.

Today I downloaded the latest Bochs code from CVS. I changed the following lines:

Code: Select all

config.cc @ line 428
  #define BX_CPU_PROCESSORS_LIMIT 255
  #define BX_CPU_CORES_LIMIT 8
  #define BX_CPU_HT_THREADS_LIMIT 4

Code: Select all

config.h.in @ line 57
  #define BX_MAX_SMP_THREADS_SUPPORTED 255

Code: Select all

bochs.g @ line 359
  #define MAX_LOGFNS 1024
After these changes Bochs seems to work fine on my computer with up to 48 CPUs. With more than this Bochs fails, sometimes with a dialog box that says "Device: [BIOS ]", "Message: Keyboard error: 21", but sometimes it just locks up.
stlw wrote:I suggest you to talk with Bochs BIOS experts to see how many CPUs actually Bochs BIOS could support (without overlapping tables at least).
I think there is no real problem to increase the number.
For up to 255 CPUs it costs 20 bytes per CPU, or about 5 KiB in the EBDA (or wherever the MP specification tables are placed), which shouldn't be a problem at all. If there's more than 255 CPUs, then the MP specification tables only support 8-bit APIC IDs and the best thing to do would be to skip the MP specification tables completely (they're mostly obsolete now anyway).

The ACPI tables are typically placed at the end of memory below 4 GiB. This means if the user puts "megs: 4" in "bochsrc.txt" then the ACPI tables should be able to use up to 3 MiB of RAM. For 8-bit APIC IDs 255 CPUs would cost less than 2 KiB of space. For 32-bit APIC IDs (for x2APIC) each CPU costs 16 bytes, so with "megs: 4" you could probably handle around 190000 CPUs before you run out of space for "Processor x2APIC Structures".

However, for x2APIC and ACPI things get more complex. From the "IntelĀ® 64 Architecture x2APIC Specification (June 2008)":
Intel wrote:A.2.1 x2APIC Structure

The Processor X2APIC structure (type 9) is very similar to the processor local APIC structure (type 0). When using the X2APIC interrupt model, logical processors with APIC ID values of 255 and greater in the system are required to have a Processor X2APIC record and an ACPI Device object. OSPM does not expect the information provided in this table to be updated if the processor information changes during the lifespan of an OS boot. While in the sleeping state, logical processors are not allowed to be added, removed, nor can their X2APIC ID or x2APIC Flags change. When a logical processor is not present, the Processor X2APIC information is either not reported or flagged as disabled.
Of course now we're back to the original problem - statically defined structures being used to describe extremely flexible/configurable hardware. Basically the ACPI AML code (currently in the file "acpi-dsdt.hex") needs to be generated dynamically during boot, which is exactly what I told Bochs developers several years ago when I was writing my BIOS.


Cheers,

Brendan

Re: x2APIC/NUMA Emulator

Posted: Fri Feb 20, 2009 2:02 pm
by JohnnyTheDon
ugh....

Will you be rewriting the bochs bios as well bewing?

Re: x2APIC/NUMA Emulator

Posted: Fri Feb 20, 2009 4:33 pm
by stlw
I enabled Bochs CVS to be able to configure up to 255 CPUs using Brendan's changes.
So at least one limitation less :)

About X2APIC ... If you watch Bochs CVS commits you could guess I tried to looked on it recently.
I could tell I mostly done.
Only a couple of questions left to make it final:

- Could somebody explain me a CPUID leaf 0xB output ? Especially EAX and EBX registers.
- What about I/O APIC ?
It has some APIC ID as well and knows to send IPIs to one the APIC bus.
But I never seen any spec explaining if something has to change in the I/O APIC when local apics are in x2apic mode.
So it will keep sending the IPIs as usual and have the same 8-bit APIC ID ...
Have no idea how all it could work.

Some notes:
- I hope nobody gonna to configure part of the APICs in the system in X2APIC mode and part in legacy mode - undefined behavior guaranteed :)

I'd like if somebody could send me a small disk image with even UP system wih CPU in X2APIC mode.
Or I could post a patch for testing.
But please not like with VMX - I expected that 2 weeks of playing with a patch is more than enough to publish some input/comments/results.

Stanislav