Re:clear-cut 64 bits architecture ? (splitted)
Posted: Tue Dec 06, 2005 6:29 pm
Hi,
I'm hoping to get basic ACPI (including SRAT) into my Bochs BIOS by the end of this month/year. Then we'd at least be able to emulate a NUMA system with eight 64 bit CPUs. My own patches add hyper-threading to Bochs (and the new BIOS handles that too), so it'll also emulate the dual-NUMA space, dual-cpu HT machine. Dual core isn't supported yet though (although it wouldn't be that hard to add - tweaking "cpuid.cc" and adding the option to the configure script is all that it'd take).
I also intend to do some research on the number of CPUs Bochs can emulate - I want to push it up to 64 or 128 CPUs.
The "exclusive" requirement isn't quite the same as CPU affinity. For example, if you've got several threads already running and decide that a CPU should become "exclusive", then you may need to migrate existing threads to other CPUs (and may not be able to if too many threads are using "this CPU only" affinity). Also, when/if the CPU stops being exclusive you'd want to restore the original CPU affinities and/or migrate threads back.
"Following reset, the counter is incremented every processor clock cycle, even when the processor is halted by the #HLT instruction or external STPCLK# pin. However, the assertion of the external DPSLP# pin may cause the time-stamp counter to stop and Intel SpeedStep technology transitions may cause the frequency at which the time-stamp counter increments to change in accordance with the processor's internal clock frequency."
This means reduced CPU performance caused by STPCLK# doesn't effect the TSC while reduced CPU performance caused by SpeedStep does. For example, if an operation costs 1000 cycles on a CPU when STPCLK# is used for a 50% duty cycle, then the same operation would probably only cost 500 cycles if STPCLK# isn't being used.
I'd prefer it if STPCLK# stopped the TSC. That way you could always measure how many cycles were used and easily estimate how much CPU time the thread/software would need on other CPUs.
The other problem with the TSC is that it's shared by logical CPUs (for hyper-threading). If RDTSC says a thread used 1000 cycles, then it may have used 1000 cycles (if the other logical CPUs in the core is halted) or it could have used about 500 cycles (if both logical CPUs are sharing). Accurately measuring relative time used in all cases is going to be a lot of work..
Cheers,
Brendan
IMHO a good free emulator is a better option - otherwise you'd probably need to spent half a million to cover every combination.Candy wrote:Although a machine with 16 cores would be pretty impossible to afford at this time. I'm hoping I can test at my work before I leave (got a dual xeon myself, sadly the model without em64t, friends work box is a dual athlon, athlon MP without em64t again).
I'm hoping to get basic ACPI (including SRAT) into my Bochs BIOS by the end of this month/year. Then we'd at least be able to emulate a NUMA system with eight 64 bit CPUs. My own patches add hyper-threading to Bochs (and the new BIOS handles that too), so it'll also emulate the dual-NUMA space, dual-cpu HT machine. Dual core isn't supported yet though (although it wouldn't be that hard to add - tweaking "cpuid.cc" and adding the option to the configure script is all that it'd take).
I also intend to do some research on the number of CPUs Bochs can emulate - I want to push it up to 64 or 128 CPUs.
I still want to support different CPUs in the same computer - the idea is that anyone who takes advantage of this feature will be "locked in" (they'll be unable to change back to a less capable OS without hardware changes). It also makes the OS more future-proof - if a manufacturer wants to produce a CPU with one full core and several reduced cores then the OS should handle it without change.Candy wrote:I'm not counting this as especially important, since I see the future holding mainly identical processors (8-core cpu's will have mostly identical cores I hope). The access to the hardware should occur through asynchronous device drivers which work using memory anyway. It should be possible to hand the read function a page on which it can read, so it's in your local space. (of course, this is in atlantisos already).
I'm not convinced virtualization will remain "server-only", and I'm hoping it won't. One problem I'll have is that my OS won't run Windows software, but if my OS can run Windows itself then the problem is partially solved - it's gives users a better upgrade path.Candy wrote:3. Exclusive is a server-only thing. I can't help with that since I'm not going for servers. On the idea itself, that's a subrequest for the above. You must be able to indicate which processors a given process can run on based on any metric, so even user-specified checkboxes. So, see 2.
The "exclusive" requirement isn't quite the same as CPU affinity. For example, if you've got several threads already running and decide that a CPU should become "exclusive", then you may need to migrate existing threads to other CPUs (and may not be able to if too many threads are using "this CPU only" affinity). Also, when/if the CPU stops being exclusive you'd want to restore the original CPU affinities and/or migrate threads back.
From Intel's manual:Candy wrote:4. Variable speed is hard to do, although I have at one time intended to make the cpu allocations roughly in cycles, which would account for this entirely (except for the case of #STPCLK, but I'm not sure the tsc will keep counting then... I hope not).
5. Well.. see 4.
"Following reset, the counter is incremented every processor clock cycle, even when the processor is halted by the #HLT instruction or external STPCLK# pin. However, the assertion of the external DPSLP# pin may cause the time-stamp counter to stop and Intel SpeedStep technology transitions may cause the frequency at which the time-stamp counter increments to change in accordance with the processor's internal clock frequency."
This means reduced CPU performance caused by STPCLK# doesn't effect the TSC while reduced CPU performance caused by SpeedStep does. For example, if an operation costs 1000 cycles on a CPU when STPCLK# is used for a 50% duty cycle, then the same operation would probably only cost 500 cycles if STPCLK# isn't being used.
I'd prefer it if STPCLK# stopped the TSC. That way you could always measure how many cycles were used and easily estimate how much CPU time the thread/software would need on other CPUs.
The other problem with the TSC is that it's shared by logical CPUs (for hyper-threading). If RDTSC says a thread used 1000 cycles, then it may have used 1000 cycles (if the other logical CPUs in the core is halted) or it could have used about 500 cycles (if both logical CPUs are sharing). Accurately measuring relative time used in all cases is going to be a lot of work..
Cheers,
Brendan