Page 2 of 3
Re:Multi-CPU question
Posted: Wed Jul 13, 2005 10:49 am
by Colonel Kernel
Brendan wrote:
My OS uses 2 different types of locks, one which doesn't effect IRQs and the other that causes any received IRQs to be placed into a buffer and handled when the lock is freed. This still effects interrupt latency, but only when a CPU is using code that an IRQ handler needs to use (not for all locks).
Do you mean that your second type of lock is somehow better than just disabling interrupts, or that the fact that you have two kinds of locks allows you to avoid disabling interrupts much of the time...?
Re:Multi-CPU question
Posted: Wed Jul 13, 2005 11:36 am
by Brendan
Hi,
Colonel Kernel wrote:Brendan wrote:My OS uses 2 different types of locks, one which doesn't effect IRQs and the other that causes any received IRQs to be placed into a buffer and handled when the lock is freed. This still effects interrupt latency, but only when a CPU is using code that an IRQ handler needs to use (not for all locks).
Do you mean that your second type of lock is somehow better than just disabling interrupts, or that the fact that you have two kinds of locks allows you to avoid disabling interrupts much of the time...?
Both!
The second type of lock causes most IRQs to be queued, but still allows some IRQs (IRQ 8 and IRQ 0) to do their thing, which makes it a little better than disabling interrupts. The lock also messes with the "task priority register" so IRQs (that use the "send to lowest priority" mode) are normally sent to CPUs that aren't within a critical section.
As most of the kernel can use the first type of lock, it also means that interrupts are handled immediately most of the time. Because of the micro-kernel design, the only code that needs the second type of lock is the "send message" code (as it's the only code called by IRQ handlers) and the code that does the thread switches (as IRQs can't be handled when the CPUs state is inconsistant).
Combining both of these means that it's unlikely that an IRQ won't be handled immediately, and it gets more unlikely with more CPUs.
Cheers,
Brendan
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 1:57 am
by Kemp
Ok, one last question (I hope)... What is the actual difference between a 32 bit cpu and a 64 bit one? It seems such a fundamental question and yet there's no real discussion of it. I read in another post that the addressing doesn't do full 64 bit memory addresses, and the only other thing I can think of is 64 bit registers, and if that's the only change it doesn't seem like a big deal all of a sudden ???
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 3:04 am
by AR
64bit CPUs have 3 additional CPU Modes added: Longmode, 32bit compatibility longmode and 16bit compatibility longmode. In Longmode there are an additional 8 general registers and 8 SSE registers. All registers are 64bit (SSE is of course 128bit) with a 64bit address space (unless you're in 32/16 compatiblity mode where you will of course only have a 32/16 bit address space and registers). Longmode also brings about a nice standard in CPU features finally (after so long just bolting new features on) where you are guaranteed SYSCALL, MMX, SSE, SSE2 (there are probably some others, SSE3 as well IIRC) - it also removes segmentation. Essentially it leads to the same difference as 16bit protected mode from 32bit protected mode (more address space, able to manipulate larger values faster).
You won't be able to multiprocess in 64bit mode unless the whole system is in 64bit mode. It may actually be plausible to do so but you would need a double kernel image (One for the 64bit CPU, one for the 32bit CPU), the 32bit CPU will not be able to run 64bit software but the 64bit CPU can run 32bit software which may present some rather uneven distribution of labour, especially if most of the software for the OS is designed for 64bit, although it would nicely defeat the lack of V8086 in longmode.
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 3:11 am
by Kemp
Ok thanks, that helped a lot. My next step in making sure I can move the code to 64 bit easily when the time comes - Make sure I don't use any features that were dropped. IIRC, hardware task switching was dropped (in AMD64 systems at least), right?
Extra registers make Kemp a very happy person
Plus no segmentation is good, I don't use it anyway but at least there's no way I can be swayed to the dark side accidentaly anymore.
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 3:31 am
by AR
Hardware task switching is gone, the TSS still exists though for SS0 and ESP0, all switching has to be done manually with software task switching (which has always been faster anyway).
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 3:34 am
by Brendan
Hi,
Kemp wrote:Ok thanks, that helped a lot. My next step in making sure I can move the code to 64 bit easily when the time comes - Make sure I don't use any features that were dropped. IIRC, hardware task switching was dropped (in AMD64 systems at least), right?
Extra registers make Kemp a very happy person
Plus no segmentation is good, I don't use it anyway but at least there's no way I can be swayed to the dark side accidentaly anymore.
Hardware task switching and virtual 8086 mode was dropped, along with most (but not all) segmentation - IIRC FS and GS still use segmentation in some form. Paging is very similar to 32 bit paging with PAE.
@AR: long mode was clean, until Intel added "LaGrande" and AMD added "Pacifica". Intel is also planning some digital rights management stuff, and I wouldn't be surprised if "SSE4" (with 256 bit registers) is announced in the next 12 months. Long mode won't be clean for long (the "bolting new features on" will continue).
[edit]I almost forgot: Intel's 64 bit chips are likely to have hyper-threading too - something you won't find in AMD chips. Multi-socket AMD computers are also NUMA, which you won't find in Intel computers for a few years yet.[/edit]
Cheers,
Brendan
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 3:50 am
by AR
Brendan wrote:@AR: long mode was clean, until Intel added "LaGrande" and AMD added "Pacifica". Intel is also planning some digital rights management stuff, and I wouldn't be surprised if "SSE4" (with 256 bit registers) is announced in the next 12 months. Long mode won't be clean for long (the "bolting new features on" will continue).
It always does but it's a guarantee that "you will always have at least XYZ", before Longmode was added, XYZ was about 32bit PMode with an FPU (486) this at least brings up to a decent list of things to make use of, without the "does it have X? If not then does it have Y? If not then just use Z".
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 4:23 am
by Brendan
Hi,
AR wrote:It always does but it's a guarantee that "you will always have at least XYZ", before Longmode was added, XYZ was about 32bit PMode with an FPU (486) this at least brings up to a decent list of things to make use of, without the "does it have X? If not then does it have Y? If not then just use Z".
Even then I'm not so sure, and I'd still rely on CPUID. For e.g. in a few years time someone might decide not to bother with FPU, MMX or virtual 8086 mode extensions and just leave the corresponding CPUID feature flag clear. If it helps cram more CPU cores onto a chip, then dropping everything that isn't used anymore would seem like a good option.
What I'd
really like to see is something like the cell chip - one main CPU core with all the bells & whistles (including hyper-threading), and about 8 reduced CPU cores (long mode integer and SSE only) on the same chip.
Cheers,
Brendan
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 6:26 am
by Solar
...which is a great thing in a game / multimedia machine. But look at the pathetic adnvantage HT or dual-CPU systems give you today even when your apps actually use the additional cores. Many desktop applications don't benefit at all, some only to very limited extends. Then again, for desktop uses we've hit the "well enough" mark in hardware performance long ago, much to the chagrin of the CPU manufacturers. People stormed the shops to get the first 386. Same with the 486. Less so with the Pentium. And with every generation after that - diminishing returns. If it weren't for ever-more-careless OS coding and ever-more-performance-eating games, the desktop CPU market would have stalled long ago.
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 7:14 am
by Kemp
Something just occurred to me. We use atomic instructions for the various types of lock to make sure code that isn't fully re-entrant can't be run twice at the same time, but what happens if both cpus in a dual/quad/whatever system hit that instruction at exactly the same time? Do they both succeed in getting the right to run the code? Also, if you have a LOCKed instruction and they both execute it at the same time do they both try to lock the bus and end up killing each other? My mind is currently filling with race conditions and worst-case scenarios :-\
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 9:52 am
by Brendan
Hi,
Solar wrote:...which is a great thing in a game / multimedia machine. But look at the pathetic adnvantage HT or dual-CPU systems give you today even when your apps actually use the additional cores. Many desktop applications don't benefit at all, some only to very limited extends. Then again, for desktop uses we've hit the "well enough" mark in hardware performance long ago, much to the chagrin of the CPU manufacturers. People stormed the shops to get the first 386. Same with the 486. Less so with the Pentium. And with every generation after that - diminishing returns. If it weren't for ever-more-careless OS coding and ever-more-performance-eating games, the desktop CPU market would have stalled long ago.
Most desktop software spends most of the time waiting for the user to press a key or click the mouse, and can be run adequately on a 100 MHz Pentium (if you ignore OS bloat).
Some software is IO bound, where no amount of CPU power will make any difference (but faster hard drives will, for e.g.).
For some software, using multi-threading (or designing for multi-CPU) isn't very practical because there's too much communication needed between the threads (e.g. compiler).
Some software takes a huge amount of design work to get any benefit from multi-CPU. IMHO the "secret" for good multi-CPU performance is to keep all CPUs busy while minimizing communication - often it's a tricky balancing act. An example would be the multi-threaded CPU emulator proto-type I've been playing with - regardless of what I do the best I can get out of it is around 15 times slower than QEMU (it's doing dynamic translation now). This means that unless the host computer has more than 15 CPUs (and it's emulating a computer with 15 or more CPUs) single-threaded would perform better. I'm still working on this one (trying to think of a better way of assigning work to threads).
For the remainder, multi-CPU (and good design) can make a lot of difference - 3D games, Apache, pmake, etc.
IMHO most 80x86 software is designed to be as fast as possible on single-CPU, and then modified to use multi-CPU after - it (currently) makes the most sense, but this will change as multi-CPU becomes more common.
Cheers,
Brendan
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 10:05 am
by Brendan
Hi,
Kemp wrote:Something just occurred to me. We use atomic instructions for the various types of lock to make sure code that isn't fully re-entrant can't be run twice at the same time, but what happens if both cpus in a dual/quad/whatever system hit that instruction at exactly the same time? Do they both succeed in getting the right to run the code? Also, if you have a LOCKed instruction and they both execute it at the same time do they both try to lock the bus and end up killing each other? My mind is currently filling with race conditions and worst-case scenarios :-\
When you use the "lock" prefix (or the XCHG instruction) the CPU asserts a signal that locks the bus and prevents anything else (other CPUs and other devices) from using the bus until the CPU removes the signal. I'm not sure what happens if more than one CPU tries to lock the bus at the exact same time (I assume there'd be some sort of low-level arbitration), but it's guaranteed to work correctly.
Given that the atomic instructions are guaranteed to work correctly by CPU manufacturers, re-entrancy locking code can also be guaranteed to work correctly (unless it's buggy).
Cheers,
Brendan
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 1:36 pm
by mystran
About several processors using LOCK at the same time:
One of them goes first while the others wait. No dadlocks occur, as signal-level arbitration is used to rotate the access (in one form or other).
This arbitration actually has to happen every time a processor talks to a bus, because you (normally) can't allow two transfers using the same bus at the same time anyway. Normally you release the bus after you've done your load or store, at which point the next pending processor gets to do it's operation.
What LOCK-prefix does, is simply to keep the bus reserved until the instruction has completed, instead of letting other processors use it between the load and the store.
In reality the whole process can be a lot more complicated if you allow queuing of load/store request to memory and such, but the principle is the same anyway.
There are various ways to do arbitration on the signal level. One simple way is to have one line per processor, and a separate "busy" line. When somebody wants to send something, it asserts it's own line. Then it checks if the line is busy, and if not, has a processor with higher priority has asserted it's line, and if it's the highest priority processor trying to access a free bus, it knows that next clock it can assert the busy line, and assume that nobody else will mess with the line while the busy line is asserted.
This will work since we have a synchronously clocked bus (so we have semi-discrete time-unit's called "clocks") and we can deal with several lines in parallel during the same clock. To get rid of "priorities", you can rotate them among the processors with some kind of scheme. For anything that needs only a single clock of bus-time, you don't really need the busy-line at all... which is actually kinda like the LOCK line that LOCK-prefix is supposed to assert.
Re:Multi-CPU & 32/64 bits
Posted: Tue Jul 19, 2005 2:29 pm
by Kemp
Ok, that sounds sane. Thanks for all this info guys, it's invaluable.