Multi processor boards
Multi processor boards
There are some new, and not too expensive designs of motherboards that have more than one processor. Most of these seem to target the server market, but I think they can be useful for demanding tasks. They certainly should be interesting for OS developers.
On the boards I've seen, memory and IO (PCI cards) are connected to one of the processors, and so this should mean that physical memory a process / thread uses is connected to the processor the thread is running on. It should mean that threads should only (or at least primarily) be moved between cores in the same processor.
I don't quite understand how things work for PCI devices. If I enumerate PCI devices on one processor would I see those on the other processors too? Or are those truly local? As for memory. I suppose it should be mapped to unique locations and that one processor can access the memory of the others by some inter-chip communication mechanism, but that this is slower than accessing the memory that is directly connected to the processor?
On the boards I've seen, memory and IO (PCI cards) are connected to one of the processors, and so this should mean that physical memory a process / thread uses is connected to the processor the thread is running on. It should mean that threads should only (or at least primarily) be moved between cores in the same processor.
I don't quite understand how things work for PCI devices. If I enumerate PCI devices on one processor would I see those on the other processors too? Or are those truly local? As for memory. I suppose it should be mapped to unique locations and that one processor can access the memory of the others by some inter-chip communication mechanism, but that this is slower than accessing the memory that is directly connected to the processor?
-
- Member
- Posts: 5578
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Multi processor boards
Multi-CPU boards have existed longer than multi-core CPUs, and from a software perspective there's no difference between the two (except that there's usually better inter-core latency when all of the cores are inside a single CPU).
You'll see all of them no matter which CPU you use to enumerate them.rdos wrote:If I enumerate PCI devices on one processor would I see those on the other processors too? Or are those truly local?
Accessing memory attached to the local memory controller will be faster than accessing memory on a different CPU's memory controller. The same applies to PCI devices as well.rdos wrote:As for memory. I suppose it should be mapped to unique locations and that one processor can access the memory of the others by some inter-chip communication mechanism, but that this is slower than accessing the memory that is directly connected to the processor?
Re: Multi processor boards
Maybe for advanced servers, but not for more mainstream designs. Although you still need to go to the server market to buy multiprocessor motherboards, a few more mainstream retailers also have them.Octocontrabass wrote:Multi-CPU boards have existed longer than multi-core CPUs, and from a software perspective there's no difference between the two (except that there's usually better inter-core latency when all of the cores are inside a single CPU).
That's a bit bad. How do I know which processor has them connected then? The ACPI tables? It would have been far easier if they only responded to PCI enumeration when they were connected to the processor doing the enumeration.Octocontrabass wrote:You'll see all of them no matter which CPU you use to enumerate them.rdos wrote:If I enumerate PCI devices on one processor would I see those on the other processors too? Or are those truly local?
Re: Multi processor boards
I'm confused.
Surely PCI devices connect to the PCI bus, as do processors. So both processors can communicate with a PCI device.
Surely PCI devices connect to the PCI bus, as do processors. So both processors can communicate with a PCI device.
-
- Member
- Posts: 5578
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Multi processor boards
The ACPI tables are supposed to tell you.rdos wrote:How do I know which processor has them connected then? The ACPI tables?
This would break backwards compatibility with existing operating systems. Multi-CPU designs have been around since the 486, back when the northbridge PCI controller was shared equally by all CPUs.rdos wrote:It would have been far easier if they only responded to PCI enumeration when they were connected to the processor doing the enumeration.
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Multi processor boards
As an aside out of curiosity, does anyone know of any ISA or EISA multiprocessors back in the day?Octocontrabass wrote:The ACPI tables are supposed to tell you.rdos wrote:How do I know which processor has them connected then? The ACPI tables?
This would break backwards compatibility with existing operating systems. Multi-CPU designs have been around since the 486, back when the northbridge PCI controller was shared equally by all CPUs.rdos wrote:It would have been far easier if they only responded to PCI enumeration when they were connected to the processor doing the enumeration.
Multi-processor systems were quite common in the workstation world starting around 1988. These were mostly M68020/30/40 based systems, such as the Sony NeWS or some of the early SGI Iris systems, or custom designs such as the BeBox HOBBIT chips (and maybe some of the HP PA-RISC systems, I'm not sure), and earlier there were a number of custom multi-processor systems built around 8-bit and 16-bit CPUs. And of course their were the dedicated multiprocessor systems such as the Connection Machine or the InMOS Transputer cards.
However, I seem to recall that there were at least a few which used 286 or 386 CPUs (I never heard of any using 8086s or 80186s, though I vaguely recall that someone was going to do one which used NEC V30s, which were '186 clones), but I don't know of any those which followed the PC design, which is why I am asking.
I expect that someone tried to make one, but I have no idea when or if they ever brought them to market. They certainly didn't have any real impact, and I'd say multiprocessor x86 systems in general probably didn't really go very far until Intel started adding multiprocessor support to the chips themselves (around the time the Pentium Pro came out, I think), by which time ISA and EISA were dead letters.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
-
- Member
- Posts: 510
- Joined: Wed Mar 09, 2011 3:55 am
Re: Multi processor boards
By "mainstream", you seem to mean "consumer-oriented". But there was a time before consumer oriented designs were mainstream. There was a time, in fact, before consumer-oriented designs.rdos wrote:Maybe for advanced servers, but not for more mainstream designs. Although you still need to go to the server market to buy multiprocessor motherboards, a few more mainstream retailers also have them.Octocontrabass wrote:Multi-CPU boards have existed longer than multi-core CPUs, and from a software perspective there's no difference between the two (except that there's usually better inter-core latency when all of the cores are inside a single CPU).
Multi-CPU systems have been a thing since servers *were* mainstream designs. Actually, since *before* servers were mainstream designs, since you need to have networks before you can have servers.
But here's the thing: as time goes on, boards with more than one processor are becoming farther and farther away from mainstream, because with time, mainstream *has* come to mean consumer, and the processing power that a single consumer needs has plateaued while the number of cores that will fit on a chip has gone *up*. So to actually need that second processor slot, you need to be higher and higher above "mainstream".
When Multi-CPU systems first showed up, a single CPU didn't even fit on a board. Then came the point where a CPU could finally be squeezed onto a board, and eventually the point where one could be squeezed onto a single chip. Sometime after that, boards with multiple processor slots started showing up, and then, finally, we got to the point where multi-core processors arrived.
-
- Member
- Posts: 5578
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Multi processor boards
There were some multiprocessor 386 and 486 systems that predate Intel's multiprocessor specification. They could boot and run ordinary PC software, which would use only one CPU, but required special drivers and a capable OS to use more than one. They typically had (E)ISA or MCA slots.Schol-R-LEA wrote:As an aside out of curiosity, does anyone know of any ISA or EISA multiprocessors back in the day?
[...]
However, I seem to recall that there were at least a few which used 286 or 386 CPUs (I never heard of any using 8086s or 80186s, though I vaguely recall that someone was going to do one which used NEC V30s, which were '186 clones), but I don't know of any those which followed the PC design, which is why I am asking.
They're interesting from a historical perspective, but the incompatible designs make them irrelevant to an OS developer looking at modern hardware.
Re: Multi processor boards
For multiprocessor boards to be really useful they must offload much of the traffic between cores and peripherials, otherwise you can just as well buy a multicore processor with more cores instead. I suppose if the operating system does a good job with making sure that most of the memory references are going to local memory, and if it handles PCI devices from the processor they are connected to, and also map their memory mapped IO to local memory, then the inter-processor communication will not become a bottleneck.
-
- Member
- Posts: 5578
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Multi processor boards
I forgot to reply to this post earlier. PCI is a shared bus, but PCIe is a direct point-to-point connection. Each CPU will have a handful of PCIe lanes that can be connected directly to PCIe devices, and any communication from one CPU to a PCIe device connected to a different CPU will have to go through the inter-CPU connection (whatever it may be).iansjack wrote:Surely PCI devices connect to the PCI bus, as do processors. So both processors can communicate with a PCI device.
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Multi processor boards
Interestingly, one of the first - if not the first - multiprocessor systems specifically designed for SIMD processing (that is, applying several processors to a single task, as opposed to 'merely' having more overall throughput for separate tasks) was ILLIAC IV, a gargantuan collection of 64, then later 128, multi-board minicomputers (the projected design called for four 'quadrants' of 64 CPUs each, though cost overruns led to it being cancelled before the third and fourth quadrants were completed). While it ran from 1972 to 1980, this was at the culmination of a long design phase, with the project originally put forward in 1952.linguofreak wrote:When Multi-CPU systems first showed up, a single CPU didn't even fit on a board. Then came the point where a CPU could finally be squeezed onto a board, and eventually the point where one could be squeezed onto a single chip. Sometime after that, boards with multiple processor slots started showing up, and then, finally, we got to the point where multi-core processors arrived.
Note that, according to the Wicked-pedo article, vector processing systems were also being developed at the same time, which was the line of research which led to the Cyber and eventually Cray supercomputers.
For a brief period, the half-completed system was the fastest computer in the world. While the project was considered a failure, it did contribute to the general understanding of parallel algorithms and operational techniques, some of which are currently applied by modern systems - though mainly in GPUs rather than multi-core CPUs. Much of the work would also go into the massively parallel designs of the 1980s and early 1990s such as the Connection Machine, as well as into the Transputer add-on co-processor systems.
While SIMD, vector, and systolic array were all competing design approaches in the 1970s, modern CPUs and GPUs apply all three to varying degrees for different workloads.
According to Ted Nelson, in the 1987 commemorative edition of Computer Lib/Dream Machines, because things such as parallel processing, distributed processing, hardware capability-addressing based security, and hardware language support (e.g., LispMs and the like) were seen by the academic community as the ways forward, they by and large ignored the rise of microprocessors until the hobby community was well established and the commercial and home use of 'microcomputers' began to grow, causing most of them to be blind-sided by the actual big development of the day. I personally can recall the dismissive attitude which many academics - and businesspeople coming out of the IBM sphere of influence, where mainframes were always seen as the 'real' computers and PCs were regarded as overgrown terminals even by those producing them - had towards personal computers even into the 1990s.
Last edited by Schol-R-LEA on Sun Dec 29, 2019 1:12 pm, edited 1 time in total.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Re: Multi processor boards
Massively parallel designs are interesting. If I have an ADC running at 1G samples per second, then this will cause a lot of load on the memory bus, and even a very high end processor core cannot use more than a few instructions per sample to keep up with it in realtime. However, if I distribute the load over 32 cores, then the local caches will keep the data, and I can do up to 32 times more processing per sample. I think this should scale well.
So, essentially, the problems that needed extremely expensive & complicated designs in the 90s now can be handled with a mainstream server machine. The only problem is that existing operating systems like Windows and Linux "kills" the whole concept by not allowing users to monopolize cores & and doing direct memory access.
So, essentially, the problems that needed extremely expensive & complicated designs in the 90s now can be handled with a mainstream server machine. The only problem is that existing operating systems like Windows and Linux "kills" the whole concept by not allowing users to monopolize cores & and doing direct memory access.
Re: Multi processor boards
That depends on what you're doing with the data, and how it's getting moved around. Looking at the datasheet of the second 1 GS/s ADC I saw on mouser, I see an 8-bit one that does 250 MSPS at 4 channel, or 1000 MSPS in single channel, output over 4 LVDS lines. At that point you're just moving 32 bits over at 250 MHz, which, while not an insignificant amount of bandwidth, is still well within what you can move around on even DDR3. It's about 4 lanes of PCIe 1.0, or about 1 lane of PCIe 3.0. If you bumped it up to a full 32 bit 1 GSPS, that's about the rate of a PCIe 1.0 class graphics card, or about a quarter of a PCIe 3.0 graphics card.
Of course, usually what's on the other end of those LVDS links is usually going to be an FPGA or microcontroller to preprocess things a bit, but I could see a situation where you had a particularly bare bones PCIe card that could spit it out. Whether it gets buffered to allow for access jitter or not is a quality of implementation issue.
As for the topic on hand, dual processor Pentium II machines were on the mid to high end of workstations around '98 or so. I had one, Linux supported it just fine, and I ended up writing an emulator that managed to depend on it for performance. AFAIK the CPUs just sat on one PCI bus, and the usual bus mastering negotiations let them share. Larger server boards with more than two exist, but while everything can talk to everything, they aren't all on equal footing, hence the push for NUMA support in linux. Since everything old is new again, that part has started to apply at the workstation level again, even in the single-slot form factor, thanks to core complexes and clusters with some of the newer 8+ core chips grouping things into partitions.
If you're interested in non-x86 massively multiprocessor stuff, you might look at either Chuck Moore's GreenArrays stuff, with massive arrays of forth CPUs in a grid, or the Intellasys SEAforth 40C18 chips that it evolved away from. IIRC they are fairly explicit about peripherals only being attached to a particular core, and needing you to manually route and relay the data through the grid.
Slightly less exotic would be the things Adapteva has been working on. They kickstarted a 64-core parallel grid processor. The configuration there is a bit more approachable than the forth ones.
Amusingly, dealing with that sort of grid mess got gamified by Zachtronics in TIS-100
Of course, usually what's on the other end of those LVDS links is usually going to be an FPGA or microcontroller to preprocess things a bit, but I could see a situation where you had a particularly bare bones PCIe card that could spit it out. Whether it gets buffered to allow for access jitter or not is a quality of implementation issue.
As for the topic on hand, dual processor Pentium II machines were on the mid to high end of workstations around '98 or so. I had one, Linux supported it just fine, and I ended up writing an emulator that managed to depend on it for performance. AFAIK the CPUs just sat on one PCI bus, and the usual bus mastering negotiations let them share. Larger server boards with more than two exist, but while everything can talk to everything, they aren't all on equal footing, hence the push for NUMA support in linux. Since everything old is new again, that part has started to apply at the workstation level again, even in the single-slot form factor, thanks to core complexes and clusters with some of the newer 8+ core chips grouping things into partitions.
If you're interested in non-x86 massively multiprocessor stuff, you might look at either Chuck Moore's GreenArrays stuff, with massive arrays of forth CPUs in a grid, or the Intellasys SEAforth 40C18 chips that it evolved away from. IIRC they are fairly explicit about peripherals only being attached to a particular core, and needing you to manually route and relay the data through the grid.
Slightly less exotic would be the things Adapteva has been working on. They kickstarted a 64-core parallel grid processor. The configuration there is a bit more approachable than the forth ones.
Amusingly, dealing with that sort of grid mess got gamified by Zachtronics in TIS-100
Re: Multi processor boards
I've looked a lot at the market for high-speed ADCs, but it's a bit strange. It's easy to find an affordable 1GS/s 16-bit ADC that has LVDS or JESD204B interface, but then I cannot find anything that connects to PCIe. Texas Instruments have several of those and also evaulation boards that run on PCs, but their solutions are based on slow PC connections with USB and so cannot operate in realtime.
Acqiris is the only one I've found that has an ADC on a PCIe board (https://acqiris.com/#tab1), but it costs ten times to much and they don't give out specifications for how it works so I can write my own driver.
Hitech appears to have some solutions with FMC boards. http://www.hitechglobal.com/FMCModules/16-bit_AD-DA.htm. They seem to connect the FMC-based ADC to a high-speed network card that has an extra FMC connection. Not sure if they have specifications either.
Acqiris is the only one I've found that has an ADC on a PCIe board (https://acqiris.com/#tab1), but it costs ten times to much and they don't give out specifications for how it works so I can write my own driver.
Hitech appears to have some solutions with FMC boards. http://www.hitechglobal.com/FMCModules/16-bit_AD-DA.htm. They seem to connect the FMC-based ADC to a high-speed network card that has an extra FMC connection. Not sure if they have specifications either.
Re: Multi processor boards
Yeah, at that rate, you're almost always going to be looking at a dedicated FPGA or other embedded setup to do some processing and reduce the actual data rate immediately, which usually means custom-ish boards. Best bet will be evaluation boards for the chips in question, though most of those are going to be aimed at connecting to an FPGA dev board, which is what the F in FMC is for.
DSP at that speed and scale isn't a great fit for your usual CPU if you want to do anything other than straight recording it to disk or something, whether there's an OS in the way or not.
AFAIK the use cases tend to fall into one of "Making an oscilloscope", "Software defined radio", "Developing new and interesting wireless communication protocols", or "Weird custom DSP". Two of those imply budgets way in excess of an average hobbyist setup. One of those is almost always going to be immediately downsampling to something more reasonable from a high carrier frequency. Not sure what else comes up, other than esoteric stuff like radio astronomy, but I'd suspect that falls mostly into a custom deal to record the signal to disk, then throwing millenia of CPU time at it after the fact. The cynical view would be that if you had the budget and knowhow to move that signal around without noise destroying it, you have the budget for a 4-5 digit pricetag on the eval hardware.
Those HiTech boards are all set up to slot into FPGA based dev boards of varying capabilities, which is what that list at the bottom is for. Judging by the chips they're throwing on there, none will be cheap. The PCIe on it is likely more for configuring and powering the thing than necessarily being the primary destination for the data, though if you're configuring the FPGA you can almost certainly make your own PCIe driver for it.
Terasic may have some cheaper options, but I don't think they'll have anything for 1GSPS, and the affordable ones will be standalone boards with the low cost fpga lines.
DSP at that speed and scale isn't a great fit for your usual CPU if you want to do anything other than straight recording it to disk or something, whether there's an OS in the way or not.
AFAIK the use cases tend to fall into one of "Making an oscilloscope", "Software defined radio", "Developing new and interesting wireless communication protocols", or "Weird custom DSP". Two of those imply budgets way in excess of an average hobbyist setup. One of those is almost always going to be immediately downsampling to something more reasonable from a high carrier frequency. Not sure what else comes up, other than esoteric stuff like radio astronomy, but I'd suspect that falls mostly into a custom deal to record the signal to disk, then throwing millenia of CPU time at it after the fact. The cynical view would be that if you had the budget and knowhow to move that signal around without noise destroying it, you have the budget for a 4-5 digit pricetag on the eval hardware.
Those HiTech boards are all set up to slot into FPGA based dev boards of varying capabilities, which is what that list at the bottom is for. Judging by the chips they're throwing on there, none will be cheap. The PCIe on it is likely more for configuring and powering the thing than necessarily being the primary destination for the data, though if you're configuring the FPGA you can almost certainly make your own PCIe driver for it.
Terasic may have some cheaper options, but I don't think they'll have anything for 1GSPS, and the affordable ones will be standalone boards with the low cost fpga lines.