OSDev.org

Posted: **Tue Oct 25, 2011 1:55 pm**

Hi. i have some questions about io and drivers.

The numa wiki page mentions io hubs. is Pci an example of this? Is pci a chip within a numa domain?

Is pcie basically this io chip built into the cpu ?

And is this a good definition of a driver.

A driver connects two architectures together. e.g. AC 97 to ALSA.

thankyou

Posted: **Tue Oct 25, 2011 4:46 pm**

cxzuk wrote:Is pcie basically this io chip built into the cpu ?

PCI Express is exactly the same as PCI from a software point of view (it is designed to be backwards compatible). Only the hardware is different. How they managed to go from a bus-topology to a star-topology and still do that is beyond me, but they did.

cxzuk wrote:And is this a good definition of a driver.

A driver connects two architectures together. e.g. AC 97 to ALSA.

A driver, in it's simplest form is a piece of software that interfaces between the hardware (an AC '97 device) and application software (VLC, iTunes, etc). It's a bit more complicated than that in the real world, due to multiple layers being used. Usually for sound it is something like: Sound card driver --> Sound Mixer (ALSA for example) --> Application software. Video is similar, where applications call the GUI and then the GUI calls a card-specific driver. Whether you call only the bottom layer a driver or the bottom and middle layers drivers is up to you.

Posted: **Tue Oct 25, 2011 7:31 pm**

I would define a driver as a piece of software that gives a device a consistent interface, which can then be used by other software. It's simply abstraction: drivers make a SCSI SSD look and act like an ATA hard disk look and act like a USB flash drive.

Posted: **Wed Oct 26, 2011 6:03 am**

my intention is to capture as much information about design as i can in my code. the reason i ask about pci is from a hardware point of view.

i think the chip im after is the northbridge chip?

am i right in saying the northbridge chip is now inside the cpu package with regard to pcie?

im interested as i want to identify the roles all the chips play and how they interact.

Posted: **Wed Oct 26, 2011 6:15 am**

as for drivers.. what would the sound driver and the ALSA mixer do?

would the sound driver just be a binding from the hardware to something useful to a program. if so that would mean that ALSA mixers would have to understand every.hardware architecture?

Posted: **Wed Oct 26, 2011 6:44 am**

A typical sound card allows a certain number of streaming channels and a certain number of looping channels, and the driver will show them to the application.

An operating system however expects that multiple applications can be playing multiple bits of audio at the same time, something which can't be done directly over a lame soundcard with only a single stereo output stream. The mixer is an service that provides mostly unlimited of sound channels on top of something that can only do a limited amount of it.

ALSA is an architecture. It does not just cover the drivers, but also provides a generic system call interface, emulations of other sound interfaces, libraries in userland to use those interfaces, and several applications that deal with installation and configuration.

Posted: **Wed Oct 26, 2011 1:05 pm**

Hi,

cxzuk wrote:The numa wiki page mentions io hubs. is Pci an example of this?

For an example, a NUMA system could look like this:

This is actually a diagram of a real Tyan Transport VX50 system (using Opterons). The "AMD 8111" and the "nVidia CK04" are both IO hubs, and both include PCI host controllers. The "groups of vertical bars" to the left and right are RAM chips.

For a system like this, the device drivers for hardware attached to the "AMD 8111" may run better when they're running on the bottom left CPU, and device drivers for hardware attached to the "nVidia CK04" may run better when they're running on the bottom right CPU. Traffic from one of the top CPUs has to go over 3 different hyper-transport links to get to the bottom (where the IO hubs are), which would increase latency and also consume bandwidth on all those links (which can effect other traffic); so the top CPUs would be the worst CPUs to run device drivers on.

Cheers,

Brendan

Posted: **Wed Oct 26, 2011 2:32 pm**

Combuster wrote:A typical sound card allows a certain number of streaming channels and a certain number of looping channels, and the driver will show them to the application.

An operating system however expects that multiple applications can be playing multiple bits of audio at the same time, something which can't be done directly over a lame soundcard with only a single stereo output stream. The mixer is an service that provides mostly unlimited of sound channels on top of something that can only do a limited amount of it.

ALSA is an architecture. It does not just cover the drivers, but also provides a generic system call interface, emulations of other sound interfaces, libraries in userland to use those interfaces, and several applications that deal with installation and configuration.

i know very little on this subject so your comment is good food for thought.

i must say, i was under the assumption that channels where what was taken

Posted: **Tue Nov 01, 2011 6:11 am**

Sorry.. I was typing on my phone and seems to have scrabbled my last message.

I think our differences in understanding from the replies so far is that i believe that a greater understanding of the collaborating objects is needed.

@Brendan:
Thankyou for the The Tyan Transport VX50 picture.

I think that the "CPU" actually hides many other objects beneath it. Most likely the jobs of the north bridge (a Memory Controller?), south bridge (HTT controller?), and most def Caches, and cores. Cores which are again built up from other objects such as an ALU's FPU's and MMU's.

As for the AMD 8111, Is it an IO Hub because it contains many Controllers?.. Im unsure on the PCI and LPC, While they are buses, They must have a controller surely?

As for device drivers on the Tyan; Would the top left CPU be informed(detected) of the presence of the AMD 8111? Would even the bottom left CPU be informed?

The same question goes for the top left CPU detecting other CPU's, Will it be informed of the connection to the other 3 CPUs or All or none of them? - I think understanding the internals of them and what controllers they contain is key to this but im unsure.

I currently think that I will need to specify some CPU stuff about ports and objects it contains. Then i would explore these ports (which i think is actually just a controller inside the CPU) on initialisation to see what Controllers i am connected to. Once I have them i initialise the controllers and then explore what the controller is connected to, etc.

Mike Brown

Posted: **Tue Nov 01, 2011 11:14 pm**

Hi,

cxzuk wrote:I think that the "CPU" actually hides many other objects beneath it. Most likely the jobs of the north bridge (a Memory Controller?), south bridge (HTT controller?), and most def Caches, and cores. Cores which are again built up from other objects such as an ALU's FPU's and MMU's.

It's probably easier to think of it as a hierarchical tree of "things", where all NUMA domains are children of the root (regardless of the relationships between NUMA domains) and everything else is an ancestor of one of the NUMA domains.

For example:

Code: Select all

Computer
  |
  |__ NUMA Domain #0
  |    |__ CPU #0
  |    |__ CPU #1
  |    |__ Memory bank #0
  |    |__ PCI Host Controller #0
  |         |__ SATA controller #0
  |         |    |__ Hard Disk #0
  |         |    |__ Hard Disk #1
  |         |__ SATA controller #1
  |         |    |__ CD_ROM #0
  |         |__ PCI to PCI bridge
  |              |__ USB controller #0
  |              |    |__ Keyboard
  |              |    |__ Mouse
  |              |__ Ethernet card #0
  |              |__ PCI to LPC Bridge
  |                   |__ PIC
  |                   |__ PIT
  |                   |__ Floppy disk controller
  |                        |__ Floppy disk #0
  |
  |__ NUMA Domain #1
  |    |__ CPU #2
  |    |__ CPU #3
  |    |__ Memory bank #1
  |    |__ PCI Host Controller #1
  |         |__ Video card #0
  |         |    |__ Monitor #0
  |         |__ Video card #1
  |         |    |__ Monitor #1
  |         |__ USB controller #1
  |         |    |__ Flash memory stick #0
  |         |    |__ Flash memory stick #1
  |         |__ USB controller #2
  |
  |__ NUMA Domain #2
  |    |__ CPU #4
  |    |__ CPU #5
  |    |__ Memory bank #2
  |
  |__ NUMA Domain #3
       |__ CPU #6
       |__ CPU #7
       |__ Memory bank #3

Notes:

NUMA domains may have none or more CPUs, none or more memory banks and none or more "IO hubs". For example, it's entirely possible for a NUMA domain to have memory banks and IO hubs with no CPUs; or for a NUMA domain to have CPUs and no memory or IO hubs; or any other combination.
This is only an example I made up. It includes devices connected to devices that are connected to devices that are connected to NUMA domains; because this type of tree is needed by an OS for things like power management later anyway. For example, it would be bad for an OS to put "USB controller #0" to sleep and then expect to be able to talk to the mouse.

cxzuk wrote:As for device drivers on the Tyan; Would the top left CPU be informed(detected) of the presence of the AMD 8111? Would even the bottom left CPU be informed?

I don't think "informed" is the right word. Real hardware is more like a system of routers.

For example, (for the tree above), CPU #1 might write to physical address 0x87654321; something in NUMA Domain #0 determines that the write should be forwarded to NUMU Domain #1. NUMA Domain #1 looks at the address and decides the write should be forwarded to "PCI Host Controller #1". That PCI Host Controller looks at the address and decides the write should be forwarded "PCI bus 1", and something on "PCI bus 1" (e.g. Video card #0) accepts that write. The only thing that knows what is at address 0x87654321 is the video card itself - everything else only knows where to forward it.

When the computer first starts, hardware/firmware is responsible for setting up the routing. For example, for AMD/hyper-transport there's a negotiation phase where each "agent" discovers if each of its hyper-transport links are connected to anything, then firmware does things like RAM detection, etc. and configures the routing in each NUMA domain. Firmware is also responsible for configuring PCI host controllers and PCI bridges so that they route requests within certain ranges to the correct PCI buses; and also responsible for initialising the BARs in PCI devices to tell them which accesses each device should accept.

cxzuk wrote:The same question goes for the top left CPU detecting other CPU's, Will it be informed of the connection to the other 3 CPUs or All or none of them? - I think understanding the internals of them and what controllers they contain is key to this but im unsure.

Ignoring caching (controlled by MTTRs, etc) each CPU knows nothing. It forwards all accesses (read, writes) to something in its NUMA domain that is responsible for routing.

In practice (for AMD/hyper-transport and Intel/QuickPath) you might have a single chip that contains one or more CPUs and a "memory controller" (which is the part that handles the routing), which is connected to one or more (hyper-transport or QuickPath) links and also connected to RAM slots. For example, a single chip might look like this:

Code: Select all

         --------
 CPU --- |        | --- link #0
 CPU --- | ROUTER |
 CPU --- |        | --- link #1
         --------
               |
               |
             RAM Slots

Each of these links might be connected to other "Router + CPUs" chips or connected to an IO hub (e.g. "AMD 8111"), like in the Tyan Transport VX50 diagram.

However; this "single chip containing router/memory controller and CPUs" is only how AMD and Intel have been doing it lately. There are 80x86 NUMA systems (that came before Hyper-transport/QuickPath) where the routing is done by the chipset alone (no "on chip routing"); and there are also larger 80x86 NUMA systems that use special routers in the chipset in addition to the "on chip" routers.

Also don't forget that even for "single chip containing router/memory controller and CPUs" nothing says there has to be RAM present in the RAM slots (you can have NUMA domains with no memory), and (at least for AMD) you can get special chips that don't have any CPUs in them that are mostly used to increase the number of RAM slots (you can have NUMA domains with RAM and no CPUs) .

For detection (on 80x86), you'd use the ACPI "SRAT" table to determine which NUMA domain/s CPUs and areas of RAM are in; and the ACPI "SLIT" table to determine the relationships between NUMA domains (represented as a table containing the relative cost of accessing "domain X" from "domain Y"). Unfortunately, there isn't a standardised way to determine which NUMA domains IO hubs and/or devices are in or where they're connected - if you go that far, then you need to resort to specialised code (e.g. different pieces of code for different chipsets and/or CPUs that extracts the information from whatever happens to be used as the "router" in each NUMA domain); or perhaps just let the user configure it instead of auto-detecting.

Cheers,

Brendan

Posted: **Wed Nov 02, 2011 12:30 pm**

thankyou for that reply, my first thought is ugh.

from your diagram, i believe the numa domain nodes must be the routing chips. and i feel we should have this information in our tree. if you consider connecting ram to pci slot 1, the pci controller is acting just like a numa domain. because the whole reason for the latency is from routing?

im very interested on this auto routing bootstraping stuff but feel its for another time. don't know much about acpi but currently looks awful

would i be able to manually program these objects and at a later date get this information automatically? via apci etc

thankyou

Posted: **Wed Nov 02, 2011 11:50 pm**

Hi,

cxzuk wrote:from your diagram, i believe the numa domain nodes must be the routing chips. and i feel we should have this information in our tree. if you consider connecting ram to pci slot 1, the pci controller is acting just like a numa domain. because the whole reason for the latency is from routing?

The latency would be from the routing and from the link/s themselves. For example, the routing might be very fast, but if the link is under heavy load (congested) then it could take a while to transfer something across it.

cxzuk wrote:would i be able to manually program these objects and at a later date get this information automatically? via apci etc

Using ACPI, you can find out which NUMA domain CPUs and areas of memory are (e.g. from the SRAT table), and the relative costs between NUMA domains (e.g. from the SLIT table). Most if this information depends on how the hardware is designed. For example, you can't shift a CPU from one NUMA domain to another with software. The only thing you could change is how things get mapped in the physical address space; but it's hard to see an advantage in doing that (given that you mostly only care about the virtual address space and can use paging to map anything anywhere in the virtual address space), and it'd be relatively complex to do (for example, you'd have to make sure you don't shift RAM that your code is relying on, as it'd break your code).

For IO hubs (where ACPI doesn't give you information about which NUMA domain the IO hubs and devices attached to it are in), you'd have to extract the information from hardware itself. This can't be done in a standard way - you'd need different code for different variations of "AMD", different code for different variations of "Intel", plus different code for different chipsets.

Also note that in all of these cases you need fall-backs. For example, an OS has to be able to work when it doesn't know which NUMA domain IO hubs are in, in case it's unable to find out. In the same way an OS has to be able to work when it doesn't know the relative costs between NUMA domains (e.g. there's no "SLIT"); and has to work when it doesn't know there are NUMA domains or their contents (there's no "SRAT"). For all these cases "lack of information" means the OS isn't able to optimise things as much to suit - the OS still works correctly, but is a little less efficient.

For this reason, your "hardware tree" should probably have a dummy "unknown" NUMA domain. For example, the tree from before could be like this:

Code: Select all

Computer
  |
  |__ NUMA Domain UNKNOWN
  |    |__ PCI Host Controller #0
  |    |    |__ SATA controller #0
  |    |    |    |__ Hard Disk #0
  |    |    |    |__ Hard Disk #1
  |    |    |__ SATA controller #1
  |    |    |    |__ CD_ROM #0
  |    |    |__ PCI to PCI bridge
  |    |         |__ USB controller #0
  |    |         |    |__ Keyboard
  |    |         |    |__ Mouse
  |    |         |__ Ethernet card #0
  |    |         |__ PCI to LPC Bridge
  |    |              |__ PIC
  |    |              |__ PIT
  |    |              |__ Floppy disk controller
  |    |                   |__ Floppy disk #0
  |    |__ PCI Host Controller #1
  |         |__ Video card #0
  |         |    |__ Monitor #0
  |         |__ Video card #1
  |         |    |__ Monitor #1
  |         |__ USB controller #1
  |         |    |__ Flash memory stick #0
  |         |    |__ Flash memory stick #1
  |         |__ USB controller #2
  |
  |__ NUMA Domain #0
  |    |__ CPU #0
  |    |__ CPU #1
  |    |__ Memory bank #0
  |
  |__ NUMA Domain #1
  |    |__ CPU #2
  |    |__ CPU #3
  |    |__ Memory bank #1
  |
  |__ NUMA Domain #2
  |    |__ CPU #4
  |    |__ CPU #5
  |    |__ Memory bank #2
  |
  |__ NUMA Domain #3
       |__ CPU #6
       |__ CPU #7
       |__ Memory bank #3

Or it could look like this:

Code: Select all

Computer
  |
  |__ NUMA Domain UNKNOWN
       |__ CPU #0
       |__ CPU #1
       |__ Memory bank #0
       |__ CPU #2
       |__ CPU #3
       |__ Memory bank #1
       |__ CPU #4
       |__ CPU #5
       |__ Memory bank #2
       |__ CPU #6
       |__ CPU #7
       |__ Memory bank #3
       |__ PCI Host Controller #0
       |    |__ SATA controller #0
       |    |    |__ Hard Disk #0
       |    |    |__ Hard Disk #1
       |    |__ SATA controller #1
       |    |    |__ CD_ROM #0
       |    |__ PCI to PCI bridge
       |         |__ USB controller #0
       |         |    |__ Keyboard
       |         |    |__ Mouse
       |         |__ Ethernet card #0
       |         |__ PCI to LPC Bridge
       |              |__ PIC
       |              |__ PIT
       |              |__ Floppy disk controller
       |                   |__ Floppy disk #0
       |__ PCI Host Controller #1
            |__ Video card #0
            |    |__ Monitor #0
            |__ Video card #1
            |    |__ Monitor #1
            |__ USB controller #1
            |    |__ Flash memory stick #0
            |    |__ Flash memory stick #1
            |__ USB controller #2

That last "hardware tree" might be for a NUMA system where you couldn't get any NUMA information; but it could also be for an SMP system where there's no NUMA at all.

Cheers,

Brendan

OSDev.org

IO Questions

IO Questions

Re: IO Questions

Re: IO Questions

Re: IO Questions

Re: IO Questions

Re: IO Questions

Re: IO Questions

Re: IO Questions

Re: IO Questions

Re: IO Questions

Re: IO Questions

Re: IO Questions