Hi,
Combuster wrote:Brendan wrote:Combuster wrote:I'd add sparse port sets to the list. Many ISA devices use ports in the style of 0x??nn, where n is constant, and only the top x bits change. To make matters worse, some PCI devices do the exact same thing, and do not report that in the BARs. (and can therefore thoroughly screw up your system when something locates a set of ports and they start overlapping. (I even know of a well known VM that emulates one such card)
Some old ISA devices only decode the least significant 10-bits of an I/O port address, so that (for e.g.) if the device uses I/O ports from 0x0200 to 0x0208 you get annoying shadows (I/O ports 0x0600 to 0x0608, from 0x0A00 to 0x0A08, from 0x0E00 to 0x0E08, etc). In this case the device driver itself doesn't need access to the shadow ranges (e.g. it would only need to use I/O ports from 0x0200 to 0x0208).
Also, for PCI devices that support a legacy/ISA interface and a native interface, you'd expect that a native device driver written specifically for the device would use the device's native interface rather than worrying about the legacy/ISA interface.
I know those, and they do not form a problem with your system. However there is the set of ISA (compatible) cards that decode the full address, and use the top bits as offset while the bottom bits are constant (most likely a compensation for having register files that cooperate with the mirrored register files you mentioned)
As an example:
S3 Trio/Virge series of cards. MMIO is barely covered, and in fact not enough to make it work. And even the X.org drivers use the sparse register file for some cards that have PCI versions (including the one that VirtualPC emulates)
Oh my - unfortunately you're right.
I've got a book here (
"Programmer's Guide to the EGA, VGA, and Super VGA Cards (3rd Edition)") which has an (unfortunately partial) description of S3 video card registers. This book says that setting bit 4 in CR53 (I/O port 0x03D4, index 0x53) enables memory mapped I/O mode, where the extended registers become memory mapped like this:
- I/O port 0x8?E8 accessable via. memory address 0x000A8?E8
I/O port 0x9?E8 accessable via. memory address 0x000A9?E8
I/O port 0xA?E8 accessable via. memory address 0x000AA?E8
I/O port 0xB?E8 accessable via. memory address 0x000AB?E8
In addition, pixel data can be written to the memory area from 0x0000A0000 to 0x0000A7FFF (instead of using the "Pixel Data Transfer Registers" - I/O ports 0xE2E8 and 0xE2EA)
It also says that "I/O port 0xBEE8 index 0x0F, Read Register Select Register" is the only extended register that can't be memory mapped, and that memory mapped registers are "write-only". This shouldn't be a problem though (keep a copy of video card's registers in RAM so that you can read from RAM instead of reading from the device's registers directly).
It seems to me that the problem is the poor quality of available documentation for these card/s, rather than the card's actual abilities.
A more significant problem with these cards (and similar cards) is that it's impossible for the BIOS to know which I/O ports the device is using, and therefore impossible for the BIOS to avoid assigning these I/O ports to a different PCI card (e.g. a sound card); and likewise it's impossible for the BIOS to correctly configure any PCI to PCI bridges to forward I/O port access to the correct PCI bus segment (to ensure I/O port accesses that are meant to be forwarded to the video card actually are forwarded to the video card). Basically it breaks all of the "plug and play" principles of the PCI bus.
Because of this, IMHO memory mapping the extended registers is the only way this video card can be used safely (and even then you'd need to make sure that the legacy VGA I/O ports and legacy VGA "display memory window" addresses are forwarded to the video card).
Of course you already knew this; however, I still don't think that such an appauling mess justifies sparse port sets - if sparse I/O ports are necessary for a specific device, then that device should go on the OS's "currently not supported by this OS, and never will be supported this OS" hardware compatability blacklist (where if the device is detected during boot, the OS panics with a "
Rip that scanky piece of crud out of your computer NOW!" error message)..
Combuster wrote:Combuster wrote:To prevent repeated context switches just to push some polygons through a hardware FIFO? (especially when the device doesn't have DMA access to command buffers and you can only have two commands queued at any one time)
Edit: and before you come with the security argument - consider the case where the app has exclusive use of the screen device, and you give it only access to the command fifo (and not the CRTC regs to blow up the monitor, or DMA regs to screw with your memory, and so on)
One of the main goals of having device drivers is to prevent the need for applications to have lots and lots of code to support lots and lots of devices itself. Allowing a process to access a video card's command queue tends to defeat the purpose of having a device driver.
To avoid repeated context switches (e.g. for polygon drawing), a process can give the video driver an entire list of polygons that need to be drawn, then do one context switch to the video driver (which draws all polygons in the list).
In my design, the driver will be able to download itself into a process, so that those functions can be called directly, as an optimisation sacrificing security over speed. (and if you take care selecting the proper subset of controls, security may not even be an issue)
And there's still the FIFO problem you have yet to address: (for the demonstration, the fifo can hold one command, IRL the fifo holds ~2 while a third is executed):
Code: Select all
Naive approach: Batch processing: Direct access:
create create create
-switch- create render
render create create
-switch- -switch- render
create render create
-switch- -wait- render
render render create
-switch- -wait- render
create render create
. -switch- .
. . .
. . .
2x switch / cmd 2x swtich / n cmds no overhead
lockstep delays per (cycle waste can reach zero)
n cmds
(cpu cycles wasted)
True, that doesn't hold with the recent cards that have comparatively huge FIFOs and/or DMA commands.
Code: Select all
Elite processing:
create
render (using GPU)
create
render (using GPU)
create
render (using CPU - FIFO full)
create
render (using GPU)
create
render (using CPU - FIFO full again)
Better performance than just using the GPU alone (or the CPU alone), with no unnecessary context switching...
More seriously, AFAIK for most video cards the FIFOs are meant to be IRQ driven. For example, when the FIFO becomes empty enough the video card generates an IRQ and the IRQ handler shoves more commands into the FIFO. The idea is that the CPU spends very little time handling the IRQ and is therefore free to spent most of it's time doing other useful work while the video card does it's work in parallel. In this case a micro-kernel design does give you lots of context switches, but this is usually true for all IRQ driven devices (floppy, serial, hard disk, etc). When you decide to make a micro-kernel you decide to sacrifice performance for things like security/reliability (which IMHO is a fair compromise), and this is no different (but, the same idea of getting the video card to do work in parallel still holds, and avoiding the graphics acceleration and the context switches will probably make performance worse).
Combuster wrote:Of course I'd go a step further - I'm planning to use resolution independence and colour depth independence, so that normal processes don't need to care about these device specific (video mode specific?) details either.
Exokernel here.
*feeds output to /dev/blasphemy*
*ducks and runs*
Ok, let's think about this... How about an example?
There's 4 monitors and 3 video cards. The first monitor is connected to the first video card, which is an old S3 running at 800 * 600 * 15 bpp. The second monitor is connected to onboard Intel video and is running at 1024 * 768 * 16-bpp. The third and fourth monitors are connected to a nice ATI video card (with dual-output), and the third monitor is running at 1600 * 1200 * 32-bpp while the fourth monitor is running at 1280 * 1024 * 32-bpp.
The 4 monitors are arranged in a square, like this:
Code: Select all
+-----+-----+
| 1 | 2 |
+-----+-----+
| 3 | 4 |
+-----+-----+
The 4 monitors are setup as a single virtual screen, and act like one big monitor.
You want to draw a green rectangle in the center of the virtual screen. To do this you create a "video script" that describes the virtual co-ordinates for the top left corner and the virtual co-ordinates for the bottom right corner of the rectangle, and the rectangle's colour. You send this script to a "virtual screen" layer, which create 4 copies of this script (one for each monitor) and clips them, so that each of the 4 new scripts describes a quarter of the original script/rectangle. Then these 4 new scripts are sent to the corresponding video device drivers, and each video driver renders it's script (including converting from virtual co-ordinates into video mode specific co-ordinates, and converting colours from the standardized representation into the video mode's pixel format).
Of course if there's only one video card and one monitor, the "virtual screen" layer can be skipped entirely (the script from the GUI/application can go directly to the video driver); and one green rectangle is fairly boring (but the exact same idea would work for complex 3D scenes, or windows, dialog boxes, icons, etc, complete with full hardware acceleration where possible); and there's no reason the same script couldn't be sent to a printer or something else instead (or stored "as is" in a file for later, or maybe sent to a ray-caster to create an insanely high-quality image rather than an acceptable "real time" image).
To me, this sort of arrangement is both flexible and elegant.
Now, try describing how something like this would work when the application is plotting pixels directly into the video card/s frame buffer (and please realize that I deliberately didn't say that all of the video cards are in the *same* computer)...
Cheers,
Brendan