pci bus mastering for video card

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
999999999
Posts: 6
Joined: Wed Jan 07, 2009 1:02 pm

pci bus mastering for video card

Post by 999999999 »

Hi everyone!
I have been spending some time on these forums and let me say that they are the best source of information for everyone attempting to develop an operating system!
I am an Italian student and I study physics at university, but I have been a hobby programmer for the past 10 years.
Some time ago I started a sort of "operating system". Well, since now I have not done much more than a bootloader. But my real goal is not an os in the common meaning of the word. I was planning to create a single-tasking core of an operating system (32-bit protected mode): my aim is having the complete control of the hardware to be able to use 100% of hardware resources (no other processes running). I would like to use this system for computer simulations and "heavy" calculus (eg. complex multi-particle systems). So I'm looking more to developing specific hardware drivers than a complete kernel with memory management, IPC etc. I don't need to create generic drivers, as I am able to choose the peripherals I want to use! But for those ones I need a driver and here is my problem. So far I have the alpha of a PS/2 keyboard driver and by reading this forum I have come to the conclusion that if I dedicated some time to it, I would be able to develop a USB and SATA driver, as they seem to have an available documentation on the web. I could also get a specific Intel chipset, to have complete access to open manuals. Some weeks ago I began to investigate about pci devices enumeration. At the moment I have a very simple video driver which works with a linear framebuffer (set up during boot using VESA2.0.) and simulates a text console. I have tried various vesa modes but I am interested in 1024x768 or 1280x1024. The pcs I have used for test support both 24-bit and 32-bit modes but of course I prefer the latter for performance. I have not done some precise benchmarks but full screen blit (ram->framebuffer) is quite slow on high resolution. I have read anything I could find here and I have spent months searching the web with no result. I understand that the problem is related to the bus transfer speed etc. I am aware of burst writes on pci, cache types etc, in fact the only useful thing I found is setting framebuffer memory area cache type to "write combining" using MTRRs and it has really improved speed A LOT!!! It is not so great though because the main problem is that even if I get a good FPS of 30, CPU is always busy with transferring data on the bus! I have read that modern video cards support bus mastering and I am pretty sure that even the graphics controller embedded in one of my intel chipsets supports it...
now the main question is:

What should I do to activate bus mastering functionality of my video card(/chip)? I think it could be something to look for in intel chipset manual, but so far I have found nothing! I read everywhere of IDE bus mastering but really nothing about video pci bus mastering. My goal is to let gpu copy the framebuffer while cpu continues to do other tasks. I understand that technical details of gfx cards are kept quite "secret" by the producers but this may be a general pci function. Or at least I could use an integrated gfx controller but Intel manuals have not been able to help me so far.
Any ideas?

Thank you very much in advance for your kind help and thank you for keeping such a great forum!

Best regards!
Giovanni
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: pci bus mastering for video card

Post by Combuster »

Enabling bus mastering depends on the exact video card involved. It is probably more interesting to try hardware acceleration to save bus bandwidth and do whatever graphics stuff on the video card itself.
Also, I know of video cards that can only do busmastering for command queues (that means no host-to-vram transfers), and some that can't do busmastering at all (but still do hardware accelleration). Since I don't know the details of the Intel GMAs I can't tell what they can and can not do.

What I do know is that while Intel chips have better documentation, ATI cards have decent documentation as well, and generally outperform intel stuff. (And ATIs are true GPU's in case you are interested in programming them) Either way, graphics drivers are complex beasts.

I can't comment on what card would be best - Right now I've only attempted graphics drivers for older chips (Vanilla VGA/VGA with LFB/BGA which work, Mach64 which is WIP but success so far, and the V2000 which so far failed due to a total lack of documentation), So it'd probably be best to grab a manual for both types of card and see with which you can get on the most.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
999999999
Posts: 6
Joined: Wed Jan 07, 2009 1:02 pm

Re: pci bus mastering for video card

Post by 999999999 »

Combuster wrote:Enabling bus mastering depends on the exact video card involved. It is probably more interesting to try hardware acceleration to save bus bandwidth and do whatever graphics stuff on the video card itself.
Also, I know of video cards that can only do busmastering for command queues (that means no host-to-vram transfers), and some that can't do busmastering at all (but still do hardware accelleration). Since I don't know the details of the Intel GMAs I can't tell what they can and can not do.

What I do know is that while Intel chips have better documentation, ATI cards have decent documentation as well, and generally outperform intel stuff. (And ATIs are true GPU's in case you are interested in programming them) Either way, graphics drivers are complex beasts.

I can't comment on what card would be best - Right now I've only attempted graphics drivers for older chips (Vanilla VGA/VGA with LFB/BGA which work, Mach64 which is WIP but success so far, and the V2000 which so far failed due to a total lack of documentation), So it'd probably be best to grab a manual for both types of card and see with which you can get on the most.
Thank you very much for your answer! At the moment I don't have an ATI card (wish NVidia released some information...) to do experiments with, but I will try to read intel manuals about my chipset. The problem is that hardware acceleration is a good thing but for my needs it is not very appropriate: line rendering, polygons, blits are useful for gui and maybe games, but I would like to have complete control, as I will draw things the card cannot imagine :twisted: like eg. differential equation solutions etc. At the moment the only function I need is DrawPoint() :wink: or something like that. I would like to fill a buffer in system memory and then tell the video card to copy it into its framebuffer, while cpu can proceed with other tasks. Let's say that I want to play a video file, then hardware acceleration is quite unhelpful...Should I look into intel manuals for "bus mastering" or is there a more specific characteristic I should look for? I fear that I may get distracted by something unrelated to my problem as those manuals seem very heavy..

Thank you very much!

All the best!
Giovanni
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: pci bus mastering for video card

Post by Combuster »

I poked through the intel reference, and reminded myself that Intel chips are northbridge chips.
In other words, they have shared memory access. Therefore it should be fastest to operate on video memory directly with caching enabled, force a cache flush, then swap pages.

The ATI R5xx supports busmastering, but from the quick look I don't know for sure if it can do host-to-video blits via busmastering DMA as opposed to the command buffer DMA
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
999999999
Posts: 6
Joined: Wed Jan 07, 2009 1:02 pm

Re: pci bus mastering for video card

Post by 999999999 »

Combuster wrote:it should be fastest to operate on video memory directly with caching enabled, force a cache flush, then swap pages.
So I should not care about cpu having to send data to gfx card? When you say 'caching enabled' do you refer to "write combining" or something like that on video framebuffer memory area? About 'cache flush' and 'page swap' - how should I perform them? Should I look into cpu functions or should I study chipset manual? Please, forgive me, I am quite advanced in programming itself but to os and hardware I am quite new :)


Thank you very much!

All the best!
Giovanni
User avatar
Dex
Member
Member
Posts: 1444
Joined: Fri Jan 27, 2006 12:00 am
Contact:

Re: pci bus mastering for video card

Post by Dex »

Hi 999999999, Your OS design is simular to my OS 8) , also your ? is the same has the one i try to workout .
I would for the moment forget third party drivers (Intel chips etc), along with PC and maybe look at what ARM offer you.
If you must have x86, than maybe you could try to geting the best FPS you can for a screen res, and see if it's fast enough without doing anything else.
Than maybe looking into multi-core use, or even making your own video card, as are member's of my OS team are trying to do.
You could also look at old video card's, with more info (Voodoo 3), eg: http://homepage.swissonline.ch/tinyasm/v3.htm

Than theres the old xbox, to look into http://www.xbdev.net/non_xdk/nasm_xbe/xbe_050/index.php
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: pci bus mastering for video card

Post by Combuster »

999999999 wrote:
Combuster wrote:it should be fastest to operate on video memory directly with caching enabled, force a cache flush, then swap pages.
So I should not care about cpu having to send data to gfx card? When you say 'caching enabled' do you refer to "write combining" or something like that on video framebuffer memory area? About 'cache flush' and 'page swap' - how should I perform them? Should I look into cpu functions or should I study chipset manual? Please, forgive me, I am quite advanced in programming itself but to os and hardware I am quite new :)
Intel graphics chips don't have the standard FSB-PCI-Graphics card construction (they aren't cards!). Instead:

Code: Select all

+-------+   +-----------+---+   +--------+
|       |   |           |GPU|   |        |
|  CPU  <---> North     +---+   | Memory |
|       |   | bridge        <--->        |
+-------+   +---------------+   +--------+
In other words, the GPU is about as close to memory as is the CPU itself. It even uses the same memory the CPU uses, and a write to video memory has no different properties than a write to any other part of RAM. So if you put your backbuffer in video memory (not a performance hit since it is the same RAM as the rest of the system), you don't need to copy things around as much as you would with pushing chunks over the PCI bus, you'd only need to force whatever's still in the CPU cache to main memory before telling the graphics card to pull its data from there. (so that you can use the area that was previously frontbuffer as backbuffer - Wikipedia link).
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: pci bus mastering for video card

Post by Brendan »

Hi,

Just a quick note...

I took a quick look at (selected parts of) Intel's documentation, and it looks to me like the graphics hardware supports (it's own version of) paging, complete with TLBs and both "global virtual memory" and "per process virtual memory". From this I wonder if the idea is to use "GPU paging" to map 4 KiB pages of RAM (from the host) directly into the GPU's virtual address space.

I'd also assume that it could take 6 months of studying the hardware and documentation and trying different things, just to learn enough about the hardware to be able to decide what the best way of implementing things might be...


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
999999999
Posts: 6
Joined: Wed Jan 07, 2009 1:02 pm

Re: pci bus mastering for video card

Post by 999999999 »

I am aware of double buffering and on the intel chipset I'll do some benchmarks (I don't have that pc here right now). But let's say for a moment that I want to use an agp/pci-express card: apart from enabling write combining cache mode on lfb memory area, should I look for information about bus/mastering etc in the card manual (if it exists at all :roll: ) or it is a function I should look for in the chipset's manual and/or AGP/PCI-E specifications? What search do you suggest?

Thank you very much for your answers and contributes to the thread!


All the best!
Giovanni
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: pci bus mastering for video card

Post by Owen »

Brendan: The GPU having some form of paging doesn't sound unusual. I know that the nVIDIA drivers map part of the graphics card's RAM into the process address space; if libGL is talking to the graphics card directly, it would make sense to have system memory paging on the graphics card in order to firstly stop the process from doing anything silly, and secondly reconcile the process and graphics card's opinions of what was what.

Is this in adition to - or part of - the GART?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: pci bus mastering for video card

Post by Brendan »

Hi,
Owen wrote:Brendan: The GPU having some form of paging doesn't sound unusual. I know that the nVIDIA drivers map part of the graphics card's RAM into the process address space; if libGL is talking to the graphics card directly, it would make sense to have system memory paging on the graphics card in order to firstly stop the process from doing anything silly, and secondly reconcile the process and graphics card's opinions of what was what.

Is this in adition to - or part of - the GART?
It seems similar to the GART, but different. AFAIK, the GART is normally only used like a (more flexible) form of scatter-gather bus mastering - e.g. used as a way to transfer data to/from the video card's own memory, rather than as complete replacement of the video card's memory.

For example, it looks like you could do page flipping just by changing the page tables, where the backbuffer is normal system RAM (and becomes the displayed frontbuffer after it's paged into the GPU's address space). It also looks like an OS might need to provide physical memory manager hooks - for example, so that the video driver can release a page of "video RAM" and tell the OS that the page is now free RAM (that can be used as normal by the OS), or so that the video driver can take a page of normal RAM and tell the OS that the page is now unusable (e.g. part of video display memory now); and this applies to all video RAM (for e.g. the "amount of RAM to use for onboard video" BIOS setting can be completely overridden dynamically, one page at a time). However, I wouldn't assume this level of flexibility would be easy to support - there's probably some major complications with cache coherency and MTRRs to consider.

Of course I should point out that I'm just guessing - I've never written any code for Intel's graphics controllers... ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
djmauretto
Member
Member
Posts: 116
Joined: Wed Oct 22, 2008 2:21 am
Location: Roma,Italy

Re: pci bus mastering for video card

Post by djmauretto »

999999999 wrote:I am aware of double buffering and on the intel chipset I'll do some benchmarks (I don't have that pc here right now). But let's say for a moment that I want to use an agp/pci-express card: apart from enabling write combining cache mode on lfb memory area, should I look for information about bus/mastering etc in the card manual (if it exists at all :roll: ) or it is a function I should look for in the chipset's manual and/or AGP/PCI-E specifications? What search do you suggest?
Ciao,
ti stai creando dei problemi inutili,una volta che metti da parte un sistema operativo con
relativo multitask,ovvero non operi più all'interno di esso ma sviluppi un mini sistema operativo
quindi hai a disposizione l'intera CPU per te,abiliti write combining nella memoria video
puoi lavorare direttamente con essa o con la tecnica del double buffer e avrai una sufficente
performance grafica per non dire ottima.
Piuttosto io mi preoccuperei per gli algoritmi che usi per i tuoi pesanti calcoli,non so
che linguaggio usi ma il mio consiglio se sei un bravo programmatore è l'assembly.
Non perdere tempo in sciocchezze riguardo al bus mastering,e comincia ad ottimizzare i tuoi
algoritmi riguardo ai tuoi calcoli ingegneristici,non immaggini neanche la potenza dei
calcolatori odierni,io ancora mi meraviglio di ciò,
Ricordati sempre che la cosa più importante sono gli algoritmi che usi
poi viene l'ottimizzazine se necessaria :wink:



Hello,
you are creating unnecessary problems, once put by an operating system with
on multitask, or no longer operates within it, but develop a mini operating system
then you have the CPU for you, you enable write combining video memory
you can work directly with him or with the technique of double buffer and get a sufficient
graphics performance if not excellent.
Rather, I worry about the algorithms you use for your heavy calculations, I do not know
that uses language but my advice if you are a good programmer is the assembly.
Do not waste time on nonsense about the bus mastering, and begin to optimize your
algorithms to your calculations about engineering, not even the images of power
today's computers, I still I wonder what
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: pci bus mastering for video card

Post by Combuster »

Do not waste time on nonsense about the bus mastering
If I got the math right, having a framebuffer in memory which is busmastered to the video card will save you 14% off the total CPU time (for AGP 8X), or 7% (for 16x pci express 2.0). (1280x1024x32@60)

Or hundreds of millions of wasted operations per displayed frame, since the vast majority of time is spent on wait-states.

I would not even consider that nonsense as that factor is much more than sensible with hand-optimizing things (which I give 5% max). Now if you could manage to improve algorithmic complexity for your problem, that'd be really interesting
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
djmauretto
Member
Member
Posts: 116
Joined: Wed Oct 22, 2008 2:21 am
Location: Roma,Italy

Re: pci bus mastering for video card

Post by djmauretto »

Today the industry of hardware and Software that focus on the whole performance, but often you'll not obtain the benefit of these things, in most cases it will run slower.
Combuster your theory is only equal to the producers that trying to sell you their products, I could get one thousand examples, the first comes to my mind is what about the LCD Monitor, many producers say monitor 2 milliseconds, but you do not ever get them.
Instead of preaching theory do your experiments and
shows the results of the tests.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: pci bus mastering for video card

Post by Combuster »

many producers say monitor 2 milliseconds, but you do not ever get them.
Okay, let me add that what I computed is the theoretical best case limit for both buses. So lets add another 20% over what I mentioned before to get the average case. Happy now?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply