pci bus mastering for video card
pci bus mastering for video card
Hi everyone!
I have been spending some time on these forums and let me say that they are the best source of information for everyone attempting to develop an operating system!
I am an Italian student and I study physics at university, but I have been a hobby programmer for the past 10 years.
Some time ago I started a sort of "operating system". Well, since now I have not done much more than a bootloader. But my real goal is not an os in the common meaning of the word. I was planning to create a single-tasking core of an operating system (32-bit protected mode): my aim is having the complete control of the hardware to be able to use 100% of hardware resources (no other processes running). I would like to use this system for computer simulations and "heavy" calculus (eg. complex multi-particle systems). So I'm looking more to developing specific hardware drivers than a complete kernel with memory management, IPC etc. I don't need to create generic drivers, as I am able to choose the peripherals I want to use! But for those ones I need a driver and here is my problem. So far I have the alpha of a PS/2 keyboard driver and by reading this forum I have come to the conclusion that if I dedicated some time to it, I would be able to develop a USB and SATA driver, as they seem to have an available documentation on the web. I could also get a specific Intel chipset, to have complete access to open manuals. Some weeks ago I began to investigate about pci devices enumeration. At the moment I have a very simple video driver which works with a linear framebuffer (set up during boot using VESA2.0.) and simulates a text console. I have tried various vesa modes but I am interested in 1024x768 or 1280x1024. The pcs I have used for test support both 24-bit and 32-bit modes but of course I prefer the latter for performance. I have not done some precise benchmarks but full screen blit (ram->framebuffer) is quite slow on high resolution. I have read anything I could find here and I have spent months searching the web with no result. I understand that the problem is related to the bus transfer speed etc. I am aware of burst writes on pci, cache types etc, in fact the only useful thing I found is setting framebuffer memory area cache type to "write combining" using MTRRs and it has really improved speed A LOT!!! It is not so great though because the main problem is that even if I get a good FPS of 30, CPU is always busy with transferring data on the bus! I have read that modern video cards support bus mastering and I am pretty sure that even the graphics controller embedded in one of my intel chipsets supports it...
now the main question is:
What should I do to activate bus mastering functionality of my video card(/chip)? I think it could be something to look for in intel chipset manual, but so far I have found nothing! I read everywhere of IDE bus mastering but really nothing about video pci bus mastering. My goal is to let gpu copy the framebuffer while cpu continues to do other tasks. I understand that technical details of gfx cards are kept quite "secret" by the producers but this may be a general pci function. Or at least I could use an integrated gfx controller but Intel manuals have not been able to help me so far.
Any ideas?
Thank you very much in advance for your kind help and thank you for keeping such a great forum!
Best regards!
Giovanni
I have been spending some time on these forums and let me say that they are the best source of information for everyone attempting to develop an operating system!
I am an Italian student and I study physics at university, but I have been a hobby programmer for the past 10 years.
Some time ago I started a sort of "operating system". Well, since now I have not done much more than a bootloader. But my real goal is not an os in the common meaning of the word. I was planning to create a single-tasking core of an operating system (32-bit protected mode): my aim is having the complete control of the hardware to be able to use 100% of hardware resources (no other processes running). I would like to use this system for computer simulations and "heavy" calculus (eg. complex multi-particle systems). So I'm looking more to developing specific hardware drivers than a complete kernel with memory management, IPC etc. I don't need to create generic drivers, as I am able to choose the peripherals I want to use! But for those ones I need a driver and here is my problem. So far I have the alpha of a PS/2 keyboard driver and by reading this forum I have come to the conclusion that if I dedicated some time to it, I would be able to develop a USB and SATA driver, as they seem to have an available documentation on the web. I could also get a specific Intel chipset, to have complete access to open manuals. Some weeks ago I began to investigate about pci devices enumeration. At the moment I have a very simple video driver which works with a linear framebuffer (set up during boot using VESA2.0.) and simulates a text console. I have tried various vesa modes but I am interested in 1024x768 or 1280x1024. The pcs I have used for test support both 24-bit and 32-bit modes but of course I prefer the latter for performance. I have not done some precise benchmarks but full screen blit (ram->framebuffer) is quite slow on high resolution. I have read anything I could find here and I have spent months searching the web with no result. I understand that the problem is related to the bus transfer speed etc. I am aware of burst writes on pci, cache types etc, in fact the only useful thing I found is setting framebuffer memory area cache type to "write combining" using MTRRs and it has really improved speed A LOT!!! It is not so great though because the main problem is that even if I get a good FPS of 30, CPU is always busy with transferring data on the bus! I have read that modern video cards support bus mastering and I am pretty sure that even the graphics controller embedded in one of my intel chipsets supports it...
now the main question is:
What should I do to activate bus mastering functionality of my video card(/chip)? I think it could be something to look for in intel chipset manual, but so far I have found nothing! I read everywhere of IDE bus mastering but really nothing about video pci bus mastering. My goal is to let gpu copy the framebuffer while cpu continues to do other tasks. I understand that technical details of gfx cards are kept quite "secret" by the producers but this may be a general pci function. Or at least I could use an integrated gfx controller but Intel manuals have not been able to help me so far.
Any ideas?
Thank you very much in advance for your kind help and thank you for keeping such a great forum!
Best regards!
Giovanni
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: pci bus mastering for video card
Enabling bus mastering depends on the exact video card involved. It is probably more interesting to try hardware acceleration to save bus bandwidth and do whatever graphics stuff on the video card itself.
Also, I know of video cards that can only do busmastering for command queues (that means no host-to-vram transfers), and some that can't do busmastering at all (but still do hardware accelleration). Since I don't know the details of the Intel GMAs I can't tell what they can and can not do.
What I do know is that while Intel chips have better documentation, ATI cards have decent documentation as well, and generally outperform intel stuff. (And ATIs are true GPU's in case you are interested in programming them) Either way, graphics drivers are complex beasts.
I can't comment on what card would be best - Right now I've only attempted graphics drivers for older chips (Vanilla VGA/VGA with LFB/BGA which work, Mach64 which is WIP but success so far, and the V2000 which so far failed due to a total lack of documentation), So it'd probably be best to grab a manual for both types of card and see with which you can get on the most.
Also, I know of video cards that can only do busmastering for command queues (that means no host-to-vram transfers), and some that can't do busmastering at all (but still do hardware accelleration). Since I don't know the details of the Intel GMAs I can't tell what they can and can not do.
What I do know is that while Intel chips have better documentation, ATI cards have decent documentation as well, and generally outperform intel stuff. (And ATIs are true GPU's in case you are interested in programming them) Either way, graphics drivers are complex beasts.
I can't comment on what card would be best - Right now I've only attempted graphics drivers for older chips (Vanilla VGA/VGA with LFB/BGA which work, Mach64 which is WIP but success so far, and the V2000 which so far failed due to a total lack of documentation), So it'd probably be best to grab a manual for both types of card and see with which you can get on the most.
Re: pci bus mastering for video card
Thank you very much for your answer! At the moment I don't have an ATI card (wish NVidia released some information...) to do experiments with, but I will try to read intel manuals about my chipset. The problem is that hardware acceleration is a good thing but for my needs it is not very appropriate: line rendering, polygons, blits are useful for gui and maybe games, but I would like to have complete control, as I will draw things the card cannot imagine like eg. differential equation solutions etc. At the moment the only function I need is DrawPoint() or something like that. I would like to fill a buffer in system memory and then tell the video card to copy it into its framebuffer, while cpu can proceed with other tasks. Let's say that I want to play a video file, then hardware acceleration is quite unhelpful...Should I look into intel manuals for "bus mastering" or is there a more specific characteristic I should look for? I fear that I may get distracted by something unrelated to my problem as those manuals seem very heavy..Combuster wrote:Enabling bus mastering depends on the exact video card involved. It is probably more interesting to try hardware acceleration to save bus bandwidth and do whatever graphics stuff on the video card itself.
Also, I know of video cards that can only do busmastering for command queues (that means no host-to-vram transfers), and some that can't do busmastering at all (but still do hardware accelleration). Since I don't know the details of the Intel GMAs I can't tell what they can and can not do.
What I do know is that while Intel chips have better documentation, ATI cards have decent documentation as well, and generally outperform intel stuff. (And ATIs are true GPU's in case you are interested in programming them) Either way, graphics drivers are complex beasts.
I can't comment on what card would be best - Right now I've only attempted graphics drivers for older chips (Vanilla VGA/VGA with LFB/BGA which work, Mach64 which is WIP but success so far, and the V2000 which so far failed due to a total lack of documentation), So it'd probably be best to grab a manual for both types of card and see with which you can get on the most.
Thank you very much!
All the best!
Giovanni
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: pci bus mastering for video card
I poked through the intel reference, and reminded myself that Intel chips are northbridge chips.
In other words, they have shared memory access. Therefore it should be fastest to operate on video memory directly with caching enabled, force a cache flush, then swap pages.
The ATI R5xx supports busmastering, but from the quick look I don't know for sure if it can do host-to-video blits via busmastering DMA as opposed to the command buffer DMA
In other words, they have shared memory access. Therefore it should be fastest to operate on video memory directly with caching enabled, force a cache flush, then swap pages.
The ATI R5xx supports busmastering, but from the quick look I don't know for sure if it can do host-to-video blits via busmastering DMA as opposed to the command buffer DMA
Re: pci bus mastering for video card
So I should not care about cpu having to send data to gfx card? When you say 'caching enabled' do you refer to "write combining" or something like that on video framebuffer memory area? About 'cache flush' and 'page swap' - how should I perform them? Should I look into cpu functions or should I study chipset manual? Please, forgive me, I am quite advanced in programming itself but to os and hardware I am quite newCombuster wrote:it should be fastest to operate on video memory directly with caching enabled, force a cache flush, then swap pages.
Thank you very much!
All the best!
Giovanni
Re: pci bus mastering for video card
Hi 999999999, Your OS design is simular to my OS , also your ? is the same has the one i try to workout .
I would for the moment forget third party drivers (Intel chips etc), along with PC and maybe look at what ARM offer you.
If you must have x86, than maybe you could try to geting the best FPS you can for a screen res, and see if it's fast enough without doing anything else.
Than maybe looking into multi-core use, or even making your own video card, as are member's of my OS team are trying to do.
You could also look at old video card's, with more info (Voodoo 3), eg: http://homepage.swissonline.ch/tinyasm/v3.htm
Than theres the old xbox, to look into http://www.xbdev.net/non_xdk/nasm_xbe/xbe_050/index.php
I would for the moment forget third party drivers (Intel chips etc), along with PC and maybe look at what ARM offer you.
If you must have x86, than maybe you could try to geting the best FPS you can for a screen res, and see if it's fast enough without doing anything else.
Than maybe looking into multi-core use, or even making your own video card, as are member's of my OS team are trying to do.
You could also look at old video card's, with more info (Voodoo 3), eg: http://homepage.swissonline.ch/tinyasm/v3.htm
Than theres the old xbox, to look into http://www.xbdev.net/non_xdk/nasm_xbe/xbe_050/index.php
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: pci bus mastering for video card
Intel graphics chips don't have the standard FSB-PCI-Graphics card construction (they aren't cards!). Instead:999999999 wrote:So I should not care about cpu having to send data to gfx card? When you say 'caching enabled' do you refer to "write combining" or something like that on video framebuffer memory area? About 'cache flush' and 'page swap' - how should I perform them? Should I look into cpu functions or should I study chipset manual? Please, forgive me, I am quite advanced in programming itself but to os and hardware I am quite newCombuster wrote:it should be fastest to operate on video memory directly with caching enabled, force a cache flush, then swap pages.
Code: Select all
+-------+ +-----------+---+ +--------+
| | | |GPU| | |
| CPU <---> North +---+ | Memory |
| | | bridge <---> |
+-------+ +---------------+ +--------+
Re: pci bus mastering for video card
Hi,
Just a quick note...
I took a quick look at (selected parts of) Intel's documentation, and it looks to me like the graphics hardware supports (it's own version of) paging, complete with TLBs and both "global virtual memory" and "per process virtual memory". From this I wonder if the idea is to use "GPU paging" to map 4 KiB pages of RAM (from the host) directly into the GPU's virtual address space.
I'd also assume that it could take 6 months of studying the hardware and documentation and trying different things, just to learn enough about the hardware to be able to decide what the best way of implementing things might be...
Cheers,
Brendan
Just a quick note...
I took a quick look at (selected parts of) Intel's documentation, and it looks to me like the graphics hardware supports (it's own version of) paging, complete with TLBs and both "global virtual memory" and "per process virtual memory". From this I wonder if the idea is to use "GPU paging" to map 4 KiB pages of RAM (from the host) directly into the GPU's virtual address space.
I'd also assume that it could take 6 months of studying the hardware and documentation and trying different things, just to learn enough about the hardware to be able to decide what the best way of implementing things might be...
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: pci bus mastering for video card
I am aware of double buffering and on the intel chipset I'll do some benchmarks (I don't have that pc here right now). But let's say for a moment that I want to use an agp/pci-express card: apart from enabling write combining cache mode on lfb memory area, should I look for information about bus/mastering etc in the card manual (if it exists at all ) or it is a function I should look for in the chipset's manual and/or AGP/PCI-E specifications? What search do you suggest?
Thank you very much for your answers and contributes to the thread!
All the best!
Giovanni
Thank you very much for your answers and contributes to the thread!
All the best!
Giovanni
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: pci bus mastering for video card
Brendan: The GPU having some form of paging doesn't sound unusual. I know that the nVIDIA drivers map part of the graphics card's RAM into the process address space; if libGL is talking to the graphics card directly, it would make sense to have system memory paging on the graphics card in order to firstly stop the process from doing anything silly, and secondly reconcile the process and graphics card's opinions of what was what.
Is this in adition to - or part of - the GART?
Is this in adition to - or part of - the GART?
Re: pci bus mastering for video card
Hi,
For example, it looks like you could do page flipping just by changing the page tables, where the backbuffer is normal system RAM (and becomes the displayed frontbuffer after it's paged into the GPU's address space). It also looks like an OS might need to provide physical memory manager hooks - for example, so that the video driver can release a page of "video RAM" and tell the OS that the page is now free RAM (that can be used as normal by the OS), or so that the video driver can take a page of normal RAM and tell the OS that the page is now unusable (e.g. part of video display memory now); and this applies to all video RAM (for e.g. the "amount of RAM to use for onboard video" BIOS setting can be completely overridden dynamically, one page at a time). However, I wouldn't assume this level of flexibility would be easy to support - there's probably some major complications with cache coherency and MTRRs to consider.
Of course I should point out that I'm just guessing - I've never written any code for Intel's graphics controllers...
Cheers,
Brendan
It seems similar to the GART, but different. AFAIK, the GART is normally only used like a (more flexible) form of scatter-gather bus mastering - e.g. used as a way to transfer data to/from the video card's own memory, rather than as complete replacement of the video card's memory.Owen wrote:Brendan: The GPU having some form of paging doesn't sound unusual. I know that the nVIDIA drivers map part of the graphics card's RAM into the process address space; if libGL is talking to the graphics card directly, it would make sense to have system memory paging on the graphics card in order to firstly stop the process from doing anything silly, and secondly reconcile the process and graphics card's opinions of what was what.
Is this in adition to - or part of - the GART?
For example, it looks like you could do page flipping just by changing the page tables, where the backbuffer is normal system RAM (and becomes the displayed frontbuffer after it's paged into the GPU's address space). It also looks like an OS might need to provide physical memory manager hooks - for example, so that the video driver can release a page of "video RAM" and tell the OS that the page is now free RAM (that can be used as normal by the OS), or so that the video driver can take a page of normal RAM and tell the OS that the page is now unusable (e.g. part of video display memory now); and this applies to all video RAM (for e.g. the "amount of RAM to use for onboard video" BIOS setting can be completely overridden dynamically, one page at a time). However, I wouldn't assume this level of flexibility would be easy to support - there's probably some major complications with cache coherency and MTRRs to consider.
Of course I should point out that I'm just guessing - I've never written any code for Intel's graphics controllers...
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 116
- Joined: Wed Oct 22, 2008 2:21 am
- Location: Roma,Italy
Re: pci bus mastering for video card
Ciao,999999999 wrote:I am aware of double buffering and on the intel chipset I'll do some benchmarks (I don't have that pc here right now). But let's say for a moment that I want to use an agp/pci-express card: apart from enabling write combining cache mode on lfb memory area, should I look for information about bus/mastering etc in the card manual (if it exists at all ) or it is a function I should look for in the chipset's manual and/or AGP/PCI-E specifications? What search do you suggest?
ti stai creando dei problemi inutili,una volta che metti da parte un sistema operativo con
relativo multitask,ovvero non operi più all'interno di esso ma sviluppi un mini sistema operativo
quindi hai a disposizione l'intera CPU per te,abiliti write combining nella memoria video
puoi lavorare direttamente con essa o con la tecnica del double buffer e avrai una sufficente
performance grafica per non dire ottima.
Piuttosto io mi preoccuperei per gli algoritmi che usi per i tuoi pesanti calcoli,non so
che linguaggio usi ma il mio consiglio se sei un bravo programmatore è l'assembly.
Non perdere tempo in sciocchezze riguardo al bus mastering,e comincia ad ottimizzare i tuoi
algoritmi riguardo ai tuoi calcoli ingegneristici,non immaggini neanche la potenza dei
calcolatori odierni,io ancora mi meraviglio di ciò,
Ricordati sempre che la cosa più importante sono gli algoritmi che usi
poi viene l'ottimizzazine se necessaria
Hello,
you are creating unnecessary problems, once put by an operating system with
on multitask, or no longer operates within it, but develop a mini operating system
then you have the CPU for you, you enable write combining video memory
you can work directly with him or with the technique of double buffer and get a sufficient
graphics performance if not excellent.
Rather, I worry about the algorithms you use for your heavy calculations, I do not know
that uses language but my advice if you are a good programmer is the assembly.
Do not waste time on nonsense about the bus mastering, and begin to optimize your
algorithms to your calculations about engineering, not even the images of power
today's computers, I still I wonder what
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: pci bus mastering for video card
If I got the math right, having a framebuffer in memory which is busmastered to the video card will save you 14% off the total CPU time (for AGP 8X), or 7% (for 16x pci express 2.0). (1280x1024x32@60)Do not waste time on nonsense about the bus mastering
Or hundreds of millions of wasted operations per displayed frame, since the vast majority of time is spent on wait-states.
I would not even consider that nonsense as that factor is much more than sensible with hand-optimizing things (which I give 5% max). Now if you could manage to improve algorithmic complexity for your problem, that'd be really interesting
-
- Member
- Posts: 116
- Joined: Wed Oct 22, 2008 2:21 am
- Location: Roma,Italy
Re: pci bus mastering for video card
Today the industry of hardware and Software that focus on the whole performance, but often you'll not obtain the benefit of these things, in most cases it will run slower.
Combuster your theory is only equal to the producers that trying to sell you their products, I could get one thousand examples, the first comes to my mind is what about the LCD Monitor, many producers say monitor 2 milliseconds, but you do not ever get them.
Instead of preaching theory do your experiments and
shows the results of the tests.
Combuster your theory is only equal to the producers that trying to sell you their products, I could get one thousand examples, the first comes to my mind is what about the LCD Monitor, many producers say monitor 2 milliseconds, but you do not ever get them.
Instead of preaching theory do your experiments and
shows the results of the tests.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: pci bus mastering for video card
Okay, let me add that what I computed is the theoretical best case limit for both buses. So lets add another 20% over what I mentioned before to get the average case. Happy now?many producers say monitor 2 milliseconds, but you do not ever get them.