GPU based?

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Ready4Dis
Member
Member
Posts: 571
Joined: Sat Nov 18, 2006 9:11 am

Post by Ready4Dis »

I agree with Brandon, the extra latencies and overhead of bus transfers must outweigh the costs, imagine trying to use a 1gbps network while listening to an MP3, and playing a game... you are wasting tons of valuable resources by doing so many transfers over the PCI bus. Unless you could do PCI->PCI dma, it's just not worth it. I can see a point for folding@home using it, it's supposed to run in spare time, so it's something that can run while nothing is going on, then pause when you are playing games or doing something intensive. Now, you CAN perform things like mpeg decoding on the video card, but there are already provisions for such things. If you were running very large scientific research simulations, it'd be helpful, but in that case, it'd typically be a single implementation/program that uses the GPU rather than a kernel supplied service. The point is, the time when your computer is stressed the most, is typically when you'd want to hand things back to the CPU, which negates even doing stuff on the GPU for non-important times. And just another note, using the GPU on a laptop for something will kill the battery much faster than running it on the CPU, which end users tend to dislike. At this time, I just don't see a use for it, except for one off special cases in which it's better to just let the application developer have access to the video card like the other OS's do.
grover
Posts: 17
Joined: Wed Apr 30, 2008 7:20 am

Post by grover »

MarkOS wrote:How can we implement drivers for NVIDIA's CUDA and AMD's Close-to-the-Metal architectures? Their sources are opened to the public? If not, how we can do?
CUDA is nothing but a software layer on top of the classic texture memory and shaders. So you're not looking at something special there, its just sugar coating.

With ATI you're in luck: They opened their GPU specs, so you should be able to find it there. I assume the registers/memory layout/commands to be very close to Direct3D.
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Post by Jeko »

grover wrote:
MarkOS wrote:How can we implement drivers for NVIDIA's CUDA and AMD's Close-to-the-Metal architectures? Their sources are opened to the public? If not, how we can do?
CUDA is nothing but a software layer on top of the classic texture memory and shaders. So you're not looking at something special there, its just sugar coating.
What? Operations on nvidia cards are done with pixel shaders?
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Post by pcmattman »

MarkOS wrote:What? Operations on nvidia cards are done with pixel shaders?
You've obviously never written shaders in GPU assembly...
User avatar
karloathian
Posts: 22
Joined: Fri Mar 28, 2008 12:09 am

Post by karloathian »

GPU shaders written in assembly is pretty much the way to go and I find them easier to write than using any high level implementation
User avatar
einsteinjunior
Member
Member
Posts: 90
Joined: Tue Sep 11, 2007 6:42 am

Post by einsteinjunior »

Hi,

I wish to know where i can find some informations about wrtting pixel shaders in assembly as you said above.I am highly interested.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

einsteinjunior wrote:Hi,

I wish to know where i can find some informations about wrtting pixel shaders in assembly as you said above.I am highly interested.
Google GLSL and Cg.
Ready4Dis
Member
Member
Posts: 571
Joined: Sat Nov 18, 2006 9:11 am

Post by Ready4Dis »

einsteinjunior wrote:Hi,

I wish to know where i can find some informations about wrtting pixel shaders in assembly as you said above.I am highly interested.
I have a few on my harddrive that I wrote, it's an older version of pixel shaders so it doesn't include branching, so pretty useless for OS dev in most cases, unless you are trying to write a simulation that has to process tons of data in a similar fashion.
User avatar
suthers
Member
Member
Posts: 672
Joined: Tue Feb 20, 2007 3:00 pm
Location: London UK
Contact:

Re:

Post by suthers »

MarkOS wrote:
grover wrote:
MarkOS wrote:How can we implement drivers for NVIDIA's CUDA and AMD's Close-to-the-Metal architectures? Their sources are opened to the public? If not, how we can do?
CUDA is nothing but a software layer on top of the classic texture memory and shaders. So you're not looking at something special there, its just sugar coating.
What? Operations on nvidia cards are done with pixel shaders?
I think he might of been referring to the unified shader architecture, in modern GPUs there is no such thing as a pixel shader, all the shaders are capable of all the different types of instructions...

As for the 280GTX Tri-SLI :shock: =P~ , 6 months ago I bought 2 8800 GTS 512mb and I was nearly at the top of what technology could provide... 6 months later I'm one graphics card and 2 series down... :( :cry:
Jules
grover
Posts: 17
Joined: Wed Apr 30, 2008 7:20 am

Re: GPU based?

Post by grover »

Yes, I was talking about the shader processors, more commonly referred to unified shaders (that being the DirectX term).

Anyways the current AMD/ATI Radeon 4870 has a performance of 1.2 Teraflops in a single card configuration, which beats anything NVidia has to offer from prices, power and performance perspectives.

The 4870 consists of 10 shader cores, each having 80 32-bit stream processors making a total of 800 shader processors on a single card. So you could schedule up to 10 processes on the GPU...
User avatar
suthers
Member
Member
Posts: 672
Joined: Tue Feb 20, 2007 3:00 pm
Location: London UK
Contact:

Re: GPU based?

Post by suthers »

Well, though the ATI Radeon HD 4870 produces 1.2 TF and the Nvidia Geforce 280GTX only produces 0.933, it outperforms the 4870 on most tests....
Probably because it has a higher amount of ROPs, faster shader clock and more TMUs, so though it slower at generating the 'skeleton' image, it will be faster at mapping textures and actually getting it onto the screen...
Meaning that if you want an OS that runs on the GPU, the 4870 is probably faster, but not for most games....
Jules
User avatar
AndrewAPrice
Member
Member
Posts: 2299
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: GPU based?

Post by AndrewAPrice »

The DirectX SDK (Windows only, although they hare online documentation on MSDN) is a good resource, but you need to trick it to do GPGPU (General Processing on GPU) since Direct3D is only designed to use shaders for manipulating vertices, geometry, and pixels. You can pass it virtually any kind of data to work with.

If you wanted to do repetitive calculations on a large data set you could store all the data in a vertex buffer and let if manipulate through it all in a few calls :D
My OS is Perception.
User avatar
suthers
Member
Member
Posts: 672
Joined: Tue Feb 20, 2007 3:00 pm
Location: London UK
Contact:

Re: GPU based?

Post by suthers »

Just wanted to know out of interest is it possible for a video card to fetch memory from main memory?
Because then couldn't you run the kernel on the video memory at make it fetch instructions by itself, or is it necessary for the instructions to be fetched by the CPU?
Jules
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: GPU based?

Post by Brendan »

Hi,
suthers wrote:Because then couldn't you run the kernel on the video memory at make it fetch instructions by itself, or is it necessary for the instructions to be fetched by the CPU?
A GPU is designed to do the same operation on a large array of data, and gets really high FLOPS by doing the operations in parallel.

A kernel (especially a micro-kernel) never does the same operation on a large array of data, and therefore being able to do lots of operations in parallel is entirely useless. I did the math and found that at one operation per instruction (rather than 80 operations per instruction), an old 1 GHz Pentium III has better performance than one of the cores in your 1.2 teraflops GPU.

Of course this is based on theoretical maximums, which has nothing to do with the performance you'd get in practice. In practice a lot of time is spent fetching data from RAM (or for your GPU, fetching data from video RAM after you've dragged it across the PCI bus from main RAM). Therefore, in practice if you try running general purpose code on a GPU I'd expect all to get less performance from all cores of your 1.2 teraflop GPU than you would from an old 1 GHz Pentium III.

However, I'm ignoring scalability problems - ten cores don't give the same performance as a single core that's ten times faster (bus sharing problems,, lock contention, etc). Because of this, I'd estimate that your 1.2 teraflop GPU would perform about as well as a 600 MHz Pentium II. But...

This assume that the instruction sets are roughly equal, and they're not. Doing a floating point addition quickly won't help when you're having trouble figuring out how to do a basic function call when there's no CALL/RET instructions and no stack. Because of this I'd expect some seriously ugly hacks to make any general purpose code run on a GPU, and these ugly hacks are going to cost you. Because of this, I'd estimate that your 1.2 teraflop GPU would perform about as well as a 266 MHz Pentium.

Of course I've overlooked something. You'd need a separate kernel for each type of GPU (which would be entirely insane - the kernel would be obsolete before it's written) or some sort of run-time compiler that compiles "generic shader-language" into something the GPU can actually execute. Because of this I'd estimate that your 1.2 teraflop GPU would perform about as well as a 66 MHz 80486.

Unfortunately, it'll be like a 66 MHz 80486 without an MMU or protection (or even basic IRQ handling) that's sitting next to a general purpose CPU that's capable of executing general purpose code thousands of times faster... :lol:


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply