OSDev.org

Posted: **Wed May 07, 2008 3:33 am**

I agree with Brandon, the extra latencies and overhead of bus transfers must outweigh the costs, imagine trying to use a 1gbps network while listening to an MP3, and playing a game... you are wasting tons of valuable resources by doing so many transfers over the PCI bus. Unless you could do PCI->PCI dma, it's just not worth it. I can see a point for folding@home using it, it's supposed to run in spare time, so it's something that can run while nothing is going on, then pause when you are playing games or doing something intensive. Now, you CAN perform things like mpeg decoding on the video card, but there are already provisions for such things. If you were running very large scientific research simulations, it'd be helpful, but in that case, it'd typically be a single implementation/program that uses the GPU rather than a kernel supplied service. The point is, the time when your computer is stressed the most, is typically when you'd want to hand things back to the CPU, which negates even doing stuff on the GPU for non-important times. And just another note, using the GPU on a laptop for something will kill the battery much faster than running it on the CPU, which end users tend to dislike. At this time, I just don't see a use for it, except for one off special cases in which it's better to just let the application developer have access to the video card like the other OS's do.

Posted: **Wed May 07, 2008 11:33 am**

MarkOS wrote:How can we implement drivers for NVIDIA's CUDA and AMD's Close-to-the-Metal architectures? Their sources are opened to the public? If not, how we can do?

CUDA is nothing but a software layer on top of the classic texture memory and shaders. So you're not looking at something special there, its just sugar coating.

With ATI you're in luck: They opened their GPU specs, so you should be able to find it there. I assume the registers/memory layout/commands to be very close to Direct3D.

Posted: **Thu May 08, 2008 6:46 am**

grover wrote:
MarkOS wrote:How can we implement drivers for NVIDIA's CUDA and AMD's Close-to-the-Metal architectures? Their sources are opened to the public? If not, how we can do?
CUDA is nothing but a software layer on top of the classic texture memory and shaders. So you're not looking at something special there, its just sugar coating.

What? Operations on nvidia cards are done with pixel shaders?

Posted: **Thu May 08, 2008 3:11 pm**

MarkOS wrote:What? Operations on nvidia cards are done with pixel shaders?

You've obviously never written shaders in GPU assembly...

Posted: **Thu May 08, 2008 10:01 pm**

GPU shaders written in assembly is pretty much the way to go and I find them easier to write than using any high level implementation

Posted: **Fri May 09, 2008 8:09 am**

Hi,

I wish to know where i can find some informations about wrtting pixel shaders in assembly as you said above.I am highly interested.

Posted: **Fri May 09, 2008 8:11 am**

einsteinjunior wrote:Hi,

I wish to know where i can find some informations about wrtting pixel shaders in assembly as you said above.I am highly interested.

Google GLSL and Cg.

Posted: **Thu Jun 05, 2008 10:58 pm**

einsteinjunior wrote:Hi,

I wish to know where i can find some informations about wrtting pixel shaders in assembly as you said above.I am highly interested.

I have a few on my harddrive that I wrote, it's an older version of pixel shaders so it doesn't include branching, so pretty useless for OS dev in most cases, unless you are trying to write a simulation that has to process tons of data in a similar fashion.

Posted: **Tue Jun 24, 2008 9:59 pm**

http://www.wired.com/techbiz/it/news/2008/06/gpu_power

Posted: **Wed Jun 25, 2008 5:53 am**

MarkOS wrote:
grover wrote:
MarkOS wrote:How can we implement drivers for NVIDIA's CUDA and AMD's Close-to-the-Metal architectures? Their sources are opened to the public? If not, how we can do?
CUDA is nothing but a software layer on top of the classic texture memory and shaders. So you're not looking at something special there, its just sugar coating.
What? Operations on nvidia cards are done with pixel shaders?

I think he might of been referring to the unified shader architecture, in modern GPUs there is no such thing as a pixel shader, all the shaders are capable of all the different types of instructions...

As for the 280GTX Tri-SLI

, 6 months ago I bought 2 8800 GTS 512mb and I was nearly at the top of what technology could provide... 6 months later I'm one graphics card and 2 series down...

Jules

Posted: **Tue Jul 08, 2008 3:03 am**

Yes, I was talking about the shader processors, more commonly referred to unified shaders (that being the DirectX term).

Anyways the current AMD/ATI Radeon 4870 has a performance of 1.2 Teraflops in a single card configuration, which beats anything NVidia has to offer from prices, power and performance perspectives.

The 4870 consists of 10 shader cores, each having 80 32-bit stream processors making a total of 800 shader processors on a single card. So you could schedule up to 10 processes on the GPU...

Posted: **Tue Jul 08, 2008 5:37 am**

Well, though the ATI Radeon HD 4870 produces 1.2 TF and the Nvidia Geforce 280GTX only produces 0.933, it outperforms the 4870 on most tests....
Probably because it has a higher amount of ROPs, faster shader clock and more TMUs, so though it slower at generating the 'skeleton' image, it will be faster at mapping textures and actually getting it onto the screen...
Meaning that if you want an OS that runs on the GPU, the 4870 is probably faster, but not for most games....
Jules

Posted: **Wed Jul 09, 2008 6:35 am**

The DirectX SDK (Windows only, although they hare online documentation on MSDN) is a good resource, but you need to trick it to do GPGPU (General Processing on GPU) since Direct3D is only designed to use shaders for manipulating vertices, geometry, and pixels. You can pass it virtually any kind of data to work with.

If you wanted to do repetitive calculations on a large data set you could store all the data in a vertex buffer and let if manipulate through it all in a few calls

Posted: **Wed Jul 09, 2008 10:49 am**

Just wanted to know out of interest is it possible for a video card to fetch memory from main memory?
Because then couldn't you run the kernel on the video memory at make it fetch instructions by itself, or is it necessary for the instructions to be fetched by the CPU?
Jules

Posted: **Wed Jul 09, 2008 1:00 pm**

Hi,

suthers wrote:Because then couldn't you run the kernel on the video memory at make it fetch instructions by itself, or is it necessary for the instructions to be fetched by the CPU?

A GPU is designed to do the same operation on a large array of data, and gets really high FLOPS by doing the operations in parallel.

A kernel (especially a micro-kernel) never does the same operation on a large array of data, and therefore being able to do lots of operations in parallel is entirely useless. I did the math and found that at one operation per instruction (rather than 80 operations per instruction), an old 1 GHz Pentium III has better performance than one of the cores in your 1.2 teraflops GPU.

Of course this is based on theoretical maximums, which has nothing to do with the performance you'd get in practice. In practice a lot of time is spent fetching data from RAM (or for your GPU, fetching data from video RAM after you've dragged it across the PCI bus from main RAM). Therefore, in practice if you try running general purpose code on a GPU I'd expect all to get less performance from all cores of your 1.2 teraflop GPU than you would from an old 1 GHz Pentium III.

However, I'm ignoring scalability problems - ten cores don't give the same performance as a single core that's ten times faster (bus sharing problems,, lock contention, etc). Because of this, I'd estimate that your 1.2 teraflop GPU would perform about as well as a 600 MHz Pentium II. But...

This assume that the instruction sets are roughly equal, and they're not. Doing a floating point addition quickly won't help when you're having trouble figuring out how to do a basic function call when there's no CALL/RET instructions and no stack. Because of this I'd expect some seriously ugly hacks to make any general purpose code run on a GPU, and these ugly hacks are going to cost you. Because of this, I'd estimate that your 1.2 teraflop GPU would perform about as well as a 266 MHz Pentium.

Of course I've overlooked something. You'd need a separate kernel for each type of GPU (which would be entirely insane - the kernel would be obsolete before it's written) or some sort of run-time compiler that compiles "generic shader-language" into something the GPU can actually execute. Because of this I'd estimate that your 1.2 teraflop GPU would perform about as well as a 66 MHz 80486.

Unfortunately, it'll be like a 66 MHz 80486 without an MMU or protection (or even basic IRQ handling) that's sitting next to a general purpose CPU that's capable of executing general purpose code thousands of times faster...

Cheers,

Brendan

OSDev.org

GPU based?

Re: GPU based?

Re:

Re: GPU based?

Re: GPU based?

Re: GPU based?

Re: GPU based?

Re: GPU based?