Combuster wrote:rdos wrote:your extremely slow command lists
Biased remark based on your own ignorance. In fact, manual pixelpumping in my OS is slower than sending render commands since the driver gets to convert that to accelerator instructions.
GPU accelerated path is faster than non accelerated path? What a surprise.
Sending generic command to graphics driver, which must then translate them into device specific commands is slower than calling driver function which just places a GPU command into a command buffer, and when that is full asks the driver to take it and pass it to the hardware (and give it another buffer)? Only sounds logical
Brendan wrote:I thought I should probably clarify what I meant; with a little experiment.
Start "glxgears" in a large window, and write down the frame rate you get out of it (I get about 2000 frames per second).
Now open some other window (doesn't matter what it is as long as there's no transparency - I used a text editor) and arrange that other window so that it covers almost all of the "glxgears" window, so that only a thin strip of black pixels from "glxgears" is showing. Based on the original "entire window visible" frame rate and the amount that is now invisible, write down an estimate of how much you think the frame rate should've improved. My guess was about 10000 times faster (given that all of the polygons should've been clipped and nothing needed to be drawn at all). Check what frame rate you're actually getting now (I get about 4000 frames per second) and see how much the frame rate improved (about 2 times faster). Is this anywhere near what you expected (no); or is it many orders of magnitude worse than what I should've been able to expect (yes)?
Now arrange the other window so that the entire "glxgears" window is entirely covered. Write down a new estimate of how much you think the frame rate should've improved. I've got 16 CPUs that are capable of doing "nothing" about 4 billion times per second, so my estimate was 64 billion frames per second. Check what frame rate you're actually getting now (I still get about 4000 frames per second). Is this anywhere near what you expected; or have you realised why I call it retarded?
Now minimise the "glxgears" window and try again. My estimate was that it should've reported the frame rate as "NaN". I still get about 4000 frames per second. I was expecting "retarded" from the start, so does this mean I'm an optimist?
So you're taking GLXGears - the height of 1992 3D graphics - on top of X11 - the height of windowing systems in 1984, and declaring that because it sucks (and people who don't think X11 sucks are a rarity), all graphics systems suck?
GLXGears has one mission in life: Push as many frames as possible. It's dumb, its' stupid, it doesn't listen for minimize events/etc. Yeah. We get it. Way to choose a terrible example.
Yes, I expected it to do 4000 frames a second. GLXGears is terrible, its' entirely CPU bound, its' using crappy old OpenGL immediate mode from 1992. Your GLXGears framerate is a better proxy for how quickly your machine can context switch than anything graphics related.
If you'd chosen a decent example (say, something built upon a widget toolkit), you would have found that they stop rendering when they're not visible because they never receive paint events.
Of course, if you were implementing a modern GUI system you might decide to be smart and take away GLXGears' OpenGL context whenever it was minimized (or a full screen application was running in front of it). You know, like Android does. Actually, at that point GLXGears would crash but thats only because its' a crappy program (and because X11 never takes away your context).
Brendan wrote:I've got 2 completely separate ATI video cards (both driving separate monitors). For fun, I positioned a "glxgears" window so that half was on each monitor. With 2 entire video cards doing half the work each, the efficiency "improved" from 4000 frames per second all the way "up" to 200 frames per second! This is obvious proof that I'm deluded, and that it's impossible to do anything more efficiently...
So I take it you're using an old and crappy non-compositing window manager then? Because that's the only situation in which it would become relevant that the window was interposed in between two displays
Of course, in that case each graphics card isn't doing half the work each. Assuming the commands are going to both cards, they're both rendering the scene. Even if you managed to spatially divide the work up precisely between the cards.. they'd still be rendering more than 1/2 of the triangles each (because of overlap)
Whatever is going on, its' not smart, and it says more about the X server than GLXGears.
What I suspect is happening is that the "primary" card is rendering the image. Because its' split between the two screens, X11 cant' do its normal non-composited thing and render directly into a scisor-rect of the desktop. Therefore, its' falling back to rendering to a buffer, and its' probably picked something crappy like a pixmap (allocated in system memory), then copying out of said pixmap to the framebuffer on the CPU.
Even if you're using a compositing window manager, X11 barely counts as modern. If you're not using a compositing window manager, yeah, expect these problems because that's just how X11 is.
Brendan wrote:Hi,
Owen wrote:So you think that you can produce the one final retained mode 3D engine to rule all 3D engines?
No; that's why I want to provide an adequate abstraction, where applications only describe what to draw and the 3D rendering engine (built into video drivers, etc) can be improved without breaking everything.
OK, so you do expect to make the one final retained mode 3D engine to rule all 3D engines then.
[quote="Brendan"
Owen wrote:Oh, and you plan to use the same assets for raytracing? You're deluded. Rasterization and raytracing take very different geometry formats.
Is there any technical reason why the same "description of what to draw" can't be used by both rasterisation and ray tracing? If there is, I'd like to know what they might be.[/quote]
Because efficient and pretty rasterization uses triangle meshes, normal and/or displacement maps, fragment shaders with atomic counters for real time global illumination, stencil buffer "hacks" for shadows, etc, all of which are engineered for the rasterization pipeline.
"Efficient" and pretty raytracing, on the other hand, uses NURBS and other mathematical models of shapes which avoid the discontinuities of such vertex based geometry.
(Also note that, for a lot of materials, the "rasterization plus a lot of trickery" model works a lot better than raytracing, because raytracing breaks down for diffuse illumination)
Brendan wrote:If the graphics are completely ignored; then in both cases the application decides what to draw. After that, in my case the application appends some tiny little commands to a buffer (which is sent, then ignored). In your case the application does a massive amount of pointless pixel pounding (which is sent, then ignored). If you think my way is slower, you're either a moron or a troll.
GLXGears is hardly doing any pixel pounding. In fact, its' doing none at all. It's building command lists which, if your driver is good, go straight to the hardware. If your driver is mediocre, then.. well, all sorts of shenanigans can occur.
[quote="Brendan"
rdos wrote:In the partial case, your commands must employ some smart tactics as to which commands to ignore, which to modify and which to send. This requires extra processing (slows down the fully visible case). It is also possible to do the same thing without commands, but that also slows down the typical case.
The "smart tactics" are a small amount of work that avoids a much larger amount of work. For the fully visible case, the only extra processing is appending commands to a list and decoding commands from the list (all the rendering will cost the same, it just happens in a different place). If nothing happens to the graphics after the application creates it, then your way would avoid the negligible extra work of creating the list of commands and decoding it. However, if anything does needs to happen to the data afterwards (e.g. rotating, scaling, etc; or even just copying it "as is" from the application's buffer into a GUI's buffer) then your way ends up slower.[/quote]
Assuming that the application just draws its' geometry to the window with no post processing, yes, you will get a negligible speed boost (any 3D accelerator can scale and rotate bitmaps at a blistering speed).
Of course, for any modern 3D game (where modern covers the last 7 years or so) what you'll find is that the geometry you're scaling is a quad with one texture attached and a fragment shader doing the last post-processing step.
Of course, that speed boost will only apply if you're drawing the contents of that window once. Of course, most non-game windows don't change every frame (and we have established already that for game windows its' pointless),
Basically, for the common case (where something does need to happen to the application's graphics after the application has created it, regardless of what that "something" is), your way is slower. However, for the special case (e.g. where the application is full screen)... oh wait, your way is slower.
Brendan wrote:Also note that you are completely ignoring the massive amount of extra flexibility that "list of commands" provides (e.g. trivial to send the "list of commands" to multiple video cards, over a network, to a printer, to a file, etc).
Whats more efficient:
- Executing the same commands on multiple independent GPUs
- Executing them on one GPU (or multiple couple GPUs, i.e. CrossFire/SLI)
Answer: The later, because it uses less power (and because it lets you use the second GPU for other things, maybe accelerating physics or running the GUI)
Also, I don't know how you intend to do buffer readbacks when you're slicing the commands between the GPUs, or deal with when one GPU only supports the fixed function pipeline and the other supports all the latest shiny features.
Any GUI lets you send the commands over a network or to a file; you just need to implement the appropriate "driver"
To a printer is a special case. Actually, an OpenVG-like API to drive a printer isn't a bad an idea... Cairo essentially implements that (with the PostScript/PDF backends). It makes sense for 2D... not so much for 3D. But thats nothing new; Microsoft managed that abstraction decades ago; you've used GDI to talk to printers in Windows since eternity.
3D rendering to a printer? There is no "one size fits all" solution, just like there isn't for converting a 3D scene to an image today.
Every good system is built upon layers. For graphics, I view three:
- The low level layer; your OpenGL and similar; direct access for maximum performance. Similarly, you might allow direct PostScript access to PostScript printers (for example)
- The mid level layer; your vector 2D API; useful for drawing arbitrary graphics. Cairo or OpenVG.
- The high level layer; your GUI library; uses the vector API to draw widgets
You'll find all competent systems are built this way today, and all incompetent systems evolve to look like it (e.g. X11 has evolved Cairo, because X11's rendering APIs suck)
Cheers,
Owen