Brendan wrote:Hi,
Owen wrote:I was lazy and it was an easy test to do. I downloaded "glxgears" for Windows (Vista) and did the experiment again - exactly the same speed when everything except a thin black strip is covered, and 5 times faster when the window is minimised. This shows that both X and Vista suck (but doesn't rule out the fact that glxgears sucks).
I don't have any 3D games for Linux. Instead I tested the frame rate of Minecraft on Windows. I set the rendering distance to "far" and stood on top of a very tall tower looking out towards the horizon, so that there's lots of graphics on the bottom of the screen and nothing but blue sky at the top of the screen. With the game paused and the window entirely visible I got 25 frames per second, and with the game paused and everything except for part of the sky (and the FPS display) covered up I got the same 25 frames per second - no speedup at all. I didn't/couldn't measure it minimised. Then I decided to try Crysis and got the same frame rates (for both the menu and while playing the game) regardless of how much of the game's window is obscured.
Finally (for fun) I decided to try Minecraft and Crysis running (and both visible) at the same time. Minecraft was paused and Crysis seemed playable. After loading a saved game in Crysis, I ran up to a beach and decided to unload some ammo into a passing turtle. As soon as I pressed the trigger the graphics froze, then Minecraft and Crysis both crashed at the same time. Three cheers for mouldy-tasking OSs!
Feel free to conduct your own tests using whatever games and/or 3D applications you like, running on whatever OS you like, if you need further confirmation that existing graphics systems suck.
Theres' no denying that in some regards they suck. Actually, modern versions of Windows are handicapped in some ways with this - if you hover over the Crysis icon in your task bar you'll probably find an animated thumbnail of it.
Owen wrote:So I take it you're using an old and crappy non-compositing window manager then? Because that's the only situation in which it would become relevant that the window was interposed in between two displays
I'm using KDE 4, which is meant to support compositing (but every time I attempt to enable compositing effects it complains that something else is using the graphics accelerator and refuses). To be honest; it was hard enough just getting it to run OpenGL in a window (different versions of X and ATI drivers and libraries and whatever, all with varying degrees of "unstable"); and once I found a magic combination that seemed to work I stopped updating any of it in case I upset something.
Got it. Crappy video drivers.
Brendan wrote:Owen wrote:Of course, in that case each graphics card isn't doing half the work each. Assuming the commands are going to both cards, they're both rendering the scene. Even if you managed to spatially divide the work up precisely between the cards.. they'd still be rendering more than 1/2 of the triangles each (because of overlap)
True; but I'd expect worst case to be the same frame rate when the window is split across 2 video cards, rather than 50 times slower.
Owen wrote:Whatever is going on, its' not smart, and it says more about the X server than GLXGears.
What I suspect is happening is that the "primary" card is rendering the image. Because its' split between the two screens, X11 cant' do its normal non-composited thing and render directly into a scisor-rect of the desktop. Therefore, its' falling back to rendering to a buffer, and its' probably picked something crappy like a pixmap (allocated in system memory), then copying out of said pixmap to the framebuffer on the CPU.
Even if you're using a compositing window manager, X11 barely counts as modern. If you're not using a compositing window manager, yeah, expect these problems because that's just how X11 is.
Yes, but I doubt Wayland (or Windows) will be better. The problem is that the application expects to draw to a buffer in video display memory, and that buffer can't be in 2 different video cards at the same time. There's many ways to make it work (e.g. copy from one video card to another) but the only way that doesn't suck is my way (make the application create a "list of commands", create 2 copies and do clipping differently for each copy).
Except this model is already in use today, on laptops and machines where all the outputs are connected to the IGP, such that the discrete graphics card can be powered down completely when you're not gaming. Surely you've heard of things like nVIDIA's Optimus and LucidLogix' Virtu?
For the former case the overhead is generally about 2%. Whether they're doing DMA from video RAM to system RAM, or just allocating the colour buffer in system RAM (its' the depth buffer which gets most traffic), I don't know. The latter option would actually work quite well with modern games which write the colour buffer exactly once (because they have various post processing effects)
Brendan wrote:Owen wrote:OK, so you do expect to make the one final retained mode 3D engine to rule all 3D engines then.
I expect to create a standard set of commands for describing the contents of (2D) textures and (3D) volumes; where the commands say where (relative to the origin of the texture/volume being described) different things (primitive shapes, other textures, other volumes, lights, text, etc) should be, and some more commands set attributes (e.g. ambient light, etc). Applications create these lists of commands (but do none of the rendering).
So the final retained mode 3D engine to rule all 3D engines. Except yours is going to be more abstracted and therefore lower performance than most, never mind the fact that developers aren't going to be able to do anything innovative on it because you haven't implemented any of that yet...
Sure, you can raytrace polygon meshes. You'd never want to.
As I said: Raytracing provides very good reproduction for specular lighting effects and terrible reproduction for diffuse effects. Rasterization natively provides nothing, but the shading systems modern engines use mean that, for diffuse lighting - the majority of lighting in the real world - they're more convincing.
Brendan wrote:Owen wrote:Brendan wrote:If the graphics are completely ignored; then in both cases the application decides what to draw. After that, in my case the application appends some tiny little commands to a buffer (which is sent, then ignored). In your case the application does a massive amount of pointless pixel pounding (which is sent, then ignored). If you think my way is slower, you're either a moron or a troll.
GLXGears is hardly doing any pixel pounding. In fact, its' doing none at all. It's building command lists which, if your driver is good, go straight to the hardware. If your driver is mediocre, then.. well, all sorts of shenanigans can occur.
I don't think we were talking about GLXGears here; but anyway...
For GLXGears, the application sends a list of commands to the video card's hardware, and the video card's hardware does a massive pile of pointless pixel pounding, then the application sends the results to the GUI (which ignores/discards it)?
Why not just send the list of commands to the GUI instead, so that the GUI can ignore it before the prodigious pile of pointless pixel pounding occurs?
Why does the GUI not just take the application's framebuffer away?
Brendan wrote:Owen wrote:Assuming that the application just draws its' geometry to the window with no post processing, yes, you will get a negligible speed boost (any 3D accelerator can scale and rotate bitmaps at a blistering speed).
Of course, for any modern 3D game (where modern covers the last 7 years or so) what you'll find is that the geometry you're scaling is a quad with one texture attached and a fragment shader doing the last post-processing step.
So for a modern 3D game; you start with a bunch of vertexes and textures, rotate/scale/whatever the vertexes wrong (e.g. convert "3D world-space co-ords" into "wrong 2D screen space co-ords"), then use these wrong vertexes to draw the screen wrong, then rotate/scale/whatever a second time to fix the mess you made; and for some amazing reason that defies any attempt at logic, doing it wrong and then fixing your screw-up is "faster" than just doing the original rotating/scaling/whatevering correctly to begin with? Yay!
Deferred shading is quite common on modern games. Rather than rendering the geometry with all the final shading effects, you render it with a shader which outputs the details relevant to the final shading effects. This buffer might contain the data
Code: Select all
ubyte[3] rgb_ambient;
ubyte[3] rgb_diffuse;
ubyte[3] rgb_specular
half_float[2] normal; // 3rd component of normal reverse engineered later because normals are normalized
plus of course don't forget the depth buffer
Next, the game will render, to either the actual frame buffer or more likely a HDR intermediate buffer, a quad with a fragment shader attached which reads this data, reverse engineers the original position from the depth and the transformation matrix, and does the lighting that the game developer desired.
HDR lighting will then require at least 2 more passes over the data in order to work out the average intensity and then apply it. If the applciation is doing bloom, expect another couple of passes.
Why? Well, consider a lighting system which requires 1 pass per light, with 4 lights, over 500 objects. With deferred shading, that amounts to 504 GPU state changes. Without, it becomes 2000.
Deferred shading isn't perfect of course, because it doesn't work for transparent objects (so everyone fills them in with a standard forward rendering pass later - as they do anyway, because transparent objects are expensive and have to be rendered back to front, the least efficient order, and so if you render them last hopefully they'll be occluded)
Brendan wrote:Owen wrote:Brendan wrote:Also note that you are completely ignoring the massive amount of extra flexibility that "list of commands" provides (e.g. trivial to send the "list of commands" to multiple video cards, over a network, to a printer, to a file, etc).
Whats more efficient:
- Executing the same commands on multiple independent GPUs
- Executing them on one GPU (or multiple couple GPUs, i.e. CrossFire/SLI)
Answer: The later, because it uses less power (and because it lets you use the second GPU for other things, maybe accelerating physics or running the GUI)
I don't know what your point is. Are you suggesting that "list of commands" is so flexible that one GPU can execute the list of commands once and generate graphics data in 2 different resolutions (with 2 different pixel formats) and transfer data to a completely separate GPU instantly?
What I'm saying is that sending the same commands to both GPUs is a waste of a bunch of CPU time (particularly bad when most 3D is CPU bound these days) and a bunch of power and GPU time (~half of the objects on each GPU will be culled, making the entire geometry transformation a waste of time)
Buffer copying from one GPU to another is cheap (they have fast DMA engines). For two different pixel densities... render in the higher density. For two different gamuts... render into the higher gamut format.
Brendan wrote:I have no intention of supporting buffer readbacks. When one GPU only supports the fixed function pipeline and the other supports all the latest shiny features; one GPU's device driver will do what its programmer programmed it to do and the other GPU's device driver will do what its programmer programmed it to do.
OK, so any hope of doing GPGPU is gone, as is any hope of performance when you drag a window between your shiny GeForce Titan and the monitor plugged into your Intel IGP (because the IGP's driver just went into software rendering for everything because it doesn't support much, and what it does support is slow anyway).
Brendan wrote:Owen wrote:Any GUI lets you send the commands over a network or to a file; you just need to implement the appropriate "driver"
Yes; and sending 2 KiB of "list of commands" 60 times per second is a lot more efficient than sending 8 MiB of pixel data 60 times per second; so that "appropriate driver" can just do nothing.
Did I say otherwise?
Actually, the one saving grace of X11 is that it actually manages to do this at somewhat reasonable efficiency, but AIGLX is still significantly behind direct rendering. However, even removing the X server overhead couldn't hope to save it.
Brendan wrote:Every bad system is also built on layers. The most important thing is that application developers get a single consistent interface for all graphics (not just video cards), rather than a mess of several APIs and libraries that all do similar things in different ways.
Sure.
If you want to do realtime 3D, use OpenGL or DirectX.
If you want to do some 2D, use Cairo, or CoreGraphics, or GDI.
If you want a UI... use the UI toolkit.
Nobody doing a UI wants to use the same API as someone rendering a 3D scene. That would just be mad. And maddening.