Hi Brendan,
This post has been getting off the topic, I will try to get back to what I interpret your original post is about - rather than a graphics buffer being a set of pixels, it contains a list of drawing commands, on the hope that this will result in a lot less data having to be transferred to the parent. I think it has the potential to be a great idea.
I have some topics I'd like to bring up, not all bad:
1) You can do vector graphics!
You don't need to issue a redraw event when you to resize a window! Some applications may still want to do this of course (you resize a Firefox window, Firefox will want to use want to use that extra space to show more of the webpage, rather than make everything scale larger.)
2) Asynchronous rendering
A really simple example - I am writing an Emacs like text edit that is using Lua (was using Google's v8 Javascript but I ran into overhead issues switching between sandboxed instances) - these high level languages are very slow at bit twiddling, so that's something I have to avoid all together. Instead I use
Cairo. When it comes time to redraw the window, I have a loop that goes over each frame (displaed document) and says (pseudocode):
Code: Select all
foreach(Frame f):
if(!f.redraw && !screen.fullredraw) continue; // skip
cairo_clip(f.x, f.y, f.width, f.height);
lua_call(f.draw_func);
I wrote a C wrapper around Cairo, so inside of each frame's draw_func, LUA calls the high level Cairo instead (select pen, draw line) and it works pretty well.
(I must note, that I also wrap Cairo_Clip in LUA, but in the wrapper I must do additional bounds checking, to ensure the 'sub'-clip stays within frame, otherwise it would be possible for a frame to draw in arbitrary positions all over the screen!)
This may not work for general purpose operating system graphics - it requires all graphics to be synchronous.. You need to wait for the lower level to finish drawing before the higher level can use the results. One bad application can make the entire system unresponsive if it takes too long to draw.
If you do asynchronous graphics (everything running at it's own speed) then there can be problems if you try to access the results while the child is half way through drawing. To see this in action, take any low level rendering framework, and clear the screen red, and draw a photo on top:
Code: Select all
while(!keyDown(key_esc)):
draw_rect(0, 0, width, width, RED);
draw_image(0, 0, width, width, loaded_photo);
You will see the image flash between red and the photo. That's because the graphics driver or window manager is trying to read the result at arbitrary positions of execution (if you have components that never overlap, such as rendering a single image or video to the screen, this issue does not occur). As a result, most modern window managers have a "Double Buffer" flag you can set - otherwise GUI components will be flickering all over the place as you resize it. (You have a "drawing" buffer you draw to, and a "rendering buffer" the graphics driver reads from, and when you finish rendering, you copy the "drawing" buffer to the "rendering" buffer, and it'll prevent flickering overlaps. You actually have to copy the buffers, you can't do a simple pointer swap - if you think about this, you will figure out why. You can optimize it with a pointer swap, at the expense of some memory, if you do triple buffering.)
You will need to think about how you will implement double buffering. With pure pixels, you can do a lock-less copy because each pixel can fit in a 32-bit word, and on most systems, copying a memory aligned word is atomic.
In 99% of applications, it doesn't matter if your old buffer is half way through copying into the new buffer when the screen updates (hard-core 3D gamers will sometimes notice
tearing - if you rotate a camera fast, the next half of frame is on the top part of the screen, bottom part contains the previous frame - you need to synchronize the buffer copying to avoid this - look up "v-sync") - however if your buffer contains a "command list" like you're proposing, instead of pixels, if you have half of the old command list mixed in with half of the new command list, you won't see half of the previous frame with half of the next frame, no, you'll see a completely different final image, as the renderer's state could have changed in a way that could dramatically change how the rest of the buffer will be executed.
To avoid both of this issues (drawing a half written buffer to the screen, and having to lock while copying), I would recommend a triple buffer approach. You will need 3 'command list' buffers:
1. Currently Drawing Into
2. Frame Ready To Render
3. Currently Drawing Into The Screen
Now, you can have two loops running completely asynchronous. In your application:
Code: Select all
// real-time, we want to keep updating the screen (video game, playing media, etc.)
while(running):
Draw Into 1;
Atomically Swap Pointers 1 And 2;
// or event based.. (only update when something has changed.)
on button.click:
label.text = "Clicked";
Draw Into 1;
Atomically Swap Pointers 1 And 2;
And in your window manager/graphics driver/anything reading the buffer:
Code: Select all
while(running):
Draw 3 To The Screen;
Atomically Swap Pointers 2 And 3;
You still have to perform a lock, but only to safely swap the pointers (and depending on your architecture you may have an atomic "swap" instruction) - which is arguably a lot faster than locking a buffer for the entire duration of copying to it, or drawing to the screen.
2. Dynamic allocation/memory usage
You'll have to consider that the buffers may rapidly change size between frames. If you add a new GUI component, that may be an extra 10 commands. It's unlikely that an application will know how many commands it is about to draw, until it draws them, so likely you'll have to dynamically expand the buffer as you write to it.
In an event-based GUI system, I would consider it a non-issue, since you only need to redraw when something changes (which happens when the user provides input, for example), but in a real time application like a video-game, every time a character enters the screen, that may mean a dozen new draw calls, and you might end up resizing the command buffer every frame.
In a pixel-based system, your buffers have a fixed size (width*height) unless the user resizes the window.
3. Performance
Your main concern is to prevent the copying of pixels. Above, I discussed how we would need to do triple buffering (at the expense of some memory) to avoid very locking while copying between your drawing and screen buffers.
You can even triple buffer pixel buffers to avoid copying them, until the final moment when you want to compose everything on the screen into one image to send to your video card.
At some point, you will have to execute your command list. If your whole OS graphics stack is abstracted away from the pixel, then this will be done at the final stage in your graphics driver. Will you be able to execute this command buffer fast enough during the v-sync period, to avoid the 'flickering' I mentioned early about having a semi-drawn image displayed on the screen?
If not, you will have to double buffer in the graphics driver (one buffer that you're executing the command list in to, one buffer that's ready for rendering.) So you'll be copying pixels during this final double buffer.
If you using traditional pixel buffers, and can triple buffer all the way from your application into the graphics driver using pointer swaps, you don't have to do any pixel copying, until the final moment when you compose it into a single buffer, then copy this buffer into the final graphics card buffer (and you can skip this double buffering if you don't have any overlapping windows, such as a single full screen application.)
Next thing - copying pixels is blazingly fast.. Will executing your command list also be blazingly fast? If the command lists are executed in the graphics driver, does that mean the graphics driver will have to redraw all of the fonts, redraw all of the components on the webpage, redraw all of that video game that you have paused on the side of the screen?
Rather than executing the command buffer in the graphics driver, and rendering everything on the screen, and recalculating the final colour of every pixel, everytime you want to update the screen - would it not be better performance to let each application have their own pixel buffer, with the final colour of every pixel already precalculated (that the application was able to do in its own time), all you have to do is copy the pre-calculated pixels onto the screen?
I think this is more obvious in 3D. Calculating a pixel in 3D means performing lighting equations, calculating shadows that could have been projected on to it from multiple sources, applying refraction/reflection textures depending on the material, doing texture lookups.
It's often desirable to let the program choose when to perform these calculations.
In a 3D modelling program (Blender, Autodesk's tools, Google Sketchup, etc.) you can work with complicated models containing millions of vertices and extremely detailed textures. However, the UI of those tools stay responsive, because it only redraws the 3D scene when something changes (the camera moves, or we apply a change to the model), not while the user is clicking on toolbars and working in sub-windows. You can have multiple 3D modelling programs open side by side or on different monitors, and the entire system is fully responsive, because only the modelling program you're working with is updating its 3D scene.
The system is able to do this optimisation because all the video card driver cares about is receiving the final pixels from each window. The application is intelligent enough to know not to redraw the 3D scene unless something has changed in it.
If we're actually sending the 3D scene as a command list to the graphics driver to draw for us, it may not know that it has not changed, and will try to redraw every 3D scene every time the monitor refreshes.
There are also times when sending individual pixels is desirable. A custom video decoder? Sending millions of "SetPixel(x,y,colour)" commands would be significantly slower than just letting it write the pixels itself. When I worked on VLC, I saw that they do a lot of low level assembly optimization for unpacking pixels in bulk. You could take the OpenGL route and say "Yes, you can draw pixels into a texture, then send a command to draw that texture to the screen" - but is that not just causing more pixel copying - the very thing you're trying to avoid?
As a note to what I said above - if you have a full screen application, you can often triple buffer straight from the application into the graphic card's buffer without dealing with the window manager. This is called 'exclusive mode' in the Windows-world and is often used for full screen applications. Starting in Windows Vista, all window composting is done on the GPU, but in earlier versions - full-screen 'exclusive mode' (where Direct3D surface would render directly to the screen) was significantly faster than full-screen or window 'non-exclusive mode' (the Direct3D surface would be drawn in the GPU, then was copied back into main memory, which the CPU would then perform the window composting (other windows, popups, etc.) and have to re-upload the final screen back to the GPU.) Gamers will probably remember these days, because popup notifications would force fullscreen games to minimize. If you're worried about raw pixel-pushing speed, consider having an 'exclusive mode' for media players, video games, etc.
4. Multi-monitor rendering
I think this is unrelated to command lists vs. pixels. Every multi-monitor machine I've used has allowed me to have windows span more than one monitor.
5. Shaders
This is unrelated. I would see your system as
more favorable to shaders, if your command list is executed on the GPU.
Brendan wrote:The physics of light do not change. The only thing that does change is the "scene"; and this is the only thing an application needs to describe.
It is true, the physics of light do not change. However, computers are yet to reach the stage when we are able to simulate, even just a room, at a fine grain atomic level in real time. Real-time ray tracing is barely reaching that level, real-time
photon tracing is still way off. Rendering graphics is all about using approximations that give an effect that is "good enough" while maintaining acceptable performance.
Yes, the programmable pipeline (the name given to programmable shader-based rendering) is harder to learn than the fixed-function pipeline (non-shader-based rendering), because GPUs have become significantly more complicated. But, if you're working on a game that requires 3D graphics, there are libraries out there to help you - like
Ogre 3D and
Irrlicht. With these frameworks, you don't have to worry about shaders (but can if you want to) - you just create a scene, load in a model, attach a material, set up some lights, and away you go. You don't have to worry about shadow projection, morphing bones to animate characters, occlusion culling (octrees and anti-portals), writing custom shaders.
Shaders have taken off because they're fast. Modern graphics cards are able to run pixel shaders on the order of billions per second, using specialized units that run in parallel and are designed specifically for those sorts of operations.
What are some effects that are hard to do without shaders?
Rim lighting,
cell shading, water (
particles as metaballs that are calculated in screen space,
foam and mist),
soft particles,
ambient occlusion,
motion blur - and custom lighting pipelines like
deferred lighting and pre-pass lighting (which would require a render pass per light in traditional rendering) that is best suited to a particular application. Vertex shaders let you calculate do animation on the GPU (so you're not always having to upload a new vertices with new positions each frame) - flapping flags, water waves, skeletal animation. Tessellation shaders allow you to insert vertices at runtime, purely on the GPU, based on dynamic parameters such as textures and the camera position, preventing you from having to a) continuously stream new vertices to the GPU, and b) having to upload large geometry to the GPU, when you can you just upload a flat plane to the GPU and
let the tessellation shader do the rest.
Conclusion
A call-based graphics stack surely is interesting, and I applaud your innovative ideas.
It really depends on your specific application. If you're developing an industrial OS that uses specific 'standardized' GUI components, it may be better to let your application send those GUI components directly to the OS.
However, a general purpose operating system is likely to want to support all kinds of dynamic content, including media, web pages, video games - in which case, after thinking deeply into it, I still think you would gain better performance using traditional pixel-based buffers implementing a fast lockless triple buffering system that prevents any copying until the final image is composed on to the screen.
As far as vector drawing APIs to draw resolution independent user interfaces, and scene-based APIs to simplify 3D graphics - I think these would be better suited as libraries that wrap around your low-level graphics stack - be it a pixel buffer, or GPU-accelerated OpenGL calls.
Windows wraps around these with their own APIs like GDI, WPF, Direct2D, and DirectWrite. Mac OS X offers Quartz 2D. Cairo and Skia are cross platform solutions that can optionally use an OpenGL backend or write directly to a pixel buffer. As far as 3D goes, there are hundreds of cross-platform scene APIs out - Irrlicht offers a software backend that doesn't require GPU acceleration.
Personally, if I were you, I'd first focus on optimizing pushing the pixel buffer from your application to your screen in as least calls as possible. Then focus on your high-level vector drawing or scene-based 3d as a user-library.