OS Graphics

Brendan · Post by **Brendan** » Mon Jul 29, 2013 12:01 pm

Hi,

It seems people had difficulty understanding what I was trying to describe in another topic; so I thought I'd put together a much easier to understand example.

Imagine you have 2 applications ("foo" and "bar"). The first application creates a window with some text and 2 fancy icons/textures; and looks like this:

: foo.png (1.11 KiB) Viewed 9852 times

The second application creates a window with some text and 2 fancy icons/textures; and it looks like this:

: bar.png (1.1 KiB) Viewed 9852 times

Both applications send their graphics to the GUI. The GUI adds window decorations to both application's windows; then puts them on top of the desktop/background. The final result (what the user sees) looks like this:

: gui1.png (1.92 KiB) Viewed 9852 times

The GUI sends its graphics to a "virtual screen" layer. The virtual screen happens to be 2 monitors connected to 2 different video cards; both running at different resolutions and different colour depths. To make it more interesting, the monitor on the right is on its side.

The virtual screen layer sends this to the first video card (which displays it on the left monitor):

: m1.png (1.16 KiB) Viewed 9852 times

The virtual screen layer also sends this to the second video card (which displays it on the right monitor):

: m2.png (1.28 KiB) Viewed 9852 times

Now, some people (and most existing graphics systems) work on "raw pixels". My proposal is that it's better to work on "lists of commands". Let's see how both approaches work for this example.

Raw Pixels

In this case; the first application draws texture 1 (64 * 32 = 2048 pixels) and texture 2 (40*40 = 1600 pixels). The text is sent to a font engine which converts it into a texture (40 * 16 = 640 pixels). Then the application creates a buffer for its window (white background) and adds all the textures to it, creating another texture (200*100 = 20000 pixels). In total, the first application has drawn 2048+1600+640+20000 = 24288 pixels.

The second application draws texture 3 (120 * 60 = 7200 pixels), texture 4 (50*30 = 1500 pixels) and gets the font engine to create the text for its text (40 * 16 = 640 pixels); then creates a texture for its window's content (200*100 = 20000 pixels). In total, the second application has drawn 7200+1500+640+20000 = 29340 pixels.

The GUI gets the first application's window and has to add window borders; so it creates a larger texture for the first application's "decorated window" (205*105 = 21525 pixels), and the same for the second applications window (another 205*105 = 21525 pixels). It also draws its background/desktop, which is another texture (300*150 = 45000 pixels). Finally the GUI combines all of these into it's output (300*150 = 45000 pixels). In total, the GUI has drawn 21525 + 21525 + 45000 + 45000 = 133050 pixels.

The virtual screen layer gets the GUI's output. It creates one texture for the first video card (150*150 = 22500 pixels) and another texture for the second video card (150*150 = 22500 pixels) and sends the textures to the video cards. In total the virtual screen layer has drawn 22500+22500 = 45000 pixels.

For the complete thing, the total number of pixels drawn by both applications, the GUI and the virtual screen layer is 24288+29340+133050+45000 = 231678 pixels.

Of course all of this work wouldn't need to be done every time anything changes. For example; if the first application changes texture 1, then the first application would need to redraw texture 1 (2048 pixels) and redraw it's window (20000 pixels), the second application wouldn't redraw anything, the GUI would need to redraw the first application's "decorated window" (21525 pixels) and it's own buffer (45000 pixels), and the virtual screen layer would draw the textures for both video cards (45000 pixels). That all adds up to a total of 2048+20000+21525+45000+45000 = 133573 pixels drawn because texture 1 changed.

List of Commands

For this case; the first application has 3 "list of commands" (to describe the first texture, the second texture and its text). Then it creates a fourth "main list of commands" describing its window. It draws nothing and just sends these lists of commands to the GUI.

The second application does the same thing; sending its 4 lists of commands to the GUI.

The GUI gets the first application's "main list of commands" and modifies it to include commands for the window decoration. It does the same for the the second application's "main list of commands". The GUI also has it's own list of commands for its background, plus its "main list of commands". The GUI sends a total of 3+1+3+1+2 = 10 lists of commands to the virtual screen layer; but draws nothing.

The virtual screen layer looks at all these lists of commands and works out what needs to go to each video card. It can easily (and quickly) determine that the first video card needs 8 of them (GUI's background, GUI's main list, app1's main list, app1's text, texture 1, app2's main list, app2's text and texture 3) and determine that the second video card needs 6 of them (GUI's background, GUI's main list, app1's main list, texture 2, app2's main list and texture 4). The virtual screen layer sends the lists of commands to the video drivers (and draws nothing).

The driver for the first video card has to use the lists of commands it received to draw everything for the first monitor (150*150 = 22500 pixels). While doing this drawing it decides to create "cached textures" for texture 3 (7200 pixels) and app2's text (640 pixels). It knows that texture 1 and app1's text isn't visible and doesn't bother creating "cached textures" for them (at this time). For the remaining lists of commands it decides they're so simple that it can draw them directly without using any cached textures. That adds up to 7200+640 = 7840 pixels for cached textures plus 22500 pixels to send to the monitor; or a total of 30340 pixels drawn by the first video card's driver.

The driver for the second video card does something similar; creating 22500 pixels to send to the second monitor; and "cached textures" for both texture 2 (1600 pixels) and texture 4 (1500 pixels). It ends up drawing a total of 1600+1500+22500 = 25600 pixels.

For the complete thing, only the video drivers do any drawing; and the total number of pixels drawn is 22500+25600 = 48100 pixels. This is 20% less than the "raw pixel" approach which drew 231678 pixels, so this is approximately 5 times faster. However, the lists of commands had to be created, decoded and manipulated, and this adds a negligible amount of extra overhead, so it might only be 4.95 times faster.

Of course all of this work wouldn't need to be done every time anything changes. For example; if the first application changes texture 1, then the list of commands for texture 1 (and nothing else) would have to be sent from app1 to GUI to virtual screen layer to the first video card's driver; but the first video card's driver knows that texture 1 isn't visible anyway and draws nothing. For this case, drawing nothing is a lot less than the "raw pixel" approach which drew 231678 pixels; and (accounting for list creating, decoding, etc) this might be a million times faster.

Cheers,

Brendan

bluemoon · Post by **bluemoon** » Mon Jul 29, 2013 1:22 pm

What you describe is how OpenGL and modern graphics card work. Only 2 action is required:
1. Manipulate buffer, be it shader, texture, index, attribute array, etc
2. execute command (e.g. run shader)

However, not everyone be able to work with OpenGL acceleration, so raw pixel become the only choice.

Jezze · Post by **Jezze** » Mon Jul 29, 2013 1:25 pm

I think I understand the concept but I might be way off.

I just wonder what if you create a bunch of layers stacked on each other and when an application wants to draw something it will pass a pointer to its buffer that will go through all these layers where each layer will crop the buffer more and more depending on the properties of each layer until it reaches the bottom layer (the graphics card driver) that will actually write only the cropped result. Wouldn't that produce the same amount of pixels to draw as a final result?

Brendan · Post by **Brendan** » Mon Jul 29, 2013 2:30 pm

Hi,

bluemoon wrote:What you describe is how OpenGL and modern graphics card work. Only 2 action is required:
1. Manipulate buffer, be it shader, texture, index, attribute array, etc
2. execute command (e.g. run shader)

However, not everyone be able to work with OpenGL acceleration, so raw pixel become the only choice.

For most graphics systems; the application would ask OpenGL to create/load pixel data for its textures (even if/when the pixels aren't visible), then the application would ask OpenGL to create pixel data for its window (even if/when the pixels aren't visible); then the application sends the pixel data to the GUI. The GUI asks OpenGL to.... (even if/when the pixels aren't visible), etc.

This is the "raw pixel" case (where OpenGL/video card does the rendering all over the place, even if/when most of pixels drawn at each place aren't visible).

Jezze wrote:I just wonder what if you create a bunch of layers stacked on each other and when an application wants to draw something it will pass a pointer to its buffer that will go through all these layers where each layer will crop the buffer more and more depending on the properties of each layer until it reaches the bottom layer (the graphics card driver) that will actually write only the cropped result. Wouldn't that produce the same amount of pixels to draw as a final result?

If the bottom layer (e.g. graphics card driver) is the only thing drawing in the application's buffer; then it'd make more sense for the application's buffer to be "owned" by the graphics card driver.

Now think about the function calls that the application uses to ask the GUI to ask the "virtual screen layer" to ask the graphics card driver to draw something - that looks like a lot of context switches to me, especially if the application is drawing lots of things at once and causing context switches for each single thing it draws (where a "context switch" might be task switches between processes, or kernel API calls, or RPC over a network, or...). To reduce the number of context switches, why not send something like a "batch of commands" so that you only need one set of context switches to draw lots of things?

Now; what if one application sends a "batch of commands" asking the graphics card driver to draw something and the graphics card driver knows it's not visible and doesn't draw anything; but then the user shifts some other window exposing the "previously not drawn" part? You're going to have to ask the application to send the "batch of commands" again; or (to avoid that) the graphics card driver could keep the previous "batch of commands" (and draw things whenever they become visible, even if something doesn't become visible until later on).

With these 3 improvements, you've mostly got what I described..

Cheers,

Brendan

bluemoon · Post by **bluemoon** » Mon Jul 29, 2013 2:57 pm

Brendan wrote:This is the "raw pixel" case (where OpenGL/video card does the rendering all over the place, even if/when most of pixels drawn at each place aren't visible).

With hardware acceleration this is not a problem, the graphics card can render millions of texture a second, as long as the texture does not change (and stay on display memory), you only need to call "draw array" for each object and almost no traffic across the bus.

Now I don't know how to implement OpenGL on modern card, so...

Brendan · Post by **Brendan** » Mon Jul 29, 2013 3:32 pm

Hi,

bluemoon wrote:
Brendan wrote:This is the "raw pixel" case (where OpenGL/video card does the rendering all over the place, even if/when most of pixels drawn at each place aren't visible).
With hardware acceleration this is not a problem, the graphics card can render millions of texture a second, as long as the texture does not change (and stay on display memory), you only need to call "draw array" for each object and almost no traffic across the bus.

If there happens to be a video card and a native video driver for your hobby OS that's capable of rendering 60 million textured polygons per second (or 1 million polygons 60 times per second); then for my example, the first application can have 700 thousand polygons and the second application can have another 700 thousand polygons; and because I'd only be drawing 900 thousand polygons you'll get your 60 frames per second and the GPU will be idle for 10% of the time (which will help make the laptop's battery last longer).

Using "hardware is fast and/or cheap" to justify poor efficiency is a lame excuse. It's not an acceptable way to reward the user for providing good/fast hardware.

Cheers,

Brendan

Owen · Post by **Owen** » Mon Jul 29, 2013 6:20 pm

Brendan wrote:Hi,

It seems people had difficulty understanding what I was trying to describe in another topic; so I thought I'd put together a much easier to understand example.

Imagine you have 2 applications ("foo" and "bar"). The first application creates a window with some text and 2 fancy icons/textures; and looks like this:
foo.png
The second application creates a window with some text and 2 fancy icons/textures; and it looks like this:
bar.png
Both applications send their graphics to the GUI. The GUI adds window decorations to both application's windows; then puts them on top of the desktop/background. The final result (what the user sees) looks like this:
gui1.png
The GUI sends its graphics to a "virtual screen" layer. The virtual screen happens to be 2 monitors connected to 2 different video cards; both running at different resolutions and different colour depths. To make it more interesting, the monitor on the right is on its side.

The virtual screen layer sends this to the first video card (which displays it on the left monitor):
m1.png
The virtual screen layer also sends this to the second video card (which displays it on the right monitor):
m2.png
Now, some people (and most existing graphics systems) work on "raw pixels". My proposal is that it's better to work on "lists of commands". Let's see how both approaches work for this example.

Raw Pixels

In this case; the first application draws texture 1 (64 * 32 = 2048 pixels) and texture 2 (40*40 = 1600 pixels). The text is sent to a font engine which converts it into a texture (40 * 16 = 640 pixels). Then the application creates a buffer for its window (white background) and adds all the textures to it, creating another texture (200*100 = 20000 pixels). In total, the first application has drawn 2048+1600+640+20000 = 24288 pixels.

The second application draws texture 3 (120 * 60 = 7200 pixels), texture 4 (50*30 = 1500 pixels) and gets the font engine to create the text for its text (40 * 16 = 640 pixels); then creates a texture for its window's content (200*100 = 20000 pixels). In total, the second application has drawn 7200+1500+640+20000 = 29340 pixels.

Reasonable so far...

Brendan wrote:The GUI gets the first application's window and has to add window borders; so it creates a larger texture for the first application's "decorated window" (205*105 = 21525 pixels), and the same for the second applications window (another 205*105 = 21525 pixels). It also draws its background/desktop, which is another texture (300*150 = 45000 pixels). Finally the GUI combines all of these into it's output (300*150 = 45000 pixels). In total, the GUI has drawn 21525 + 21525 + 45000 + 45000 = 133050 pixels.

No newly designed compositing window system does server side decoration (Quartz and Wayland follow this) - that is to say, the decorations are already drawn around the buffers that the client submits (often by the mandatory to use windowing library). So, lets go with the decorated window sizes; 21525px each. The 20000px per window cost previously accounted for can therefore be disregarded (because you would just draw into the buffer eventually presented)

Brendan wrote:The virtual screen layer gets the GUI's output. It creates one texture for the first video card (150*150 = 22500 pixels) and another texture for the second video card (150*150 = 22500 pixels) and sends the textures to the video cards. In total the virtual screen layer has drawn 22500+22500 = 45000 pixels.

The first video card had a non-rotated image, so you would simply set the row stride to be 300 pixels (i.e. ignore the second half of the source texture)

Now, the second half (the rotated screen) depends upon the framebuffer hardware... however, rotation capability is quite common. The OMAP 3's DSS, for example, has [PLANE]_PIXEL_SKIP and [PLANE]_ROW_SKIP registers, which each contain a signed number of pixels to add to the framebuffer address between pixels and rows respectively. Appropriate programming of these can rotate the screen in each of the two directions.

I believe the two different display depths to largely be a red herring: when did you last see a 16bpp monitor?

If both screens are on one GPU, thats 45000 pixels that can be discounted. If they're on two, and its impossible to allocate memory acceptable to both, thats 22500 pixels to discount.

Brendan wrote:For the complete thing, the total number of pixels drawn by both applications, the GUI and the virtual screen layer is 24288+29340+133050+45000 = 231678 pixels.

Assuming good GPUs, 187178; assuming bad 209678

Brendan wrote:Of course all of this work wouldn't need to be done every time anything changes. For example; if the first application changes texture 1, then the first application would need to redraw texture 1 (2048 pixels) and redraw it's window (20000 pixels), the second application wouldn't redraw anything, the GUI would need to redraw the first application's "decorated window" (21525 pixels) and it's own buffer (45000 pixels), and the virtual screen layer would draw the textures for both video cards (45000 pixels). That all adds up to a total of 2048+20000+21525+45000+45000 = 133573 pixels drawn because texture 1 changed.

List of Commands

For this case; the first application has 3 "list of commands" (to describe the first texture, the second texture and its text). Then it creates a fourth "main list of commands" describing its window. It draws nothing and just sends these lists of commands to the GUI.

The second application does the same thing; sending its 4 lists of commands to the GUI.

The GUI gets the first application's "main list of commands" and modifies it to include commands for the window decoration. It does the same for the the second application's "main list of commands". The GUI also has it's own list of commands for its background, plus its "main list of commands". The GUI sends a total of 3+1+3+1+2 = 10 lists of commands to the virtual screen layer; but draws nothing.

The virtual screen layer looks at all these lists of commands and works out what needs to go to each video card. It can easily (and quickly) determine that the first video card needs 8 of them (GUI's background, GUI's main list, app1's main list, app1's text, texture 1, app2's main list, app2's text and texture 3) and determine that the second video card needs 6 of them (GUI's background, GUI's main list, app1's main list, texture 2, app2's main list and texture 4). The virtual screen layer sends the lists of commands to the video drivers (and draws nothing).

Given the following simple OpenGL snippet, and assuming everything else is in its' default case - i.e. no vertex buffers/etc bound to the pipeline

Code: Select all

glUseProgram(aShader);
glDrawArrays(GL_TRIANGLES,  0, 3);

Can you discern what is going to be drawn? Note that I just provoked the processing of 3 vertices with no buffers bound - yes this is legal. Actually, even if I'd bound a bunch of buffers that wouldn't help, because all the render system would know is something along the lines of "The buffer has a stride of 8 bytes and offset 0 in each stride contains a 4-vector of half floats bound to shader input slot 0"

And even if I was using the old glVertexPosition buffer bindings... that still wouldn't help because vertex shaders are free to do whatever they want to the position (it is after all kind of their whole point).

So any pretense that the "virtual screen layer" can, in the general case determine where anything is being drawn is a complete and utter falsehood.

Brendan wrote:The driver for the first video card has to use the lists of commands it received to draw everything for the first monitor (150*150 = 22500 pixels). While doing this drawing it decides to create "cached textures" for texture 3 (7200 pixels) and app2's text (640 pixels). It knows that texture 1 and app1's text isn't visible and doesn't bother creating "cached textures" for them (at this time). For the remaining lists of commands it decides they're so simple that it can draw them directly without using any cached textures. That adds up to 7200+640 = 7840 pixels for cached textures plus 22500 pixels to send to the monitor; or a total of 30340 pixels drawn by the first video card's driver.

The driver for the second video card does something similar; creating 22500 pixels to send to the second monitor; and "cached textures" for both texture 2 (1600 pixels) and texture 4 (1500 pixels). It ends up drawing a total of 1600+1500+22500 = 25600 pixels.

For the complete thing, only the video drivers do any drawing; and the total number of pixels drawn is 22500+25600 = 48100 pixels. This is 20% less than the "raw pixel" approach which drew 231678 pixels, so this is approximately 5 times faster. However, the lists of commands had to be created, decoded and manipulated, and this adds a negligible amount of extra overhead, so it might only be 4.95 times faster.

Except this is an utter falsehood. Any practical GUI involves alpha blending, because otherwise things like curved window borders look awful and you can't have things like drop shadows which users appreciate for giving the UI a sense of depth. So, actually, display 1 draws
22500 (background) + ~8000 (background window) + 2048px (texture 1) + ~16000 (decorated window) + 7200 (texture 3) + 640 (app2's text) = ~56388
while display 2 draws
22500 (background) + ~15000 (window containing T2) + 1600 (Texture 2) + ~7200 (window containing T4) + 1500 (T4) = ~47800
That gives a total of 104188px. We know from above that each window is 21515px, so the actual figure is 10138. Add in the caching you proposed and you're at 104138. Note that you can't optimize out any of those draws unless you know that nothing being drawn has any translucent segments.

So, now you're just over twice as good as my worst case estimate of 209678px, but the best is yet to come!

Brendan wrote:f course all of this work wouldn't need to be done every time anything changes. For example; if the first application changes texture 1, then the list of commands for texture 1 (and nothing else) would have to be sent from app1 to GUI to virtual screen layer to the first video card's driver; but the first video card's driver knows that texture 1 isn't visible anyway and draws nothing. For this case, drawing nothing is a lot less than the "raw pixel" approach which drew 231678 pixels; and (accounting for list creating, decoding, etc) this might be a million times faster.

The application changed texture/image 1. It set a clipping rectangle in its' graphics toolkit, and redrew the window background plus texture for a cost of 4096px drawn. It then sent a damage event to the GUI server and the GUI server then recomposited the associated rectangle of the display, drawing the background+window1+window2 at a cost of 6144px. Total cost 10240px. Your display list system also received a texture update for texture 1. It also set a clipping rectangle and then drew the background+window1bg+texture1+window2bg+texture3, at a total cost of... 10240px. Of course, the compositor scales O(n) in the number of windows, while your display list system scales O(n) in the number of layers

An alternate situation: the user picks up one of the windows and drags it.

The damage rectangle per frame is going to be slightly higher than 20000px because that is the size of the window, so we will take that figure (the actual number is irrelevant). None of the window contents are changing in this scenario, and we'll assume the user is moving the topmost window left, and 16000px of the background window are covered. The display list system draws the 20000px of background, 16000px of background window decorations/background, 2048px of texture 1, 20000px of foreground window decorations/background, 7200+1500px of T3+T4: 66748px

The compositor draws 20000px of background, 16000px of background window and 20000px of foreground window: 56000px.

Again, the compositor scales better as the window stack gets deeper. Of course, the display list system can keep up by... decaying into a compositor.

Of course, all of this is largely academic for any hardware accelerated video system: pixel fillrate is so much higher than any desktop needs for compositing even on obsolete IGPs. Of course, we should consider also the performance of the most demanding apps: games.

And Brendan has yet to answer, in his scenario, particularly considering he has said he doesn't plan to allow buffer readbacks, he intends to support composition effects like dynamic exposure HDR

In HDR, the scene is rendered to a HDR buffer, normally in RGB_11f_11f_10f floating point format. A shader then takes exposure information to produce an LDR image suitable for a feasible monitor by scaling the values from this buffer. It often also reads a texture in order to do "tonemapping", or create the curve (and sometimes colouration) that the developers intended.

In dynamic exposure HDR, the CPU (or on modern cards a compute shader) averages all the pixels on screen in order to calculate an average intensity, and from this calculates an exposure value. This is then normally integrated via a weighted and/or moving average and fed back to the HDR shader (with a few frames delay for pipelining reasons)

In his system, either the entire scene needs to get rendered on both GPUs (so the adjustment can be made on both), or you'll get different exposures on each and everything is going to look awful.

Aside: Earlier I rendered a single triangle without an associated buffer. Here is the assoicated vertex shader:

Code: Select all

varying vec2 texCoord;

void main()
{
    texCoord = vec2((gl_VertexID << 1) & 2, gl_VertexID & 2);
    gl_Position = vec4(texCoord * vec2(2,-2) +vec2(-1,1), 0, 1);
}

Yes, it created a right angled triangle with the right angle top-left on your screen, and the midpoint of the hypotenuse bottom-right. Once you clip that... you get a fullscreen rectangle. The texture coordinates are helpful too.

It's a "fancy hack"

zeitue · Post by **zeitue** » Mon Jul 29, 2013 6:47 pm

Brendan your design remind me of the Nitpicker used in the TUDOS that runs on L4.
It uses a buffer type structure like what you're describing here, so I thought I'd share it; might be helpful?

The following information was taken from this link Nitpicker

TUDOS wrote: Even though there are several interesting technical aspects of Nitpicker, let me explain only the most important one: How can we display multiple different windowing systems together at one desktop?

Nitpicker deals with only two kinds of objects: buffers and views.

A buffer is a memory region that holds two-dimensional pixel data. The memory region is provided by the client to Nitpicker via shared memory. Nitpicker has no notion of windows. A window is expected to have window decorations and policies, for example a window can be moved by dragging the window title with the mouse. Nitpicker provides a much simpler object type that we call view. A view is a rectangular area on screen presenting a region of a buffer. Each view has an arbitrary size and position on screen, defined by the client. If the view's size on screen is smaller than its assigned buffer, the client can define the viewport on the buffer by specifying a vertical and horizontal offset. Note that there may exist multiple views on one and the same buffer whereas each view can have an individual size and position on screen and presents a different region of the buffer. Each time a client changes the content of a buffer, it informs Nitpicker about the affected buffer region. Nitpicker then updates all views that display the specified buffer region. Views may overlap on screen. A client can define the stacking position of a view by specifying an immediate neighbor in the view stack. While each client manages the local stacking order of its views, the global stacking order of all views is only known to Nitpicker.

The following Figure illustrates the use of buffers and views for merging two legacy window systems into one Nitpicker screen:

Each client performs drawing operations on its buffer and tells Nitpicker about the buffer regions to display on screen by creating views on the buffer. In the Figure above, each client creates one view per client window and keeps the view configuration always consistent with the window configuration. For example, when the user brings a legacy window ontop, we also bring the corresponding view ontop.

To integrate a legacy window system into a Nitpicker session, we need to do the following things:

Enable the legacy window system to perform drawing operations into a Nitpicker-buffer instead on screen. This can be done by providing a pseudo-frame-buffer driver

Feed input events from Nitpicker to the legacy window system, for example by using a custom driver or emulating input devices, for which drivers are available

Apply legacy window-state changes to corresponding Nitpicker-views

Whereas the first two points are straight-forward, the third one depends on the actual legacy window system. On X11, we can simply record all window events by using a X11 client program (250 lines of code) that registers for all window events at the X11 root window. Therefore we do not need to modify the X-Server at all. For supporting DOpE, I added ca. 200 lines of support code to the DOpE window server. A more ancient example is the Atari GEM GUI running in the Hatari emulator. I managed to integrate GEM into Nitpicker by installing a small patch program (250 lines of C/ASM code) on the unmodified (closed-source) Atari TOS and adding a new virtual device to the Hatari emulator.

bluemoon · Post by **bluemoon** » Mon Jul 29, 2013 7:00 pm

Brendan wrote:the GPU will be idle for 10% of the time (which will help make the laptop's battery last longer).

You assumed the screen needs to redraw 60 times a second. However, any smart renderer will skip rendering if there is no change on screen, and for general UI the screen is quite static. Furthermore, the number of object in a UI manager is tiny, usually within the tens of thousand polygon range (2 vertices per widget minimum) - compare that with tens of million polygon with multi-pass shaders for a modern game.

So, in the above case you might save extra 0.001% juice but not 10%.

Now, let's consider few more cases:

1. One of the window redraw its scene, for example the animated emotion icon on the left side in the browser while I'm typing this reply.
--------------------
For traditional OpenGL Style renderer, it would require a full redraw of every object (renderer could skip things that are invisible), however since most of the things is unchanged, only the emotion icon texture needs to be update (a more well-designed UI would accept image list, and only adjust UVs).

For Brendan's method, you would need to renderer the emotion icon again, however, I suppose you still need a back buffer to avoid flicker, so a full screen redraw is still required.
Furthermore, it is impossible for the driver to keep an "image list" and do UV trick since it only got partial information.

2. Resize window.
--------------------
For traditional OpenGL Style renderer, you execute shaders on all visible objects, no change on texture except if it is "owner-draw window".
For Brendan's method, you need to redraw the resized window, however, I suppose you still need a back buffer to avoid flicker, so a full screen redraw is still required.

Brendan · Post by **Brendan** » Mon Jul 29, 2013 9:19 pm

Hi,

Owen wrote:
Brendan wrote:The GUI gets the first application's window and has to add window borders; so it creates a larger texture for the first application's "decorated window" (205*105 = 21525 pixels), and the same for the second applications window (another 205*105 = 21525 pixels). It also draws its background/desktop, which is another texture (300*150 = 45000 pixels). Finally the GUI combines all of these into it's output (300*150 = 45000 pixels). In total, the GUI has drawn 21525 + 21525 + 45000 + 45000 = 133050 pixels.
No newly designed compositing window system does server side decoration (Quartz and Wayland follow this) - that is to say, the decorations are already drawn around the buffers that the client submits (often by the mandatory to use windowing library). So, lets go with the decorated window sizes; 21525px each. The 20000px per window cost previously accounted for can therefore be disregarded (because you would just draw into the buffer eventually presented)

So to work around the problem of being badly designed, they force application's to use a mandatory windowing library and make things worse in a different way? This is good news!

Owen wrote:
Brendan wrote:The virtual screen layer gets the GUI's output. It creates one texture for the first video card (150*150 = 22500 pixels) and another texture for the second video card (150*150 = 22500 pixels) and sends the textures to the video cards. In total the virtual screen layer has drawn 22500+22500 = 45000 pixels.
The first video card had a non-rotated image, so you would simply set the row stride to be 300 pixels (i.e. ignore the second half of the source texture).

I tried to make the example simple/generic, so that people can see the point clearly. I was hoping that the point I was trying to make wouldn't be taken out back and bludgeoned to death with implementation details. Let's just assume the video driver is using an LFB buffer that was setup by the boot loader and all rendering is done in software (unless you're volunteering to write hardware accelerated video drivers for all of our OSs).

Owen wrote:Now, the second half (the rotated screen) depends upon the framebuffer hardware... however, rotation capability is quite common. The OMAP 3's DSS, for example, has [PLANE]_PIXEL_SKIP and [PLANE]_ROW_SKIP registers, which each contain a signed number of pixels to add to the framebuffer address between pixels and rows respectively. Appropriate programming of these can rotate the screen in each of the two directions.

I see you're using the new "quantum entanglement" video cards where the same texture magically appears in 2 completely separate video card's memory at the same time. Nice...

Owen wrote:I believe the two different display depths to largely be a red herring: when did you last see a 16bpp monitor?

I didn't specify any specific colour depth for either display. Let's assume one is using 24-bpp XvYCC and the other is using 30-bpp "deep colour" sRGB.

Owen wrote:
Brendan wrote:The virtual screen layer looks at all these lists of commands and works out what needs to go to each video card. It can easily (and quickly) determine that the first video card needs 8 of them (GUI's background, GUI's main list, app1's main list, app1's text, texture 1, app2's main list, app2's text and texture 3) and determine that the second video card needs 6 of them (GUI's background, GUI's main list, app1's main list, texture 2, app2's main list and texture 4). The virtual screen layer sends the lists of commands to the video drivers (and draws nothing).
Given the following simple OpenGL snippet, and assuming everything else is in its' default case - i.e. no vertex buffers/etc bound to the pipeline
Code: Select all
glUseProgram(aShader);
glDrawArrays(GL_TRIANGLES,  0, 3);
Can you discern what is going to be drawn? Note that I just provoked the processing of 3 vertices with no buffers bound - yes this is legal. Actually, even if I'd bound a bunch of buffers that wouldn't help, because all the render system would know is something along the lines of "The buffer has a stride of 8 bytes and offset 0 in each stride contains a 4-vector of half floats bound to shader input slot 0"

Have I ever done or said anything to make you think I care about OpenGL compatibility?

Note: Current graphics systems are so lame that application/game developers feel the need to cobble together their own shaders. On one hand this disgusts me (in a "how did potentially smart people let things get this bad" way), but on the other hand it makes me very very happy (in a "Hahaha, I can make this so much easier for application developers" way).

Owen wrote:
Brendan wrote:For the complete thing, only the video drivers do any drawing; and the total number of pixels drawn is 22500+25600 = 48100 pixels. This is 20% less than the "raw pixel" approach which drew 231678 pixels, so this is approximately 5 times faster. However, the lists of commands had to be created, decoded and manipulated, and this adds a negligible amount of extra overhead, so it might only be 4.95 times faster.
Except this is an utter falsehood. Any practical GUI involves alpha blending, because otherwise things like curved window borders look awful and you can't have things like drop shadows which users appreciate for giving the UI a sense of depth. So, actually, display 1 draws
22500 (background) + ~8000 (background window) + 2048px (texture 1) + ~16000 (decorated window) + 7200 (texture 3) + 640 (app2's text) = ~56388
while display 2 draws
22500 (background) + ~15000 (window containing T2) + 1600 (Texture 2) + ~7200 (window containing T4) + 1500 (T4) = ~47800
That gives a total of 104188px. We know from above that each window is 21515px, so the actual figure is 10138. Add in the caching you proposed and you're at 104138. Note that you can't optimize out any of those draws unless you know that nothing being drawn has any translucent segments.

It's obvious (from the pictures I created) that nothing was transparent except for each application's text. If you want to assume that the window borders had rounded corners, and that each application's smaller textures also had transparency, then that changes nothing anyway. The only place where transparency would make a difference is if the second application's background was transparent; but that's a plain white rectangle.

Owen wrote:
Brendan wrote:Of course all of this work wouldn't need to be done every time anything changes. For example; if the first application changes texture 1, then the list of commands for texture 1 (and nothing else) would have to be sent from app1 to GUI to virtual screen layer to the first video card's driver; but the first video card's driver knows that texture 1 isn't visible anyway and draws nothing. For this case, drawing nothing is a lot less than the "raw pixel" approach which drew 231678 pixels; and (accounting for list creating, decoding, etc) this might be a million times faster.
The application changed texture/image 1. It set a clipping rectangle in its' graphics toolkit, and redrew the window background plus texture for a cost of 4096px drawn. It then sent a damage event to the GUI server and the GUI server then recomposited the associated rectangle of the display, drawing the background+window1+window2 at a cost of 6144px. Total cost 10240px. Your display list system also received a texture update for texture 1. It also set a clipping rectangle and then drew the background+window1bg+texture1+window2bg+texture3, at a total cost of... 10240px. Of course, the compositor scales O(n) in the number of windows, while your display list system scales O(n) in the number of layers

Um, why is my system suddenly drawing things that it knows aren't visible?

Owen wrote:An alternate situation: the user picks up one of the windows and drags it.

The damage rectangle per frame is going to be slightly higher than 20000px because that is the size of the window, so we will take that figure (the actual number is irrelevant). None of the window contents are changing in this scenario, and we'll assume the user is moving the topmost window left, and 16000px of the background window are covered. The display list system draws the 20000px of background, 16000px of background window decorations/background, 2048px of texture 1, 20000px of foreground window decorations/background, 7200+1500px of T3+T4: 66748px

I'm not too sure what's going on with the 2 completely different systems that you seem to have invented.

For my "list of commands" (as described), if the second application's window (the topmost window) is being dragged to the left; then the GUI would send its main list of commands each frame, causing the first video driver to redraw 22500 pixels and the second video driver to redraw 22500 pixels for most frames (there are 2 textures that were never drawn, that would need to be drawn once when they become exposed).

However, my "list of commands" (as described) is only a simplified description because I didn't want to bury the concept with irrelevant details. Nothing prevents the video driver's from comparing the GUI's main list of commands with the previous version and figuring out what needs to be redrawn and only redrawing the minimum it has to. This would make it as good as your system/s without any "damage events" or toolkits or whatever other extra burdens you want to force the unfortunate application developers to deal with.

Owen wrote:And Brendan has yet to answer, in his scenario, particularly considering he has said he doesn't plan to allow buffer readbacks, he intends to support composition effects like dynamic exposure HDR

In HDR, the scene is rendered to a HDR buffer, normally in RGB_11f_11f_10f floating point format. A shader then takes exposure information to produce an LDR image suitable for a feasible monitor by scaling the values from this buffer. It often also reads a texture in order to do "tonemapping", or create the curve (and sometimes colouration) that the developers intended.

In dynamic exposure HDR, the CPU (or on modern cards a compute shader) averages all the pixels on screen in order to calculate an average intensity, and from this calculates an exposure value. This is then normally integrated via a weighted and/or moving average and fed back to the HDR shader (with a few frames delay for pipelining reasons)

In his system, either the entire scene needs to get rendered on both GPUs (so the adjustment can be made on both), or you'll get different exposures on each and everything is going to look awful.

You're right - the video drivers would need to send a "max brightness for this video card" back to the virtual screen layer, and the virtual screen layer would need to send a "max brightness for all video cards" back to the video cards. That's going to cost a few extra cycles; but then there's no reason the virtual screen layer can check for bright light sources beforehand and tell the video cards not to bother if it detects that everything is just ambient lighting (e.g. normal applications rather than games).

Cheers,

Brendan

rdos · Post by **rdos** » Tue Jul 30, 2013 4:13 am

I have no idea why anybody would want to do the pixel-graphics as Brendan proposes directly with the graphics API.

Foo is an application that has a white background with two controls inside it (lets say T1 is a text label control and T2 is an bitmap icon). That is one base control + 2 child controls. The GUI (not the video driver, neither the application code), calls the Paint method of the base control, which will redraw it's background if it has changed (it has the first time only). Then it will call the Paint methods of the child controls, which will render and redraw their content.

Bar is similar, and is created by one base control and 2 child controls, and is drawn the same way.

If text T1 changes, a Paint event to the child control is generated, and the T1 area is redrawn.

In all, the application creates 3 controls, and then does "Redraw". Nothing is device-dependent, and there is no OpenGL or low-level graphics work to do as the class library already contains ready-to-use image and label controls.

The typical way this is done in RDOS is that either application 1 or application 2 has focus, and only one of them will output something to the video-card (while the other one only buffers in memory). Thus, RDOS uses fewer writes to LFB than any of the other solutions, including command lists, as you cannot optimize away the visible graphics. In addition to that, either application 1 or 2 can decide to use a lower resolution than the maximal, like 640x480 , and thus speed up the process by reducing number of pixels required to be output.

With two monitor support, the sane way would be to run application 1 on one monitor and application 2 on the other. No user would want to have the presentation that Brendan illustrates.

Owen · Post by **Owen** » Tue Jul 30, 2013 5:07 am

Brendan wrote:Hi,

Owen wrote:
Brendan wrote:The GUI gets the first application's window and has to add window borders; so it creates a larger texture for the first application's "decorated window" (205*105 = 21525 pixels), and the same for the second applications window (another 205*105 = 21525 pixels). It also draws its background/desktop, which is another texture (300*150 = 45000 pixels). Finally the GUI combines all of these into it's output (300*150 = 45000 pixels). In total, the GUI has drawn 21525 + 21525 + 45000 + 45000 = 133050 pixels.
No newly designed compositing window system does server side decoration (Quartz and Wayland follow this) - that is to say, the decorations are already drawn around the buffers that the client submits (often by the mandatory to use windowing library). So, lets go with the decorated window sizes; 21525px each. The 20000px per window cost previously accounted for can therefore be disregarded (because you would just draw into the buffer eventually presented)
So to work around the problem of being badly designed, they force application's to use a mandatory windowing library and make things worse in a different way? This is good news!

So to work around the problem of being badly designed, you force every application to re-implement its' own GUI support library?

Brendan wrote:
Owen wrote:
Brendan wrote:The virtual screen layer gets the GUI's output. It creates one texture for the first video card (150*150 = 22500 pixels) and another texture for the second video card (150*150 = 22500 pixels) and sends the textures to the video cards. In total the virtual screen layer has drawn 22500+22500 = 45000 pixels.
The first video card had a non-rotated image, so you would simply set the row stride to be 300 pixels (i.e. ignore the second half of the source texture).
I tried to make the example simple/generic, so that people can see the point clearly. I was hoping that the point I was trying to make wouldn't be taken out back and bludgeoned to death with implementation details. Let's just assume the video driver is using an LFB buffer that was setup by the boot loader and all rendering is done in software (unless you're volunteering to write hardware accelerated video drivers for all of our OSs).

OK, so your only consideration is software rendering because you can't be bothered to write hardware accelerated graphics drivrs?

Brendan wrote:
Owen wrote:Now, the second half (the rotated screen) depends upon the framebuffer hardware... however, rotation capability is quite common. The OMAP 3's DSS, for example, has [PLANE]_PIXEL_SKIP and [PLANE]_ROW_SKIP registers, which each contain a signed number of pixels to add to the framebuffer address between pixels and rows respectively. Appropriate programming of these can rotate the screen in each of the two directions.
I see you're using the new "quantum entanglement" video cards where the same texture magically appears in 2 completely separate video card's memory at the same time. Nice...

You pretend that DMA and GPU IOMMUs (or for obsolete AGP machines, the GART) don't exist

Brendan wrote:
Owen wrote:I believe the two different display depths to largely be a red herring: when did you last see a 16bpp monitor?
I didn't specify any specific colour depth for either display. Let's assume one is using 24-bpp XvYCC and the other is using 30-bpp "deep colour" sRGB.

So you run the framebuffer in 30-bit sRGB and convert once.

And, among dual monitor setups, how common is this scenario?

It's an intentionally manufactured scenario which is comparatively rare.

Of course, nothing is stopping the compositor from being smart and rendering directly to a framebuffer for each display (N.B. that rendering to and from system memory is normally supported, while framebuffer scanout slightly less common)

Brendan wrote:
Owen wrote:Given the following simple OpenGL snippet, and assuming everything else is in its' default case - i.e. no vertex buffers/etc bound to the pipeline
Code: Select all
glUseProgram(aShader);
glDrawArrays(GL_TRIANGLES,  0, 3);
Can you discern what is going to be drawn? Note that I just provoked the processing of 3 vertices with no buffers bound - yes this is legal. Actually, even if I'd bound a bunch of buffers that wouldn't help, because all the render system would know is something along the lines of "The buffer has a stride of 8 bytes and offset 0 in each stride contains a 4-vector of half floats bound to shader input slot 0"
Have I ever done or said anything to make you think I care about OpenGL compatibility?

Note: Current graphics systems are so lame that application/game developers feel the need to cobble together their own shaders. On one hand this disgusts me (in a "how did potentially smart people let things get this bad" way), but on the other hand it makes me very very happy (in a "Hahaha, I can make this so much easier for application developers" way).

I was saying nothing about OpenGL support. I was using it as an example of one way in which making sense of the commands that people pass to modern renderers are difficult to understand from the perspective of the people receiving them.

So what is your proposed alternative to developers shaders?

Note: Current operating systems are so lame that users feel the need to cobble together their own applications. On one hand this disgusts me (in a "how did potentially smart people let things get this bad" way), but on the other hand it makes me very happy (in a "Haha, I can make this so much easier for users" way)

N.B. shaders originated in non-realtime renderers to allow artists then unprecedented control over the appearance of the objects in their scenes. The first system to introduce them, IIRC, was Pixar's PR Renderman, many years before even the simplest programmable shading systems (things like NVIDIA's register combiners) appeared in GPUs.

Brendan wrote:
Owen wrote:Except this is an utter falsehood. Any practical GUI involves alpha blending, because otherwise things like curved window borders look awful and you can't have things like drop shadows which users appreciate for giving the UI a sense of depth. So, actually, display 1 draws
22500 (background) + ~8000 (background window) + 2048px (texture 1) + ~16000 (decorated window) + 7200 (texture 3) + 640 (app2's text) = ~56388
while display 2 draws
22500 (background) + ~15000 (window containing T2) + 1600 (Texture 2) + ~7200 (window containing T4) + 1500 (T4) = ~47800
That gives a total of 104188px. We know from above that each window is 21515px, so the actual figure is 10138. Add in the caching you proposed and you're at 104138. Note that you can't optimize out any of those draws unless you know that nothing being drawn has any translucent segments.
It's obvious (from the pictures I created) that nothing was transparent except for each application's text. If you want to assume that the window borders had rounded corners, and that each application's smaller textures also had transparency, then that changes nothing anyway. The only place where transparency would make a difference is if the second application's background was transparent; but that's a plain white rectangle.

How does that change nothing? If you have transparency, you need to alpha blend things.

Either every surface is opaque (and therefore there is no overdraw because you can render front to back), or some surfaces are transparent, and you need to render back to front (or do two passes; front to back for the opaque ones, back to front for the transparent ones, but this requires use of a Z-Buffer and therefore will add 16-bits minimum of writes per pixel)

Brendan wrote:
Owen wrote:The application changed texture/image 1. It set a clipping rectangle in its' graphics toolkit, and redrew the window background plus texture for a cost of 4096px drawn. It then sent a damage event to the GUI server and the GUI server then recomposited the associated rectangle of the display, drawing the background+window1+window2 at a cost of 6144px. Total cost 10240px. Your display list system also received a texture update for texture 1. It also set a clipping rectangle and then drew the background+window1bg+texture1+window2bg+texture3, at a total cost of... 10240px. Of course, the compositor scales O(n) in the number of windows, while your display list system scales O(n) in the number of layers
Um, why is my system suddenly drawing things that it knows aren't visible?

Because what was the last UI system that users actually wanted to use (as opposed to things which look like Motif which only programmers want to use) which didn't have rounded corners, alpha transparency or drop shadows somewhere?

Brendan wrote:
Owen wrote:An alternate situation: the user picks up one of the windows and drags it.

The damage rectangle per frame is going to be slightly higher than 20000px because that is the size of the window, so we will take that figure (the actual number is irrelevant). None of the window contents are changing in this scenario, and we'll assume the user is moving the topmost window left, and 16000px of the background window are covered. The display list system draws the 20000px of background, 16000px of background window decorations/background, 2048px of texture 1, 20000px of foreground window decorations/background, 7200+1500px of T3+T4: 66748px
I'm not too sure what's going on with the 2 completely different systems that you seem to have invented.

For my "list of commands" (as described), if the second application's window (the topmost window) is being dragged to the left; then the GUI would send its main list of commands each frame, causing the first video driver to redraw 22500 pixels and the second video driver to redraw 22500 pixels for most frames (there are 2 textures that were never drawn, that would need to be drawn once when they become exposed).

However, my "list of commands" (as described) is only a simplified description because I didn't want to bury the concept with irrelevant details. Nothing prevents the video driver's from comparing the GUI's main list of commands with the previous version and figuring out what needs to be redrawn and only redrawing the minimum it has to. This would make it as good as your system/s without any "damage events" or toolkits or whatever other extra burdens you want to force the unfortunate application developers to deal with.

So your application developers are going to write their UI drawing code from scratch for every application they develop?

Of course not, they're going to use a widget toolkit that, hopefully, your OS will provide. Supporting said damage events in a widget toolkit isn't difficult. In the worst case, implementing it is no more complex than implementing the same feature in the GUI system.

Brendan wrote:
Owen wrote:And Brendan has yet to answer, in his scenario, particularly considering he has said he doesn't plan to allow buffer readbacks, he intends to support composition effects like dynamic exposure HDR

In HDR, the scene is rendered to a HDR buffer, normally in RGB_11f_11f_10f floating point format. A shader then takes exposure information to produce an LDR image suitable for a feasible monitor by scaling the values from this buffer. It often also reads a texture in order to do "tonemapping", or create the curve (and sometimes colouration) that the developers intended.

In dynamic exposure HDR, the CPU (or on modern cards a compute shader) averages all the pixels on screen in order to calculate an average intensity, and from this calculates an exposure value. This is then normally integrated via a weighted and/or moving average and fed back to the HDR shader (with a few frames delay for pipelining reasons)

In his system, either the entire scene needs to get rendered on both GPUs (so the adjustment can be made on both), or you'll get different exposures on each and everything is going to look awful.
You're right - the video drivers would need to send a "max brightness for this video card" back to the virtual screen layer, and the virtual screen layer would need to send a "max brightness for all video cards" back to the video cards. That's going to cost a few extra cycles; but then there's no reason the virtual screen layer can check for bright light sources beforehand and tell the video cards not to bother if it detects that everything is just ambient lighting (e.g. normal applications rather than games).

So now your graphics drivers need to understand all the intricacies of HDR lighting? What about all the different variations of tonemapping things could want?

What about global illumination, boekh, dynamic particle systems, cloth, etc? Are you going to offload those in their infinite variations on the graphics driver too?

As if graphics drivers weren't complex enough already...

bluemoon · Post by **bluemoon** » Tue Jul 30, 2013 6:30 am

rdos wrote:I

Forgive me, I'm going to get off-topic but...
Hey, welcome back. Although sometime I have different opinions with you but you bought interesting topics and insights along with the discussions, I miss that.

rdos · Post by **rdos** » Tue Jul 30, 2013 3:36 pm

I probably won't post a lot for a while because I have too much to do at work, mostly related to RDOS. We will hire another worker that will help me, so I need to fix build system (OpenWatcom) and alike. I anticipate that a few 1,000 installations of RDOS will run in a few years time.

Brendan · Post by **Brendan** » Tue Jul 30, 2013 3:39 pm

Hi,

rdos wrote:I have no idea why anybody would want to do the pixel-graphics as Brendan proposes directly with the graphics API.

Because it avoids drawing pixels for no reason; while also providing a clean abstraction for applications to use; while also providing device independence; while also allowing a massive amount of flexibility.

Rather than explaining the advantages again and again and again just so you can ignore all of them and/or fail to understand them; perhaps I should do the opposite and describe all of the disadvantages of your way:

a) Applications have to know/care about things like resolution and colour depth. If they don't you need extra processing for scaling and conversions which increases overhead and reduces graphics quality. Because applications need to care about resolution and colour depth you have to have to add a pile of bloat to your applications that application developers should never have needed to bother with. This pile of bloat could be "hidden" by a library or something, but that just shifts the bloat into the library, and also means that applications need to bother with the hassle of libraries/toolkits/puss.

b1) If the user wants to send a screenshot to their printer, the GUI and all applications have to create completely different graphics to send to the printer; which adds another pile of bloat to your applications (and/or whatever libraries/toolkits/puss they're using to hide problems that shouldn't have existed). Most OSs deal with this problem by failing to deal with the problem at all (e.g. they'll scale and convert graphics intended for the video card and produce crappy quality images). The "failing to deal with the problem" that most OSs do includes either scaling the image (which reduces quality) or not scaling the images (e.g. a small rectangle in the middle of the printed page that's the wrong size). Also note that converting colours from one colour space to another (e.g. sRGB to CMYK) means that colours that weren't possible in either colour space aren't possible in the result (for example; if the application draws a picture with shades of cyan that can't be represented by sRGB but can be represented by CMYK, then you end up screwing those colours up, then converting the screwed up colours to screwed up CMYK, even those the colours should've been correct in CMYK).

b2) If the user wants to record a screenshot as a file, or record a video of the screen to a file; then it's the same problem as above. For example; I do want to be able to record 5 minutes of me playing a game or using any application (with crappy low quality software rendering at 640*480), and then (afterwards) tell the OS to spend 6 hours meticulously converting that recording into an extremely high quality video at 1920*1600 (for demonstration purposes).

c) Because performance would suck due to applications drawing lots of pixels for no reason; you need ugly hacks/work-arounds in an attempt to reduce the stupidity (e.g. "damage events"). This adds yet another pile of bloat and hassles for the application/GUI developers (and/or whatever libraries/toolkits/puss they're using to hide problems that shouldn't have existed) to bother with.

d) If the graphics have to be sent over a network, sending "raw pixels" is a huge waste of network bandwidth and is extremely stupid. It'd be like using BMP (raw pixel data) instead of HTML (a description of what to draw) for web pages. In addition to wasting lots of bandwidth, it doesn't distribute load well (e.g. the server generating the content and rendering the content, rather than load being shared between server and client). I'm doing a distributed system so this is very important to me; but even for a "normal" OS it can be important (e.g. remote management/administration, businesses using desktop virtualisation, etc). Note: If I'm playing a 3D game (or doing whatever) in my lounge room, I want to be able to transfer my "virtual screen" to a computer at the other end of the house and keep playing that game (or doing whatever), without the game/applications knowing or caring that everything (video card, monitor, etc) changed. This sort of thing would be easy for my OS to support (for all applications and GUIs, without any extra code in any application or GUI).

e) If an application or GUI is using multiple monitors then "raw pixel" can't cope well with different resolutions and/or different colour depths (the application can only handle one resolution and one pixel depth). To cope with this most OS's will suck - e.g. they'll tell the application to use the same resolution for everything and then scale the image for the other monitor (which increases overhead and/or reduces graphics quality). Note that this includes the GUI itself, and while "different colour depths" is usually avoidable "different resolutions" is quite common. E.g. I'm currently using one monitor at 1920*1600 and another at 1600*1200; both monitors are different physical sizes (one is smaller in both directions) and have different aspect ratios (16:10 and 16:9). Sadly; both of my monitors have different "white points" and different brightness, which is annoying. I've tried adjusting them to get colours on both to match and it simply doesn't work (the range of adjustments is too limited) and the OS I'm using is too stupid to solve the problem simply by generating different colours for different monitors. Note: What I'd really like is to be able to point a web-cam at both monitors and let the OS auto-calibrate the colours to match (e.g. using a short series of patterns).

f) The same "different resolutions" problem occurs for other things that have nothing to do with multiple monitors. For example, most OSs have a "magnifying glass" utility which increases the size of a certain area of the screen (here's an example). In this case you get poor quality scaled graphics because it's too hard for applications/GUIs to do it properly.

g) For real-time graphics (e.g. games) the person writing the application can't know how fast the computer is in advance. To work around this they add a pile of stupid/annoying/silly controls to their application (e.g. to setup/change resolution, to set texture detail level, to enable/disable shadows, etc) which just adds up to extra hassle for the application developer and extra bloat. Even with all the trouble, it doesn't actually work. For example, the user might get 60 frames per second on a virtual battlefield, but then a building will explode and the complexity of the scene will increase, and the use will end up with 20 frames per second. To avoid this, the user (who shouldn't have needed to waste their time fiddling with the controls that the application shouldn't have needed) could reduce the quality of graphics so they do get 60 frames per second for the complex scenes, but then they just get worse quality graphics for no reason for the simpler scenes. The main problem here is that those stupid/annoying/silly controls that the application developer wasted their time bothering with can't/won't dynamically adjust to changes in load. To avoid this problem, in theory, it would be possible for games developers to add even more bloat to dynamically adjust detail to cope with changes in load; but I have never seen any game actually do this (game developers don't want the hassle). For my way; the OS (e.g. video drivers) should be able to take care of this without too much problem.

f1) For real-time graphics (e.g. games) the person writing the application/game can't know which feature's the video card supports. Because the graphics API the applications/games have to use doesn't provide an adequate abstraction; this means that different applications/games won't run on different video cards (e.g. because the game requires "shader puke version 666" and the video card only supports "shader puke version 665"); and also means that older games won't benefit from newer features. In an attempt to cope with this, games developers spend a massive amount of time and money creating bloat to support many different pieces of hardware; which is incredibly retarded (e.g. as retarded as having to write 20 different pieces of code just to read a file because the OS failed to abstract the differences between AHCI and USB3 or between FAT and ISO9660).

f2) Of course normal application developers won't waste years attempting to handle the idiotic/excessive complexity of 3D graphics. They all decide it's too hard and don't bother doing more than the rusty old 2D stuff they've been doing for 20 years. For a simple example, an "OK" button will be a 2D texture (with a "pretend 3D look" as a lame attempt to make it seem more modern) and won't actually be a 3D "OK" button. What this means is that if the GUI developer bothers to implement anything interesting (like lighting/shadow, fancy compositing effects or even just something like the "Aero Flip 3D" then the application's window will look like the flat piece of crap that it is.

All of these problems can be solved easily, without making software developers deal with hacks/bloat/workaround/hassles.

rdos wrote:With two monitor support, the sane way would be to run application 1 on one monitor and application 2 on the other. No user would want to have the presentation that Brendan illustrates.

Let's have a vote! How many people here would want to use something like this?

Note: This is a single application (some sort of simulator) running on at least 7 monitors at the same time. I'd love to be able to use something like this, with normal applications (text editors, web browsers, etc) running in windows that happen to be spread across 3 or more screens.

Cheers,

Brendan

OSDev.org

OS Graphics

OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics