OSDev.org

Posted: **Tue Aug 06, 2013 7:43 pm**

Hi,

MessiahAndrw wrote:1) You can do vector graphics!
You don't need to issue a redraw event when you to resize a window! Some applications may still want to do this of course (you resize a Firefox window, Firefox will want to use want to use that extra space to show more of the webpage, rather than make everything scale larger.)

Yes; and this includes all sorts of things. E.g. if the user changes video mode the GUI and all applications won't need to know; if the users (or GUI) zooms in or out on a window (or rotates a window) the application won't need to know, etc. This means less communication between processes (less task switches, etc). More importantly; for a distributed system like mine it also means faster response times (as you can redraw without sending messages to other computers and having network latency in both directions).

MessiahAndrw wrote:2) Asynchronous rendering
A really simple example - I am writing an Emacs like text edit that is using Lua (was using Google's v8 Javascript but I ran into overhead issues switching between sandboxed instances) - these high level languages are very slow at bit twiddling, so that's something I have to avoid all together. Instead I use Cairo. When it comes time to redraw the window, I have a loop that goes over each frame (displaed document) and says (pseudocode):

Yes - it can be 100% "tear free", without having something that causes delays (e.g. a lock shared by application and GUI, where GUI has to wait until application releases the lock) and without having extra pixel buffers.

MessiahAndrw wrote:2. Dynamic allocation/memory usage
You'll have to consider that the buffers may rapidly change size between frames. If you add a new GUI component, that may be an extra 10 commands. It's unlikely that an application will know how many commands it is about to draw, until it draws them, so likely you'll have to dynamically expand the buffer as you write to it.

For my messaging system, the application builds messages (which can be up to 2 MiB) in a special "message buffer" area; and the kernel's virtual memory manager knows about the message buffer area and does "allocation on demand" to automatically allocates pages if/when needed. When the message is sent the pages are moved (not copied) leaving the message buffer area empty (as if the pages were freed).

For graphics, an application builds a "list of commands" in the message buffer and doesn't have to worry about memory management for that at all. The only real problem is that the size of a message is limited to 2 MiB (e.g. if you assume that each command costs an average of 40 bytes you end up with about 50000 commands in a list). For a 3D game (e.g. with a huge number of polygons on screen) this might sound like a problem; but it's not. For example, you might have an army of 1000 people, but that might be one list describing an arm, one list describing a leg, one list describing a head, etc; with a separate lists describing each person (where to put arms, legs, etc), then a master list saying where each person is. For applications it'd be the same (e.g. one list for a widget, one list for a toolbar, one list for a status bar, etc; with a master list saying where each of the smaller pieces are). This also means that applications/games only need to send lists of commands that changed - you don't send all lists every time. For example (an army of 1000 people) one soldier might move his arm, so you don't send lists to describe an arm or leg, or the landscape, buildings, tanks, trucks, etc because none of them changed.

Basically; as far as the applications, GUI, etc are concerned most of the memory management happens automatically. For the video drivers it's a different story though (they'd need something more advanced than just "malloc()" because they'd be managing "cached textures", etc).

MessiahAndrw wrote:3. Performance
Your main concern is to prevent the copying of pixels. Above, I discussed how we would need to do triple buffering (at the expense of some memory) to avoid very locking while copying between your drawing and screen buffers.

You can even triple buffer pixel buffers to avoid copying them, until the final moment when you want to compose everything on the screen into one image to send to your video card.

At some point, you will have to execute your command list. If your whole OS graphics stack is abstracted away from the pixel, then this will be done at the final stage in your graphics driver. Will you be able to execute this command buffer fast enough during the v-sync period, to avoid the 'flickering' I mentioned early about having a semi-drawn image displayed on the screen?

For a native video driver I'd use page flipping (2 buffers, one being displayed and the other being drawn into, where you switch from one to the other); and for software rendering (e.g. "LFB only video") you'd need a buffer in RAM (and blit that buffer to display memory). Either case doesn't seem like a problem.

For how long it'd take to draw a frame, it depends too much on too many things to say. There's lots of opportunities for optimisation within the video driver (e.g. not drawing obscured things, using caches to avoid redrawing pieces that didn't change, etc) so you'd rarely need to redraw everything from scratch. Note: I imagine it as a recursive thing, where you start with the list of commands for the entire screen and "draw" it to determine which child lists are needed, then draw the children to find out which of their children are needed, etc; where you might have a suitable cached texture that prevents the need to draw a list of commands again (or it's children, or it's children's children).

On the other side of this, the video driver is able to draw things again if it wants to. For example, the video driver might reduce rendering detail in an attempt to get a frame done in 1/60th of a second, and then (if nothing changes) do it again with higher detail. For normal applications and GUI this can mean faster response times (e.g. get it displayed fast, then get it displayed with higher detail before the user has had a chance to realise the first frame was low detail). For 3D games it could mean everything is moving around and you didn't need higher detail anyway, but then the user pauses the game.

MessiahAndrw wrote:4. Multi-monitor rendering
I think this is unrelated to command lists vs. pixels. Every multi-monitor machine I've used has allowed me to have windows span more than one monitor.

I don't think it's unrelated at all. Most OSs allow you to have windows span more than one monitor; but there's severe restrictions and work-arounds for those restrictions.

For example, one of my monitors is smaller than the other (less wide and slightly less tall), and the smaller monitor has a "16:10" aspect ratio and the larger monitor has a "16:9" aspect ratio. If both monitors used the same resolution it'd look like crap - a window spanning both monitors would be large on one monitor and smaller on the other and half of it would be stretched a little in one direction. To cope with that; I've got the larger monitor set to 1920*1200 and the smaller monitor set to 1600*1200, just so that the OS doesn't screw up the sizes and also screw up the aspect ratios. This only works sort of - it's not perfect and windows on the smaller monitor are slightly larger than they should be. Worse; the smaller monitor is not using it's native/preferred resolution, which means that the monitor itself has to scale the image to suit its LCD screen (which is guaranteed to cause a "worse than perfect" image).

Basically most OSs do support multiple monitors; but unless both monitors are the same size and using the same resolution, it does not work properly. For my OS I'd be able to have both monitors running at their native resolution without any problem at all.

MessiahAndrw wrote:
Brendan wrote:The physics of light do not change. The only thing that does change is the "scene"; and this is the only thing an application needs to describe.
It is true, the physics of light do not change. However, computers are yet to reach the stage when we are able to simulate, even just a room, at a fine grain atomic level in real time. Real-time ray tracing is barely reaching that level, real-time photon tracing is still way off. Rendering graphics is all about using approximations that give an effect that is "good enough" while maintaining acceptable performance.

If you've got an application that describes a scene and a renderer that draws that scene; then the application has no reason to care how the renderer works internally. My OS will work like this, which means that the only person that needs to care how any renderer actually works is the person writing the code for that specific renderer. Everyone else can just assume that all renderers "attempt to loosely approximate the results of applying physical model of light to the scene" without giving a rat's arse how the renderer actually does this (or whether the renderer actually does do photon tracing, or something radically different like rasterisation, or anything else).

Note: The only thing I was trying to say here (and am continually trying to say) is that nobody needs to care how any renderer works (unless they're writing that renderer).

Of course not needing to care how any renderer works includes not needing to care if your "lists of commands" are being sent to a video driver and used for a 2D monitor, or being sent to a video driver and used for some kind of a "3D stereoscopic" display, or being sent to a 2D printer, or being sent to a 3D printer, or being sent to a file, or being sent across the internet to a user on the other side of the world, or...

MessiahAndrw wrote:What are some effects that are hard to do without shaders? Rim lighting, cell shading, water (particles as metaballs that are calculated in screen space, foam and mist), soft particles, ambient occlusion, motion blur - and custom lighting pipelines like deferred lighting and pre-pass lighting (which would require a render pass per light in traditional rendering) that is best suited to a particular application. Vertex shaders let you calculate do animation on the GPU (so you're not always having to upload a new vertices with new positions each frame) - flapping flags, water waves, skeletal animation. Tessellation shaders allow you to insert vertices at runtime, purely on the GPU, based on dynamic parameters such as textures and the camera position, preventing you from having to a) continuously stream new vertices to the GPU, and b) having to upload large geometry to the GPU, when you can you just upload a flat plane to the GPU and let the tessellation shader do the rest.

Sure; and I'll probably (eventually) want to allow video driver's to have built-in shaders for all these things. There's no reason for each different application/game to provide them.

MessiahAndrw wrote:However, a general purpose operating system is likely to want to support all kinds of dynamic content, including media, web pages, video games - in which case, after thinking deeply into it, I still think you would gain better performance using traditional pixel-based buffers implementing a fast lockless triple buffering system that prevents any copying until the final image is composed on to the screen.

Let's test this. Get 5 computers connected via. gigabit ethernet. Only 2 of these computers need to have video cards and the rest shouldn't have any to save costs. Plug one monitor into one computer and another monitor into another computer. Then run a GUI on the third computer (such that the GUI is using both monitors). Start a 3D game on the fourth computer and another 3D game on the fifth computer (so those games are running in windows on the GUI). Arrange the game's windows so they're split across both monitors. Let me know what sort of performance you get pushing massive quantities of pixels around the network; for both Windows and Linux.

Cheers,

Brendan

Posted: **Tue Aug 06, 2013 10:03 pm**

Hi,

Owen wrote:
Gigasoft wrote:The original post actually mixes up many unrelated problems.

- Should 2D GUIs be drawn completely into a rectangular buffer and then clipped, or should clipping be applied while drawing? Obviously, most existing windowing systems do the latter. The notion that "most existing systems" draw the entire thing to a buffer and then work with the resulting pixels is simply a lie.
Most existing crappy graphics stacks (e.g. Windows with DWM disabled, X11 without Composite, Mac OS 9) apply clipping while drawing.

More modern graphics stacks render the whole window and don't bother with clipping it (because they want to be able to do things like realtime previews - observe what happens when you hover over a taskbar icon on Windows Vista/7/8: a live preview of the window can be seen in the popup

There's also another side to this - the complexity of the graphics API. If an application draws all pixels (regardless of whether they're visible or not) it avoids a lot of complications for the application's programmers (at the expense of potentially wasted effort). There's only one way that I can think of to avoid complicating things for the application programmer while also avoiding the wasted effort.

Gigasoft wrote:- How about 3D games? Of course you can do the same for 3D with very minor changes to existing code, but most games don't bother, as they are designed to run in the foreground. With software vertex processing, you can avoid uploading textures that are not used. Some cards may also issue an IRQ to load textures when needed when using hardware vertex processing.

My problem here is that (regardless of what other people may or may not want for their OS) I want 3D applications for my OS. This means that I can't assume "3D stuff" is only ever in the foreground.

I'm also mostly worried about software rendering (it needs to be usable without a GPU) because I don't think I'll be able to finish writing all the native video drivers the OS could ever want before I die of old age. If there's a list of commands that describe a texture, then I want to avoid generating that texture (with my software renderer) if/when possible, especially when the software renderer is trying to get everything done in 1/60th of a second. For a native driver with full GPU support you might be able to just draw everything without caring if its visible or not, but that's not what I'm worried about.

If you combine these things; you end up with "3D stuff" that much more likely to be in the background (not visible), and a software renderer (that's going to be "pushing mud up a hill" to get it all done fast enough to begin with) that really can't afford to waste time drawing things for no reason.

Owen wrote:
Gigasoft wrote:- When pixels are being generated to be used as an input for a later drawing operation, should only pixels that are going to be used be generated? Yes, if possible. But then we need to know in advance which pixels will be used. The application knows this, but the system does not. If this is handled automatically by the system, all drawing operations must be specified before drawing can begin. If handled by the application, there is no problem.
- is the cost of this optimization greater than the cost of generating all the pixels?

It depends on the renderer (e.g. software rendering vs. GPU). Why not let the renderer decide which method is the most efficient method for the renderer?

Cheers,

Brendan

Posted: **Wed Aug 07, 2013 12:57 pm**

Brendan wrote:There's also another side to this - the complexity of the graphics API. If an application draws all pixels (regardless of whether they're visible or not) it avoids a lot of complications for the application's programmers (at the expense of potentially wasted effort).

How so? The application typically doesn't draw pixels itself, it just tells the system to draw various shapes. Ensuring that the correct pixels are updated is the system's job. This holds true whether the application performs drawing in response to a request from the system, or asynchronously. The application does not need to do anything special if the entire window is not visible. An application can optionally check which parts need to be drawn, but it doesn't really have to.

Brendan wrote:My problem here is that (regardless of what other people may or may not want for their OS) I want 3D applications for my OS. This means that I can't assume "3D stuff" is only ever in the foreground.

That's not a problem. You can automatically clip what is drawn to front buffers and their associated buffers. For additional buffers created by the application whose pixels correspond to the final output in a 1:1 manner, provide an easy way to associate these buffers with the window so that clipping is applied.

Posted: **Wed Aug 07, 2013 2:34 pm**

Brendan - What you are describing for 3D (uploading a scene, and let the driver deal with rendering it) is very much like the Retained Mode from very early versions (1990s) of Direct3D. The theory was that you load in your scene, your lights, your camera, and away you go - let the drivers deal with optimising and rendering it in a way that it decides is best.

The problem was that many developers chose not to use it. Especially back in those days when 3D hardware was fairly weak, so it was often better for the developer to do application-specific optimisation:

In games like Quake that are made up of many small connected rooms, it's often good to split the rooms up via 'visibility portals' to only draw the room that you're currently in (and none of the adjacent rooms unless you can see their portal).
For general purpose 3D platformer, an octree would work fine.
A 3D RTS game that takes place on a fairly flat landscape, could use a 2d variant of an octree called a quadtree.
In a racing game, it was common to have all of your objects stored in a stored list, based on their distance along the track, and you could easily optimize rendering by saying "render from player's position to xxxx metres in front."
3D space sims often switch units to preserve precision (planetary units, solar system units, galactic units) - using various rendering tricks to make this appear seamless.
Open-world RPGs will often page their landscape to/from disk as you're moving around.

Computers are much more powerful today, but if you present any one-size-fits-all approach for a scene graph, I'm sure I could think of a case where it would be inefficient.

You can't make everyone happy, and because of this, Microsoft eventually removed Retained Mode because not many people used it, and told us to do it ourselves.

Nobody wants to rewrite an octree implementation, lighting algorithms, mesh loading, bone animation for every project - that's why there are many wonderful highly-optimized scene graph APIs (many open source too). Most have large communities, lots of tutorials, and are very easy to get started with.

Even then, there's no one-size-fits-all approach of doing things, so scene graphs tend to be highly extensible. As an example, Ogre has a dozen scene managers to choose from (with many more community contributed) - or you can extend Ogre with your own by writing a C++ class. Could you load your own scene manager or procedurally tessellated mesh object into a driver?

This does not mean providing a bundled kernel mode scene graph API that comes with your OS as a 'standard API' is a bad idea - Apple did it for QuickDraw 3D, Microsoft has a simple 3D scene graph built into WPF for rendering 3D GUIs. Perhaps it will be sufficient for the majority of applications that will be developed for your OS. With a simple 3D Mahjong game or 3D graph visualisation, all you really want to do is create a cube, define some lines, set up a camera, and away you go.

In most cases, this scene graph is simply a bundled wrapper on top of an underlying low-level API such as OpenGL, but still expose that low-level API for applications that would benefit from it.

I would view bundling a scene graph library like bundling a standard GUI library with your OS. It's there, you can easily add list boxes, buttons, draw lines, without the user having to install dependencies, but sometimes people want to provide their own (QT, GDK, custom drawing).

Posted: **Wed Aug 07, 2013 9:12 pm**

Hi,

Gigasoft wrote:
Brendan wrote:There's also another side to this - the complexity of the graphics API. If an application draws all pixels (regardless of whether they're visible or not) it avoids a lot of complications for the application's programmers (at the expense of potentially wasted effort).
How so? The application typically doesn't draw pixels itself, it just tells the system to draw various shapes. Ensuring that the correct pixels are updated is the system's job. This holds true whether the application performs drawing in response to a request from the system, or asynchronously. The application does not need to do anything special if the entire window is not visible. An application can optionally check which parts need to be drawn, but it doesn't really have to.

Think of it as 3 different systems.

The first is "raw pixels" like I described in the example in my initial post; where the application doesn't know or care what is visible and simply draws everything in its window regardless of whether (e.g.) other windows obscure those pixels, and sends the raw pixels to the GUI. This involves drawing pixels that aren't visible.

The second system is also "raw pixels". However, to avoid drawing pixels that aren't visible you add extra hassles on top of it, such that the GUI has to figure out which pieces of the application's window is visible and inform the application, and the application has to use this information when drawing (or tell a library that uses this information when drawing) its pixels. For this way you can avoid drawing pixels that aren't visible but it is more complex than the very simple "draw everything" case.

The third system is "lists of commands" where the application doesn't need to care and just creates a list/s of commands; the GUI doesn't care and just sends list/s of commands. In this case the applications and the GUI don't have extra complexity to determine what is/isn't visible; and the video driver can do whatever it likes (including not drawing pixels that aren't visible).

Gigasoft wrote:
Brendan wrote:My problem here is that (regardless of what other people may or may not want for their OS) I want 3D applications for my OS. This means that I can't assume "3D stuff" is only ever in the foreground.
That's not a problem. You can automatically clip what is drawn to front buffers and their associated buffers. For additional buffers created by the application whose pixels correspond to the final output in a 1:1 manner, provide an easy way to associate these buffers with the window so that clipping is applied.

I'm not sure I understand what you're saying.

Imagine a single 3D window in the middle of the screen at a slight angle (not quite parallel with the screen), with a dialog box in front of it; with a mirror on the right of the screen at an angle so that you can see the side/back of the window and the side of the dialog box in the mirror. Also imagine a light in front of everything (e.g. behind the user's head) at the top left that causes the dialog box to cast a shadow on the window, and causes the mirror and the window to cast a shadow on the desktop/background. Light also reflects off the mirror onto the bottom left of the desktop/background.

The dialog box has a raised button in it, so in the mirror the side of the dialog box looks like this:

Code: Select all

     #
   I #
     #
    [#
     #

Where '#' is the side of the dialog box, '[' is the side of a raised button and the 'I' is the side of the mouse pointer. Of course when the button in the dialog box is pressed you see the button sink down level with the front surface of the dialog box.

Now imagine the user is adjusting the angle of the mirror, and explain exactly what is clipping what to which edge/s while this happens.

Cheers,

Brendan

Posted: **Wed Aug 07, 2013 9:53 pm**

Hi,

MessiahAndrw wrote:Brendan - What you are describing for 3D (uploading a scene, and let the driver deal with rendering it) is very much like the Retained Mode from very early versions (1990s) of Direct3D. The theory was that you load in your scene, your lights, your camera, and away you go - let the drivers deal with optimising and rendering it in a way that it decides is best.

The problem was that many developers chose not to use it. Especially back in those days when 3D hardware was fairly weak, so it was often better for the developer to do application-specific optimisation:

For my system, the application can still determine what is worth putting in the "lists of commands" and what isn't. Most of these different techniques would still be used.

MessiahAndrw wrote:
In games like Quake that are made up of many small connected rooms, it's often good to split the rooms up via 'visibility portals' to only draw the room that you're currently in (and none of the adjacent rooms unless you can see their portal).

For this the application would have a list of commands describing a "volume"; with one volume per room.

MessiahAndrw wrote:
For general purpose 3D platformer, an octree would work fine.

A 3D RTS game that takes place on a fairly flat landscape, could use a 2d variant of an octree called a quadtree.

In a racing game, it was common to have all of your objects stored in a stored list, based on their distance along the track, and you could easily optimize rendering by saying "render from player's position to xxxx metres in front."

An application can still use all of these techniques to avoid telling the video card/GUI about things that are too far away to matter.

MessiahAndrw wrote:
3D space sims often switch units to preserve precision (planetary units, solar system units, galactic units) - using various rendering tricks to make this appear seamless.

For my system each "volume" has it's own local co-ordinate system; where one main volume (e.g. the world) says where the other volumes appear (and these other volumes say where even more volumes appear within their local co-ordinate system, and so on recursively). Essentially the "lack of precision" problems don't occur in the first place.

MessiahAndrw wrote:
Open-world RPGs will often page their landscape to/from disk as you're moving around.

This is entirely possible with "volumes".

MessiahAndrw wrote:Computers are much more powerful today, but if you present any one-size-fits-all approach for a scene graph, I'm sure I could think of a case where it would be inefficient.

In theory, you might be able to think of a case where my system loses a "non-negligable" amount of efficiency. In practice, I can think of cases where my system is more efficient by a significant amount.

In general (for a definition of "better" that includes more than just efficiency); my system might be better for 10 cases, equal for 10 cases and worse for 3 cases (and therefore be "better on average" despite the existence of 3 cases where it's worse); and it'd be foolish to find one specific case where my system would be worse and decide that my system is always worse based on that one specific case. Basically what I'm saying is that if you actually do manage to find a case where my system is worse, it doesn't necessarily matter anyway.

Cheers,

Brendan

Posted: **Thu Aug 08, 2013 8:17 am**

Brendan, I do see how it could be useful at a high-level if you're doing remote rendering - conventional low-level graphics APIs like Direct3D and OpenGL allow you to upload your models into index/vertex buffers, and you simply say "draw this buffer", it's bad practice to push 25,000 vertices to the GPU each frame, as I think you are imagining. Combine this with an OpenGL style-display list (a display list allows you to record calls, and then can play them back in the driver, reducing driver calls) where you simply set your camera matrix (and other frame-specific parameters) and playback your draw list.

I still think it's best for actual scene management (choosing what to put into the display list, optimizing what is visible, etc) to be better suited for a user library.

Perhaps you are seeing something that I do not. Perhaps you've had an epiphany that, due to my limited imagination, I cannot fully comprehend. In any case, I wish you the best, and I still applaud you for thinking against the norm and I look forward to seeing any designs or potential implementations you may develop.

Posted: **Thu Aug 08, 2013 11:34 am**

The third system is "lists of commands" where the application doesn't need to care and just creates a list/s of commands; the GUI doesn't care and just sends list/s of commands. In this case the applications and the GUI don't have extra complexity to determine what is/isn't visible; and the video driver can do whatever it likes (including not drawing pixels that aren't visible).

This would approximately describe many existing graphics systems. The main difference with your system seems to be that you are queuing up these commands in a buffer that is kept for later use, while most other systems don't.

For example, in my system, what happens is as follows: There is a system component that handles high level drawing operations. It lets you create Graphics Contexts and associate them with a surface. Two different clipping regions may be set, which require different levels of access to change. The resulting intersection of these regions is used to clip operations. When the window system finds that something must be painted, the window is put on the application's dirty list. When an application listens for GUI events and the dirty list has entries, the first entry is removed. The system then creates a graphics context associated with the screen, or optionally with another intermediate surface, and sets region 1 according to the area that is to be painted. Then a paint event is returned to the application. The application then invokes methods implemented by the graphics context to perform its drawing. It can set region 2 at will if it needs to. It can also optionally inspect the bounding box of the region to be painted to skip invisible parts. When the application fetches the next GUI event, the system finishes up by copying from the intermediate surface to the screen, if one was used. The system performs all clipping and passes the resulting shapes to the video driver. Later, I might implement support for accelerated clipping.

Posted: **Thu Aug 08, 2013 12:20 pm**

I also designed an UI framework, it's used with openGL in game development, although adding opengl support to my OS is not possible in near future, i think the concept still applies.

Each widget got its region, texture (or sub-texture), and a default UI shader. Most widget uses something like nine patch to scale their size. In general, the whole desktop can be draw with relatively few vertices and a few combined textures, and since the buffers are more or less static, it results minimal traffic between main memory and display card.

For custom draw widget, the application draw to its texture instead of using the static textures.

As for window's clipping, it's trivial to adjust vertices & UVs, and the shader can do off-screen clipping, where the display card automatically skip coordinates that transformed out of [-1,1]

If you wonder if a widget is totally off the screen, is it a waste to even try to invoke redraws of those widgets?
In my opinion, in some situations you may still want to invoke the redraw to maintain consistency of application logics; on the other hand, you may also remove the widget from redraw list if you detected it is totally off the screen.

If I can use hardware-accelerated opengl, i will sure go with this approach.

Posted: **Fri Aug 09, 2013 1:36 pm**

Brendan wrote:No, what has happened (at least from my perspective) is that I tried to say something obvious (e.g. "the goal of the render is to generate a picture as if the physical model of light was followed; regardless of whether the physics of light actually were followed or not, and regardless of how any specific renderer is implemented")

What happened from my perspective is that you said "Rasterizing and ray tracing both try to implement a model of light; the difference is that rasterizers tend to ignore many aspects." I now tried to find the implications of that statement and didn't end up successful. So it actually had none and I'm sorry to have made such a long discussion out of this point - I just thought "I'll correct that and it'll be over". Sorry.

Brendan wrote:I'm some moron that has no idea how the implementation of (e.g.) a rasterizer and a ray tracer is very different

I did indeed infer that from your original point. Sorry again.

Brendan wrote:I attempt to explain my original extremely obvious point

Yeah, the point you've mentioned in this post is obvious. But

Brendan wrote:For photo-realistic rendering you create a very accurate model of light; and for real-time rendering you skip a lot of things to save time (e.g. severely limit reflection and ignore refraction).

(from one of your previous posts) was the point I got and this still isn't obvious to me (for the very reasons I explained to you, which was obviously unsuccessful, since you already knew all that - though that original statement implied to me you didn't).

Brendan wrote:patronising instead

So why didn't you just say "Yeah, I know, rasterizing and ray tracing are fundamentally different things; I know rasterizing itself does not actually use the physical model of light at all and therefore inherently cannot compute reflection and refraction; but that wasn't my point anyway"? You just silently shifted your point to "It's the result that matters", though the original post I quoted above did not seem to be that specific; therefore it seems/seemed to me you were avoiding the discussion.

In fact, your statement now:

Brendan wrote:(e.g. "the goal of the render is to generate a picture as if the physical model of light was followed; regardless of whether the physics of light actually were followed or not, and regardless of how any specific renderer is implemented")

Stands (in my opinion) in contrast to your previous one:

Brendan wrote:In both cases you try to implement a renderer that tries to follow the physical model of light.

But maybe it's just something I've (mistakenly) interpreted in there.

If your statement now is what you actually meant, I'd still dispute it, though. You're right, 3D engines often try to generate a picture as close to reality as possible (or as you said "as if the physical model of light was followed"); but my original point was exactly that this is not always the case. By saying that this is the only type of rendering that matters, you're (in my eyes) severely limiting the choice of game designers etc. But we already discussed this to the point were (I guess) we both agreed on disagreeing.

Brendan wrote:The basic steps were:

Part A:

Also known as vertex shading

Brendan wrote:
Part B:
Part C:

Also known as rasterizing + fragment shading/filling

Brendan wrote:For Part A it would've been trivial for each CPU to do each "Nth" object in parallel

Obviously, since vertex shading can be parallelized pretty good.

Brendan wrote:For Part B and Part C, it would've been trivial to divide the screen into N horizontal bands where each CPU does one horizontal band; with per CPU buffers.

Only that you would have to draw a polygon multiple times if it appeared on multiple bands. Once your polygons are (on average) drawn on as many bands as you have cores, performance will be worse than not to have threaded at all (at the latest).
Also, rasterization itself will be slower than ever, since you have to clip multiple times; filling depends on memory bandwidth only. The most processing time expensive operation is the fragment shading, which again can be parallelized trivially (even as SIMD) without having to split the screen into different areas.

Brendan wrote:For Part D you could still do the "N horizontal bands" thing, but you're going to be limited by bus bandwidth so there's probably no point.

Agreed.

So I doubt your idea will actually work out, but I have to admit that indeed shading is perhaps the most expensive operation (at least today); therefore, today's rasterization (including shading) can actually be parallelized pretty well. So I was wrong there, since the cost of rasterization itself doesn't actually matter today.

However, ray tracing still is vastly superior in this regard (at least in my opinion); it's not that easy to shade multiple polygons at once, so you pretty much can only parallelize by the number of fragments per polygon; whereas for basic ray tracing, every pixel of the screen is independent, therefore one can use as many cores as there are pixels on the screen.

Generally, rasterizing sucks behinds and in my opinion we're only still using it because of legacy issues.

Brendan wrote:Too abstract interfaces lead to code duplication; which can be solved in many ways (e.g. static libraries, shared libraries, services, etc) that don't include using a lower level abstraction for the video driver's API.

Oh, great!

I just assumed you don't like libraries at all since I believe remembering you being very sceptical about libraries generally. Also, I inferred from your (previous) statement:

Brendan wrote:Also note that I couldn't care less how the video driver actually implements rendering (e.g. if they use rasterization, ray tracing, ray casting or something else). That's the video driver programmer's problem and has nothing to do with anything important (e.g. the graphics API that effects all applications, GUIs and video drivers).

that the task of providing the highly abstract interfaces fell to every video driver programmer seperately.

But since that's obviously not the case: Yes, of course you're right. Libraries (etc.) easily solve this problem or rather allow avoiding it in the first place.

Posted: **Fri Aug 09, 2013 9:42 pm**

Hi,

XanClic wrote:
Brendan wrote:For Part B and Part C, it would've been trivial to divide the screen into N horizontal bands where each CPU does one horizontal band; with per CPU buffers.
Only that you would have to draw a polygon multiple times if it appeared on multiple bands. Once your polygons are (on average) drawn on as many bands as you have cores, performance will be worse than not to have threaded at all (at the latest).
Also, rasterization itself will be slower than ever, since you have to clip multiple times; filling depends on memory bandwidth only. The most processing time expensive operation is the fragment shading, which again can be parallelized trivially (even as SIMD) without having to split the screen into different areas.

You don't draw any part of any polygon multiple times. You might draw the top half once and the bottom half once; but "1/2*1 + 1/2*1 = 1".

The only extra work you do is clipping polygons to the top/bottom edges (instead of doing it once for the screen's edges you'd do it once for each horizontal band), but this is almost nothing. Most of the work is calculating the start/end of each fragment, inserting the fragment into the sorted list for the screen line, and converting the sorted list for each screen line into pixels; but all of this is only done once (on whichever CPU is responsible for the corresponding horizontal band).

Of course for a more complex renderer it'd work even better. For example, if it did textures then the inner loop of Part B would have to calculate/store the texture co-ords at the start/end of a fragment and Part C would have to find the colour of each textel as it converts fragments into pixels, but both these parts are very scalable.

If it takes one CPU N ms to draw the scene (not including blitting the final buffer to display memory) then I'd expect 2 CPUs to be almost (but not quite) twice as fast; and 32 CPUs to be about 30 times faster. The real problem (for scalability) isn't drawing the pixels, it's blitting pixels to display memory (Amdahl's Law).

XanClic wrote:However, ray tracing still is vastly superior in this regard (at least in my opinion); it's not that easy to shade multiple polygons at once, so you pretty much can only parallelize by the number of fragments per polygon; whereas for basic ray tracing, every pixel of the screen is independent, therefore one can use as many cores as there are pixels on the screen.

That depends on what sort of shading. My code used the angle between the polygon's normal and the camera to determine the shade for the entire polygon (essentially; shading the entire polygon from one calculation), and this was very simple to do. It'd be just as trivial to determine the angle between the polygon's normal and a light source. Having a single shade for the entire polygon is crappy though, especially for large polygons. A better idea would've been to determine the angle between the polygon's normal and the light source/s at each vertex and interpolate, or to calculate the shade for each individual pixel, or to split the fragments into smaller pieces of "close enough angle". All of these things are relatively easy to do. However; none of them give you shadows.

For ray tracing, each pixel is independent (which makes it even easier than "very easy" to do in parallel). The problem is that it's several orders of magnitude slower; which is why it's (mostly) never used for real-time rendering and (mostly) only ever used when you can afford to spend 30 minutes or more generating a single frame.

XanClic wrote:Generally, rasterizing sucks behinds and in my opinion we're only still using it because of legacy issues.

No. Ray tracing is older than rasterisation, and everyone shifted from ray tracing to rasterisation because ray tracing was far too slow for real-time graphics. It's only recently that we've seen specialised hardware capable of handling the massive amount of overhead involved.

Of course (for my plans) applications have no reason to care how the renderer works; so whoever writes the renderer is free to use whatever is best for the hardware and/or time limits involved (which may be rasterisation in some cases and ray tracing in others).

XanClic wrote:
Brendan wrote:Too abstract interfaces lead to code duplication; which can be solved in many ways (e.g. static libraries, shared libraries, services, etc) that don't include using a lower level abstraction for the video driver's API.
Oh, great! I just assumed you don't like libraries at all since I believe remembering you being very sceptical about libraries generally.

Yes; I prefer the "services" way of solving code duplication problems. The main point here is that it'd be silly to make the video driver API so low level that (e.g.) applications need to care which video card/s are present and/or how they do rendering, just because you don't feel like using existing solutions to code duplication problems.

XanClic wrote:Also, I inferred from your (previous) statement:
Brendan wrote:Also note that I couldn't care less how the video driver actually implements rendering (e.g. if they use rasterization, ray tracing, ray casting or something else). That's the video driver programmer's problem and has nothing to do with anything important (e.g. the graphics API that effects all applications, GUIs and video drivers).
that the task of providing the highly abstract interfaces fell to every video driver programmer seperately.

A video driver can be responsible for getting rendering done, and fulfil that responsibility by delegating some or all of the work. For example, I might have a "renderer service" (e.g. a normal process that does software rendering) where 10 different video cards (e.g. video cards that only support mode switching and don't support hardware acceleration) use the "renderer service". For another example, I might have a service that converts "lists of commands" into "ATI GPU commands"; or a "font engine" service, or a service that just does matrix maths, or a service that only loads graphics files and converts them into various texture formats (DXT1, DXT2, ...), or any other services that anyone feels like wanting. Of course all of these examples are only examples. The only thing I'm trying to say here is that being responsible for something isn't the same as actually doing it yourself.

Cheers,

Brendan

OSDev.org

OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics

Re: OS Graphics