GPU programming

All off topic discussions go here. Everything from the funny thing your cat did to your favorite tv shows. Non-programming computer questions are ok too.
Locked
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

GPU programming

Post by bzt »

Hi,

How many of you tried to port GPU rendering to your OS? I mean really porting it, not just pushing some copy'n'pasted library written by a random guy to compile under your OS without errors.

Reading the specs I must conclude that guys at Khronos are a bunch of incompetent idiots. To back up that claim, I'm going to analyze their APIs in 3 aspects:
syntax: how you define a call, how you pass arguments etc.
schemantics: how easy is to figure out what a function is for, what arguments it expects, and how the functions fit together and fit in the bigger picture
abstraction: how good it covers the underlying actual GPU architecture, and how many functions need to be called to cover specific layers or parts
Now, let's see what we got.

Original OpenGL
The syntax was good and simple, scalar values to functions. The schemantics were also perfect, everybody knew from the first sight what

Code: Select all

glBegin(GL_TRIANGLES);
glVertex3f(1.0, 1.0, 1.0);
is supposed to do, and that glVertex3f() expects 3 float arguments. It is uncertain, why did some task require 3 or more function calls, always in the same order (like binding a texture for example), but that's not a big deal. The abstraction was terrible though, and as it was a static standard, there was no way to influence the pipeline or keep up with the video card manufacturers putting new features in the cards. Not to mention that is was bloated with lots of stuff not related to the GPU, like a linear algebra library (which is a good to have, just really not part of the video card API).

Modern OpenGL
It took them more than a decade (!) but finally they realized OpenGL was a failure. So they came up with the "modern" OpenGL 3 standard, in which they throw out about 3/4 of the functions, and replaced with a few simple ones in which they haven't messed up the syntax luckily. This is welcomed, the problem lies in schemantics, for example what a lunatic could came up with the idea that it's feasable to integrate a complete compiler in each and every single video driver??? (Hint: it's not, and many video driver developers failed to implement GLSL properly, as they are low-level hardware developers, not compiler-theory experts.) However they come closer to a better abstraction a little bit, but ultimately they failed. This lead to many API dialects, like OpenGL/ES and OpelGL/Desktop for example, each with it's own versioning. Bad. Really bad, defeats the whole purpose of having a single standard for the GPU. Almost every other hardware abstraction APIs are capable of handling extensions in a standard way (think about cpuid, X11 Extensions, EFI Locate protocol, etc.), but not OpenGL. Just for the records, you can use a separate glew library, but that's not part of the standard OpenGL API, and it is not implemented on all platforms, which again, defeats the whole purpose.

Vulkan
Okay, third times the charm, they've tried to design an API again. This time the syntax is totally f*cked up. Why on earth do we need those structs? They just make the source code totally unreadable and the most unmaintainable. I mean, in EFI, where you dynamically query the interfaces using GUIDs, that makes perfect sense to use a struct to describe the interface (and even there EFI usually does not use struct arguments in interface methods, and when it does it's clear from the name what struct it expects. If EFI is considered overcomplicated, what does this tell you about Vulkan?) As Vulkan has FIXED interface anyway, so why? As a result, the default C API is totally unusable, and you have to use their C++ wrapper class if you want to make your code a little bit more readable. WTF? Why are the developers forced to use a specific OOP language to use the GPU??? Okay, let's move to schemantics. Good point that they have tried to implement the extensions in the basic API, but bad point that they haven't succeed (but at least they've tried). They have replaced the compiler requirement with a simple bytecode interpreter, but I'd say that's the only advantage Vulkan has over modern OpenGL. Becuse it is extremely hard to figure out what parameters a certain function needs, and since they are wrapped in structs, the compiler won't help you any more with the proper function invocation. Also it is extremely hard to learn which functions has to be called in which order for a specific task. Just a quick survey, how many of you can figure out what this function does actually, when do you have to call it in the rendering cycle and what arguments it expects exactly?

Code: Select all

VkResult vkCreateDescriptorUpdateTemplate(
    VkDevice                                    device,
    const VkDescriptorUpdateTemplateCreateInfo* pCreateInfo,
    const VkAllocationCallbacks*                pAllocator,
    VkDescriptorUpdateTemplate*                 pDescriptorUpdateTemplate);
See what I'm talking about? What kind of Descriptor is that? What the Template is used for? For the descriptor itself, or for the object it's describing? The function name and argument names tell nothing. Because, and now we arrive at abstraction level, abstraction layers are actually literally nonexistent in Vulkan. It's the worst ever, as now it tied to the OS and the windowing system, WTF? Clearly a sign of messed up abstraction layers. This just makes video card drivers more difficult to write, and makes Vulkan applications unportable (nope, requiring a bunch of precompiler #if-s in every single application is not a good API). A good API provides an OS and GUI independent abstraction for framebuffers which then a GUI can use to display the window (but that's the GUI's job, not the video card driver's). On the low end layers though, Vulkan exposes the GPU hardware internals, which is just really bad. You have to learn A LOT (hey, not all developers are video card manufacturers!) just to figure out how the components fit together, and what arguments they are expecting, and THERE'S A COMPLETE LACK OR ERROR REPORTING (you have to validate your code in advance, which is good for compile time errors, but not for run-time errors). The way Vulkan was designed does not help developers at all, but it creates huge number of opportunities to mess up. Saying Vulkan is faster because now the efficiency lies entirely on the shoulders of the developers who're using it is a bad joke at best.

What are your thoughts? Do you also think that people at Khronos can't design a proper API?

Cheers,
bzt
alexfru
Member
Member
Posts: 1111
Joined: Tue Mar 04, 2014 5:27 am

Re: GPU programming

Post by alexfru »

I escaped COM and DirectX, but neither seems particularly nice in C. You need to use the dedicated helper macros for things that C++ hides. And yes, there are some seemingly obscure things/structs too.
reapersms
Member
Member
Posts: 48
Joined: Fri Oct 04, 2019 10:10 am

Re: GPU programming

Post by reapersms »

Introduction: I mostly lurk here and on gamedev.net. My day job is graphics and systems programming for game consoles, with a heavy focus on performance, and has been for ~18 years at this point.

Vulkan is a bit messy to learn. DX12 is worse. This is probably going to take a few posts.

First, sorting out some misconceptions and elaborating on some history:

OpenGL 1.0
OpenGL was not a "video card" API, it was a 3D rendering API, and the primary target was IRIX/X11. It was not necessarily hardware accelerated at the time, and integration with a system GUI was left to external bits. In particular, the API was set up such that it could be accelerated piecemeal, with slow paths and fast paths. It did have an queryable extension mechanism, and a process for promoting extensions from vendor specific, to multi-vendor, to official, to standard. What it did not have was a helpful or useful way to determine whether a particular feature was hardware accelerated, or emulated in software.

The standard was not static, just a bit slow moving, as it was controlled by a review board (the ARB) populated by representatives of various hardware, OS, and software vendors. A number of nice features and overhauls spent a long time in limbo, due to vetos from certain OS vendors with competing APIs, hardware vendors that didn't have a good plan for accelerating certain things, and software vendors that did not want to break any of their legacy code with newfangled things for those damned video gamers.

The abstraction wasn't terrible at the time, and the quirks of the API were influenced a fair bit by the demands of X11. In particular, it would work over a remote X link, which I believe is where Bind-to-modify came in. Extensions along the way started complicating things, mostly for ways to cut down on the API call traffic by allowing it to read application arrays directly.

OpenGL 2.0-4+
A long development process went into what eventually became OpenGL 2.0, but it was not the ideal one. There was a much better proposal, referred to at the time as Longs Peak IIRC, that was eventually stripped and watered down to what became 2.0. Adoption on Windows was hampered a good deal by some quirks of the environment and MS themselves. On the application side, the issue was that the system library only provided the 1.1 interface. The driver could provide the full one, but to get to it you needed to query for the various extensions that made up the later versions, and interrogate it to get the function pointers in question. The process was a tad arcane, and window system specific, hence the plethora of helper libraries to deal with the mess.

The GLSL situation was rather not great. It was designed a bit from on high to push towards a higher level abstraction, and get away from the vendor specific assembly interfaces. The goal was laudable, to let applications write one shader program, rather than having to detect and deal with the wide variety of hardware avaliable. Where it really ran into issues was that it mandated a particular program structure that was a good deal more restrictive than what was allowed at the time, and retrofitting to it was a painful bookkeeping mess.

While that was going on, the hardware was changing drastically under the hood, which is where the abstraction started diverging from reality. OpenGL handled it a tad better than D3D did.

The OpenGL 3.0 Core vs Compatability context adventure was an attempt to cut ties with parts of history that made writing an opengl implementation more difficult. In reality, no real vendor truly ditched the compatability context, as AutoCad wasn't going away any time soon. ES came about specifically to cut down to an acceleratable subset that would work with the mobile hardware of the day.

Through this period, the PC side was generally a war between 2-3 driver vendors. One provided a driver where everything generally Just Worked, whether the spec said the should or not. One provided one that was very particular about the spec as written, but that usually resulted in things not working as reliably. One really preferred to just make CPUs. One of them got a bit frustrated with the decades long impression of unreliability, and constant digs at the performance of the driver, and came up with a new API that more directly matched where hardware was at the time, and where it was headed in the future.

Vulkan
That API was turned, almost directly, into Vulkan. It's primary purpose was performance, and providing a low overhead interface to reality. Ease of use was not as large a concern, due to:
  • The more abstracted, easier to use APIs were still avaliable and provided for non peformance-critical uses. They had to be, for existing software to continue running.
  • A long running trend towards using an existing engine to abstract away the platform and API details. A large portion of the vulkan targeted apps these days don't write a single line of Vulkan code, because they just use UE4.
  • A general desire to have a direct, no-nonsense interface to allow PC games some of the advantages of the dedicated consoles.
The window system integration got moved into an official part of the API, with the loader layers and such, to allow applications to try a better job at not caring whether they were on X or Win32. The process for doing that on older OpenGL was even more terrible, as the window system portions of the interface tended to not get updated, ever.

API usage validation has gotten more complicated than is reasonable to deal with via glGetError and friends. It is further complicated by the general performance hit from checking everything, as well as the lack of any graceful method of handling a large number of the errors that come up. These days, it is almost always best handled by a validation or debug layer, where a flag at initialization says whether to swap that in or not. That pushes all of the validation overhead into the library itself. Upsides:
  • Your application code does not need to be littered with checks for rare, esoteric cases you can't recover from anyways
  • You only pay the performance hit when specifically testing for correctness
It is a bit bureaucratic about things like memory allocation, and the documentation does take a lot of study to really understand what's going on. The first is because the PC environment is a good deal messier than the ideal, and the performance is tightly connected to that. The alignment required for various objects, caching policies, location (system local, system via PCIe, on-card, on-die) and such are extremely important, but are also very specific to the type of resource and the hardware in question. Vulkan and DX12 were built to be able to provide that information in a way that it can be used efficiently -- but not always easily.

The second issue, one of needing a lot of domain knowledge to use the API properly, is just the nature of the beast. Vulkan stretches rather far down in a layer diagram, and gives you the ability to dig almost as deep as you would like. Highly abstracted, easy(er) to use interfaces are still avaliable. OpenGL 1.0 is still there (though it may not perform nicely these days), WebGL is still a thing (somehow), etc.

The abstraction it provides is very thin, and is built to give the application developer as much ability to eliminate useless overhead as possible. Hardware 3D rendering these days is as much about buffer management and plumbing as it is about pixels, and essentially finding a better way to memcpy() things back and forth.

Next post will go through why the interfaces have changed over to what they look like these days. I can also go into more depth on the history and architectural issues if wanted.
reapersms
Member
Member
Posts: 48
Joined: Fri Oct 04, 2019 10:10 am

Re: GPU programming

Post by reapersms »

Why everything involves those godawful looking structures these days, and why it can't be like Simpler Times

OpenGL, DirectX 5 through 9
Things were nice and straightfoward for these. You made a number of calls to set up state, and then a couple more to issue draw commands. Easy to understand. Really great before shaders showed up.

High performance, both on the GPU and CPU, revolved around eliminating as many state modifications as possible. The driver may detect redundant state calls itself, but then it may not.

Under the hood, there was rarely a 1:1 relationship between the settable state and a particular hardware register. Many states could be packed into one, some states would stretch across several, and the interactions between them all were not always valid. Additionally, the API would provide certain guarantees about dependencies and resource usage, for things like rendering to a buffer, and then reading that buffer as a texture in a subsequent draw call.

Generally, checking the consistency of the state, and producing the actual hardware register writes or command buffer packets for state changes would be deferred. That usually worked out as the driver keeping a number of flags for dirty state, and triggering sometimes heavyweight checks, intermediate state block creation, flushes, and stalls to validate them -- but they can't do the validation until they know the full state for the draw call. The easy way to deal with this was to have a check at every draw call for dirty flags.

Several approaches at improving that situation were tried through the years, such as display lists, state blocks, command lists, etc. None quite solved the problems at hand, and in the DirectX case, the provided abstraction was too tied to the previous hardware.

OpenGL dealt with this a bit via vendor extensions, and a general tendency to have much lower CPU overhead on API calls (most of them were handled in user space, vs DirectX punting to kernel space for almost everything)

Shaders were a point of contention, DirectX allowed you to freely mix and match shaders between stages, OpenGL GLSL required you to link your vertex and pixel shaders together into monolithic program objects. This had a tendency towards some nice explosions, as instead of needing M vertex shaders and N pixel shaders, you could end up needing M*N program objects.

The GLSL approach was closer to reality later on in the generation. DirectX would end up having to patch the shaders at the last second to connect to input buffers, and get the vertex stage properly talking to the fragment stage. GL later added tesselation stages somewhere in the 4.x range IIRC.

GLSL accepts shaders only as raw text, D3D accepts both text and bytecode compiled objects. Both are going to get recompiled by the driver under the hood into actual GPU instructions.

Shader parameters are generally modelled as a flat array of SIMD vector registers per stage. GLSL tracks them per program. Assignment of internal names to the actual slots in question can be either specified in the shader code, or queried at runtime via a reflection interface, depending on the design of the system. OpenGL gains some extensions later to manage them in larger buffer units, rather than requiring them to all be set via individual calls.

Resources (Textures mostly) are handled as flat arrays of slots per stage.

Memory management is hidden away by the API, and has all the hidden costs associated with that. Generally buffers are garbage collected, and explicit map/unmap or lock/unlock calls are needed to get a cpu accessible pointer. The driver may keep multiple copies of data around, to handle situations like a toggle between fullscreen and windowed, which tended to have a side effect of invalidating all of VRAM, forcing you to explicitly recreate or reupload your data. This started becoming an issue when things were still 32 bit, and the video cards started having a gig or two of ram on them.

DirectX 10 & 11
These were a well intentioned, but slightly misguided attempt at resolving the state issue and some of the API overhead issues.

Instead of having a hundred or so individual states set via things like

Code: Select all

pD3DDevice->SetRenderState(D3DRS_STENCILENABLE, FALSE);
They group the states together into larger blocks. You create state objects from a DESC structure, and get back a refcounted pointer to an immutable object. The API claims to give you the same pointer if passed an identical DESC. In theory, this lets the driver do all of its validation and create intermediate representations up front, and then you set the state block as a single operation. In practice, this doesn't work out so nicely, as there are still a lot of interactions between the state blocks that need to be checked and validated before draw calls, and managing those state blocks is a real pain due to the immutability, and the unpredictability of what states you actually use.

The usual result was you end up having to do your own bookkeeping of everything, that the driver will still be doing itself under the hood, to avoid calling the driver unless you have to. You also end up carrying around a lot of pointers to refcounted objects, and have an excess of them due to the state blocks themselves being less granular than the hardware generally supported.

Shaders were still allowed to be separate, which meant it still had to patch them up. Tesselation was added around here as well. The API demands compiled bytecode objects, but the compiler ships as a redistributable extension library, allowing games to still generate and compile their shaders on the fly.

Shader parameters are placed in buffers, with the structure either queried by reflection or handled by each application as they so choose.

Resources are still flat arrays per stage.

Things are still handled by garbage collected objects, with explicit mapping to access. Various hints are available to try and avoid unnecessary synchronization between the GPU and CPU.

DirectX 12, Vulkan

Both of these dealt with the shader and state patching and validation problems by wrapping everything into one monolithic pipeline state object. IIRC they allow some cheap patching to adjust often changed parameters, but strongly encourage you to compile those pipeline objects offline.

The hardware moved away from flat slots per stage, to more general buffers of resource objects per shader a long time ago. DX12 and Vulkan expose that a bit more directly, and they can be embedded directly in shader parameter buffers, or indexed in the shader. This gives you a lot more flexibility with how you manage them, and can also allow you to get away with fewer heavyweight state changes between draw calls -- changing parameter buffer pointers is quick, changing texture slots less so. If you can stuff the texture resource object in the parameter buffer, as far as the API and device are concerned, there's no stage change it needs to do a heavy flush for.

Those resource objects are what a Descriptor is in DX12/Vulkan (not to be confused with the endless *_STATE_DESC structs flying around DX10/11).

Where things get particularly tricky these days is that you have full control over the memory allocation of objects, and total responsibility for proper synchronization. This drastically complicates the application side for those used to the old ways, but it lets you remove excess overhead. For instance, instead of the driver checking every bound texture for a dependency related flush, because it doesn't know how things are related, the application can check only the set it knows could possibly interact. You can also make better use of memory, as you know exactly when a particular buffer may be used or reused. An example there would be aliasing several buffers over the same block of memory, as you know beforehand that their used lifetimes during a frame never interfere.

The naming and structure are a bit arcane and overspecified, but that is what happens with most things modern. The move towards structures mirrors directx, and likely stems from calling convention concerns, and making it easier to reuse common things as needed.

For that particular monstrosity...
  • device: context/this pointer
  • pCreateInfo: structure with all of the stuff it needs to create the actual object
  • pAllocator: allocation callbacks for memory management, as that's under the application developers control
  • pDescriptorUpdateTemplate: output parameter
There is a general push to eliminate persistent API state as much as possible, which may be why the allocators are passed in every time.

As for what that object is actually for, resources are dealt with in arrays called Descriptor Sets. A particular combination of shaders will expect things to be in particular spots, such as a set of vertex buffers, parameter buffers, a number of textures, and samplers. Those may be dealt with by name, maybe not. The application will want to change those around often, and will usually be a configuration they've used before (such as drawing a player, then a weapon, then the background, every frame). The DescriptorUpdateTemplate encapsulates the mapping from the form the application provides, to the particular locations in a Descriptor Set, allowing the application to update the current descriptor set with the resources it wants, in one call, pulling directly from the form it likes most.

In short, a really fancy name for a scatter/gather array.
User avatar
zaval
Member
Member
Posts: 656
Joined: Fri Feb 17, 2017 4:01 pm
Location: Ukraine, Bachmut
Contact:

Re: GPU programming

Post by zaval »

bzt wrote:Hi,

How many of you tried to port GPU rendering to your OS? I mean really porting it, not just pushing some copy'n'pasted library written by a random guy to compile under your OS without errors.

Reading the specs I must conclude that guys at Khronos are a bunch of incompetent idiots. To back up that claim, I'm going to analyze their APIs in 3 aspects:
syntax: how you define a call, how you pass arguments etc.
schemantics: how easy is to figure out what a function is for, what arguments it expects, and how the functions fit together and fit in the bigger picture
abstraction: how good it covers the underlying actual GPU architecture, and how many functions need to be called to cover specific layers or parts
Now, let's see what we got.

Original OpenGL
The syntax was good and simple, scalar values to functions. The schemantics were also perfect, everybody knew from the first sight what

Code: Select all

glBegin(GL_TRIANGLES);
glVertex3f(1.0, 1.0, 1.0);
is supposed to do, and that glVertex3f() expects 3 float arguments. It is uncertain, why did some task require 3 or more function calls, always in the same order (like binding a texture for example), but that's not a big deal. The abstraction was terrible though, and as it was a static standard, there was no way to influence the pipeline or keep up with the video card manufacturers putting new features in the cards. Not to mention that is was bloated with lots of stuff not related to the GPU, like a linear algebra library (which is a good to have, just really not part of the video card API).

Modern OpenGL
It took them more than a decade (!) but finally they realized OpenGL was a failure. So they came up with the "modern" OpenGL 3 standard, in which they throw out about 3/4 of the functions, and replaced with a few simple ones in which they haven't messed up the syntax luckily. This is welcomed, the problem lies in schemantics, for example what a lunatic could came up with the idea that it's feasable to integrate a complete compiler in each and every single video driver??? (Hint: it's not, and many video driver developers failed to implement GLSL properly, as they are low-level hardware developers, not compiler-theory experts.) However they come closer to a better abstraction a little bit, but ultimately they failed. This lead to many API dialects, like OpenGL/ES and OpelGL/Desktop for example, each with it's own versioning. Bad. Really bad, defeats the whole purpose of having a single standard for the GPU. Almost every other hardware abstraction APIs are capable of handling extensions in a standard way (think about cpuid, X11 Extensions, EFI Locate protocol, etc.), but not OpenGL. Just for the records, you can use a separate glew library, but that's not part of the standard OpenGL API, and it is not implemented on all platforms, which again, defeats the whole purpose.

Vulkan
Okay, third times the charm, they've tried to design an API again. This time the syntax is totally f*cked up. Why on earth do we need those structs? They just make the source code totally unreadable and the most unmaintainable. I mean, in EFI, where you dynamically query the interfaces using GUIDs, that makes perfect sense to use a struct to describe the interface (and even there EFI usually does not use struct arguments in interface methods, and when it does it's clear from the name what struct it expects. If EFI is considered overcomplicated, what does this tell you about Vulkan?) As Vulkan has FIXED interface anyway, so why? As a result, the default C API is totally unusable, and you have to use their C++ wrapper class if you want to make your code a little bit more readable. WTF? Why are the developers forced to use a specific OOP language to use the GPU??? Okay, let's move to schemantics. Good point that they have tried to implement the extensions in the basic API, but bad point that they haven't succeed (but at least they've tried). They have replaced the compiler requirement with a simple bytecode interpreter, but I'd say that's the only advantage Vulkan has over modern OpenGL. Becuse it is extremely hard to figure out what parameters a certain function needs, and since they are wrapped in structs, the compiler won't help you any more with the proper function invocation. Also it is extremely hard to learn which functions has to be called in which order for a specific task. Just a quick survey, how many of you can figure out what this function does actually, when do you have to call it in the rendering cycle and what arguments it expects exactly?

Code: Select all

VkResult vkCreateDescriptorUpdateTemplate(
    VkDevice                                    device,
    const VkDescriptorUpdateTemplateCreateInfo* pCreateInfo,
    const VkAllocationCallbacks*                pAllocator,
    VkDescriptorUpdateTemplate*                 pDescriptorUpdateTemplate);
See what I'm talking about? What kind of Descriptor is that? What the Template is used for? For the descriptor itself, or for the object it's describing? The function name and argument names tell nothing. Because, and now we arrive at abstraction level, abstraction layers are actually literally nonexistent in Vulkan. It's the worst ever, as now it tied to the OS and the windowing system, WTF? Clearly a sign of messed up abstraction layers. This just makes video card drivers more difficult to write, and makes Vulkan applications unportable (nope, requiring a bunch of precompiler #if-s in every single application is not a good API). A good API provides an OS and GUI independent abstraction for framebuffers which then a GUI can use to display the window (but that's the GUI's job, not the video card driver's). On the low end layers though, Vulkan exposes the GPU hardware internals, which is just really bad. You have to learn A LOT (hey, not all developers are video card manufacturers!) just to figure out how the components fit together, and what arguments they are expecting, and THERE'S A COMPLETE LACK OR ERROR REPORTING (you have to validate your code in advance, which is good for compile time errors, but not for run-time errors). The way Vulkan was designed does not help developers at all, but it creates huge number of opportunities to mess up. Saying Vulkan is faster because now the efficiency lies entirely on the shoulders of the developers who're using it is a bad joke at best.

What are your thoughts? Do you also think that people at Khronos can't design a proper API?

Cheers,
bzt
Image
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: GPU programming

Post by bzt »

Hi,

@zaval: that's all you can add to the topic? Yes, I misspelled, I meant semantics. I'm not a native speaker you know. So what?
reapersms wrote:Introduction: I mostly lurk here and on gamedev.net. My day job is graphics and systems programming for game consoles, with a heavy focus on performance, and has been for ~18 years at this point.
Welcome, and thank you for your post! I really liked your historical comments, thanks!

TL;DR in short, thanks for the historical insight and that you confirmed that I'm not imaging that the 3D API is not as good as it could be.
reapersms wrote:Vulkan is a bit messy to learn. DX12 is worse. This is probably going to take a few posts.
Yes, that was my point :-) It shouldn't have been messy, there are plenty of APIs which are not and could have served as an example for the Khronos board.
reapersms wrote:First, sorting out some misconceptions and elaborating on some history:
It seems that you misunderstood a few things, let me explain, if I may.

OpenGL
reapersms wrote:OpenGL was not a "video card" API, it was a 3D rendering API
True, but that 3D rendering was supposed to be executed on the video card, not on the CPU. (Yes, there were CPU-only GL implementations too, but that was merely a workaround, not a design goal. CPUs were never designed for efficient parallel matrix operations for example, they are and always were, general purpose.)
reapersms wrote:integration with a system GUI was left to external bits.
...
API were influenced a fair bit by the demands of X11
Since X11 is a GUI, these two sentences contradicts each other. Also I don't remember that GL was dependent on Xlib ever (only the other way around, X11 incorporated GL). Even if X11 protocol had tokens for GL commands, I don't think that had any influence on how the functions were named and what arguments they needed, or what properties the GL state machine had.
reapersms wrote:The standard was not static, just a bit slow moving
What I meant, it did not had something like EFI's LocateProtocol. The GL standard (and therefore the API) lacked something like glGetAvailableExtensions() and glDynamicallyLinkExtension() functions. You had to install a new version of the library to get more features. You couldn't get those in run-time without re-compilation, static in this sense.
reapersms wrote:review board (the ARB) populated by representatives of various hardware, OS, and software vendors. A number of nice features and overhauls spent a long time in limbo, due to vetos from certain OS vendors with competing APIs, hardware vendors that didn't have a good plan for accelerating certain things, and software vendors that did not want to break any of their legacy code with newfangled things for those damned video gamers.
Ergo they were a bunch of people incompetent to properly design an API. You said it too :-)
reapersms wrote:The abstraction wasn't terrible at the time
Not at the time, no, but it wasn't future proof, and manufacturers couldn't extend it with their latest features for sure. You had to wait until the board voted the feature in (which, as you said, they did not do in several occasions). For example the X11 protocol is designed in a way that it can be extended without loosing compatibility or requiring a new version of Xlib.so to be installed. Therefore it can cover the features of new hardware without changing the interface. That I'd say is a good, non-static abstraction.
reapersms wrote:The GLSL situation was rather not great. It was designed a bit from on high to push towards a higher level abstraction, and get away from the vendor specific assembly interfaces. The goal was laudable, to let applications write one shader program
For that it's enough to standardize a bytecode. The hardware manufacturers can then convert the standard bytecode into their vendor specific instructions. Just as SPIR-V in Vulkan. No need for a complete compiler. I'd like to point out that IBM had experience with JIT bytecode compilation for over half a century (!) and Java already had a long history when OpenGL 3 came out... Again, the example were there, but Khronos members simply did not pay attention.
reapersms wrote:While that was going on, the hardware was changing drastically under the hood, which is where the abstraction started diverging from reality. OpenGL handled it a tad better than D3D did.
Yes. However the basic concepts remained, a vertex contains 3 coordinates just as 30 years ago. (I know VBOs were introduced as a big thing, huge innovation, revolution whatsoever, but in reality programs stored verteces in CPU memory in struct arrays anyway. The only difference is, a VBO is allocated on the GPU's memory.)
reapersms wrote:The OpenGL 3.0 Core vs Compatability context adventure was an attempt to cut ties with parts of history that made writing an opengl implementation more difficult.
Again, you are perfectly right, but let's not forget M$ was on Khronos, and if anybody, they know the best how to keep compatibility with ancient old APIs and introducing new innovation at the same time. The required knowledge was there, it should not have been such a big deal to cut ties with history, imho.
reapersms wrote:Through this period, the PC side was generally a war between 2-3 driver vendors. One provided a driver where everything generally Just Worked, whether the spec said the should or not. One provided one that was very particular about the spec as written, but that usually resulted in things not working as reliably.
...
came up with a new API
...because the spec was poorly designed and the API was unexpandable. Couldn't agree more, that was my original point :-)

Vulkan
reapersms wrote:That API was turned, almost directly, into Vulkan. It's primary purpose was performance, and providing a low overhead interface to reality.
That's the thing. It's does not have better performance.

An application using OpenGL shaders and another one using Vulkan has exactly the same (logical) steps to make before their triangle gets displayed on the screen. What really happened here is, that all so called "overhead" is now moved from driver space into application space. Which means IF the application developer knows what he's doing, and is a better programmer than the driver developers, then he can write a software which performs better. On the other hand, all the other application programmers (who are less talented/experienced than the driver developers) are going to write a much much worse, more buggy and slower software.
And they are the majority.
There's no way they would know, what you meant by "stuff" when you said "pCreateInfo: structure with all of the stuff it needs to create the actual object", and they are definitely the ones who are too lazy to read the documentation. They will just copy'n'paste a struct from some forums without knowing what it actually means.
reapersms wrote:The window system integration got moved into an official part of the API, with the loader layers and such, to allow applications to try a better job at not caring whether they were on X or Win32.
But window system integration had exactly the opposite effect. It should have been the windowing system that integrates Vulkan, and not the other way around. This is called poorly designed abstraction.
reapersms wrote:The process for doing that on older OpenGL was even more terrible, as the window system portions of the interface tended to not get updated, ever.
I strongly disagree. I'm talking from experience, that on Linux, the DRI was good; you asked for a buffer from the driver, and passed that same buffer to the windowing system as your window's buffer (I'm not talking about the API how DRI is implemented is good, it certainly had it's issues which were fixed independently to GL/Vulkan in DRI3; just that the basic concept was good and it was and is working). I actually never had problems about a GL window not getting updated, well, ever. Although this is not a big deal, it's not that multiplatform programs are not full of "#ifdef __OS__" precompiler directives already. It's just one more.
reapersms wrote:API usage validation has gotten more complicated than is reasonable to deal with via glGetError and friends.
Exactly. Also API usage validation cannot substitute glGetError, as the latter can report run-time errors too, while a validator obviously can't.
(Imagine that there's everything fine with the API calls, the program works on 9 machines without problems, but on the 10th machine the video card is faulty or the driver is buggy. No way a validation layer can handle this - unfortunately not far fetched - scenario.) => bad design
reapersms wrote:the PC environment is a good deal messier than the ideal
That's true, however I have never heard of anybody complaining about using for example the POSIX API on a PC is different than using it on IRIX. Because that's the purpose of the standardized API, to hide the hardware and lower level mess.
reapersms wrote:Next post will go through why the interfaces have changed over to what they look like these days. I can also go into more depth on the history and architectural issues if wanted.
I really don't want to break down your enthusiasm, and I really did enjoy your historical comments. But the truth is, most people don't really care why a particular interface is a big pile of sh*t, what they care about is that IT IS a big pile of sh*t. No offense of any kind meant.

Speaking only for myself, I'm interested in history, so please. I only hope others are interested too. I particularly liked what you wrote about Vulkan and memory allocation, it was interesting.

Cheers,
bzt
Last edited by bzt on Fri Oct 04, 2019 6:22 pm, edited 1 time in total.
User avatar
zaval
Member
Member
Posts: 656
Joined: Fri Feb 17, 2017 4:01 pm
Location: Ukraine, Bachmut
Contact:

Re: GPU programming

Post by zaval »

@zaval: that's all you can add to the topic? Yes, I misspelled, I meant semantics. I'm not a native speaker you know.
but it was fun nevertheless.
as of adding something to this thread. you know, it's not the first time, when you start moaning about something you dislike, using strong words and the same time, - either not paying much attention to the subject or, as in this example, - not having the needed level of expertise to get yourself in such a stand. what else one can add to this? of course, thanks to reapersms, the thread has become something interesting, but it's totally his achievement. you just failed to get that complex thing and instead of keeping on studying, decided to relax by bashing it. it's solely your joy. If you don't get, I think your criticism is of zero value, you just don't know that Vulkan well enough to scream as you do. I don't know it too, but I don't moan either. Mostly, I believe in standards, - I believe they are produced by experienced people, way more, than you or me, so I tend to have good attitude towards standards (except those web related, which I hate :D). and graphics for me is chinese grammar, didn't even try to hide that. :mrgreen:

Yes, first I wanted to add that you forgot to mention "M$" (as the root of all troubles), which is impossible to not mention for you when it's time for your next portion of rant; but then I decided to not do so, and, as we see, you fixed that in the next post. -_-

Btw, the wikipikia article on vulkan, is almost unbearable to read due to enormous amount of impudent, suggary blurb - almost every line mentions "extreme low overhead", "highly efficient", "ultra superior", etc. they piss boiling water of their excitement on how this Vulkan efficient is. we all know why they try so hard to look not as an encyclopaedia here - because of DX12, but you, how you dared to not like what "is gonna smash out" DX12 from "M$"? :lol:

PS. and yet, this "automatic courtesy":
Cheers
I hate this! :D
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: GPU programming

Post by bzt »

zaval wrote:Yes, first I wanted to add that you forgot to mention "M$" (as the root of all troubles), which is impossible to not mention for you
What makes you think that you know what is impossible to mention for me or not? You are just imagine that I think M$ is the root of all trouble, I've never said that, and I don't think that. I shorten the name because it's easier to write, everybody recognizes it, and I use the dollar sign because Microsoft is indeed one of the biggest corporations making huge amount of money. If there were an "S" in Google or in Facebook, I would use "$" there too :-)
zaval wrote:you start moaning about something you dislike
This is something that you seem to unable to comprehend. It never is about what I like or dislike. It's about what's true, and what's good for the industry and the developers in it, and for the end users of course, because let's not forget, software are written for them at the end of the day.
zaval wrote:PS. and yet, this "automatic courtesy":
Cheers
I hate this! :D
When I lived in the UK, everybody was using this, so it stuck with me. Get over it!

Cheers,
bzt
User avatar
iansjack
Member
Member
Posts: 4688
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: GPU programming

Post by iansjack »

bzt wrote:You are just imagine that I think M$ is the root of all trouble, I've never said that, and I don't think that. I shorten the name because it's easier to write, everybody recognizes it, and I use the dollar sign because Microsoft is indeed one of the biggest corporations making huge amount of money.
That's just childish, I'm afraid. You can't be so naive as to not recognize how pejorative that is. It's reminiscent of the Apple/PC flame wars. If you've got time to add a meaningless "cheers" to every post (which just seems to be copying what Brendan used to do) then you surely have time to spell out company names. I'm afraid that people who use the silly "M$" abbreviation look just that - silly.
reapersms
Member
Member
Posts: 48
Joined: Fri Oct 04, 2019 10:10 am

Re: GPU programming

Post by reapersms »

bzt wrote: OpenGL
reapersms wrote:OpenGL was not a "video card" API, it was a 3D rendering API
True, but that 3D rendering was supposed to be executed on the video card, not on the CPU. (Yes, there were CPU-only GL implementations too, but that was merely a workaround, not a design goal. CPUs were never designed for efficient parallel matrix operations for example, they are and always were, general purpose.)
After some google digging, it appears the IRIS hardware did provide geometry operations, though there are hints that they weren't always there (explicit callouts in documentation that later era ones implemented the 'full' IRIS GL pipeline in hardware)

OpenGL was a cleaned up and generalized IRIS GL, and keep in mind that transition happened in '92.

The matrix library bits were commonly partially hardware, partially software. One particular family of consoles had essentially the OpenGL 1.2 pipeline implemented in hardware, as far as having equivalents of the GL_PROJECTION, GL_NORMAL, and GL_MODELVIEW matrices, but the matrix stack operations on them were all done in software.
bzt wrote:
reapersms wrote:integration with a system GUI was left to external bits.
...
API were influenced a fair bit by the demands of X11
Since X11 is a GUI, these two sentences contradicts each other. Also I don't remember that GL was dependent on Xlib ever (only the other way around, X11 incorporated GL). Even if X11 protocol had tokens for GL commands, I don't think that had any influence on how the functions were named and what arguments they needed, or what properties the GL state machine had.
After some thought, it would be more correct to say that OpenGL itself did not really care how you got the context, it just defined what you could do with the context. The particulars of how you acquire an OpenGL context are up to the external bits, in this case GLX, and (later) AGL and WGL. GLX being the X11 extension to provide an OpenGL context, it absolutely defined X11 tokens for passing GL state around.
bzt wrote:
reapersms wrote:The standard was not static, just a bit slow moving
What I meant, it did not had something like EFI's LocateProtocol. The GL standard (and therefore the API) lacked something like glGetAvailableExtensions() and glDynamicallyLinkExtension() functions. You had to install a new version of the library to get more features. You couldn't get those in run-time without re-compilation, static in this sense.
It absolutely did. glGetString(GL_EXTENSIONS) existed from 1.0. Acquiring the function pointers for any added entry points was provided by GLX, WGL, or AGL. You did have to know about what extensions you wanted to support when you wrote your program, but that is not a particuarly onerous or unusual requirement. I'm not sure what your complaint about having to install a new version of the library to get more features is about, if you're dynamically linking anyways, you have to do that for anything else. The app links against libGL.so or opengl32.dll, and a update of X or the graphics driver substitutes their own library in, or the rough equivalent (Windows ICD vs MCD stuff)

Generally that sort of thing would be largely outside the scope of the OpenGL part of the specification anyways, as the process and mechanism of dynamic linking is inherently system specific.
bzt wrote:
reapersms wrote:(ARB realities)
Ergo they were a bunch of people incompetent to properly design an API. You said it too :-)
They were a bunch of people with occasionally competing interests, that managed to come up with an API that still largely works nearly 30 years later. I'd call that human, not incompetent =P
bzt wrote:
reapersms wrote:The abstraction wasn't terrible at the time
Not at the time, no, but it wasn't future proof, and manufacturers couldn't extend it with their latest features for sure. You had to wait until the board voted the feature in (which, as you said, they did not do in several occasions). For example the X11 protocol is designed in a way that it can be extended without loosing compatibility or requiring a new version of Xlib.so to be installed. Therefore it can cover the features of new hardware without changing the interface. That I'd say is a good, non-static abstraction.
About the only vaguely correct notion in here is that X extensions went about things with additional DLLs for some of them, like XRender. The fact that OpenGL on X doesn't usually work like that is more an implementation detail.

Vendors could expose their shiny new features via extensions from 1.0 on, with no input from the ARB at all. They still do to this day. This isn't great for the application developer, as you end up with a bunch of conditional code blocks depending on whether NV_sliced_bread or AMD_chrome_plating existed in the extension string or not (yes, returning them all in one string was a tad shortsighted, but that got fist over a decade ago I think)

If some of those extensions are similar to each other, or seem to be good ideas, a vendor neutral EXT version would get sorted out, usually by a cross vendor group of 2-3 people. Where the ARB comes in is in deciding whether an EXT extension is a good enough (and politically feasible enough) to be a core feature, and if it is, it gets renamed ARB_whatever for querying in version 1.now, and is provided as a first class core function inn version 1.(now+1). It is still accessible via the extension and glXGetProcAddress, so applications that linked against 1.0 still work if the system updates to a 1.1 lib.
bzt wrote:
reapersms wrote:The GLSL situation was rather not great. It was designed a bit from on high to push towards a higher level abstraction, and get away from the vendor specific assembly interfaces. The goal was laudable, to let applications write one shader program
For that it's enough to standardize a bytecode. The hardware manufacturers can then convert the standard bytecode into their vendor specific instructions. Just as SPIR-V in Vulkan. No need for a complete compiler. I'd like to point out that IBM had experience with JIT bytecode compilation for over half a century (!) and Java already had a long history when OpenGL 3 came out... Again, the example were there, but Khronos members simply did not pay attention.
Yes it would have been nice if they'd gone with a common bytecode or IL at the time. The extensions that the ARB shader system grew from were assembly bytecode level, but they decided to put the language itself straight in.

Given what the hardware at the time did, I am a bit glad SPIR-V took until now. The bytecode at the time would have almost certainly been based around the concept of 4-wide SIMD, as vectorizing scalar code was not particularly reliable in any compiler I recall from the period. HLSL did go with a 4-wide bytecode, and many of the optimizations done to fit that make things more difficult these days -- either by confusing the output for shader debuggers, or being flat-out unoptimal now.
bzt wrote:
reapersms wrote:While that was going on, the hardware was changing drastically under the hood, which is where the abstraction started diverging from reality. OpenGL handled it a tad better than D3D did.
Yes. However the basic concepts remained, a vertex contains 3 coordinates just as 30 years ago. (I know VBOs were introduced as a big thing, huge innovation, revolution whatsoever, but in reality programs stored verteces in CPU memory in struct arrays anyway. The only difference is, a VBO is allocated on the GPU's memory.)
Everything related to vertex buffers has been a matter of eliminating redundant copies and memory traffic. 1.0 didn't have vertex arrays in core, you pushed everything through repeated calls to glVertex*(), glColor*(), glNormal*(), etc. The EXT_vertex_array extension let you give it a base pointer, stride, and format specification. You would then either make a call to glArrayElement() per vertex, or use one of the bulk calls like glDrawArrays() or glDrawElements(). It very likely was still looping over the array moving things around itself under the hood, but at least it cut out the function call and parameter traffic. In theory it could have done some things to optimize the path a bit, but practically it tended to not have enough information, and I think DrawArrays was optional. It was promoted to standard in 1.1.

Buffer objects came (much) later. GPU memory buffers were certainly a big part, but another part was the interface change to explicit map and unmap calls, letting the driver know when you could have modified the memory (and thus it would need to reupload, or reprocess if the format wasn't natively supported)
bzt wrote: Again, you are perfectly right, but let's not forget M$ was on Khronos, and if anybody, they know the best how to keep compatibility with ancient old APIs and introducing new innovation at the same time. The required knowledge was there, it should not have been such a big deal to cut ties with history, imho.
... API was unexpandable. ...
As I mentioned above, the API was expandable. What it was not particularly great at was removing features. The Compatability profile was the way to support old software, and the Core profile was to move forward. Also, MS was pretty terrible about that. Capability bits were a horrible thing in practice, and they made little or no effort at forward or backward compatability later.

Core was a bit overzealous about removing things, and the compatability profile was still used by a fair amount of new software.

The parts the hardware/driver vendors really wanted to eliminate were omitted from the OpenGL ES spec, and the explosion of mobile after that has helped a good deal at weaning people off of immediate mode, if only by encouraging utility libraries to provide it layered on top of arrays.
bzt wrote: Vulkan
reapersms wrote:That API was turned, almost directly, into Vulkan. It's primary purpose was performance, and providing a low overhead interface to reality.
That's the thing. It's does not have better performance.

An application using OpenGL shaders and another one using Vulkan has exactly the same (logical) steps to make before their triangle gets displayed on the screen. What really happened here is, that all so called "overhead" is now moved from driver space into application space. Which means IF the application developer knows what he's doing, and is a better programmer than the driver developers, then he can write a software which performs better. On the other hand, all the other application programmers (who are less talented/experienced than the driver developers) are going to write a much much worse, more buggy and slower software.
And they are the majority.

There's no way they would know, what you meant by "stuff" when you said "pCreateInfo: structure with all of the stuff it needs to create the actual object", and they are definitely the ones who are too lazy to read the documentation. They will just copy'n'paste a struct from some forums without knowing what it actually means.
And Vulkan is not aimed at them. It is not meant to be the be-all, end-all API for providing 3D rendering and GPU compute. OpenGL is still there, and still getting updated. It may be the case that the OpenGL implementation happens to be a straight translation layer to Vulkan, but the application developer does not care about that detail.

Inexperienced developers are going to write shitty code, built off of terrible tutorials, and there's nothing an API can really do to avoid that. Explicitly designing an API around the idea that you can beat that is a dead end.

It is not a matter of the application developer being better than the driver developers, it is a matter of which one has more accurate information about the workload at hand. The driver developer has to support all possible workloads, and track state accordingly. They may even be kind and allow programs that don't match the spec to run "correctly", but that approach does not scale terribly well.

A concrete example would be something like the number of simultaneous texture references. One API supports 128 texture slots per shader stage. While the vast majority of applications will never do something that absurd, the driver needs to support it somehow. Maybe it gets fancy, and starts off only tracking a more reasonable number like 32, and reallocating structures under the hood or changing to a different code path when it detects the application really using that many. Maybe it just has a single code path and throws memory at the issue. Vulkan lets the application decide exactly how many slots it cares to track, and if it really does need a lot, it can. If it only needs a lot for a few specific cases, but not the other 95% of the shaders and draw calls, it can do that as well, and it doesn't have to guess.

More than that, it is a fundamentally different architecture. Multithreading OpenGL is a complicated mess, as contexts are explicitly single threaded. Sharing resources between contexts is complicated at best, nightmarish at worst.

It has been possible to come quite close to the efficiency avaliable via Vulkan, through careful use of some OpenGL extensions, and rearranging the way your data is structured to make the driver happy. It tends to be a lot easier said than done, but a brief on the details is here
bzt wrote: [But window system integration had exactly the opposite effect. It should have been the windowing system that integrates Vulkan, and not the other way around. This is called poorly designed abstraction.
In retrospect, it's more accurate to describe the window system integration as less so being in the API itself, and more that Khronos provides an additional helper library to present a more uniform approach to acquiring a context across platforms. OpenGL ES did the same via the EGL API.
The value there is getting some consistency, vs the application developer picking one of SDL, Allegro, GLFW, glew, SFML, glut, or any of another dozen multimedia framework libraries
bzt wrote: I actually never had problems about a GL window not getting updated, well, ever. Although this is not a big deal, it's not that multiplatform programs are not full of "#ifdef __OS__" precompiler directives already. It's just one more.
I didn't mean the window itself didn't get updated, I meant the OS developer just decided to quit updating their implementations. MS did that for a long, long time, leaving the system standard GL stuck at 1.2. The driver could provide better, but the process to get that went something like
  • Create a window on the screen you want to display to[Win32]
  • Select a pixelformat, choosing from the options avaliable as of 1.2, creating the WGL context (you're still talking to the stock WGL) [WGL]
  • Create a GL context (this gets you talking to the actual driver WGL) [WGL]
  • Query the context for extensions and version [GL]
  • Get pointers to the WGL extension functions, and use them to find out whether it supports a better pixelformat than you put up with [WGL]
  • Destroy everything, and create a new window [GL, WGL, Win32]
  • Create your real window [Win32]
  • Using the extracted WGL extension functions, select the real pixelformat you want to use
  • Create a GL context
  • GetProcAddress everything, including everything you had queried before, because the function pointers returned are context specific.
That situation improved a bit around Windows 7 I think.
Apple has been just as poorly behaved, and ceased updating their system GL around 3.3 in favor of forcing everyone to use Metal.
bzt wrote:
reapersms wrote:API usage validation has gotten more complicated than is reasonable to deal with via glGetError and friends.
Exactly. Also API usage validation cannot substitute glGetError, as the latter can report run-time errors too, while a validator obviously can't.
(Imagine that there's everything fine with the API calls, the program works on 9 machines without problems, but on the 10th machine the video card is faulty or the driver is buggy. No way a validation layer can handle this - unfortunately not far fetched - scenario.) => bad design
I believe you misunderstand the validation layer. It is not (just) a compile time validation, it is extensive runtime checking of all arguments and state to at above or beyond the ability of glGetError. When you use it, all of your vkWhatever calls go to it, and it then forwards things on to the actual driver after checking. Error reporting tends to be via callbacks, logging, or debug prints depending on choice. D3D offers similar functionality via the debug and validated device creation flags. The checks these layers do can be quite thorough, almost annoyingly so at times (yes yes, I know the value of that matrix element is really small, and will probably be zero. Don't tell me about every single one of them please)

It is also closely related to that most useful of GPU debugging tools, the API frame capture.
bzt wrote:
reapersms wrote:the PC environment is a good deal messier than the ideal
That's true, however I have never heard of anybody complaining about using for example the POSIX API on a PC is different than using it on IRIX. Because that's the purpose of the standardized API, to hide the hardware and lower level mess.
My meaning here was that while there are a number of APIs that are as fast or faster than Vulkan, and easier to work with, they achieve that by being specific to particular environments. I.E. consoles.

On PC D3D 9..11 you need to ask the API to kindly create and manage all of your objects. You get back opaque pointers you get to pass around to things, and if you want any information about them you need to make more API calls or track it yourself in an object you wrap around that pointer. You have to pretend the various memory blocks under the hood don't really exist and you just have the object.

On PC Vulkan and D3D 12, you get a bit more control. You ask it nicely for large memory blocks, and can parcel those blocks out as you like. You may still have to go through API calls to convert offsets into real pointers, and probably still have to go through some API calls to query properties of things. There's still a lot of OS wrappings around things to deal with PCIe and such. Your GPU pointers probably aren't exactly the same as your CPU ones, and may move underneath you, hence the API calls.

Consoles are that, but usually you have a flat unified memory model, so you can ditch most of the address translation bits. Resources are exposed as small CPU visible headers referring to a gpu address, and you can modify them as you like. All of the mechanisms used to provide the shader abstraction are laid bare, and your offline shader compiler spits out actual GPU code instead of an IR that gets turned into real GPU code later.
bzt wrote: I really don't want to break down your enthusiasm, and I really did enjoy your historical comments. But the truth is, most people don't really care why a particular interface is a big pile of sh*t, what they care about is that IT IS a big pile of sh*t. No offense of any kind meant.
Recognizing what's bad about things can be useful, and certainly cathartic. There are many times where a coworker will ask 'But reapersms, why did they build something this way?' or 'Why did anyone ever build something this terrible and not just <insert simple solution>?'.

Many of those will indeed have an answer along the lines of 'Those guys were &*%@#(heads that hate us and just want to make development difficult', but that is usually for things where it is quite obviously something being driven by marketing or some non-technical business initiative. Other times it is clearly some research project that should have stayed in the oven longer.

Many many more of those, however, will be things like:
  • The hardware accepted these values directly at the time, and they needed no conversion/rearranging/etc
  • Throwing memory at the problem was faster at the time, as the CPU clock was only 2x the memory clock
  • It was built for a system that only did single textured alpha blended polygons, but could do them at 4X the max pixel rate of our current platforms
  • It was built for a system that had unusually fast memory and a middlin CPU, so throwing more memory at it helped
  • It was built for a system that had 2/3 the memory of its peers, so Great Efforts were put towards reducing memory footprint above all else
  • Twisting things this way results in memory access patterns that are about 20x better than the straightforward one
  • Yes this is terrible, but it doesn't become terrible until you throw two orders of magnitude more objects at it than anyone could conceive of at the time
  • Someone created a flexible, general purpose system to make creators lives easier, but left too much flexibility in so the rendering can't take advantage of things
  • Someone created a highly specific system to blast through at the speed of light, but if you need more than N things it becomes a nightmare
  • This feature was designed to solve this one particular problem in an elegant fashion, which it did. Unfortunately the special cases it added mean that it's 5-20x faster to just brute force it in this other way.
If one starts from a position of "This API is terrible because everyone who designed it was incompetent" then it is quite easy to start designing your Newer, Better, Faster API... and then find out the hard way why certain things were done that way.

It is usually better to give them the benefit of the doubt, but still investigate. Worst case, you find out yes, that bit was built by someone who shouldn't have been anywhere near it. Often, however, you find that they were dodging some nasty, but quite well hidden land mines. Alternatively, you find that it ends up being a decision between different tradeoffs.

Also, when dealing with existing standards, it can be possible to explore how to improve things while still running on the existing ones, like the AZDO presentation above. Those have the advantage of working on other peoples machines, without having to wait for your shiny new standard to get market share.
bzt wrote: Speaking only for myself, I'm interested in history, so please. I only hope others are interested too. I particularly liked what you wrote about Vulkan and memory allocation, it was interesting.
A problem with history is that there's a lot of it.
There's plenty I can provide more details on, just some of it is more off topic than others. Some options, if anyone has a preference:
  • Buffer management, command lists, and synchronization woes
  • Window system concerns
  • What the hardware register interfaces looked like across different eras
  • Early PC accelerators and the OpenGL/D3D API fights, roughly '96-'02
  • The fixed function -> shader transition, '02-'06
  • Attempts at moving beyond simple vertex and fragment programs, and why most of them end up questionably esoteric at best
  • Compute and GPGPU
I'll try to break some things up in the future so they aren't all multi-page walls of text, quote-response is a bit messy about that
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: GPU programming

Post by bzt »

Hi,

@reapersms: thank you for your constructive and very informative post! I don't have time to read it through now, but I'll. Good to see there's someone to have a decant conversation with! I like that you are focused on the topic, and you are using examples and references!
iansjack wrote:That's just childish, I'm afraid. You can't be so naive as to not recognize how pejorative that is.
I know zaval is childish. I've used sarcasm. And yes I know it has a pejorative tone to it, and that's exactly what I wanted. I would have used $ for G and FB if I could, because I have the same feelings towards them. It refers to their greedy company nature, not to their technical skills or knowledge. As I've already said.
iansjack wrote:If you've got time to add a meaningless "cheers" to every post (which just seems to be copying what Brendan used to do)
Is that what you think? I couldn't really care less about what Brendan did or did not use.

Cheers,
bzt
User avatar
iansjack
Member
Member
Posts: 4688
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: GPU programming

Post by iansjack »

Well, I guess if you want to appear to adults here as a silly kid that's your business.
klange
Member
Member
Posts: 679
Joined: Wed Mar 30, 2011 12:31 am
Libera.chat IRC: klange
Discord: klange

Re: GPU programming

Post by klange »

I've just about had enough of this "conversation". I'm locking this thread.
Locked