Redesigning video system

Ameise · Post by **Ameise** » Thu Mar 15, 2012 11:55 am

berkus wrote:
Ameise wrote:Embedded would be easier (far easier) than desktops; most use either a PowerVR chip
Yep, with no available documentation. FAR easier.

No documentation, but hardly less than most nVidia or ATI chips - at least PVR is consistent. One hardly documented chip vs 100.

rdos wrote:I proved it when I had to move the glyph cache from C to assembly in order to get decent performance.

All that rather apocryphal story proves is that your ASM was faster than your C. That doesn't prove anything about C. I assure you that I could make an asm program perform slower than its C analog.

Brendan · Post by **Brendan** » Thu Mar 15, 2012 12:47 pm

HI,

rdos wrote:It doesn't matter that the font library is coded in C, as it is only called once for each glyph (character & size), and after that a cached version is used.

Does that actually work correctly?

For example, there should be space between the letters "WT" but the letters "LT" should overlap a little (or at least have a lot smaller gap between them); and then there's a whole layer of ligatures that'd get messed up for some languages (e.g. Arabic).

Cheers,

Brendan

Owen · Post by **Owen** » Thu Mar 15, 2012 3:15 pm

berkus wrote:
Ameise wrote:Embedded would be easier (far easier) than desktops; most use either a PowerVR chip
Yep, with no available documentation. FAR easier.

Lets be fair, has anyone ever approached one of the SOC GPU vendors and asked to see their documentation under NDA? I'm pretty sure that, especially for someone like rdos (who apparently has production deployments out in the field) it wouldn't be *that* hard if you had a reasonable OS and presented yourself correctly.

rdos · Post by **rdos** » Thu Mar 15, 2012 4:18 pm

Ameise wrote:All that rather apocryphal story proves is that your ASM was faster than your C. That doesn't prove anything about C. I assure you that I could make an asm program perform slower than its C analog.

To be fair, it was the C compiler that couldn't do the 4-level lookup without loading the same selector four times. Something that was not needed in the assembler version, as the code knew it was the same selector. There where a couple of other issues as well where the C compiler produced slower code, but the inability to mix flat and segmented pointers in the C compiler was the main reason it lost big time.

Additionally, the GetStringMetrics function, which in the interface returns two values could not even be implemented in C as C cannot return more than one value. In order not to need two segment register loads in that function, I did a trick so C could return both values in a register pair, decoding it in the assembly stub.

Owen · Post by **Owen** » Thu Mar 15, 2012 4:27 pm

I wish to add another comment on this point:

rdos wrote:Rendering is an interesting aspect of graphics, and IMO, this is better done in main memory with one or more CPU-cores. That is a scalable solution, and one that is not dependent on who supplies the video card / accelerator.

Right, because GPUs are not at all scaling in performance significantly better than CPUs... and because my 5 year old GPU doesn't at all still beat the pants off of my brand new CPU for graphics, whether using the cores or IGP on said CPU for rendering, in spite of the fact that that my CPU cost nearly twice as much and is built on a process with a feature size 1/9th that of my GPU (32nm vs 90nm, remember process sizes are linear (widths) while feature sizes are quadratic (areas))

And because you can't at all scale GPUs in the same way as you can CPUs (i.e. upgrade them, or scale out onto multiple cards, multi-chip cards, etc) - and scale them out more efficiently than you can CPUs at that (90% performance improvements from scaling out aren't at all that uncommon, though it is, of course, workload dependent). In fact, you can scale out GPUs more economically - because a motherboard with 2xPCI-E slots is far cheaper than one with 2xCPU slots (plus the price premium on the CPU).

rdos wrote:
Ameise wrote:All that rather apocryphal story proves is that your ASM was faster than your C. That doesn't prove anything about C. I assure you that I could make an asm program perform slower than its C analog.
To be fair, it was the C compiler that couldn't do the 4-level lookup without loading the same selector four times. Something that was not needed in the assembler version, as the code knew it was the same selector. There where a couple of other issues as well where the C compiler produced slower code, but the inability to mix flat and segmented pointers in the C compiler was the main reason it lost big time.

Additionally, the GetStringMetrics function, which in the interface returns two values could not even be implemented in C as C cannot return more than one value. In order not to need two segment register loads in that function, I did a trick so C could return both values in a register pair, decoding it in the assembly stub.

So, in other words you're using a a poorly optimizing C compiler using an all but obsolete memory model. And the idea you can't return more than one value from a C function is laughable - you just need to pack the values into a structure, at which point how the structure is returned depends upon the compilers ABI (but again this seems hobbled by an obsolete memory model).

rdos · Post by **rdos** » Thu Mar 15, 2012 4:31 pm

Owen wrote:Lets be fair, has anyone ever approached one of the SOC GPU vendors and asked to see their documentation under NDA? I'm pretty sure that, especially for someone like rdos (who apparently has production deployments out in the field) it wouldn't be *that* hard if you had a reasonable OS and presented yourself correctly.

I've not yet succeeded with anything that requires a NDA. I tried with the network-chip in my portable computer (Marvel), but even if I could argue that they would be able to sell a couple of 100 chips, they never even sent the NDA to me. OTOH, a guy at a company that sells touch controllers (PenMount) have sent me several specifications for different chips they have. Another method that works is if you have a company that wants to sell computers for a commersial project, and you need a few hundred or more, then you can either put pressure on them to get the specifications, or find other products that have specifications available. I got an GPIO specification and speaker connection schematic that way.

rdos · Post by **rdos** » Thu Mar 15, 2012 4:51 pm

Owen wrote:So, in other words you're using a a poorly optimizing C compiler using an all but obsolete memory model. And the idea you can't return more than one value from a C function is laughable - you just need to pack the values into a structure, at which point how the structure is returned depends upon the compilers ABI (but again this seems hobbled by an obsolete memory model).

You can of course use pointers to return additional values, but that is where the additional segment register loads come in. If I would let the C routine return the values with two different pointers, it would need to load both 48-bit pointers (and thus two selectors), which would be hopelessly slow in comparison to letting an assembler routine return the results in two registers. Additionally, it is not permissable to use pointers in the syscall interface to return values. All values must be returned with registers. The only time it is allowed to use pointers is when passing addresses of data buffers, and these pointers must always be loaded in segment registers so kernel nows the memory is accessible to userspace, and not a way to access kernel memory from userspace. In fact, device-drivers does not do any pointer validation at all.

I only use C in kernel because I want to use free code, in this case FreeType. This C code will need to work with 48-bit pointers as this is the required memory-model that every device-driver in RDOS needs to comply with. Additionally, every device-driver must allocate all data-structures using selectors in a debug-mode, but are allowed to use a flat selector in production releases if it can be estimated that it would use too many selectors in a typical configuration. These rules are the primary reason why RDOS is, and has been, a stable OS for a long time.

Owen · Post by **Owen** » Thu Mar 15, 2012 6:02 pm

...erm...

Code: Select all

struct MyStruct {
   int valueA;
   float valueB;
};

struct MyStruct myMethod(void);

is perfectly valid C...

Solar · Post by **Solar** » Fri Mar 16, 2012 1:28 am

Don't argue the point. He will defend his segmented memory model no matter what, and claim that it's the fault of the language designers and compiler manufacturers and the CPU manufacturers for not supporting it properly.

@rdos:

Don't blame C, or the compiler, or the CPU. Just admit that the memory model you have chosen forces you to do things that way. Stop pointing at others for the blame.

Yoda · Post by **Yoda** » Fri Mar 16, 2012 5:36 am

Ameise wrote:All that rather apocryphal story proves is that your ASM was faster than your C. That doesn't prove anything about C. I assure you that I could make an asm program perform slower than its C analog.

Yet, in general, manual optimization of assembly output, produced from C, may gain a good increase in speed.

rdos · Post by **rdos** » Fri Mar 16, 2012 8:37 am

I don't need to defend anything. Claims that segmented memory models are inferior, that I think hardware and compiler manufacturers need to optimize their products to my wishes have no substance.

I did the choice myself, and I don't regret it, I won't switch to a flat memory model, and I currently have all the tools I need to develop as much as I want in C in the kernel. In fact, I've put considerable time & effort into adapting OpenWatcom to my needs, and since it works well, I'm not likely to do it again in a new memory model or compiler. So FOR ME C is just convinent, not faster, because no C compiler can efficiently handle segmentation.

What most flat memory model advocates forget all the time is that their design choices have drawbacks as well. I posted the most important earlier in the thread: There is no debug-switch available to tightly check invalid pointers in real-time. Such tests can mean the difference between a stable OS and an instable OS. Not to mention all the time required to find overwrites in flat memory models. I still have a few in our terminal application that I cannot locate (it uses a flat memory model).

qw · Post by qw » Fri Mar 16, 2012 8:56 am

The thing is: the segmented memory model is out of fashion, and compiler writers stopped supporting it. Of course you cannot force a compiler writer to add a feature he doesn't want to add, but on the other hand, what memory model to use is a design desicion to be made by the developer (rdos) and not the compiler writer. So I can see where his argument is coming from.

rdos · Post by **rdos** » Fri Mar 16, 2012 9:10 am

Hobbes wrote:The thing is: the segmented memory model is out of fashion, and compiler writers stopped supporting it. Of course you cannot force a compiler writer to add a feature he doesn't want to add, but on the other hand, what memory model to use is a design desicion to be made by the developer (rdos) and not the compiler writer. So I can see where his argument is coming from.

I don't need compiler developper's to support my design choices. I have write access to OpenWatcom, and I even have the whole environment in RDOS source-tree as well, so I don't need anybody to do anything. The current situation is more like RDOS is self-hosted in regards to compiler, and any need to change the compiler because of requirements in RDOS can be done regardless of what compiler writers do. I also anticipate that x86 CPU vendors will not discontinue 32-bit mode any time soon. Therefore, I don't need to worry about anything in this regard. What I need to do is to write device-drivers for new hardware.

The thing actually is that development on segmented compilers already stopped with 16-bits, and nobody actually seemed to grasp that the problems of 16-bit segmented memory models (like huge pointers), simply doesn't exist in 32-bit models. Fortunately, OS/2 originally did support 32-bit segmented memory model, and Watcom's compiler had support for it. That is why the compiler already had the basics, even if some recent add-ons to 32-bit code have assumed segmentation does not to exist, which I needed to update.

Owen · Post by **Owen** » Fri Mar 16, 2012 10:42 am

rdos wrote:What most flat memory model advocates forget all the time is that their design choices have drawbacks as well. I posted the most important earlier in the thread: There is no debug-switch available to tightly check invalid pointers in real-time. Such tests can mean the difference between a stable OS and an instable OS. Not to mention all the time required to find overwrites in flat memory models. I still have a few in our terminal application that I cannot locate (it uses a flat memory model).

We do. It's called valgrind. It checks more than segmentation can check (e.g. uninitialized reads)

rdos · Post by **rdos** » Fri Mar 16, 2012 10:52 am

Owen wrote:We do. It's called valgrind. It checks more than segmentation can check (e.g. uninitialized reads)

Not a chance. I've tried all that, and it doesn't work. My memory corruption error in the application is still there. What I eventually would want to do is to compile our application for a 32-bit compact memory model, and eliminating all these issues once and for all. All I need is a new executable format, which takes some effort to create.

The reason why valgrind will not solve it is that it cannot guard against a non-existent memory protection system (page protection doesn't solve my issue above, simply because the error disappears when memory is allocated page-based instead of byte-based. If any instruction can potentially overwrite anything else, there is no software that will solve it. Only hardware can solve that, and the hardware support is called segmentation.

OSDev.org

Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system

Re: Redesigning video system