Redesigning video system
Re: Redesigning video system
Off topic, @rdos you seem to have the same views as "Dex" and you seem to be getting hit with the same stick.
I find you idea's, a great eye opener.
PS: Did you code this OS ? http://www.rdos.net/rdos/
I find you idea's, a great eye opener.
PS: Did you code this OS ? http://www.rdos.net/rdos/
Re: Redesigning video system
Yes. The web-page is quite dated, but I will update it as the next release of Open Watcom comes out, probably in March, 2012. By then I should have found all the bugs in the tool-chain that affect RDOS.guyfawkes wrote:PS: Did you code this OS ? http://www.rdos.net/rdos/
There are some important updates, most notably a stable SMP, AHCI and the ability to use C/C++ in kernel.
Re: Redesigning video system
Rendering is an interesting aspect of graphics, and IMO, this is better done in main memory with one or more CPU-cores. That is a scalable solution, and one that is not dependent on who supplies the video card / accelerator. While we will see a continuing growth in number of cores per processor, and using more cores to render is much more scalable than to let a video accelerator do most of it. Not to mention that it would work on any video card.
In my design, any type of rendering complexity can be constructed. The end result in each stage would be a bitmap with the graphics and a mask. This can be implemented on top of the sprite functionality, which I use in the planet demo. IOW, doing a rotating 3D demo is very much doable without a complex video accelerator approach. And if it cannot be executed in realtime, it is always possible to pre-calculate it and use blit to show it in realtime.
In my design, any type of rendering complexity can be constructed. The end result in each stage would be a bitmap with the graphics and a mask. This can be implemented on top of the sprite functionality, which I use in the planet demo. IOW, doing a rotating 3D demo is very much doable without a complex video accelerator approach. And if it cannot be executed in realtime, it is always possible to pre-calculate it and use blit to show it in realtime.
Re: Redesigning video system
1. Graphics accelerators are not 400 times faster than an ordinary CPU.berkus wrote:So you claim that CPU is becoming more and more like graphics accelerators (e.g. 400+ fairly dumb cores running in parallel). It may be the trend (once they also drop SMP), but why not use it now?
2. Doing it with a graphics accelerator is highly unportable, vendor specific, and out of the question for one-man projects.
It is fine if you are building hardware specifically for high-demanding graphics applications, but then you hand-pick a graphics accelerator and write a large software package to make the best of it.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Redesigning video system
So, the fact that GPUs have been far outpacing CPUs at compute capability increases for the last 5 or so years is completely fictional, then?rdos wrote:Rendering is an interesting aspect of graphics, and IMO, this is better done in main memory with one or more CPU-cores. That is a scalable solution, and one that is not dependent on who supplies the video card / accelerator. While we will see a continuing growth in number of cores per processor, and using more cores to render is much more scalable than to let a video accelerator do most of it. Not to mention that it would work on any video card.
Re: Redesigning video system
I've just done the redesign. The LFB buffer is now only used in output mode, all writes are 4 bytes, and aligned on 4 byte boundraries, and paging is setup for write-through. The graphic primitives operates on a memory copy of the LFB, and calls an update procedure whenever something has changed and the process has input focus.
This makes a huge difference on modern video cards. My AMD P6 with ATI Radeon graphics card now runs my test application incredibly fast. It's impossible to see what it does exactly as the screen in 900 x 1600 changes so fast.
I repeat the claim: There is no need for an accelerator with this board for any normal application. As long as the LFB is not read, and accesses are aligned, it is very fast.
This makes a huge difference on modern video cards. My AMD P6 with ATI Radeon graphics card now runs my test application incredibly fast. It's impossible to see what it does exactly as the screen in 900 x 1600 changes so fast.
I repeat the claim: There is no need for an accelerator with this board for any normal application. As long as the LFB is not read, and accesses are aligned, it is very fast.
Re: Redesigning video system
Your original claim was that rendering is slow on modern systems in software mode because of C (I can quote you on this, if you'd like).
That's just utter nonsense. Rendering is slow because we expect more from renderers, and CPUs cannot manage that sort of task while at the same time having any time left over for anything else, including processing data to render.
GPUs, on the other hand, use a large number of dumb cores to execute rendering tasks in parallel; not surprisingly, rendering is a task that is very easily made into a parallel task.
C has nothing to do with it; I challenge you to write Mass Effect's renderer in asm (with no API interaction) and have it work anywhere near acceptably.
That's just utter nonsense. Rendering is slow because we expect more from renderers, and CPUs cannot manage that sort of task while at the same time having any time left over for anything else, including processing data to render.
GPUs, on the other hand, use a large number of dumb cores to execute rendering tasks in parallel; not surprisingly, rendering is a task that is very easily made into a parallel task.
C has nothing to do with it; I challenge you to write Mass Effect's renderer in asm (with no API interaction) and have it work anywhere near acceptably.
Re: Redesigning video system
I decided against real-time rendering of fonts. It is just too slow. I let FreeType (which is written in C) do the rendering once, and then I save a 256-level bitmap for each used character & size, which I then can "render" (in asm) to retain the performance I had with bitmap fonts, but with the added advantage of anti-aliasing and any font height from a single font file.
Additionally, my vector primitives are sufficiently fast for typical resolutions and processors, so I don't need to accelerate them with hardware. The only thing is that I need a "shadow" buffer for the video card which is used to do the combining in order to never need to read video memory, which is incredibly slow on modern hardware. This buffer also serves as the "virtual display" for processes that don't have input focus (I'm not using a Window system, rather run apps in different consoles, potentially mixing different resolutions, and text mode).
Additionally, my vector primitives are sufficiently fast for typical resolutions and processors, so I don't need to accelerate them with hardware. The only thing is that I need a "shadow" buffer for the video card which is used to do the combining in order to never need to read video memory, which is incredibly slow on modern hardware. This buffer also serves as the "virtual display" for processes that don't have input focus (I'm not using a Window system, rather run apps in different consoles, potentially mixing different resolutions, and text mode).
Re: Redesigning video system
Which proves my point. So long as you do nothing that would be considered "advanced", such as compositing, of course a CPU is sufficient, though less than ideal. GPUs are designed to perform those advanced tasks. This isn't a fault of C (I can write a fairly fast software renderer in C if I need to), but a 'fault' in that CPUs are not designed to handle tasks such as advanced rendering. GPUs are.
Re: Redesigning video system
There are more issues beside speed when designing the graphics subsystem in an OS. By selecting to rely on video accelerators, you have also decided that you need to write a new driver for every new accelerator there will be, which is a major undertaking. You might alternatively claim that you only support one accelerator (like the one your computer happens to have). That is fine if your target are games, or high-end graphics. If your target is embedded devices, or standard applications, it is a questionable choice unless you have hardware designers develop drivers for you (then you need to be as successful as Windows or Linux).
I decided to implement a basic GUI which had a low-level interface, and that didn't have a Windowing interface, so I modelled my GUI after PicoGUI. On top of the low-level interface I then built controls, and various "widgets" so I now can create quite advanced interactive applications that run full screen. I have no time to develop video card accelerators for a lot of video cards, so I use LFB instead. I also dropped VGA, color palettes and all bit-plane modes as those are hopeless to design anything efficient with.
I decided to implement a basic GUI which had a low-level interface, and that didn't have a Windowing interface, so I modelled my GUI after PicoGUI. On top of the low-level interface I then built controls, and various "widgets" so I now can create quite advanced interactive applications that run full screen. I have no time to develop video card accelerators for a lot of video cards, so I use LFB instead. I also dropped VGA, color palettes and all bit-plane modes as those are hopeless to design anything efficient with.
Re: Redesigning video system
Embedded would be easier (far easier) than desktops; most use either a PowerVR chip, and a few use Tegra. They are rather consistent across the board.
You still haven't proven your earlier declaration, that the C programming language is the cause of rendering being too slow to do on CPUs today.
You still haven't proven your earlier declaration, that the C programming language is the cause of rendering being too slow to do on CPUs today.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Redesigning video system
Apart from VGA, all video documentation that does exist stinks anyway.
Re: Redesigning video system
Tell me about it. I'm working on a intel extreme driver. The documentation is I guess pretty complete in the sense that all registers and values are explained but I can not get my head around how the heck to work out timings for the resolution I want and exactly what steps I need in order to do something as "easy" as switching mode. I'm browsing both the Haiku and Linux version of the driver but they all get this info from i2c EDID which i don't have support for and I just wan't to do like 1024x768 or something manually. It is also painstaking to need to reboot my computer every time I wan't to actually try it out (since my os is far from self-hosting) however this is not the fault of the documentation of course. Has anyone had any luck with this?
Fudge - Simplicity, clarity and speed.
http://github.com/Jezze/fudge/
http://github.com/Jezze/fudge/
Re: Redesigning video system
I proved it when I had to move the glyph cache from C to assembly in order to get decent performance. It doesn't matter that the font library is coded in C, as it is only called once for each glyph (character & size), and after that a cached version is used. After initial caching, the system performs very well anyway. Just as well as when the glyph bitmaps were preconstructed in a fixed-size font.Ameise wrote:You still haven't proven your earlier declaration, that the C programming language is the cause of rendering being too slow to do on CPUs today.
Re: Redesigning video system
[joke]Do cache are just plain data or you can write cache in assembly in your OS?[/joke]
Anyway, I think it make virtually no difference in terms of performance, for the code to manipulate the glyph cache written in C or assembly,
I do similar things in my game for glyph texture cache(I don't use D3DText since it has its own issues, but it's different story), it can be just a hash/map with logistic counters/timestamps
However, for rendering glyph from cache, the blt-ing part should be written using best CPU function (SSE, etc), which can be done with asm or C compiler's equivalent intrinsic.
Anyway, I think it make virtually no difference in terms of performance, for the code to manipulate the glyph cache written in C or assembly,
I do similar things in my game for glyph texture cache(I don't use D3DText since it has its own issues, but it's different story), it can be just a hash/map with logistic counters/timestamps
However, for rendering glyph from cache, the blt-ing part should be written using best CPU function (SSE, etc), which can be done with asm or C compiler's equivalent intrinsic.