Kernel Development

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Kernel Development

Post by Candy »

Tim Robinson wrote: Nobody's thought of any advantages :).

Assembly code can be faster than C if you are an expert.

I have never seen anyone who can write better assembly code than a C compiler, except maybe for 5 lines at a time. Certainly a whole OS written in asm can expect to be worse-optimised than one written in C or C++.
well, *coughs*, I've been able to optimize some code fragment using special techniques to about 6x the gcc-O3 speed... does that count? I did use SSE, and the compiler didn't, does that matter? ;) (speed increase was from 921 seconds using gcc code to 172.2 seconds with my own, alpha blending 10000 frames at 1024*768 with a constant factor @ duron 1200).

As for the topic again, I'd advise you to use both. Each has their advantages (and disadvantages), and the only way you are going to get the best there is, is by using both. That means, C is useful for large programs, hard constructs and regular programming flows. ASM is useful for everything, but it's hard, tedious to get fast and lots of debugging. So, simple solution, write it in C as much as you can, convert the parts that actually cost a lot of time to assembly, and have a near-assembly speed result with a lot less development worries and annoyances.

That said, stretch the concept to other languages as well. You don't need C for a word processor do you?
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Kernel Development

Post by Pype.Clicker »

Candy wrote:
I did use SSE, and the compiler didn't, does that matter? ;) (speed increase was from 921 seconds using gcc code to 172.2 seconds with my own, alpha blending 10000 frames at 1024*768 with a constant factor @ duron 1200).
Well, for vectorized operations like matrix multiplications, alpha blending, fast fourrier transform and stuff, using the latest-SIMD-extension of the procesor (which is usually available in assembler quite quickly, but hard to use for a C compiler because it does not naturally fit the language) and doing it in assembly often pays ...
But i would not recommend to put such code in a kernel (not if it's not conditionnal, at least) because you cannot expect every CPU to have those extensions :-/

Those are most likely to be libraries that you dynamically bind to the graphic server (or to the kernel if you're doing an in-kernel GUI server) once you've checked the CPU support it.
And i would recommand that you offer a C-like interface to those function -- rather than writing the whole GUI sub-system in ASM
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Kernel Development

Post by Solar »

Candy wrote: I did use SSE, and the compiler didn't, does that matter? ;)
Basically, as soon as SIMD units are concerned (and the data in question actually encourages SIMD), there's hardly a way not to be better than the C compiler - because there's no way to express what you want using plain C. You might use libraries (which in turn are most likely ASM), or you might use compiler extensions, but "C" doesn't have SIMD.
So, simple solution, write it in C as much as you can, convert the parts that actually cost a lot of time to assembly, and have a near-assembly speed result with a lot less development worries and annoyances.
To quote D. Knuth (again), "premature optimization is the root of all evil". Candy has good advice here: Write your code in C (or whatever), profile it to see where it really spends its time, optimize the C code, profile it again, and if at the end of sane C optimizing your code is still sub-par, then find the time-eating subroutine and code it in ASM.

Some call it the 80-20 rule, some the 90-10 rule, but it's true that most of the clock cycles are spend in just a few central functions. If your test data is good (meaning, representing real life conditions), above approach will give you the best return-on-investment, and basically that's what CS is all about, isn't it? ;-)
Every good solution is obvious once you've found it.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Kernel Development

Post by Pype.Clicker »

btw, some very paying optimization comes from using the right algorithm/technique rather than playing with instructions encoding. Look at the problem of comparing a key within a set (look a file in a directory, whatever)... of course you can use an asm function that will be more efficient than C for the comparison, but you can also maintain your directory entries in such a way that you can apply a hash, a dichotomic search or a radix tree to find the file quicker ...

And such complex algorithms are hard to write/maintain in ASM ... remember your /dev/brain has limited concentration quota, so except for ultra-organized people, writing very complex things in ASM tends to be unpractical (if not impossible ;)
Post Reply