Finding (free) PIDs

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
inx
Member
Member
Posts: 142
Joined: Wed Mar 05, 2008 12:52 am

Post by inx »

Thank you all for the useful advice. I've implemented a few of the suggestions, and everything's happily working. :)

After I got all of that done, my previously-working multitasking code died horribly. It would load the tasks, switch to user mode, and run just fine until the first timer interrupt, at which time it would promptly triple fault with a page fault from inside my task switching code.

I lost some sleep poking and prodding at it from every angle I could think of, even though everything *looked* correct to me. I took a break about an hour ago, then came back and decided to just give up for now, disable task switching, and start implementing my syscall interface. Lo and behold, as soon as I started scrolling down through my isr.S file, I found it...

Yesterday, when I was adding the first ISR stub for syscalls, I apparently accidentally hit x in vi. I had deleted a '$', and so exceptions worked, software interrupts worked, but IRQs fubar'ed the stack without giving me any compile warnings. :lol: Just thought you all might find it humorous.
User avatar
Wave
Member
Member
Posts: 50
Joined: Sun Jan 20, 2008 5:51 am

Post by Wave »

Thanks for the explanation, bewing.
Conway's Law: If you have four groups working on a compiler, you'll get a 4-pass compiler.
Melvin Conway
skyking
Member
Member
Posts: 174
Joined: Sun Jan 06, 2008 8:41 am

Post by skyking »

bewing wrote:
Wave wrote: Can you explain to me why that is? I always thought counting up would be faster because add is faster than sub, but the x86 has the loop instruction that subs automatically. It is however slower than sub cmp jmp anyways.
All CPUs have the same basic set of flags to test on, when it comes to arithmetic operations. Carry, Zero, Sign, Overflow. Immediately after an arithmetic opcode, you can sometimes test them, and get a useful result.
Not true. MIPS fx does not do it that way.
bewing wrote:
skyking wrote: you don't save that clock cycle on a modern compiler
It is not always wise to count on the compiler to save your @$$ from crappy coding. And in any case, programmers often DO use the index inside the loop, and then the compiler won't touch the code. But you can still code the loop to count up or down.
It is also not wise to look at "how fast some current CPU performs this opcode vs. that one" -- all opcodes are tending toward the limit of 1 opcode per calculation cycle. It is wisest to code for that.
I already assumed that we're running at one clock cycle per second. However if you really care about saving one clock cycle per loop you should start by not using a crappy compiler, and normally you'd be able to squeeze some more clock cycles by using assembler.

Btw I tested to use your example from C source:

Code: Select all

void loop(int* p, int n, int x)
{
	int j, k;

	for( j=0; j<n; j++ )
	{
		p[5*j] = x;
	}
}
versus:

Code: Select all

void loop(int* p, int n, int x)
{
	int j = n;
	p = p+5*n;

	do
	{
		*p = x;
		p-=5;
	} while( --j>=0 );
}
And run with n being 25*1024*1024 and got 400ms vs 391ms. You're example saved 2% execution time. Congratulations but didn't you promise a lot more?
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

You're example saved 2% execution time.
If I saved 2% of our company's product's execution time, I would get an absolutely massive bonus. No joke.
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Post by bewing »

*sigh* The particular example I happened to code, happened to use a memory store operation inside the loop. As you have now discovered, memory store operations (on average) do not happen in one clock cycle.
So the loops were not really 5 clock cycles vs. four. In fact, you seem to have determined that on average, it was 50 vs. 49. So this particular memory store operation seems to take about 45 times as long as a register operation on average.
The savings depend on the tightness of the loop, as I said. If the loop that I had ccded had only used register operations, and no memory operations at all, then the saving would have more closely approached 20%.

And Wave -- if you are truly convinced that addition is (and always will be) faster than subtraction, you can count UP from -N to 0, of course. You'd have to put a lot of explanations in your code, as to why you were doing such a crazy thing, of course. :wink:
User avatar
piranha
Member
Member
Posts: 1391
Joined: Thu Dec 21, 2006 7:42 pm
Location: Unknown. Momentum is pretty certain, however.
Contact:

Post by piranha »

So.....if you save the time, then why keep arguing over this?

Go with the faster way. Simple. And, again I must ask (as I'm a little confused), who cares? 2% time saved. Woopie. So you have a program that takes 36 hours to complete, and instead is takes (rounded) 35.3 hours. Does it matter? I'd be long gone before it even got to 10 min. If it took long enough for me to notice a difference, I wouldn't run the program.

-JL
SeaOS: Adding VT-x, networking, and ARM support
dbittman on IRC, @danielbittman on twitter
https://dbittman.github.io
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Post by bewing »

piranha wrote: So.....if you save the time, then why keep arguing over this?
I'm not arguing; just showing the young'uns one tiny simple optimization.
piranha wrote: And, again I must ask (as I'm a little confused), who cares? 2% time saved. Woopie.
Because: This was just one easy small optimization. If I have 24 more of these tricks (and I do), then my code runs 50% faster. Does that begin to interest you?
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

skyking wrote:Btw I tested to use your example from C source:
I'll swallow any criticism at your method (which I consider inapropriate), and instead show mine.

Code: Select all

#include <stdlib.h>
#include <assert.h>

void loop_init( int * p, int n, int x )
{
    size_t i;
    for ( i = 0; i < n; ++i )
    {
        *p = x;
        p += 5;
    }
}

void loop_up( int * p, int n, int x )
{
    size_t i;
    for ( i = 0; i < n; ++i )
    {
        *p = x;
        p += 5;
    }
}

void loop_down( int * p, int n, int x )
{
    size_t i;
    for ( i = n; i > 0; --i )
    {
        *p = x;
        p += 5;
    }
}

int main()
{
    size_t limit = 25*1024*1024;
    int * p = (int *)malloc( limit * sizeof( int ) );
    assert( p != NULL );
    int i;
    loop_init( p, limit, 42 );
    for ( i = 0; i < 1000; ++i )
        loop_up( p, limit, 42 );
    for ( i = 0; i < 1000; ++i )
        loop_down( p, limit, 42 );
    return 0;
}
Using GCC 4.1.2 on a Linux 2.6.23:

Code: Select all

$ gcc -pg test.c
$ ./a.out
$ gprof a.out | head
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 50.28     81.94   81.94      1000    81.94    81.94  loop_up
 50.10    163.58   81.65      1000    81.65    81.65  loop_down
...
More runs showed that:
  • Optimization settings (-O0, -O2) had no measurable influence on results,
  • loop_down() was faster than loop_up() by anywhere from 0.12% to 0.85%.
That's an efficiency increase well below 1%.

Disadvantages of counting down is the effort you would have to put into persuading your co-developers (on a pretty slim business case to start with), and - I am not making this up! - the two minutes I spent debugging loop_down() while loop_up() worked right from the start (loop_down() means --i, not ++i, dumbass 8) ).

In an environment like PDCLib, where efficiency really counts, I would actually be counting down - but not because of the odd cycle it saves, but because I could count down n directly instead of creating a local variable on the stack... so tastes differ. I don't think either of the two "tricks" makes any difference in the real world.
Every good solution is obvious once you've found it.
skyking
Member
Member
Posts: 174
Joined: Sun Jan 06, 2008 8:41 am

Post by skyking »

bewing wrote:*sigh* The particular example I happened to code, happened to use a memory store operation inside the loop. As you have now discovered, memory store operations (on average) do not happen in one clock cycle.
So the loops were not really 5 clock cycles vs. four. In fact, you seem to have determined that on average, it was 50 vs. 49. So this particular memory store operation seems to take about 45 times as long as a register operation on average.
The savings depend on the tightness of the loop, as I said. If the loop that I had ccded had only used register operations, and no memory operations at all, then the saving would have more closely approached 20%.
But what meaningful can you do with a loop that uses only register variables, are using the loop counter and is not possible to do in constant time. The later is far more meaningful than chopping of 1-2% in a loop.
And Wave -- if you are truly convinced that addition is (and always will be) faster than subtraction, you can count UP from -N to 0, of course. You'd have to put a lot of explanations in your code, as to why you were doing such a crazy thing, of course. :wink:
Then you'd need to do that explaination if you count from N to 0 as well.

The real problem with the optimization you suggested is that you did not do measurments of the performance. Rule number one should be to do this before you try to optimize and after as well. I've seen a lot of "optimizations" that increase the execution time.
Post Reply