BogoMips

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
GLneo
Member
Member
Posts: 237
Joined: Wed Dec 20, 2006 7:56 pm

BogoMips

Post by GLneo »

hi all, I think I know the answers to these questions but I would just like to hear it form someone who would know better: what are bogomips, how do I calculate them, and how can I use them?

thx! 8)
User avatar
Brynet-Inc
Member
Member
Posts: 2426
Joined: Tue Oct 17, 2006 9:29 pm
Libera.chat IRC: brynet
Location: Canada
Contact:

Post by Brynet-Inc »

It was apparently introduced by Linus Torvalds.. :?

There is a page about it on Wikipedia, Summary reads as follows:
Wiki wrote:BogoMips (from "bogus" and MIPS) is an unscientific measurement of CPU speed made by the Linux kernel when it boots, to calibrate an internal busy-loop. An oft-quoted definition of the term is "the number of million times per second a processor can do absolutely nothing".

BogoMips can be used to see whether it is in the proper range for the particular processor, its clock frequency, and the potentially present CPU cache. It is not usable for performance comparison between different CPUs.
More here: http://en.wikipedia.org/wiki/BogoMips

Hope this helps... :)

EDIT: Some source code for researching it is available here... http://sweaglesw.com/~djwong/programs/bogomips/

EDIT2: It doesn't seem all that interesting.. :?
Image
Twitter: @canadianbryan. Award by smcerm, I stole it. Original was larger.
GLneo
Member
Member
Posts: 237
Joined: Wed Dec 20, 2006 7:56 pm

Post by GLneo »

thanks, the links helped, i made this:

Code: Select all

#define LPS_PREC 8

unsigned long loops_per_jiffy = (1<<12);
extern unsigned int HZ;  // should be 100
extern volatile unsigned long long total_ticks; // total tick count
extern void delay_loops(int loops);

void delay_loops(int loops)
{
    long i;
    for (i = loops; i >= 0 ; i--);
}

int calibrate_delay(void)  
{
    unsigned long ticks, loopbit;
    int lps_precision = LPS_PREC;

    loops_per_jiffy = (1<<12);

    while (loops_per_jiffy <<= 1) 
    {
        ticks = total_ticks;
        while (ticks == total_ticks);
        ticks = total_ticks;
        delay_loops(loops_per_jiffy);
        ticks = total_ticks - ticks;
        if (ticks)
            break;
    }
    loops_per_jiffy >>= 1;
    loopbit = loops_per_jiffy;
    while ( lps_precision-- && (loopbit >>= 1) ) 
    {
        loops_per_jiffy |= loopbit;
        ticks = total_ticks;
        while (ticks == total_ticks);
        ticks = total_ticks;
        delay_loops(loops_per_jiffy);
        if (total_ticks != ticks)
            loops_per_jiffy &= ~loopbit;
    }
    return loops_per_jiffy/(500000/HZ);
}
this code always returns zero ( loops_per_jiffy = 4096 at end ) but i cant tell whats wrong, is there an easyr way?

thx!
User avatar
Brynet-Inc
Member
Member
Posts: 2426
Joined: Tue Oct 17, 2006 9:29 pm
Libera.chat IRC: brynet
Location: Canada
Contact:

Post by Brynet-Inc »

Well I see you changed all "clock()" function calls to some apparently unset variable named "total_ticks"..

Maybe you should read the SUS:
SUS wrote:The clock() function shall return the implementation's best approximation to the processor time used by the process since the beginning of an implementation-defined era related only to the process invocation.
http://www.opengroup.org/onlinepubs/009 ... clock.html

:roll:

Changing it back to "clock()" at least to some extent prints some numbers out..

I also noticed HZ isn't assigned any value.. Should set it to 100 I guess..
Image
Twitter: @canadianbryan. Award by smcerm, I stole it. Original was larger.
GLneo
Member
Member
Posts: 237
Joined: Wed Dec 20, 2006 7:56 pm

Post by GLneo »

well HZ is 100, it is an extern variable, it is set else ware, as for the total_ticks, it is the number of clock ticks sense system start, i figured sense it is volatile it should just change and you wont have to keep calling clock().
User avatar
Brynet-Inc
Member
Member
Posts: 2426
Joined: Tue Oct 17, 2006 9:29 pm
Libera.chat IRC: brynet
Location: Canada
Contact:

Post by Brynet-Inc »

GLneo wrote:well HZ is 100, it is an extern variable, it is set else ware, as for the total_ticks, it is the number of clock ticks sense system start, i figured sense it is volatile it should just change and you wont have to keep calling clock().
Well, "clock()" doesn't return the time "since" the system started, I advise you to read the function description again :roll:

You shouldn't just "guess" what a function does when porting code, You should use some common sense man and look it up.

I'm off to bed! :lol:
Image
Twitter: @canadianbryan. Award by smcerm, I stole it. Original was larger.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Post by Brendan »

Hi,

When Linus first decided to use "Bogomips" (back on 80386) CPUs ran at a fixed frequency and a calibrated delay loop actually worked.

Modern CPUs don't run at a fixed frequency (due to power management, SMIs and hyper-threading) and so a calibrated delay loop doesn't work anymore.

Worst case is if the CPU is hot (running slow due to thermal throttling) when the delay loop is calibrated. In this case you might decide that a count of 10000 is equivelent to 100 nano-seconds, and then when the CPU cools down (running at normal speed again) you'd use 10000 when you need a 100 nano-second delay and actually get a delay that is much faster (like 50 nano-seconds).

To solve the problem Linux programmers didn't replace their broken delay loop with something that worked. Instead each programmer kept using it and increased the counts for their loops when they discovered it didn't work. This means that now (for e.g.) if a device needs a 5 ms delay a device driver programmer will probably use a 15 ms delay "just in case" and most the time it'll waste 10 ms for nothing.

If you must use a broken idea, at least put a dummy read from an I/O port in there like this:

Code: Select all

void delay_loops(int loops)
{
    long i;
    for (i = loops; i >= 0 ; i--) {
        in_byte(dummy_IO_port);
    }
}
This makes the loop more dependant on (fixed) bus timing and less dependant on (variable) CPU timing. It'll still be broken, but it'll be much less broken.

A better approach is to use a timer for timing, but Linux programmers don't think like that...


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

Brendan wrote:A better approach is to use a timer for timing, but Linux programmers don't think like that...
I don't like the Linux kernel maintainers either, but this is unjust. The BogoMips delay loop is intended to be used for extremly short timespans only, where "the time is too short and/or needs to be too exact for a non-busy-loop method of waiting" (quoted from the BogoMips FAQ).
Every good solution is obvious once you've found it.
User avatar
os64dev
Member
Member
Posts: 553
Joined: Sat Jan 27, 2007 3:21 pm
Location: Best, Netherlands

Post by os64dev »

Solar wrote:The BogoMips delay loop is intended to be used for extremly short timespans only, where "the time is too short and/or needs to be too exact for a non-busy-loop method of waiting" (quoted from the BogoMips FAQ).
Strangly enough with given the arguments above about power management, SMI, thermal throttling and the fact that the use of rdtsc is discouraged because of these reasons. It seems rather contradictory or odd to use the phrase ""the time is too short and/or needs to be too exact for a non-busy-loop method of waiting". BogoMips is not accurate. Yet the suggestion made by Brendan about using IO i find more acceptable. I have done some measurements with the outportb(0x80,0x80) suggested by linux and found out that on and AMD X2 it takes about .7 microsecconds and on an intel about 1 microsecond. So that is rather accurate what processes need a higher resolution?
Author of COBOS
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

os64dev wrote:Strangly enough with given the arguments above about power management, SMI, thermal throttling and the fact that the use of rdtsc is discouraged because of these reasons. It seems rather contradictory or odd to use the phrase...
That phrase was the reason why BogoMips were originally invented, and also the reason why "using a timer" isn't really the alternative.
I have done some measurements...
...and thus "calibrated your delay loop", only with a port operation instead of BogoMips.
Every good solution is obvious once you've found it.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Post by Brendan »

Hi,
Solar wrote:
os64dev wrote:Strangly enough with given the arguments above about power management, SMI, thermal throttling and the fact that the use of rdtsc is discouraged because of these reasons. It seems rather contradictory or odd to use the phrase...
That phrase was the reason why BogoMips were originally invented, and also the reason why "using a timer" isn't really the alternative.
As I said, when Bogomips was first invented CPUs ran at a fixed frequency, and Bogomips was a perfectly valid and correct method at that time.

Things have changed since.

By "using a timer" I didn't necessarily mean waiting for an IRQ. For example, by repeatedly reading the count from one of the PIT timers (e.g. the one connected to the speaker if it's not used for much else) you can get a resolution of about 838 ns.

For newer CPUs there's RDTSC, which works on some CPUs but not others (Intel's SpeedStep is the main thing that breaks it). It wouldn't be hard to detect if it's usable or not, and use it when it is usable.

Then there's the local APIC timer count (not the local APIC timer IRQ) and possibly HPET. It isn't hard to detect if these are present.

There's also no reason you can't combine all of this...

Code: Select all

    if( CPU_has_fixed_frequency_RDTSC ) {
        use_calibrated_RDTSC();
    } else if( CPU_has_localAPIC() ) {
        use_calibrated_LAPIC_count();
    } else if( HPET_is_supported() ) {
        use_HPET();
    } else if( CPU_is_fixed_cycles_per_second() ) {
        use_calibrated_LOOP();
    } else {
        use_PIT_channel2_count();
    }
The idea is to use the most accurate method that works reliably on any system...

Of course for SMP systems using HPET and the PIT timer count would cause problems with 2 or more CPUs trying to read the timer count at the same time. Fortunately SMP systems have local APICs, so this never becomes a problem.

BTW the main problem for SMP is if the thread doing the delay is preempted and restarted on a different CPU. There are ways around this (including AMD's new RDTSCP instruction).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

os64dev wrote: I have done some measurements with the outportb(0x80,0x80) suggested by linux and found out that on and AMD X2 it takes about .7 microsecconds and on an intel about 1 microsecond. So that is rather accurate what processes need a higher resolution?
The only nasty part of 0x80 is that it happens to be a nice dummy port, which Bochs nicely happens to report as invalid DMA port (because that's why it's nice dummy port) when you enable DMA debuggin in Bochs, which will make the debug messages for DMA a total flood and it'll be impossible to find the really relevant lines..

So if you need to avoid that, you could try port 0xED. It's supposedly almost as safe, is Bochs friendly, and ... mm.. well.. yeah. It's a good alternative.

For details about that stuff, you could look in http://oinkzwurgl.org/downloads/attic/p ... index.html
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
Post Reply