Page 1 of 1
BogoMips
Posted: Tue Apr 24, 2007 2:49 pm
by GLneo
hi all, I think I know the answers to these questions but I would just like to hear it form someone who would know better: what are bogomips, how do I calculate them, and how can I use them?
thx!
Posted: Tue Apr 24, 2007 2:51 pm
by Brynet-Inc
It was apparently introduced by Linus Torvalds..
There is a page about it on Wikipedia, Summary reads as follows:
Wiki wrote:BogoMips (from "bogus" and MIPS) is an unscientific measurement of CPU speed made by the Linux kernel when it boots, to calibrate an internal busy-loop. An oft-quoted definition of the term is "the number of million times per second a processor can do absolutely nothing".
BogoMips can be used to see whether it is in the proper range for the particular processor, its clock frequency, and the potentially present CPU cache. It is not usable for performance comparison between different CPUs.
More here:
http://en.wikipedia.org/wiki/BogoMips
Hope this helps...
EDIT: Some source code for researching it is available here...
http://sweaglesw.com/~djwong/programs/bogomips/
EDIT2: It doesn't seem all that interesting..
Posted: Tue Apr 24, 2007 4:38 pm
by GLneo
thanks, the links helped, i made this:
Code: Select all
#define LPS_PREC 8
unsigned long loops_per_jiffy = (1<<12);
extern unsigned int HZ; // should be 100
extern volatile unsigned long long total_ticks; // total tick count
extern void delay_loops(int loops);
void delay_loops(int loops)
{
long i;
for (i = loops; i >= 0 ; i--);
}
int calibrate_delay(void)
{
unsigned long ticks, loopbit;
int lps_precision = LPS_PREC;
loops_per_jiffy = (1<<12);
while (loops_per_jiffy <<= 1)
{
ticks = total_ticks;
while (ticks == total_ticks);
ticks = total_ticks;
delay_loops(loops_per_jiffy);
ticks = total_ticks - ticks;
if (ticks)
break;
}
loops_per_jiffy >>= 1;
loopbit = loops_per_jiffy;
while ( lps_precision-- && (loopbit >>= 1) )
{
loops_per_jiffy |= loopbit;
ticks = total_ticks;
while (ticks == total_ticks);
ticks = total_ticks;
delay_loops(loops_per_jiffy);
if (total_ticks != ticks)
loops_per_jiffy &= ~loopbit;
}
return loops_per_jiffy/(500000/HZ);
}
this code always returns zero ( loops_per_jiffy = 4096 at end ) but i cant tell whats wrong, is there an easyr way?
thx!
Posted: Tue Apr 24, 2007 5:02 pm
by Brynet-Inc
Well I see you changed all "clock()" function calls to some apparently unset variable named "total_ticks"..
Maybe you should read the SUS:
SUS wrote:The clock() function shall return the implementation's best approximation to the processor time used by the process since the beginning of an implementation-defined era related only to the process invocation.
http://www.opengroup.org/onlinepubs/009 ... clock.html
Changing it back to "clock()" at least to some extent prints some numbers out..
I also noticed HZ isn't assigned any value.. Should set it to 100 I guess..
Posted: Tue Apr 24, 2007 8:51 pm
by GLneo
well HZ is 100, it is an extern variable, it is set else ware, as for the total_ticks, it is the number of clock ticks sense system start, i figured sense it is volatile it should just change and you wont have to keep calling clock().
Posted: Tue Apr 24, 2007 9:19 pm
by Brynet-Inc
GLneo wrote:well HZ is 100, it is an extern variable, it is set else ware, as for the total_ticks, it is the number of clock ticks sense system start, i figured sense it is volatile it should just change and you wont have to keep calling clock().
Well, "clock()" doesn't return the time "
since" the system started, I advise you to read the function description again
You shouldn't just "guess" what a function does when porting code, You should use some common sense man and look it up.
I'm off to bed!
Posted: Wed Apr 25, 2007 2:49 am
by Brendan
Hi,
When Linus first decided to use "Bogomips" (back on 80386) CPUs ran at a fixed frequency and a calibrated delay loop actually worked.
Modern CPUs don't run at a fixed frequency (due to power management, SMIs and hyper-threading) and so a calibrated delay loop doesn't work anymore.
Worst case is if the CPU is hot (running slow due to thermal throttling) when the delay loop is calibrated. In this case you might decide that a count of 10000 is equivelent to 100 nano-seconds, and then when the CPU cools down (running at normal speed again) you'd use 10000 when you need a 100 nano-second delay and actually get a delay that is much faster (like 50 nano-seconds).
To solve the problem Linux programmers didn't replace their broken delay loop with something that worked. Instead each programmer kept using it and increased the counts for their loops when they discovered it didn't work. This means that now (for e.g.) if a device needs a 5 ms delay a device driver programmer will probably use a 15 ms delay "just in case" and most the time it'll waste 10 ms for nothing.
If you must use a broken idea, at least put a dummy read from an I/O port in there like this:
Code: Select all
void delay_loops(int loops)
{
long i;
for (i = loops; i >= 0 ; i--) {
in_byte(dummy_IO_port);
}
}
This makes the loop more dependant on (fixed) bus timing and less dependant on (variable) CPU timing. It'll still be broken, but it'll be much less broken.
A better approach is to use a timer for timing, but Linux programmers don't think like that...
Cheers,
Brendan
Posted: Wed Apr 25, 2007 6:02 am
by Solar
Brendan wrote:A better approach is to use a timer for timing, but Linux programmers don't think like that...
I don't like the Linux kernel maintainers either, but this is unjust. The BogoMips delay loop is intended to be used for extremly short timespans only, where "the time is too short and/or needs to be too exact for a non-busy-loop method of waiting" (quoted from the BogoMips FAQ).
Posted: Wed Apr 25, 2007 7:06 am
by os64dev
Solar wrote:The BogoMips delay loop is intended to be used for extremly short timespans only, where "the time is too short and/or needs to be too exact for a non-busy-loop method of waiting" (quoted from the BogoMips FAQ).
Strangly enough with given the arguments above about power management, SMI, thermal throttling and the fact that the use of rdtsc is discouraged because of these reasons. It seems rather contradictory or odd to use the phrase ""the time is too short and/or needs to be too exact for a non-busy-loop method of waiting". BogoMips is not accurate. Yet the suggestion made by Brendan about using IO i find more acceptable. I have done some measurements with the outportb(0x80,0x80) suggested by linux and found out that on and AMD X2 it takes about .7 microsecconds and on an intel about 1 microsecond. So that is rather accurate what processes need a higher resolution?
Posted: Wed Apr 25, 2007 7:31 am
by Solar
os64dev wrote:Strangly enough with given the arguments above about power management, SMI, thermal throttling and the fact that the use of rdtsc is discouraged because of these reasons. It seems rather contradictory or odd to use the phrase...
That phrase was the reason why BogoMips were originally invented, and also the reason why "using a timer" isn't really the alternative.
I have done some measurements...
...and thus "calibrated your delay loop", only with a port operation instead of BogoMips.
Posted: Wed Apr 25, 2007 8:11 am
by Brendan
Hi,
Solar wrote:os64dev wrote:Strangly enough with given the arguments above about power management, SMI, thermal throttling and the fact that the use of rdtsc is discouraged because of these reasons. It seems rather contradictory or odd to use the phrase...
That phrase was the reason why BogoMips were originally invented, and also the reason why "using a timer" isn't really the alternative.
As I said, when Bogomips was first invented CPUs ran at a fixed frequency, and Bogomips was a perfectly valid and correct method at that time.
Things have changed since.
By "using a timer" I didn't necessarily mean waiting for an IRQ. For example, by repeatedly reading the count from one of the PIT timers (e.g. the one connected to the speaker if it's not used for much else) you can get a resolution of about 838 ns.
For newer CPUs there's RDTSC, which works on some CPUs but not others (Intel's SpeedStep is the main thing that breaks it). It wouldn't be hard to detect if it's usable or not, and use it when it is usable.
Then there's the local APIC timer count (not the local APIC timer IRQ) and possibly HPET. It isn't hard to detect if these are present.
There's also no reason you can't combine all of this...
Code: Select all
if( CPU_has_fixed_frequency_RDTSC ) {
use_calibrated_RDTSC();
} else if( CPU_has_localAPIC() ) {
use_calibrated_LAPIC_count();
} else if( HPET_is_supported() ) {
use_HPET();
} else if( CPU_is_fixed_cycles_per_second() ) {
use_calibrated_LOOP();
} else {
use_PIT_channel2_count();
}
The idea is to use the most accurate method that works reliably on any system...
Of course for SMP systems using HPET and the PIT timer count would cause problems with 2 or more CPUs trying to read the timer count at the same time. Fortunately SMP systems have local APICs, so this never becomes a problem.
BTW the main problem for SMP is if the thread doing the delay is preempted and restarted on a different CPU. There are ways around this (including AMD's new RDTSCP instruction).
Cheers,
Brendan
Posted: Wed Apr 25, 2007 11:48 am
by mystran
os64dev wrote: I have done some measurements with the outportb(0x80,0x80) suggested by linux and found out that on and AMD X2 it takes about .7 microsecconds and on an intel about 1 microsecond. So that is rather accurate what processes need a higher resolution?
The only nasty part of 0x80 is that it happens to be a nice dummy port, which Bochs nicely happens to report as invalid DMA port (because that's why it's nice dummy port) when you enable DMA debuggin in Bochs, which will make the debug messages for DMA a total flood and it'll be impossible to find the really relevant lines..
So if you need to avoid that, you could try port 0xED. It's supposedly almost as safe, is Bochs friendly, and ... mm.. well.. yeah. It's a good alternative.
For details about that stuff, you could look in
http://oinkzwurgl.org/downloads/attic/p ... index.html