rdos wrote:Results for my 6-core AMD Phenom: (at 2.8 GHz)
near: 44.7 million calls per second
gate: 12.0 million calls per second
And for my 2-core Intel Atom (at 3GHz)
near: 24.4 million calls per second
gate: 2.4 million calls per second
That means that Phenom, by a large margin, is the best processor for usage with RDOS. This also is evident in various test programs
It would be interesting to test on the Intel core duo as well, but I wonder if it booted?
You tested two microarchitectures, with one very specific (and one that some might say is useless) benchmark, and you declare that a Phenom is the best CPU to use with RDOS? Rather bold. I'd be happy to crush your benchmarks if you're willing to provide an image of your OS for me
Normal calls without segment\ privilege changes should only take a couple of clock cycles. This makes it very difficult to measure with any degree of accuracy how long they actually take in userspace.
After some tweaking of the parameters, this is the best benchmark I could come up with
Code: Select all
#include <time.h>
#include <iostream>
using namespace std;
int main()
{
unsigned long long total = 0;
unsigned long long low = 0xFFFFFFFFFFFFFFF;
unsigned long long high = 0;
for (int i = 0; i < 1000; i++)
{
clock_t start = clock();
_asm
{
push ecx
mov ecx, 11000000
callfunc:
call function
sub ecx, 1
jnz callfunc
jmp done
function:
ret
done:
pop ecx
}
clock_t end = clock();
total += end - start;
if (low > (end - start))
low = end - start;
else if (high < (end - start))
high = end - start;
}
cout << "Total: " << total << endl;
cout << "Average: " << (double)total / 1000.0 << endl;
cout << "Low: " << low << endl;
cout << "High: " << high << endl;
return 0;
}
Running this gave the result:
Total: 16510
Average: 16.51
Low: 2
High: 31
Those times are in millaseconds BTW (CLOCKS_PER_SEC = 1000). Pretty inconsistent results, but using the low of 2ms for 11m function calls yields 5.5b function calls per second. Doesnt seem possible on my 4ghz CPU, but I assume that's because of clock() rounding. But even saying 3ms was the best case scenario for 11m function calls, that gives us 3.66 billion function calls per second, which is a little bit more reasonable, but it's still barely over 1 clock cycle for a call\ret pair. The average across the whole test was 666 million function calls, which is 6 cycles per call\ret pair.
Given this data, I think its reasonable to say a cached call\ret pair takes ~4 clock cycles on a modern CPU (this test was ran on a Nehalem CPU).