is my micro-kernel too slow?

lemonyii · Post by **lemonyii** » Mon Aug 30, 2010 4:07 am

hi,
i add my os to "what's your os like" today.http://forum.osdev.org/viewtopic.php?f= ... &start=855(at the bottom,with colorful big text "Trive" on the screen shot)
but there is still a problem confusing me: how fast the mkernel is can be acceptable?

I tested it on my real machine with AMD Athlon XII 240 today, under the condition of no other processes, no interrupt,[img]only[/img] the scheduler, the messager, and two threads.
In message test, it can send&rec 2,000,000 messages per second.
In mem alloc/free test, it can alloc&free 400,000 pages of 4k per second(2M page may be 100 times faster).

I'm wondering if it is too slow(i'm feeling so). i use regs to pass the messages like L4, and i think the only benifit is avoiding interprocess copy, with the huge cost of saving and recovering regs in send() and rec() which have to written in assembly.
How fast is your micro-kernel? And if yours is much faster, would you please introduce you algorithm?
Thank you!

lemonyii · Post by **lemonyii** » Mon Aug 30, 2010 8:21 pm

maybe

thank you!

gerryg400 · Post by **gerryg400** » Mon Aug 30, 2010 8:54 pm

In message test, it can send&rec 2,000,000 messages per second.

I would like to compare with my kernel but I have some questions.

1. What size are the messages ?
2. Are both threads in the same address space ?
3. Are you sending ring0 to ring0 or userspace to userspace ?
4. Is the messaging synchronous ?
5. Do you include both the sending function and receiving function ? Or is the message queued ? Or something else ?

Owen · Post by **Owen** » Tue Aug 31, 2010 4:28 am

6. What are the specifications of the machine you are testing this on?

gerryg400 · Post by **gerryg400** » Tue Aug 31, 2010 9:58 pm

In mem alloc/free test, it can alloc&free 400,000 pages of 4k per second(2M page may be 100 times faster).

With regards to this test, I also have some questions.

1. Are the pages pre-mapped or does your alloc allocate and map physical memory as well.
2. Do you touch the memory that is allocated or free immediately ?
3. When you free, do you unmap the physical mem ? INVLPG etc. ?
4. Do you alloc the 400,000 pages then free the 400,000 pages OR alloc 1, free 1, 400,000 times ?

Brendan · Post by **Brendan** » Wed Sep 01, 2010 4:02 am

Hi,

gerryg400 wrote:1. What size are the messages ?

"i use regs to pass the messages like L4 would imply that the messages are relatively small - maybe 32 bytes for a 32-bit OS (or 128 bytes for a 64-bit OS that isn't designed to run 32-bit processes).

gerryg400 wrote:2. Are both threads in the same address space ?

On a 166 MHz Pentium, L4 papers claim 21 cycles for sending a message from a small address space to a small address space , and 190 to 1828 cycles for sending a message from a large address space to a large address space. An AMD Athlon XII 240 is a dual-core CPU running at 2.8 GHz. If only one core is being used, 2000000 messages per second works out to 1400 cycles per message. From this I assume both threads are in different address spaces.

gerryg400 wrote:4. Is the messaging synchronous ?

"and i think the only benifit is avoiding interprocess copy" implies that the message data isn't stored anywhere. For asynchronous you have to store the message data somewhere (until the receiver is ready to receive it), so I'd say it's synchronous. Note: the "avoiding copy" part also makes me assume the tests are only using one core of the dual-core CPU.

gerryg400 wrote:5. Do you include both the sending function and receiving function ? Or is the message queued ? Or something else ?

If my assumptions are correct, then sending a message from one task to another involves loading the data into registers and doing a task switch. In this case it'd be very difficult (impossible?) to test sending alone or to test receiving alone, and therefore it'd be measuring both (e.g. 1400 cycles for send and receive).

Owen wrote:6. What are the specifications of the machine you are testing this on?

I'm not sure that matters too much. An AMD Athlon XII 240 is a dual-core CPU running at 2.8 GHz. For a ping-pong test everything would remain in the CPU's caches (RAM speeds would have little or no impact) and no other hardware would be used.

gerryg400 wrote:1. Are the pages pre-mapped or does your alloc allocate and map physical memory as well.

"it can alloc&free 400,000 pages of 4k per second" - from this I assume it allocates a page, then frees the allocated page.

For a 2.8 GHz CPU this works out to 7000 cycles for both allocate and free; which is exactly 5 times as much as sending/receiving a message. From this I expect that the memory manager is running in a separate (large) address space; and the task sends an "allocate a page" message to the memory manager, the memory manager sends a reply, the task sends a "free a page" message to a the memory manager and the memory manager sends a reply. The messaging would account for 5600 of the 70000 cycles. I'd assume the remaining cycles would be used for finding/freeing the physical page (inside the memory manager), and adding and removing the page to/from the page tables and invalidating the TLB (inside the kernel?).

gerryg400 wrote:4. Do you alloc the 400,000 pages then free the 400,000 pages OR alloc 1, free 1, 400,000 times ?

I'd run the test for several seconds (maybe a minute) and then calculate the "pages per second" by dividing the count by the time taken (mostly to get a more accurate result). If this is 32-bit code, and if all pages are allocated (and then all pages are freed) it'd consume 1.6 GiB of RAM (and space) per second; and the test would run out of RAM too quickly (and if it's a 32-bit OS with maybe 3 GiB to space to play with it'd run space before 2 seconds have passed).

From the time it takes I'm sure it's allocating and freeing 1 page at a time (e.g. it's not asking the memory manager to allocate or free multiple pages with one message). Based on both of these things, I'd assume the test is probably "allocate 1 page then free that page" done in a loop; partly because this is the easiest way to write the test (and easier than "allocate a block of pages one a time, then free the block of pages one at a time" in a loop).

Cheers,

Brendan

lemonyii · Post by **lemonyii** » Tue Sep 14, 2010 4:41 am

ooohhhh!!!!
sorry for my absent for weeks.
ONLY one thing Bredan said is wrong:

"it can alloc&free 400,000 pages of 4k per second" - from this I assume it allocates a page, then frees the allocated page.

in fact i allocate 2 pages,and free one of them, so the allocating speed will be an average number.
and i tested only 1.6G memory, for i have only 2G phys mem.
another thing i want to point out is, my page allocating is done in microkernel(the test task call it through LEVEL0), so it get the phys memory.
my microkernel have only one entry: 256 ints, and only one exit: scheduler. So both test will enter mkernel, but msg test have more things to do, check status, copy msgs, and cut down/put back the threads.

good news: i've been able to draw Unicode text(i test Chinese,and the font need to be more beautiful) and 24bit bmp files in graphic mode now.
bad news: my malloc() in user space failed. and i'm still working on it.
question: c++ for kernel? in my new post.

thanks!
lemonyii

OSDev.org

is my micro-kernel too slow?

is my micro-kernel too slow?

Re: is my micro-kernel too slow?

Re: is my micro-kernel too slow?

Re: is my micro-kernel too slow?

Re: is my micro-kernel too slow?

Re: is my micro-kernel too slow?

Re: is my micro-kernel too slow?