hi,
i add my os to "what's your os like" today.http://forum.osdev.org/viewtopic.php?f= ... &start=855(at the bottom,with colorful big text "Trive" on the screen shot)
but there is still a problem confusing me: how fast the mkernel is can be acceptable?
I tested it on my real machine with AMD Athlon XII 240 today, under the condition of no other processes, no interrupt,[img]only[/img] the scheduler, the messager, and two threads.
In message test, it can send&rec 2,000,000 messages per second.
In mem alloc/free test, it can alloc&free 400,000 pages of 4k per second(2M page may be 100 times faster).
I'm wondering if it is too slow(i'm feeling so). i use regs to pass the messages like L4, and i think the only benifit is avoiding interprocess copy, with the huge cost of saving and recovering regs in send() and rec() which have to written in assembly.
How fast is your micro-kernel? And if yours is much faster, would you please introduce you algorithm?
Thank you!
is my micro-kernel too slow?
is my micro-kernel too slow?
Enjoy my life!------A fish with a tattooed retina
Re: is my micro-kernel too slow?
I would like to compare with my kernel but I have some questions.In message test, it can send&rec 2,000,000 messages per second.
1. What size are the messages ?
2. Are both threads in the same address space ?
3. Are you sending ring0 to ring0 or userspace to userspace ?
4. Is the messaging synchronous ?
5. Do you include both the sending function and receiving function ? Or is the message queued ? Or something else ?
If a trainstation is where trains stop, what is a workstation ?
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: is my micro-kernel too slow?
6. What are the specifications of the machine you are testing this on?
Re: is my micro-kernel too slow?
With regards to this test, I also have some questions.In mem alloc/free test, it can alloc&free 400,000 pages of 4k per second(2M page may be 100 times faster).
1. Are the pages pre-mapped or does your alloc allocate and map physical memory as well.
2. Do you touch the memory that is allocated or free immediately ?
3. When you free, do you unmap the physical mem ? INVLPG etc. ?
4. Do you alloc the 400,000 pages then free the 400,000 pages OR alloc 1, free 1, 400,000 times ?
If a trainstation is where trains stop, what is a workstation ?
Re: is my micro-kernel too slow?
Hi,
For a 2.8 GHz CPU this works out to 7000 cycles for both allocate and free; which is exactly 5 times as much as sending/receiving a message. From this I expect that the memory manager is running in a separate (large) address space; and the task sends an "allocate a page" message to the memory manager, the memory manager sends a reply, the task sends a "free a page" message to a the memory manager and the memory manager sends a reply. The messaging would account for 5600 of the 70000 cycles. I'd assume the remaining cycles would be used for finding/freeing the physical page (inside the memory manager), and adding and removing the page to/from the page tables and invalidating the TLB (inside the kernel?).
From the time it takes I'm sure it's allocating and freeing 1 page at a time (e.g. it's not asking the memory manager to allocate or free multiple pages with one message). Based on both of these things, I'd assume the test is probably "allocate 1 page then free that page" done in a loop; partly because this is the easiest way to write the test (and easier than "allocate a block of pages one a time, then free the block of pages one at a time" in a loop).
Cheers,
Brendan
"i use regs to pass the messages like L4 would imply that the messages are relatively small - maybe 32 bytes for a 32-bit OS (or 128 bytes for a 64-bit OS that isn't designed to run 32-bit processes).gerryg400 wrote:1. What size are the messages ?
On a 166 MHz Pentium, L4 papers claim 21 cycles for sending a message from a small address space to a small address space , and 190 to 1828 cycles for sending a message from a large address space to a large address space. An AMD Athlon XII 240 is a dual-core CPU running at 2.8 GHz. If only one core is being used, 2000000 messages per second works out to 1400 cycles per message. From this I assume both threads are in different address spaces.gerryg400 wrote:2. Are both threads in the same address space ?
"and i think the only benifit is avoiding interprocess copy" implies that the message data isn't stored anywhere. For asynchronous you have to store the message data somewhere (until the receiver is ready to receive it), so I'd say it's synchronous. Note: the "avoiding copy" part also makes me assume the tests are only using one core of the dual-core CPU.gerryg400 wrote:4. Is the messaging synchronous ?
If my assumptions are correct, then sending a message from one task to another involves loading the data into registers and doing a task switch. In this case it'd be very difficult (impossible?) to test sending alone or to test receiving alone, and therefore it'd be measuring both (e.g. 1400 cycles for send and receive).gerryg400 wrote:5. Do you include both the sending function and receiving function ? Or is the message queued ? Or something else ?
I'm not sure that matters too much. An AMD Athlon XII 240 is a dual-core CPU running at 2.8 GHz. For a ping-pong test everything would remain in the CPU's caches (RAM speeds would have little or no impact) and no other hardware would be used.Owen wrote:6. What are the specifications of the machine you are testing this on?
"it can alloc&free 400,000 pages of 4k per second" - from this I assume it allocates a page, then frees the allocated page.gerryg400 wrote:1. Are the pages pre-mapped or does your alloc allocate and map physical memory as well.
For a 2.8 GHz CPU this works out to 7000 cycles for both allocate and free; which is exactly 5 times as much as sending/receiving a message. From this I expect that the memory manager is running in a separate (large) address space; and the task sends an "allocate a page" message to the memory manager, the memory manager sends a reply, the task sends a "free a page" message to a the memory manager and the memory manager sends a reply. The messaging would account for 5600 of the 70000 cycles. I'd assume the remaining cycles would be used for finding/freeing the physical page (inside the memory manager), and adding and removing the page to/from the page tables and invalidating the TLB (inside the kernel?).
I'd run the test for several seconds (maybe a minute) and then calculate the "pages per second" by dividing the count by the time taken (mostly to get a more accurate result). If this is 32-bit code, and if all pages are allocated (and then all pages are freed) it'd consume 1.6 GiB of RAM (and space) per second; and the test would run out of RAM too quickly (and if it's a 32-bit OS with maybe 3 GiB to space to play with it'd run space before 2 seconds have passed).gerryg400 wrote:4. Do you alloc the 400,000 pages then free the 400,000 pages OR alloc 1, free 1, 400,000 times ?
From the time it takes I'm sure it's allocating and freeing 1 page at a time (e.g. it's not asking the memory manager to allocate or free multiple pages with one message). Based on both of these things, I'd assume the test is probably "allocate 1 page then free that page" done in a loop; partly because this is the easiest way to write the test (and easier than "allocate a block of pages one a time, then free the block of pages one at a time" in a loop).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: is my micro-kernel too slow?
ooohhhh!!!!
sorry for my absent for weeks.
ONLY one thing Bredan said is wrong:
and i tested only 1.6G memory, for i have only 2G phys mem.
another thing i want to point out is, my page allocating is done in microkernel(the test task call it through LEVEL0), so it get the phys memory.
my microkernel have only one entry: 256 ints, and only one exit: scheduler. So both test will enter mkernel, but msg test have more things to do, check status, copy msgs, and cut down/put back the threads.
good news: i've been able to draw Unicode text(i test Chinese,and the font need to be more beautiful) and 24bit bmp files in graphic mode now.
bad news: my malloc() in user space failed. and i'm still working on it.
question: c++ for kernel? in my new post.
thanks!
lemonyii
sorry for my absent for weeks.
ONLY one thing Bredan said is wrong:
in fact i allocate 2 pages,and free one of them, so the allocating speed will be an average number."it can alloc&free 400,000 pages of 4k per second" - from this I assume it allocates a page, then frees the allocated page.
and i tested only 1.6G memory, for i have only 2G phys mem.
another thing i want to point out is, my page allocating is done in microkernel(the test task call it through LEVEL0), so it get the phys memory.
my microkernel have only one entry: 256 ints, and only one exit: scheduler. So both test will enter mkernel, but msg test have more things to do, check status, copy msgs, and cut down/put back the threads.
good news: i've been able to draw Unicode text(i test Chinese,and the font need to be more beautiful) and 24bit bmp files in graphic mode now.
bad news: my malloc() in user space failed. and i'm still working on it.
question: c++ for kernel? in my new post.
thanks!
lemonyii
Enjoy my life!------A fish with a tattooed retina