How many threads can YOUR kernel handle?
How many threads can YOUR kernel handle?
Hey everyone,
I've been stress testing my kernel lately, and came up with an idea. We don't seem to have any competition/comparison between our projects, so lets start!
I want to see whose kernel can handle the most running threads. Post screenshots, code, videos, or whatever. No cheating.
I got up to 6,000 this morning before I ran out of memory (32MB). I'm going to crank that up to 512MB later, and then I'll post some screenshots
Have fun,
Sean
I've been stress testing my kernel lately, and came up with an idea. We don't seem to have any competition/comparison between our projects, so lets start!
I want to see whose kernel can handle the most running threads. Post screenshots, code, videos, or whatever. No cheating.
I got up to 6,000 this morning before I ran out of memory (32MB). I'm going to crank that up to 512MB later, and then I'll post some screenshots
Have fun,
Sean
Code: Select all
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M/MU d- s:- a--- C++++ UL P L++ E--- W+++ N+ w++ M- V+ PS+ Y+ PE- PGP t-- 5- X R- tv b DI-- D+ G e h! r++ y+
------END GEEK CODE BLOCK------
234,856,999,746,857,857,135,812,430,860,096,483,473,494,987,654,464,659
,986,999,999,999,999,999,999,999,999,099,766,55,333,449,999,000,666,555
,666,444,777:lol:
,986,999,999,999,999,999,999,999,999,099,766,55,333,449,999,000,666,555
,666,444,777:lol:
Working On:Bootloader, RWFS Image Program
Leviathan: http://leviathanv.googlecode.com
Kernel:Working on Design Doc
Leviathan: http://leviathanv.googlecode.com
Kernel:Working on Design Doc
Re: How many threads can YOUR kernel handle?
Well, the reason people don't have competitions is because the goals tend to be different. For example, if i'm writing a kernel that is meant for desktop use, i have ring 3/ring 0, my tasks have info stored about them (like a bitmap for i/o ports associated with them), etc, and someone else has a bootloader with a kernel that uses the bare minimum to multitask with, he can probably fit more processes in memory than myself, but then again, that doesn't make them useful. Also, I was going to do some testing with my multi-tasking code, what priority are these tasks supposed to run at, because I know in bochs it takes a long time to spawn a lot of threads once they start adding up, unless I put all the threads on pause besides the one creating the new threads. (my threads start as soon as spawned, so it gets slower and slower each task beacuse it has to cycle through all tasks, because getting back to the task that is spawning the tasks).senaus wrote:Hey everyone,
I've been stress testing my kernel lately, and came up with an idea. We don't seem to have any competition/comparison between our projects, so lets start!
I want to see whose kernel can handle the most running threads. Post screenshots, code, videos, or whatever. No cheating.
I got up to 6,000 this morning before I ran out of memory (32MB). I'm going to crank that up to 512MB later, and then I'll post some screenshots
Have fun,
Sean
Also, what kind of task are we running, mine is simply a task that prints it's ID, then infinite loops, I added a Sleep() in the loop so it only wastes a few instructions to help speed up the generating of tasks, but it was still to slow. I set the priority to sleeping so it skips the tasks, but then I will wake them up after generated to make sure they all fire properly. I hit my limit just under 4k tasks @ 32mb ram. What kind of tasks are you running? Do they each have their own memory space, or are they just threads inside the kernel (aka, share the kernel's memory). Does each thread have a stack in kernel space, or in user space, does it take up a full page, or is it only a minimal stack for testing, etc. Way to many variables to directly compare results, I think that is why there aren't so many competitions/challenges/comparisons between hobby OS's. Maybe if you set the rules so we could spawn our tasks properly, with the correct priority, etc would be helpful.
Yeah, each one of my tasks spawns a 4k stack, 4k for the page directory, 4k for a page table, an entry in my process list, and some other misc. stuff. I do plan on re-working a LOT of my memory management, thread/process management code in the near future, so hopefully I can bring up the # of tasks a bit, work my code for running tasks so a higher priority thread is called more often, not just alloted a bigger time slice, that should help fix my problem of generating tasks slowly (or i guess I could run it on a real machine or in another faster emulator instead of bochs).pcmattman wrote:My current kernel chokes at around 150 threads.
Lately I've been having a lot of trouble with memory in my OS and have been debugging it extensively. If you reduced the thread stack size you could probably get away with many more threads .
-
- Member
- Posts: 391
- Joined: Wed Jul 25, 2007 8:45 am
- Libera.chat IRC: aejsmith
- Location: London, UK
- Contact:
In my test all the threads were running under the same process which had one address space, so I didn't need a new page directory, etc. for each thread, just a 4k kernel stack.Ready4Dis wrote:Yeah, each one of my tasks spawns a 4k stack, 4k for the page directory, 4k for a page table, an entry in my process list, and some other misc. stuff. I do plan on re-working a LOT of my memory management, thread/process management code in the near future, so hopefully I can bring up the # of tasks a bit, work my code for running tasks so a higher priority thread is called more often, not just alloted a bigger time slice, that should help fix my problem of generating tasks slowly (or i guess I could run it on a real machine or in another faster emulator instead of bochs).
I also need to change my code so that a higher priority thread gets executed first - I do the same as you and just give a bigger timeslice.
I managed to get it to spawn tasks fairly quickly despite this. What I did was in the thread that was spawning the thread, disable preemption for that thread, create 500 new threads, call schedule() to let them run, do 500 more next time the thread is run, etc.
Yeah, like I said, it makes comparisons easier when everyone is doing the same thing, which in OS dev isn't that common (unless you find a bunch of people making a Unix clone). I did 2 things to help speed up my task creation, first I put, in the infinite loop, a call to relinquish the rest of that tasks time, so the task switch would not even wait the time that each task was allocated, second, I then started each task as sleeping so they are skipped when it's time for their time slice, and then I wake them all up at the end. Yes, removing 8k from each process would probably make it run out ot memory less quickly. I realize that I am creating a PDE and stack, but I was not creating a new PTE because it was just a function from within the kernel (which is copied from the original PDE). So each of my tasks takes 8k, if I didn't generate the PDE, I Would save myself 1/2 the memory used for the tasks, so it should just about double. 8k per task in a perfect world is 4k tasks. Bochs doesn't actually give me the entire 32mb (well, I only use memory above 1mb, below is reserved for DMA right now, this may change later to use it if we run out of regular, but for now I don't). Also, I am loading my kernel + ATA, FDC, Keyboard, Mouse, etc drivers, so they do also take some space, as well as my memory manager, task manager, etc. I was able to hit just under (like 3 less than) 4,000 tasks, so not to horrible, if I change them over to threads and only use a new stack, I can even use a smaller stack if I want if it's a thread, since most threads won't need more than 1k (although it's much less safe this way). The maximum number of threads should pretty much be guessable (within a few hundred) based on what each task is using, so it isn't really much of a challenge per say, I could give each task a 16-byte stack and generate a hole bunch, but that isn't really a true test since most tasks will use more than 64-bytes (while my loop forever and do nothing task doesn't, only needs to store the registers + a few for the function calls).AlexExtreme wrote:In my test all the threads were running under the same process which had one address space, so I didn't need a new page directory, etc. for each thread, just a 4k kernel stack.Ready4Dis wrote:Yeah, each one of my tasks spawns a 4k stack, 4k for the page directory, 4k for a page table, an entry in my process list, and some other misc. stuff. I do plan on re-working a LOT of my memory management, thread/process management code in the near future, so hopefully I can bring up the # of tasks a bit, work my code for running tasks so a higher priority thread is called more often, not just alloted a bigger time slice, that should help fix my problem of generating tasks slowly (or i guess I could run it on a real machine or in another faster emulator instead of bochs).
I also need to change my code so that a higher priority thread gets executed first - I do the same as you and just give a bigger timeslice.
I managed to get it to spawn tasks fairly quickly despite this. What I did was in the thread that was spawning the thread, disable preemption for that thread, create 500 new threads, call schedule() to let them run, do 500 more next time the thread is run, etc.
Ok, here's mine. I reached 100,000 in 512MB reliably, uncovering a few bugs on the way. I'm using pure kernel threads, so no address space, just a 4K stack for each. Each thread runs a while loop, printing a character to the ticker and then hlt-ing.
As for the competition, I didn't mean it as a serious thing, just a boyish "how big is your ..." style comparison. As long as you post the circumstances of your stress test, I'll be happy
Have fun.[/url]
As for the competition, I didn't mean it as a serious thing, just a boyish "how big is your ..." style comparison. As long as you post the circumstances of your stress test, I'll be happy
Have fun.[/url]
- Attachments
-
- dyknl-stressthreads.png (15.93 KiB) Viewed 3134 times
Code: Select all
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M/MU d- s:- a--- C++++ UL P L++ E--- W+++ N+ w++ M- V+ PS+ Y+ PE- PGP t-- 5- X R- tv b DI-- D+ G e h! r++ y+
------END GEEK CODE BLOCK------
@AlexExtreme
In your first post, you said:
15,500 Threads * 4096 byte stacks = 63,488,000 bytes = about 60.5 MB
How is it at all possible that this was running on 32 MB of RAM? I'm not even accounting for the RAM your kernel uses or the 4K used by the page directory.
Did you mean 64MB of ram? Are you doing swapping to disk?
proxy
In your first post, you said:
Then you later said this:I just ran a stress test on my new scheduler/thread code. With 32MB RAM, it gave up at around 15,500 threads with out of memory. I upped the RAM to 512MB and it appears to have frozen at 30,401 threads:
I am a little confused because the math doesn't work out:In my test all the threads were running under the same process which had one address space, so I didn't need a new page directory, etc. for each thread, just a 4k kernel stack.
15,500 Threads * 4096 byte stacks = 63,488,000 bytes = about 60.5 MB
How is it at all possible that this was running on 32 MB of RAM? I'm not even accounting for the RAM your kernel uses or the 4K used by the page directory.
Did you mean 64MB of ram? Are you doing swapping to disk?
proxy
Thread / Semphore power down !
Hello all,
I revival this thread because i have an ununderstood probleme on my scheduler that lost power at à moment.
All began very slowly (Sorry for my bad english expression)
I have build a little test application named "zBalls" on my OS named "Zumba" (begin on 2005 from my spare time)
There are some balls witch bounce on the screen borders and themselfs
Each zball is a thread that simply calcul the next coordinate of the ball and go to sleep for 80ms
There is 2 others threads:
-One "The creator" For create a zball and go to sleep for 500ms
-Second "The screen" to refresh the screen information.
All of this thread are synchronized with 2 semaphores:
-Semaphore for "Screen"
-Semaphore for zball list on memory
It's semms like that:
(All the balls have an idependent trajectorie and they bounce on the creen border and bounce on other balls)
Here the overview of architecture
- Scheduler is roundrobin
- Timerlist used to sleep threads
- Waitq Semaphore Screen : for screen acces synchro
- Waitq Liste : for data liste(list of zball) acces syncho
My Probleme:
1/ At the beginning all work correctly
- There is 13 zball thread + 2 threads (create + screen) + 1 (my Idle thread)
- 2 are actif in the scheduler
- 14 timer existe, so 14 thread sleeping
- Semaphore(waitq) are not solicited
2/ At 62 zball, all is OK
- More thread sleeping (greate !)
3/ At 126 zball QEMU : SCRATCH !!!
The apparent fludity go down so quickly, 125 zball all is OK, at 126 all down very very slowly
- 116 thread want to acces the liste
4/ An other try, but now is the screen
Like before, power go down, all be very slow....
- 104 thread want to acces the screeen...
5/ If i wait the power slow more and more down
zball d'ont move every 80 ms, it now move every 1 or 3 sec....
6/ Boch slow down progressively
look i can run 370 thread and the power slow down progressively. look sharing of ressources demande 163 for screen 197 for liste
7/ On reeal architecture i can go to 256 threads very greate and at the 257 all are going to slow down immédiately !!!
I check for RAM allocation (see corner top_right)
So i think i have win the place of "The Os can't run a small peace of thread"... i just can run 257 in the same time.
But i think it's not common to have 250 thread who want to access to the same ressource but perhaps it can be, what do you think ?
* I just want to know if it's a "normal" comportement of a stressed scheduler ?
* Can you tell me you impression or idée ?
* Have you try this kind on processing ? what is the result ? does your OS Fall Down immédiately if you go off a limit ? What is this limit ? How many thread can you run on a sharing ressource ?
Thanks a lot for read me
(Please thousands excuses for my bad english expression)
Have nice coder day
Molux
I revival this thread because i have an ununderstood probleme on my scheduler that lost power at à moment.
All began very slowly (Sorry for my bad english expression)
I have build a little test application named "zBalls" on my OS named "Zumba" (begin on 2005 from my spare time)
There are some balls witch bounce on the screen borders and themselfs
Each zball is a thread that simply calcul the next coordinate of the ball and go to sleep for 80ms
There is 2 others threads:
-One "The creator" For create a zball and go to sleep for 500ms
-Second "The screen" to refresh the screen information.
All of this thread are synchronized with 2 semaphores:
-Semaphore for "Screen"
-Semaphore for zball list on memory
It's semms like that:
(All the balls have an idependent trajectorie and they bounce on the creen border and bounce on other balls)
Here the overview of architecture
- Scheduler is roundrobin
- Timerlist used to sleep threads
- Waitq Semaphore Screen : for screen acces synchro
- Waitq Liste : for data liste(list of zball) acces syncho
My Probleme:
1/ At the beginning all work correctly
- There is 13 zball thread + 2 threads (create + screen) + 1 (my Idle thread)
- 2 are actif in the scheduler
- 14 timer existe, so 14 thread sleeping
- Semaphore(waitq) are not solicited
2/ At 62 zball, all is OK
- More thread sleeping (greate !)
3/ At 126 zball QEMU : SCRATCH !!!
The apparent fludity go down so quickly, 125 zball all is OK, at 126 all down very very slowly
- 116 thread want to acces the liste
4/ An other try, but now is the screen
Like before, power go down, all be very slow....
- 104 thread want to acces the screeen...
5/ If i wait the power slow more and more down
zball d'ont move every 80 ms, it now move every 1 or 3 sec....
6/ Boch slow down progressively
look i can run 370 thread and the power slow down progressively. look sharing of ressources demande 163 for screen 197 for liste
7/ On reeal architecture i can go to 256 threads very greate and at the 257 all are going to slow down immédiately !!!
I check for RAM allocation (see corner top_right)
So i think i have win the place of "The Os can't run a small peace of thread"... i just can run 257 in the same time.
But i think it's not common to have 250 thread who want to access to the same ressource but perhaps it can be, what do you think ?
* I just want to know if it's a "normal" comportement of a stressed scheduler ?
* Can you tell me you impression or idée ?
* Have you try this kind on processing ? what is the result ? does your OS Fall Down immédiately if you go off a limit ? What is this limit ? How many thread can you run on a sharing ressource ?
Thanks a lot for read me
(Please thousands excuses for my bad english expression)
Have nice coder day
Molux