I thought I would start another so as not to clutter up someone else's post.
LOL, good times.Brendan wrote:Hi,
A) That's hideously bad/unmaintainabletsdnz wrote:Hi, I am in 64 bit mode long, QWORDS/UINT64 only for me, but I hope this helps you out.
B) For IRQs the "HandlerAddress" should be the same for all IRQs; and it's better for to have a common assembly stub that calls a generic (C, C++) IRQ handler that handles things like how many completely different/unrelated device drivers happen to be sharing that IRQ
C) For all other types of interrupts it's better to have a specific assembly stub for each thing and not have a generic assembly stub (unless you're writing a tutorial and don't want to complicate it by doing things right)
D) Interrupts never have anything to do with task switching in the first place (unless the kernel/scheduler is a massive design failure).
Cheers,
Brendan
A) That's hideously bad/unmaintainable
For me I wanted to remove a lookup and call using a Generic Handler.
I was not sure now to do this using asm macros, so I wrote it as I wanted it.
I only have two files in my Kernel, the main file and a Generic file.
Yup, not the design method you guys would like, but using Windows Visual Studio C++ IDE it works a treat.
B) For IRQs the "HandlerAddress" should be the same for all IRQs; and it's better for to have a common assembly stub that calls a generic (C, C++) IRQ handler that handles things like how many completely different/unrelated device drivers happen to be sharing that IRQ
For me I was after speed, I had the design you are talking about but I found that I was losing a few cycles.
To find out what CPU the interrupt was on required reading from ((LocalAPICAddress + 0x20) >> 24).
My design only one driver has a single interrupt, my OS is not a generic OS, it is specific to a task.
D) Interrupts never have anything to do with task switching in the first place (unless the kernel/scheduler is a massive design failure).
Very interesting, how do you guys time-slice a running program?
For instance a program running in an infinite loop.
On my 48 core server I am time-slicing 196,608 times a second, 48 * 4096.
Each core helps the scheduler.
Never, under any circumstances, do anything in inline assembly that touches or modifies the stackBrendan wrote:Hi,
Never, under any circumstances, do anything in inline assembly that touches or modifies the stack, or relies on any specific stack layout. The stack belongs to the compiler and it will do whatever it likes with its stack; it is not yours to mess with, you gave up the right to touch the stack when you chose to use a compiler.ashishkumar4 wrote:and the switch task function:
You must use external assembly and not inline assembly for the (tiny) piece of code that does the final task switch.
Cheers,
Brendan
Although I agree I break the rules.
Changing the stack is always the last line of code to execute before functionality is passed back to user-space in my OS.
You must use external assembly and not inline assembly for the (tiny) piece of code that does the final task switch.
Again, very interesting.
I use inline not external, it works nicely for me, very nicely.
For example: If the scheduler interrupt wants to switch a task, I load the float data, set up the pages, etc...
Then do this,
Code: Select all
asm volatile (
"movq %0, %%rsp;"
"popq %%r15;"
"popq %%r14;"
"popq %%r13;"
"popq %%r12;"
"popq %%r11;"
"popq %%r10;"
"popq %%r9;"
"popq %%r8;"
"popq %%rbp;"
"popq %%rdi;"
"popq %%rsi;"
"popq %%rdx;"
"popq %%rcx;"
"popq %%rbx;"
"popq %%rax;"
"iretq;" : : "r"((QWORD)&uk->gpr)
);
/*!! __builtin_unreachable();!!*/
I am compiling a list of performance tests between my OS and windows, it would be great to see how you guys compare.
I have only starting this today. Here are 3.
What about I start a new topic and we test our server speeds out.
I would very much like to gauge my OS against others and get some feed back.
Both Windows and my OS are on the same server. 48 Core, 1.9 GHz, 128 GB Ram
1) Inside an infinite loop I increment a volatile QWORD and display it every 1/2 second.
Running on a single core.
My OS shows 474 million = 948 million per second. 1.9 GHz / 948 million = 2 cycles
Windows shows 160 million = 320 million per second. 1.9 GHz / 320 = 5.93 cycles
2) Inside an infinite loop I increment a volatile QWORD and display it every 1/2 second.
Running 48 tasks, one on each core. Separate QWORD for each task.
My OS shows 474 million = 948 million per second. 1.9 GHz / 948 million = 2 cycles
My OS show a total count of 22.7 billion. (Each QWORD for all tasks in all CPUS)
Windows shows 160 million = 320 million per second. 1.9 GHz / 320 = 5.93 cycles
Windows show a total count of 7.6 billion. (Each QWORD for all tasks in all CPUS)
3) Inside an infinite loop I increment a volatile QWORD and display it every 1/2 second.
Running 8192 tasks per core = 393,216 tasks. Separate QWORD for each task.
My OS show a total count of 21.0 billion, a little loss. (Each QWORD for all tasks in all CPUS)
And each task is allocate a percentage of the time.
Windows has trouble updating the screen with 96 threads, so I cannot get accurate readings, and it cannot handle much more without completely stopping.
As always, thanks to everyone for their feedback, it is greatly appreciated.
Alistair.