Struggling with TSS

bloodline · Post by **bloodline** » Wed Sep 23, 2020 6:37 am

So I have set up my GDT and IDT, I've been testing the interrupts traps and IRQs (at least the timers and keyboard) all work fine. I have set up ISR48 as my "Kernel Call" software interrupt, so can jump back into supervisor mode from user mode.

At the moment I have the kernel in an infinite asm("hlt") loop, and a command interpreter that runs in the keyboard interrupt (processing the command buffer when the return key is pressed, and generating output sent to the terminal), but obviously I want to run the command processor in it's own context, not in the keyboard interrupt. Which I supposed is a long winded way of saying it's time to implement mode switching.

I have followed the OS-Dev Wiki as best I can, but when I execute ltr (setting to the 6th entry in my GDT), the machine crashes, I'm assuming it's a stack issue (I'm guessing this is what is known as a triple fault), because when I've done something wrong (or deliberately performed a Div/0 test), my ISR cpu traps are called and a nice error message displayed.

In the GDT TSS set up, I have set the esp0 field to the top of a 16byte aligned clear section of memory... Is this incorrect?

-Edit- I'm making the assumption here that when I have a valid TSS and an interrupt occurs, the user stack is saved, the Supervisor stack is loaded, the interrupt is executed, then the user stack is loaded and interrupt returns... as it would on the 68000?

Then to implement task switching I simply need to swap the user stack for another task's stack, so when the interrupt returns sit will now be executing the next task.

I guess I could do this relatively easily in my interrupt handling code... but that seems rather inelegant...

nullplan · Post by **nullplan** » Wed Sep 23, 2020 8:31 am

bloodline wrote:I have followed the OS-Dev Wiki as best I can, but when I execute ltr (setting to the 6th entry in my GDT), the machine crashes, I'm assuming it's a stack issue (I'm guessing this is what is known as a triple fault), because when I've done something wrong (or deliberately performed a Div/0 test), my ISR cpu traps are called and a nice error message displayed.

So, first thing: LTR should not crash the machine. At most it should generate a #GPF, that you can then display. If it is crashing, figure out why. Usually by turning up the debug level of your emulator. If it is tripple-faulting, then you already have at least two faults in the code: You don't handle #GPF correctly, and you don't handle #DF correctly. Unless that is fixed, you won't be able to do much with the machine.

bloodline wrote:In the GDT TSS set up, I have set the esp0 field to the top of a 16byte aligned clear section of memory... Is this incorrect?

Well, I do hope you have more than 16 bytes of kernel stack. But yes, the important thing is that ESP0 points to the top of stack.

bloodline wrote:-Edit- I'm making the assumption here that when I have a valid TSS and an interrupt occurs, the user stack is saved, the Supervisor stack is loaded, the interrupt is executed, then the user stack is loaded and interrupt returns... as it would on the 68000?

Don't make assumptions, read the bloody manual. When an interrupt occurs, an interrupt frame is pushed to the stack. If a change of privilege level took place, the frame will contain the previous stack pointer. An IRET instruction will pop that information exactly the way the CPU pushed it to the stack. Beware: Most exceptions push an error code you have to remove before you can IRET.

bloodline wrote:Then to implement task switching I simply need to swap the user stack for another task's stack, so when the interrupt returns sit will now be executing the next task.

Possible, but likely a bad idea. The simplest implementation requires one kernel stack per task. In that case, switching tasks is as simple as saving your current context (callee saved registers and ESP) and loading the next one. All tasks are interrupted in kernel mode. If a task was in user mode when the interrupt hit, it will still be in kernel mode due to the interrupt.

Only changing the user stack, and only using one kernel stack per CPU is also possible, but requires you to enumerate all reasons for a task to be stopped, so you can restore its state when the reason to stop has passed. Possible, but more work.

bloodline wrote:I guess I could do this relatively easily in my interrupt handling code... but that seems rather inelegant...

Separate task management and interrupts. I have task flags. Whenever the kernel returns to user mode (all kernel entry and exit must go through the same routines, anyway), no matter if from a syscall or an interrupt, I test whether the timeout flag is set or not. If it is set, the code calls the scheduler before returning to user mode. That way, the timeout flag can be set for all sorts of reasons, one of which being the timer interrupt, but the interrupt is completely handled before the scheduler is called. Otherwise you end up having to send EOI in the middle of your scheduler, and that would really be inelegant.

sj95126 · Post by **sj95126** » Wed Sep 23, 2020 9:49 am

I would suggest making sure all of your exception handlers are solid before moving on to more complex features like tasks (or paging, or ...). ltr can trigger six different exceptions, although strangely enough #TS isn't one of them (but even that one can have more than 30 different causes).

If you're not getting the benefit of debugging from exception reporting, including an error code, you're going to be lost.

bloodline · Post by **bloodline** » Wed Sep 23, 2020 2:53 pm

nullplan wrote:
bloodline wrote:I have followed the OS-Dev Wiki as best I can, but when I execute ltr (setting to the 6th entry in my GDT), the machine crashes, I'm assuming it's a stack issue (I'm guessing this is what is known as a triple fault), because when I've done something wrong (or deliberately performed a Div/0 test), my ISR cpu traps are called and a nice error message displayed.
So, first thing: LTR should not crash the machine. At most it should generate a #GPF, that you can then display. If it is crashing, figure out why. Usually by turning up the debug level of your emulator. If it is tripple-faulting, then you already have at least two faults in the code: You don't handle #GPF correctly, and you don't handle #DF correctly. Unless that is fixed, you won't be able to do much with the machine.

I think you are right, ltr didn't crash it, it was the first interrupt which crashed after the TSS was set. I'm going to investigate further.

bloodline wrote:In the GDT TSS set up, I have set the esp0 field to the top of a 16byte aligned clear section of memory... Is this incorrect?
Well, I do hope you have more than 16 bytes of kernel stack. But yes, the important thing is that ESP0 points to the top of stack.

32k of stack, 16bytes aligned... I should have been clearer...

bloodline wrote:-Edit- I'm making the assumption here that when I have a valid TSS and an interrupt occurs, the user stack is saved, the Supervisor stack is loaded, the interrupt is executed, then the user stack is loaded and interrupt returns... as it would on the 68000?

Don't make assumptions, read the bloody manual. When an interrupt occurs, an interrupt frame is pushed to the stack. If a change of privilege level took place, the frame will contain the previous stack pointer. An IRET instruction will pop that information exactly the way the CPU pushed it to the stack. Beware: Most exceptions push an error code you have to remove before you can IRET.
bloodline wrote:Then to implement task switching I simply need to swap the user stack for another task's stack, so when the interrupt returns sit will now be executing the next task.
Possible, but likely a bad idea. The simplest implementation requires one kernel stack per task. In that case, switching tasks is as simple as saving your current context (callee saved registers and ESP) and loading the next one. All tasks are interrupted in kernel mode. If a task was in user mode when the interrupt hit, it will still be in kernel mode due to the interrupt.

Good advice, I've avoided the x86 for so many years, but I'm very experienced with 68k and ARM (v7 32bit... not looked at AArch64 yet), so I am getting a bit carried away.

Only changing the user stack, and only using one kernel stack per CPU is also possible, but requires you to enumerate all reasons for a task to be stopped, so you can restore its state when the reason to stop has passed. Possible, but more work.

bloodline wrote:I guess I could do this relatively easily in my interrupt handling code... but that seems rather inelegant...
Separate task management and interrupts. I have task flags. Whenever the kernel returns to user mode (all kernel entry and exit must go through the same routines, anyway), no matter if from a syscall or an interrupt, I test whether the timeout flag is set or not. If it is set, the code calls the scheduler before returning to user mode. That way, the timeout flag can be set for all sorts of reasons, one of which being the timer interrupt, but the interrupt is completely handled before the scheduler is called. Otherwise you end up having to send EOI in the middle of your scheduler, and that would really be inelegant.

I've given up with the TSS for now... I've implemented task switching entirely in Ring0, just swapping task stacks in the PIT handler.

As long as the tasks start with a valid interrupt stack frame (build manually during the task setup), it works well enough for now... I'll try and get my head around the bigger problems later... First I really want to get some proper gfx working, I'm a bit fed up with the text console.

I appreciate your help guys, with your help I am getting my head around this...

Heres some snippets of my approach if anyone is interested:

Code: Select all


.global usp
usp:            //Temp Stack pointer, for use when swapping stacks
.skip 8
.global ssp
ssp:            //Temp Stack pointer, for use when swapping stacks
.skip 8


.global irq0
irq0:
    cli

    pusha
    mov $usp,  %eax
    mov %esp, (%eax)  //Save Pointer to user stack
    mov $ssp,  %eax
    mov (%eax), %esp  //load Supervior stack
                 
    push $32
    jmp irq_jump


irq_jump:
    pusha

    cld
    call irq_handler   // implemented in C

    add $4, %esp

    popa

    mov $ssp, %eax
    mov %esp, (%eax)    //save supervisor stack
    mov $usp, %eax
    mov (%eax), %esp   //load user stack
    popa                //restore CPU state

    sti
    iret

Octocontrabass · Post by **Octocontrabass** » Wed Sep 23, 2020 4:36 pm

bloodline wrote:Heres some snippets of my approach if anyone is interested:

IRQ handlers should never start with CLI. If you don't want your IRQ handler to be interrupted by another higher-priority IRQ, use an appropriate gate type in your IDT.

STI right before IRET has no effect: there is a one-instruction delay before it enables interrupts, and during that delay IRET loads the interrupt flag from the stack.

Interrupts from user mode automatically switch to the supervisor stack, there is no need to do it yourself. (Interrupts in supervisor mode use the current supervisor stack.)

Since there's no need to switch stacks on every interrupt, you might want to get rid of a PUSHA/POPA pair.

It looks like you might be disobeying the System V ABI when you call your C function. There's no problem in the code you've posted, but it might cause problems elsewhere.

bloodline · Post by **bloodline** » Wed Sep 23, 2020 4:54 pm

Octocontrabass wrote:
bloodline wrote:Heres some snippets of my approach if anyone is interested:
IRQ handlers should never start with CLI. If you don't want your IRQ handler to be interrupted by another higher-priority IRQ, use an appropriate gate type in your IDT.

STI right before IRET has no effect: there is a one-instruction delay before it enables interrupts, and during that delay IRET loads the interrupt flag from the stack.

ok, understood, I wasn't sure how to handle this (as you can see in my code, it assumes an interrupt only occurs while a user context is running, so I need to be sure another interrupt doesn't nest, it probably wouldn't cause problems, but you know... I need to get this TSS thing sorted.

-edit- Actually I think, with my manual stack switching, nested interrupts would cause serious issues, as once a supervisor stack pointer had been pushed to the user pointer temp variable, there would be no way back until another task switch... but the CPU would continue to run, that's a General Protection Fault right there... I REALLY need to get this TSS thing sorted. Unless there is a way to check if we are already in supervisor mode, then I could have my IRQs entry code just continue to use the supervisor stack.

Interrupts from user mode automatically switch to the supervisor stack, there is no need to do it yourself. (Interrupts in supervisor mode use the current supervisor stack.)

Since there's no need to switch stacks on every interrupt, you might want to get rid of a PUSHA/POPA pair.

I don't (yet) have the TSS working, so I have to swap stacks manually at the moment... I couldn't face spending days trying to get it working when I want to be making other, more interesting parts work... I'm finding the x86 conceptually difficult to get my head around (I've only been looking this for a week or so...), so bare with me while I catchup!

It looks like you might be disobeying the System V ABI when you call your C function. There's no problem in the code you've posted, but it might cause problems elsewhere.

I tried to follow the OSdev wiki here... https://wiki.osdev.org/Interrupt_Service_Routines
The pushed instruction wasn't liked by the gcc assembler, no idea why.

Octocontrabass · Post by **Octocontrabass** » Wed Sep 23, 2020 6:05 pm

bloodline wrote:Unless there is a way to check if we are already in supervisor mode, then I could have my IRQs entry code just continue to use the supervisor stack.

You don't need to check. The CPU guarantees that your interrupt handler will run in supervisor mode.

bloodline wrote:I don't (yet) have the TSS working, so I have to swap stacks manually at the moment...

The TSS is required for switching from user to supervisor mode. Without it, you have to stay in supervisor mode: if you switch to user mode, you have no way back.

bloodline wrote:I'm finding the x86 conceptually difficult to get my head around (I've only been looking this for a week or so...), so bare with me while I catchup!

Good luck!

bloodline wrote:The pushed instruction wasn't liked by the gcc assembler, no idea why.

The examples on that page primarily use Intel or NASM syntax. In AT&T syntax, the mnemonics are "pushal" and "popal".

bloodline · Post by **bloodline** » Thu Sep 24, 2020 7:52 am

Octocontrabass wrote:
bloodline wrote:Unless there is a way to check if we are already in supervisor mode, then I could have my IRQs entry code just continue to use the supervisor stack.
You don't need to check. The CPU guarantees that your interrupt handler will run in supervisor mode.

It would if I had a working TSS...

bloodline wrote:I don't (yet) have the TSS working, so I have to swap stacks manually at the moment...
The TSS is required for switching from user to supervisor mode. Without it, you have to stay in supervisor mode: if you switch to user mode, you have no way back.

Which is what I'm doing for now, running all my tasks/threads in supervisor mode. That will be sufficient until I get the TSS working, then I'll rewrite the interrupt handling code.

bloodline wrote:I'm finding the x86 conceptually difficult to get my head around (I've only been looking this for a week or so...), so bare with me while I catchup!
Good luck!

Thank you for your help so far (in other threads), I'm already much further ahead than I would have been if it weren't for yours (and other's) help from this site. Especially your insight in to inline asm (hence why I've stuck to separate asm for this bit).

bloodline wrote:The pushed instruction wasn't liked by the gcc assembler, no idea why.
The examples on that page primarily use Intel or NASM syntax. In AT&T syntax, the mnemonics are "pushal" and "popal".

I'm reasonably comfortable with the AT&T syntax... It's a struggle learning the x86 ISA , and having to learn two separate syntaxes... There don't seem to be many good resources for people like me who only plan to write AT&T, but will have to read a lot of intel syntax example code.

PeterX · Post by **PeterX** » Thu Sep 24, 2020 9:30 am

bloodline wrote:I'm reasonably comfortable with the AT&T syntax... It's a struggle learning the x86 ISA , and having to learn two separate syntaxes... There don't seem to be many good resources for people like me who only plan to write AT&T, but will have to read a lot of intel syntax example code.

I can understand that this is a hard thing. I would recommend to give the brain time to adapt to it and later it goes easier. I once could only read Intel Syntax (coming from DOS MASM) but I can now read both syntaxes. It took me some time. (Similar with Lisp and Forth by the way.)

Greetings
Peter

Octocontrabass · Post by **Octocontrabass** » Thu Sep 24, 2020 5:56 pm

bloodline wrote:
Octocontrabass wrote:You don't need to check. The CPU guarantees that your interrupt handler will run in supervisor mode.
It would if I had a working TSS...

No, even without a working TSS, the CPU guarantees your interrupt handler will run in supervisor mode. It does this by crashing if you try to switch from user to supervisor mode without a TSS.

bloodline wrote:Which is what I'm doing for now, running all my tasks/threads in supervisor mode. That will be sufficient until I get the TSS working, then I'll rewrite the interrupt handling code.

But you still don't need to switch stacks on every interrupt. You should only switch supervisor stacks when you switch tasks, and switching tasks should be completely independent from interrupts.

bloodline · Post by **bloodline** » Tue Sep 29, 2020 8:27 am

Ok, I decided to give the TSS a break for a bit.

So I spent a bit of time working on my multitasking code (I also added in my memory management code, so I can allocate/free memory from the extended memory area... I have decided to leave everything below 1mb alone, grub won't load kernel code there, and it might be useful to leave it untouched if I want to switch to real mode? I don't know). I now have it working really well, with a simple priority based Round Robin scheduler, and tasks can "wait()" for a signal so remove themselves from the ready list, only to be scheduled back in when they receive the correct signal (this took a bit of time as, initially, I forgot that I need to the in the supervisor context to save the task state, so I set up a CPU trap specifically to handle the wait function). I'm really please with this as it allowed me to move the command interpreter out of the keyboard interrupt and into it's own task. The keyboard interrupt now just fills a circular buffer, and signal()s to the console task that the keyboard buffer has changed and it needs to run.

But everything is still running in ring0, which means that I'm having to manually save the the task state during a context switch, and probably will cause other problems further down the line... Still have an eye to long mode, and SMP... eventually.

To be clear, my model of multitasking here is probably closer to what many would consider threads, since I have no interest in implementing hardware based memory protection/paging/virtual memory at this time (my memory manager has strategies to limit fragmentation). Supervisor mode is for interrupts, and user mode is for a collection of preemptively scheduled threads, all living in a single address space, but each with its own stack. I might add a "kernel task" which runs in supervisor mode, but such a thing doesn't seem necessary at the moment.

This breathing space allowed me to think more about how I'm setting up the TSS.

I decided to start from scratch:

I have set up a GDT with 6 entries:

entry: 0, base: 0 , limit: 0, access: 0 , granularity: 0
entry: 1, base: 0 , limit: 0xFFFFFFFF, access: 0x9A, granularity: 0xCF
entry: 2, base: 0 , limit: 0xFFFFFFFF, access: 0x92, granularity: 0xCF
entry: 3, base: 0 , limit: 0xFFFFFFFF, access: 0xFA, granularity: 0xCF
entry: 4, base: 0 , limit: 0xFFFFFFFF, access: 0xF2, granularity: 0xCF
entry: 5, base: &tss, limit: sizeof(tss) , access: 0x82, granularity: 0x40

(This table loads into the CPU without error)

where:

tss.ss0 = 0x10
tss.esp0 = kernel_stack_top
tss.cs = 0x1B //my user code segment + 3 (since I want to execute in ring 3)
tss.ss = 0x23 //my user data segment + 3 (since I want to execute in ring 3)
tss.ds = 0x23 //my user data segment + 3 (since I want to execute in ring 3)
tss.es = 0x23 //my user data segment + 3 (since I want to execute in ring 3)
tss.fs = 0x23 //my user data segment + 3 (since I want to execute in ring 3)
tss.gs = 0x23 //my user data segment + 3 (since I want to execute in ring 3)

Now when I execute:

movw $0x2B, %ax
ltr %ax

qemu becomes stuck in a reset loop... even though I'm not switching modes, and interrupts are disabled. The TSS should be doing nothing at this point.

I simply don't understand what I'm doing wrong. I have read the documentation and all of this seems correct. The TSS seems to be a bizarre concept. I surely can't be the only person to have struggled with this? If I ever do figure out what is going wrong here, I intend to write a detailed explanation for others to follow.

sj95126 · Post by **sj95126** » Tue Sep 29, 2020 9:12 am

bloodline wrote:Now when I execute:

movw $0x2B, %ax
ltr %ax

qemu becomes stuck in a reset loop... even though I'm not switching modes, and interrupts are disabled. The TSS should be doing nothing at this point.

If you're getting a machine reset, it's almost certainly because your exception handlers aren't working properly and leading to a triple fault. ltr can trigger a number of different exceptions, and they all need to be caught. You're right that it doesn't "use" the TSS at that point, but it does check some of the fields (type and busy).

This is an area I really wish the machine simulators were more helpful. Yes, you can turn on features to show every interrupt and exception, but they dump a lot of detail that's easy to miss. What I want is simple output at the end of the crash that tells you that it was a #TS that led to a #PF that led to a #GP. I'll dig through the detailed output to figure the specifics if I need to.

bloodline · Post by **bloodline** » Tue Sep 29, 2020 9:24 am

sj95126 wrote:
bloodline wrote:Now when I execute:

movw $0x2B, %ax
ltr %ax

qemu becomes stuck in a reset loop... even though I'm not switching modes, and interrupts are disabled. The TSS should be doing nothing at this point.
If you're getting a machine reset, it's almost certainly because your exception handlers aren't working properly and leading to a triple fault. ltr can trigger a number of different exceptions, and they all need to be caught. You're right that it doesn't "use" the TSS at that point, but it does check some of the fields (type and busy).

This was VERY good advice, I have just moved my interrupt table setup code before the GDT setup (no idea why I didn't think of that right a the beginning), and now I have an error, proper error!! a General Protection Fault!

This is an area I really wish the machine simulators were more helpful. Yes, you can turn on features to show every interrupt and exception, but they dump a lot of detail that's easy to miss. What I want is simple output at the end of the crash that tells you that it was a #TS that led to a #PF that led to a #GP. I'll dig through the detailed output to figure the specifics if I need to.

So it seems some of the field in my TSS are incorrect...? ok time to explore...

I have been using http://copy.sh/v86/debug.html as this gives me more detailed machine state output, but struggles with multiboot kernels so you have a have valid grub based hd image, which is a lot more steps when trying to rapidly debug something.

sj95126 · Post by **sj95126** » Tue Sep 29, 2020 9:29 am

bloodline wrote:So it seems some of the field in my TSS are incorrect...? ok time to explore...

My example used #TS but I didn't mean to suggest that was specifically the issue. (In fact, ltr doesn't trigger #TS). If ltr is resulting in #GP, it doesn't have anything to do with the contents of the TSS itself.

Octocontrabass · Post by **Octocontrabass** » Tue Sep 29, 2020 11:53 am

bloodline wrote:But everything is still running in ring0, which means that I'm having to manually save the the task state during a context switch, and probably will cause other problems further down the line...

No, manually saving the task state is normal. No one uses the TSS for actual task switching outside of a few rare cases where an interrupt might happen in ring 0 without a valid stack (e.g. #DF). The rest of the time, it just holds the ring 0 SS and ESP.

bloodline wrote:entry: 5, base: &tss, limit: sizeof(tss) , access: 0x82, granularity: 0x40

Your access byte indicates a LDT descriptor. For a 32-bit TSS descriptor, use 0x89.

Your granularity byte has an undefined bit set. Just use 0.

bloodline wrote:tss.cs = 0x1B //my user code segment + 3 (since I want to execute in ring 3)
tss.ss = 0x23 //my user data segment + 3 (since I want to execute in ring 3)
tss.ds = 0x23 //my user data segment + 3 (since I want to execute in ring 3)
tss.es = 0x23 //my user data segment + 3 (since I want to execute in ring 3)
tss.fs = 0x23 //my user data segment + 3 (since I want to execute in ring 3)
tss.gs = 0x23 //my user data segment + 3 (since I want to execute in ring 3)

You don't need to set any of these fields unless you're using hardware task switching. Hardware task switching is strongly discouraged outside of the odd cases like #DF.

You do need to set the IOPB field.

bloodline wrote: movw $0x2B, %ax
ltr %ax

Any particular reason you're setting the RPL to 3?

OSDev.org

Struggling with TSS

Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS

Re: Struggling with TSS