Adress space switch in interrupt handlers

Freanan · Post by **Freanan** » Fri Aug 19, 2005 4:16 am

I am just wondering about Interrupts in a multitasking environment:

When i have different processes running each in their own adress space, with the kernel (or driver handling the interrupt) in another adress space, how does the kernel/driver find it's data when an interrupt is called and we still are inside the user's adress space?
I suppose cr3 has to be reloaded with the adress of the page directory belonging to the kernel/driver.
But how will it know the right virtual adress of it's own page directory when it is still in the user's adress space?

Is this the reason why the kernel is often mapped into the adress spaces of the users?
And what about user-level drivers which will also search for their own cr3-values?

0Scoder · Post by **0Scoder** » Fri Aug 19, 2005 5:44 am

If you don't have your kernel mapped in to each proccesses address space (and I presume you don't), you won't be able to perform a context switch (switching address spaces) with a normal interupt gate.

However there are actually three types of gate that can be used in the idt: An interrupt gate, a trap gate and a task gate. The task gate is one you might like to look into. Instead of taking an address in the current address space of where to jump to, they instead takes a tss as a parameter (task state segment - a type of segment that describes a task, i.e. the address space, eip and register values). If you want a better description of how these work, try the 3rd Intel system programmers manual.

This method is great if you are using tss for task switching, but if you aren't it might not be as helpful. The only other thing I can think of appart from mapping the whole kernel in, is prehaps just mapping in the bits to switch address space.

Hope this helps!

AR · Post by AR » Fri Aug 19, 2005 5:58 am

If you're using User Space drivers then you will need to effectively preform a task switch (reload CR3 with the page directory of the driver) and resume the task with the details of the interrupt (probably by just sending it a message).

User space programs do not usually need to know where, or have access to the page directory, the program is effectively inside a virtual machine, providing the page directory breaks the "shouldn't know that it is actually in a VM" rule. Drivers do not need to manipulate the directory itself, except perhaps a CPU driver (I've seen these in Windows but I haven't got a clue what they're for).

The kernel is always mapped in the address space because when an interrupt occurs, it switches to the ring and location given by the IDT entry, if the kernel isn't present then it will simply triple fault since it can't access the handler (or the GDT or IDT to begin with), it is also faster that way since it doesn't require a cache flush just to perform a minor operation in the kernel.

Freanan · Post by **Freanan** » Fri Aug 19, 2005 6:42 am

Okay, so i would have to map the kernel into all user adress spaces in order to work with isrs inside the kernel.

And when i have user level drivers i could, for example, first call a standard routine inside the kernel and then task-switch to the user-level process and call the user level handler that it registerd with the kernel?

Brendan · Post by **Brendan** » Fri Aug 19, 2005 9:03 am

Hi,

Freanan wrote:And when i have user level drivers i could, for example, first call a standard routine inside the kernel and then task-switch to the user-level process and call the user level handler that it registerd with the kernel?

That would work, but what if the device interrupts it's own device driver? In this case you'd have to have code to handle re-entrancy problems within every device driver.

To make it easier to write device drivers you could make the kernel check if the device driver is already running and put the IRQ in a queue of some sort, so that the device driver doesn't receive the IRQ until it's ready to receive it. That way the device drivers won't need to care about re-entrancy problems.

Most OSs have some sort of "Inter-process Communication" mechanism (or IPC) that does a very similar thing. For example, the kernel could just send a message to the device driver (Windows style IPC), or send some data through a pipe (Unix style IPC).

Cheers,

Brendan

Kemp · Post by **Kemp** » Fri Aug 19, 2005 10:53 am

This isn't quite relevant to the topic, but I figured it'd be neater to ask here than to start a whole new thread just for a quick question.

When you're talking about user-level drivers, I assume you mean they run in ring 3, what are the advantages of this over having them run in ring 0 with the kernel?

AR · Post by AR » Fri Aug 19, 2005 11:07 am

The driver can run like a normal program in ring3 (no need to feel your way around the kernel), the main reason though is that if a driver crashes then you simply terminate the driver and restart it instead of BSODing, a malicious driver also cannot take control as ring3 tasks can't change the GDT or IDT.

The disadvantage is it harms the performance in a rather bad way since it flushes the cache every time an interrupt occurs, so everytime you press a key it interrupts the program, switches programs then switches back (or to the GUI server to redirect the input). This can be mostly removed by using small address spaces so that all the drivers can be present but inaccessible to everyone but the kernel, this however will require segmentation which isn't portable.

Kemp · Post by **Kemp** » Fri Aug 19, 2005 11:25 am

Yeah, that was what I was thinking. My plan currently will be to essentially have two-part drivers. There will be the actual driver (ring 0) that will be responsible for actually making the hardware work and will not be required to be re-entrant except in certain special cases. Then there will be a system service (ring 3) that will receive IPC messages from applications that want to use the hardware and pass them on to the driver. This means that the requests will naturally form a queue with no extra processing needed to achieve it (or the service can pull them off the queue as they arrive to sort them differently). This also means that if required, the kernel can put the service on hold, issue a device reset command via the actual driver and then take over control by directly sending requests to the driver (or forcefully clear the service's message queue and set itself as being the only one allowed to send requests to it). Of course, this will probably be changed as I work on other things leading up to being able to support this system.

Brendan · Post by **Brendan** » Fri Aug 19, 2005 12:35 pm

Hi,

Another advantage (apart from security/protection) is that it can simplify memory management and the rest of the kernel, because everything is either a CPL=3 process or the kernel (no need to treat device drivers differently).

Giving each device driver it's own address space also means that you don't need to change all address spaces when the device driver allocates or frees memory (which involves IPI's for multi-CPU computers), and can be relatively slow in any case.

The performance impact of TLB flushes depends on what the IRQ was and a lot of other things. In general, if the IRQ means that data was received then the TLB's contents are often flushed immediately after the IRQ anyway, as part of delivering the data to where it needs to go.

For example, if a keyboard IRQ is received, then you'd want to switch from the interrupted task to the task that is waiting for the keypress. The same happens for mouse data and received network data. For floppy drives, hard drives, CD, etc it's likely that you'd want to switch from the interrupted task to the file system's task, regardless of whether data was read or written.

There are of course IRQs where the TLB flushing is a problem - storage device operations that don't involve reading or writing data (seek, recalibrate), the sound card (except for recording), "output buffer empty" or "data sent" IRQs for network or serial port, etc.

Of course all of this assumes that device drivers are running at CPL=3 like a normal process, which isn't actually required. You could map all device drivers into all address spaces and still run them at CPL=3 (with IRQs delivered by IPC). In this case no TLB flushing would be required. The problem here is that you lose most of the advantages. You'd could use segmentation to prevent applications from messing with the device drivers (and to protect one device driver from other device drivers). If you don't use different segments for each device driver then you'd probably need position independant code instead (just like a running device drivers at CPL=0).

Cheers,

Brendan

Freanan · Post by **Freanan** » Mon Aug 22, 2005 1:34 am

What is the minimum required data that has to be mapped into each process adress space?

Is there anything apart from
Isrs
Kernel pagedir
Kernel pagetables
?

(I am really speaking about minimal requirements. So it might be usefull to have the task-structs and the gdt (with single tss for sw-multitasking) too, so that the scheduler would not neccessarily have to do an adress-space-switch into the kernel to do its work, but it is not required (or at least i think so))

distantvoices · Post by **distantvoices** » Mon Aug 22, 2005 2:05 am

For heavens sake!

Please, lad, I don't know what you think a scheduler is, but I wouldna consider something which merely juggles tasks around cpu's something *highly* sophisticated and complicated!

The scheduling/task picking mechanism is something straight forward. One of the few things one can consider straight forward, because once it runs - it runs.

I for one put the scheduling/task picking stuff into the kernel. YOu juggle TCB's (task Control Blocks), not Threads, Not Processes. Just Task Control Blocks. These bear crucial info about a task: eip, kernel stack, user stack, page dir, address space info (which tcb's belonging to one process share), priority, time slice, nice value, message box.

You can of course put the scheduler/task picking stuff into its own process/task, but keep in mind that it needs to carry out round robin scheduling and task picking in a smooth and elegant way - most preferrably without cr3 reloading jsut for accessing scheduler or task picker.

Then, keep in mind that task switches may occur out of any reason: messages sent/received, timers expired, sleepers woken up, io requested (with or without messaging involved), other IPC occurences ... and at the very least - round robin time slice expiration. (that's Tim Robinsons Wisdom

)

I have the slight feeling that my typing abilities deteriorate more and more.Need to be more careful. *gg*

STay safe

Freanan · Post by **Freanan** » Mon Aug 22, 2005 2:24 am

That's all rather clear to me

.
I was just wondering what's the minimum stuff that needs to be present in each adress space.

But as it seems, now matter how i turn it, mapping all the kernel into the user spaces would be best.
I just dislike the idea of having my kernel initialisation code in each adress space, because it is used only one time during bootup to install all the other stuff...
So probably it would be preffereable to map all data belonging to the kernel as well as all isrs (and other functions that might be accessed for example via call gates), leaving out all the initialisation stuff.

I am a bit confused about all this.
Actually i am still inside coding memory management, but as it seems i should already think about multitasking when setting up memory management (especially the layout of memory, ie where to put what so that the needed things will be easily accessible once there will be proceses).

distantvoices · Post by **distantvoices** » Mon Aug 22, 2005 2:29 am

I think, Linux frees some of the initialization stuff, but that's to be confirmed.

Don't be confused. Take a sheet of paper and draw the thing as you imagine it. It gets clear then.

Freanan · Post by **Freanan** » Mon Aug 22, 2005 3:43 am

I would like to disentangle all globally needed kernel data as well as the globally needed code parts from the rest of the kernel (ie init stuff) and put it (4k-aligned) "behind" the init-stuff.
This would enable me to free the init-part and map all the global stuff to some higher virtual memory, so that userspace can start at 0x0 with kernel-data and heap following it.

Setting data apart is no problem, as this should be already done by the linker script, but i am quite unsure how to do the same with parts of the code.
+ When mapping things to somewhere else i would have to add offsets to all pointers or something like that.

Freanan · Post by **Freanan** » Tue Aug 23, 2005 11:08 am

Okay, i somehow "succumbed" to the idea of mapping the whole kernel now and was trying to change my present kernel into a higher-half one by using the faq's higher-half barebones code...
Still, my kernel seems to hang or to be not loaded at all (bochs hangs displaying parts of the kernel's elf header).
I also tried several varieties of the linker script (with output format specified as elf for example, and without the strange adress-arithmetic in the sections, but the ADRESS(.segname) things like in the barebones), but they all did not work.

Here is my code, maybe you can see something obvious, like a messed up linker script or wrong pointers inside my start.asm, and help me getting the higher half to run.

linker.ld

Code: Select all

ENTRY(start)
phys = 0xC0100000;
SECTIONS
{
  .text phys : AT(phys - 0xC0000000) {
    code = .;
    *(.text)
    *(.rodata*)
    *(.rodata.32*)
    *(.eh_frame*)
    . = ALIGN(4096);
  }
  .data : AT(phys + (data - code - 0xC0000000))
  {
    data = .;
    *(.data)
    . = ALIGN(4096);
  }
  .bss : AT(phys + (bss - code - 0xC0000000))
  {
    bss = .;
    *(.bss)
    . = ALIGN(4096);
  }
  end = .;
}

parts of start.asm:

Code: Select all

[BITS 32]
global start
start:
    jmp stublet


; This part MUST be 4byte aligned, so we solve that issue using 'ALIGN 4'
ALIGN 4
mboot:
    ; Multiboot macros to make a few lines later more readable
    MULTIBOOT_PAGE_ALIGN   equ 1<<0
    MULTIBOOT_MEMORY_INFO   equ 1<<1
    MULTIBOOT_AOUT_KLUDGE   equ 1<<16
    MULTIBOOT_HEADER_MAGIC   equ 0x1BADB002
    MULTIBOOT_HEADER_FLAGS   equ MULTIBOOT_PAGE_ALIGN | MULTIBOOT_MEMORY_INFO | MULTIBOOT_AOUT_KLUDGE
    MULTIBOOT_CHECKSUM   equ -(MULTIBOOT_HEADER_MAGIC + MULTIBOOT_HEADER_FLAGS)
    EXTERN code, bss, end
    
    ; This is the GRUB Multiboot header. A boot signature
    dd MULTIBOOT_HEADER_MAGIC
    dd MULTIBOOT_HEADER_FLAGS
    dd MULTIBOOT_CHECKSUM

    ;Data for paging and the higher half remapping
    KERNEL_VIRTUAL_BASE equ 0xc0000000                     ;3GB
    KERNEL_PAGE_NUMBER  equ (KERNEL_VIRTUAL_BASE >> 22)    ;Index of page dir entry for the higher half


; This is an endless loop here. Make a note of this: Later on, we
; will insert an 'extern _main', followed by 'call _main', right
; before the 'jmp $'.
stublet:
    
    ;install pagedir
    mov ecx,(page_dir - KERNEL_VIRTUAL_BASE)
    mov cr3,ecx
    
    ;set pse bit
    mov ecx,cr4
    or ecx,0x00000010
    mov cr4,ecx

    ;enable paging
    mov ecx,cr0
    or ecx,0x80000000
    mov cr0,ecx
    
    ;start fetching instructions in kernel space
    lea ecx,[higherhalf]
    jmp ecx
    higherhalf:
    
    ;clear the identity-mapping
    mov dword [page_dir],0
    invlpg [0]
    
    ; This points the stack to our new stack area
    mov esp, _sys_stack    
    
    ;do the jump to main and pass multiboot info as argument
    extern main
    push ebx
    call main
    jmp $

(...)
SECTION .bss
    resb 8192               ; This reserves 8KBytes of memory here
_sys_stack:

global page_table
global stack_table
ALIGN 4096
page_table:
    resd 1024
stack_table:
    resd 1024


;constant data...
SECTION .data
ALIGN 4096
global page_dir
;hardcoded pagedir with two entrys for two 4mb pages
page_dir:
    dd 0x00000083
    times (KERNEL_PAGE_NUMBER-1) dd 0
    dd 0x00000083
    times (1024-KERNEL_PAGE_NUMBER-1) dd 0

OSDev.org

Adress space switch in interrupt handlers

Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers

Re:Adress space switch in interrupt handlers