Clarify how x86 interrupts work
-
- Member
- Posts: 232
- Joined: Mon Jul 25, 2016 6:54 pm
- Location: Adelaide, Australia
Clarify how x86 interrupts work
I am looking at interrupts as a syscall mechanism and I think I understand how it works but there was one thing that stumped me for a bit and I want to clarify how it works:
I am in protected mode, ring 3. I do a software interrupt (int 0x80 for example). My IDT has a interrupt gate installed in entry 0x80. The CPU invokes my handler.
The point that gave my difficulty is this, when the code in my handler begins to execute, SS and ESP are taken from the values of SS0 and ESP0 in the TSS currently pointed to by the Task Register, correct? So while a hardware task switch has not occurred, by pointing the Task Register to a TSS pointing to the kernel stack and having the selector field in the IDT entry pointing to the kernel code segment we effectively switch from ring 3 using the user stack to ring 0 (using the kernel stack). Is this right or am I barking up the wrong tree?
This seems to be how other OSs do it but the Intel manuals make it seem like this isn't the way the processor is designed to work.
Thanks!
I am in protected mode, ring 3. I do a software interrupt (int 0x80 for example). My IDT has a interrupt gate installed in entry 0x80. The CPU invokes my handler.
The point that gave my difficulty is this, when the code in my handler begins to execute, SS and ESP are taken from the values of SS0 and ESP0 in the TSS currently pointed to by the Task Register, correct? So while a hardware task switch has not occurred, by pointing the Task Register to a TSS pointing to the kernel stack and having the selector field in the IDT entry pointing to the kernel code segment we effectively switch from ring 3 using the user stack to ring 0 (using the kernel stack). Is this right or am I barking up the wrong tree?
This seems to be how other OSs do it but the Intel manuals make it seem like this isn't the way the processor is designed to work.
Thanks!
Re: Clarify how x86 interrupts work
Hi,
In general, the possibilities (from slowest to fastest) are:
Cheers,
Brendan
That is right.StudlyCaps wrote:I am looking at interrupts as a syscall mechanism and I think I understand how it works but there was one thing that stumped me for a bit and I want to clarify how it works:
I am in protected mode, ring 3. I do a software interrupt (int 0x80 for example). My IDT has a interrupt gate installed in entry 0x80. The CPU invokes my handler.
The point that gave my difficulty is this, when the code in my handler begins to execute, SS and ESP are taken from the values of SS0 and ESP0 in the TSS currently pointed to by the Task Register, correct? So while a hardware task switch has not occurred, by pointing the Task Register to a TSS pointing to the kernel stack and having the selector field in the IDT entry pointing to the kernel code segment we effectively switch from ring 3 using the user stack to ring 0 (using the kernel stack). Is this right or am I barking up the wrong tree?
The Intel manuals only really describe how the CPU reacts to various things, and not how an OS could/should use them. For how an OS could/should do things; typically you'd use a register (e.g. EAX) to select a function, and inside the kernel you'd have a table of function pointers (e.g. "call [myTable +eax*4]" to call whichever kernel API function was requested). This means that it's relatively easy to support multiple different ways of calling the kernel API, where each different way does the same "call [myTable +eax*4]" to call whichever kernel API function was requested.StudlyCaps wrote:This seems to be how other OSs do it but the Intel manuals make it seem like this isn't the way the processor is designed to work.
In general, the possibilities (from slowest to fastest) are:
- An exception (e.g. "int3" to trigger breakpoint exception, "ud2" to trigger undefined opcode exception, etc). Relatively abnormal option, that can be messy and slow; but can also cost 1 byte for caller to use (potentially unbeatable for code size).
- Software interrupt. Slowest "normal" option (due to touching both IDT and GDT, and all the protection checks involved). Costs 2 bytes for the caller to use.
- Call gate. Slightly faster that a software interrupt; but costs 7 bytes for the caller to use (worst for code size)
- SYSENTER and SYSCALL. Fastest options and also 2 bytes for caller to use (so good for code size too). Not supported on some CPUs (including modern CPUs).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Clarify how x86 interrupts work
Brendan, have you thought about using a page fault?
Please note that a "mov eax, [mem]" instruction is shorter than e.g. a "mov ecx, [mem]" instruction. The real trick is to put as much information as possible into the address value that will trap. A total of five bytes and you select the function and make the system call.
Code: Select all
mov eax, [10] ; make a system call 10
mov eax, [11] ; make a system call 11
Re: Clarify how x86 interrupts work
...continued.
Opcode range for this purpose could be from 0xA0 to 0xA3. If user space is not from 0xFF0000000 to 0xFFFFFFFF, it would be possible to do:
Nice "atomic" syscall instructions? For pointers, validity should be checked before making the system call but that is not strictly necessary (user space programs may do anything they like anyway). Now the "pointer = 0xFF123456" could mean "pointer = 0x12345600" but that could be a documented feature?
Opcode range for this purpose could be from 0xA0 to 0xA3. If user space is not from 0xFF0000000 to 0xFFFFFFFF, it would be possible to do:
Code: Select all
db 0xA0, func, param1, param2, 0xFF ; SyscallType0(uint8_t func, uint8_t param1, uint8_t param2)
db 0xA1, func, param1, param2, 0xFF ; SyscallType1(uint8_t func, uint8_t param1, uint8_t param2)
db 0xA2, (pointer >> 8 | 0xFF000000) ; SyscallType2(void *pointer)
db 0xA3, (pointer >> 8 | 0xFF000000) ; SyscallType3(void *pointer)
-
- Member
- Posts: 232
- Joined: Mon Jul 25, 2016 6:54 pm
- Location: Adelaide, Australia
Re: Clarify how x86 interrupts work
Brendan, thanks a lot for the detailed reply. I feel much more confident now.
-
- Member
- Posts: 1146
- Joined: Sat Mar 01, 2014 2:59 pm
Re: Clarify how x86 interrupts work
Too much magic. A "call", "int", or other similar instruction looks like a function call to someone reading or writing the code. "mov eax, [10]" just looks like an invalid move, which is presumably a bug rather than a system call.Antti wrote:Brendan, have you thought about using a page fault?
Please note that a "mov eax, [mem]" instruction is shorter than e.g. a "mov ecx, [mem]" instruction. The real trick is to put as much information as possible into the address value that will trap. A total of five bytes and you select the function and make the system call.Code: Select all
mov eax, [10] ; make a system call 10 mov eax, [11] ; make a system call 11
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.
Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Re: Clarify how x86 interrupts work
While I don't have a strong opinion on the use of paging as a form of syscall, for the "magic":onlyonemac wrote:Too much magic. A "call", "int", or other similar instruction looks like a function call to someone reading or writing the code. "mov eax, [10]" just looks like an invalid move, which is presumably a bug rather than a system call.
- Anyone using assembly deserves it
- Anyone not commenting their code deserves it
Just in case you get the wrong impression, I'm not against assembly, but unless there's a good reason to use it, don't.
For the "normal" case, the above "page fault magic syscall" would be hidden in a C level syscall function so it shouldn't bother anyone. Just not sure if I see a benefit compared to using a SYSCALL. Antti, was there some benefit other than doing it in an unconventional way?
Brendan, were there some specific "modern" CPU's without SYSCALL/SYSENTER that you were referring to?
Re: Clarify how x86 interrupts work
As least you would have to heavily benchmark that. I suspect that page faults are much slower than software interrupts. The CPU will try to issue a page walk on each such mov as non-present pages are not cached in the TLB. The CPU will also speculatively assume mov always succeeds; violation of that assumption might incur additional overheads.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Re: Clarify how x86 interrupts work
This depends on your assembler and mnemonics it uses. A simple macro could hide the actual byte sequence.onlyonemac wrote:A "call", "int", or other similar instruction looks like a function call to someone reading or writing the code.
I did not know when I wrote the post but I do know something now after thinking it. That is partially the point of bringing up some "crazy" ideas. If we optimally pack the information into those 5 bytes, our "atomic" system calls could have some interesting features. Having a one-byte-shorter instruction than, e.g. "mov eax, 123" & "int3", is an advantage in itself. Perhaps that is not enough so let's innovate.LtG wrote:Antti, was there some benefit other than doing it in an unconventional way?
Brendan has mentioned his "batch processing" of system calls. What if we think about the idea of not returning to ring 3 immediately? If other system calls immediately follow, why don't we just interpret them? Writing a simple interpreter is easy if every system calls start with a byte 0xA0, 0xA1, 0xA2, or 0xA3 and has a constant length.
Re: Clarify how x86 interrupts work
Hi,
Cheers,
Brendan
That's clever, and for code size it would beat something like "int3" (use one less byte). However, for performance it'd be hideous, partly because most CPUs don't create TLB entries for "not present" pages (so you'd start with TLB miss costs), and partly because you'd need multiple checks to determine if it's a system call or a bug (e.g. check if CR2 is in a certain range, check if the access was a read, check if CPU was running at CPL=3, check if EIP is sane, check if EIP points to a special instruction). The other problems are that the page fault handler typically ends up being relatively complex/messy (to handle various virtual memory management tricks) and I'd prefer not to add more complexity to that; and that there are potentially valid reasons for wanting that page to be valid (e.g. maybe some kind of virtual machine where you want to do "host virtual address = guest physical address" for performance).Antti wrote:Brendan, have you thought about using a page fault?
You're right - if you're not deliberately trying to make it harder to understand a disassembly (and if "doesn't look like a function call" isn't a tiny advantage); then something like "call 10" would look more like a function call than a software interrupt does, and you could even trick compilers into thinking it's a normal "extern" function (if the rest of the ABI matches). However, this would still have all of the same (performance and complexity) problems.onlyonemac wrote:Too much magic. A "call", "int", or other similar instruction looks like a function call to someone reading or writing the code. "mov eax, [10]" just looks like an invalid move, which is presumably a bug rather than a system call.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Clarify how x86 interrupts work
I like outside the box ideas, though not all have immediate use =)Antti wrote:That is partially the point of bringing up some "crazy" ideas.
Is it really any more "atomic" than your "normal":Antti wrote: If we optimally pack the information into those 5 bytes, our "atomic" system calls could have some interesting features. Having a one-byte-shorter instruction than, e.g. "mov eax, 123" & "int3", is an advantage in itself. Perhaps that is not enough so let's innovate.
Code: Select all
mov al, 0x15; Select syscall function
syscall
I thought Brendan's "batch processing" of syscalls is message based, in which case you could just provide the kernel/syscall handler with a message list, and thus you don't need any "interpreter" and again it's faster and easier..Antti wrote: Brendan has mentioned his "batch processing" of system calls. What if we think about the idea of not returning to ring 3 immediately? If other system calls immediately follow, why don't we just interpret them? Writing a simple interpreter is easy if every system calls start with a byte 0xA0, 0xA1, 0xA2, or 0xA3 and has a constant length.
Just trying to figure out if there are any use cases for using VAS (virtual address space) for syscalls, possibly saving one byte with significant runtime overhead just doesn't seem like it's worth it..
As for the TLB concerns the others raised, can't those be avoided by marking the page(s) present (so cached in TLB) and also marking it ring0 or something else to cause a #PF?
Re: Clarify how x86 interrupts work
Hi,
For "batch system calls" the original idea was for a thread to create a list of entries (with one entry per kernel function) and call a "do this list of function calls" kernel API function; where the kernel does something like "for each entry in the list { load input values from entry into registers; call [functionTable + eax*4]; store output values from registers into list; }". In this way, the kernel API functions themselves don't care if the calling thread used batch system calls (in the same way that they don't care if the calling thread used software interrupt or call gate or ....).
Essentially, it didn't use messages and only used "lists of entries in memory somewhere". However, dealing with "pointers to things in user-space" is messy (partly because kernel has to do sanity checks, and other threads in the process that are running on other CPUs can modify the data after the kernel has done sanity checks) and this gets a little more messy for batch system calls because one system call might do something that modifies the list (e.g. free the page that was used to stores the list itself). For my micro-kernels I have special "message buffer" areas that are a lot less messy (they are "per thread" where one thread can't access another thread's message buffer area; and can never be part of a memory mapped file or shared memory area or ...). For this reason, even though the "batch system call" didn't really have anything to do with messaging, I probably did say something about using my "message buffer" area (and probably confused everyone ).
Cheers,
Brendan
Above I mentioned the idea of supporting multiple different "kernel system call" methods (software interrupt, call gate, SYSCALL/SYSENTER, etc) where all of them end up doing something like "call [functionTable + eax*4]" and where the kernel's functions themselves don't care which method was used by the calling thread.LtG wrote:I thought Brendan's "batch processing" of syscalls is message based, in which case you could just provide the kernel/syscall handler with a message list, and thus you don't need any "interpreter" and again it's faster and easier..Antti wrote:Brendan has mentioned his "batch processing" of system calls. What if we think about the idea of not returning to ring 3 immediately? If other system calls immediately follow, why don't we just interpret them? Writing a simple interpreter is easy if every system calls start with a byte 0xA0, 0xA1, 0xA2, or 0xA3 and has a constant length.
For "batch system calls" the original idea was for a thread to create a list of entries (with one entry per kernel function) and call a "do this list of function calls" kernel API function; where the kernel does something like "for each entry in the list { load input values from entry into registers; call [functionTable + eax*4]; store output values from registers into list; }". In this way, the kernel API functions themselves don't care if the calling thread used batch system calls (in the same way that they don't care if the calling thread used software interrupt or call gate or ....).
Essentially, it didn't use messages and only used "lists of entries in memory somewhere". However, dealing with "pointers to things in user-space" is messy (partly because kernel has to do sanity checks, and other threads in the process that are running on other CPUs can modify the data after the kernel has done sanity checks) and this gets a little more messy for batch system calls because one system call might do something that modifies the list (e.g. free the page that was used to stores the list itself). For my micro-kernels I have special "message buffer" areas that are a lot less messy (they are "per thread" where one thread can't access another thread's message buffer area; and can never be part of a memory mapped file or shared memory area or ...). For this reason, even though the "batch system call" didn't really have anything to do with messaging, I probably did say something about using my "message buffer" area (and probably confused everyone ).
Yes - you could make the area "present, supervisor only" to avoid the TLB miss. For this case, if the kernel is buggy (e.g. dereferences a null pointer) you won't get a page fault.LtG wrote:As for the TLB concerns the others raised, can't those be avoided by marking the page(s) present (so cached in TLB) and also marking it ring0 or something else to cause a #PF?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Clarify how x86 interrupts work
Thank you for the discussion. The idea was not very good but it helped me to think about the details that may reveal something worthwhile. For example, what could be done with an Alignment Check (#AC) exception? It is always ring 3 only so kernel space could not trigger it.
If an #AC comes before a page fault, this could be usable. The payload for
is excellent.
If an #AC comes before a page fault, this could be usable. The payload for
Code: Select all
mov eax, [packedData | 1]
Re: Clarify how x86 interrupts work
I doubt you'll find anything with better performance than SYSCALL, but out of curiosity, is there some _real_ use for the #AC? For what purpose is it intended?Antti wrote:For example, what could be done with an Alignment Check (#AC) exception? It is always ring 3 only so kernel space could not trigger it.
I guess you could enable it before profiling some program to see if there's mis-aligned access and then fix the code for better performance but doing so should be relatively simple for a profiler to do by analyzing the code itself.
Antti, have you checked the cost of exceptions? I'm not sure how high that is on modern CPUs, once you have that figured out it should help limit on what you might want to use the exceptions for.
Alternatively you could approach it from the other end, what needs are there, beyond SYSCALL..? For me everything is relatively simple since I'm planning a micro-kernel and purely messages (for now at least) so SYSCALL is the only one I need and AFAIK it has the best performance.
Though curious if you can find something useful with #PF, #AC and friends.
-
- Member
- Posts: 1146
- Joined: Sat Mar 01, 2014 2:59 pm
Re: Clarify how x86 interrupts work
What about if you're looking at a disassembly in a debugger, and you see an invalid "mov" and think "that must be the problem"? Unless you know that it's a syscall, and keep this in mind whenever you're reading disassembles, you're asking for confusion.LtG wrote:While I don't have a strong opinion on the use of paging as a form of syscall, for the "magic":
- Anyone using assembly deserves it
- Anyone not commenting their code deserves it
Just in case you get the wrong impression, I'm not against assembly, but unless there's a good reason to use it, don't.
For the "normal" case, the above "page fault magic syscall" would be hidden in a C level syscall function so it shouldn't bother anyone. Just not sure if I see a benefit compared to using a SYSCALL. Antti, was there some benefit other than doing it in an unconventional way?
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.
Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing