Hi,
Craze Frog wrote:Every address from user-space will be a linear address. This includes any pointers used for kernel API function arguments, the user space ESP and exception handlers (e.g. CR2 in the page fault handler).
I don't think I will need to pass strings or structures larger than what can fit in the registers to the kernel. System calls like "open" will be handled by a userspace server. The kernel only manages address spaces, threads and hardware access permissions. So I don't need to receive pointers or stack pointers.
Some ideas...
For my OS, each process has a name and each thread has a name, and software can ask the kernel for details of all running running processes/threads (e.g. a list including name, used CPU time, memory usage, etc). In this case the thread tells the kernel the linear address and size of a buffer and the kernel fills it with information. This also means you need to supply the name of a thread when it's spawned, and there's kernel functions to change a process' name or a thread's name. Spawning a new process is a little different (you need to supply string of command line arguments instead).
When a thread crashes, my OS searched linear memory (the process' code) looking for the first debugging marker before EIP, so that the "blue screen of death" can say which file the bug is in. One version even disassembled the faulty instruction if it could, and displayed the top 32 dwords on the stack (at the linear address from the thread's ESP).
For me, the general protection fault handler looks at EIP to see which instruction (if any) caused the problem, and if it's an I/O port instruction it does permission checks and emulates the instruction. If you support virtual80x86 then you'll probably need to do some instruction emulation too. The same applies to the invalid opcode handler (emulate the instruction and pretend the CPU supports it instead of crashing). Instructions can be multiple bytes split across different pages, and aren't much different to strings.
For the next version of my OS, I'm planning a kernel API extension where a thread can ask the kernel to do a list of kernel API functions. The idea is that the CPU can go from CPL=3 to CPL=0, do any number of kernel functions, and then return from CPL=0 to CPL=3 (it reduces the overhead of many CPL=3 -> CPL=0 -> CPL=3 switches). To use it, the thread tells the kernel where the address of the list is and the kernel steps through each entry in the list.
Note: if you're trashing TLB entries every time the kernel is called, then you'll want to consider using something similar!
I also strongly recommend that application never use the CPUID instruction and instead ask the kernel for the information, so that the OS can correct the buggy crud that CPUID returns (if CPUID is supported) and give applications reliable and consistant information. This means the thread tells the kernel the linear address of a buffer and the kernel writes about 150 bytes of data into it.
I'm supporting up to 255 CPUs. This means that CPU affinity masks need to be 256 bits wide, and the scheduler functions used by threads to get and set CPU affinity need to use structures in linear memory (it's too large for four 32-bit registers). Note: I use EAX for the kernel API function number and returned status, and ECX and EDX are trashed by SYSENTER/SYSEXIT, which leaves EBX, ESI, EDI and EBP for function arguments and returned data.
For any/all of these things you'd be converting between linear addresses and physical addresses. You might not do some of these things, but you might do other things I didn't think of...
[...] then you can find the page table entry from a linear address with:
I'm don't exactly understand what your code is doing there. This is what I'm thinking (gotta be something wrong with it):
Let's say I already have the physical address of the page directory (it needs to be stored
somewhere, and since the directory should never be accessed from userspace, it could just as well stored as the physical address), then I'd just do something like
Code: Select all
ppageTable = ppageDirectory[linear_address/0x1000/1024];
(is it necessary to zero the last bits by a bitmask?)[/quote]
Using arrays (instead of pointers), it'd look more like:
Code: Select all
pageDirectoryEntry = ppageDirectory[linear_address/0x1000/1024];
ppageTable = ppageDirectoryEntry & 0xFFFFF000; // Remove read/write, present, busy, accessed, etc flags
ppageTableEntry = &ppageTable[(linear_address/0x1000) & 0x00000FFC];
Craze Frog wrote:But still, will it cause a performance problem?
The advantage is that each process can use almost 4 GB of address space, as you wouldn't need to use half (for e.g.) of each address space for the kernel. This sounds good, but isn't that useful in practice - most processes that need more than 2 GB probably need more than 4 GB anyway.
Note: I say "almost" here because you need to have the GDT, a TSS, the IDT, interrupt handling stubs and kernel API entry and exit points mapped into all address spaces.
The main disadvantage (for performance) is that you trash the TLB every time the kernel does anything. The TLB misses alone could add up to 1000 cycles of overhead when a kernel function is called or an IRQ occurs (unless you happen to switch address spaces, in which case the TLB needs to be flushed anyway).
There are other disadvantages though. If you ever support PAE (or PSE36) then the kernel won't be able to access any RAM above 4 GB; it'll be harder to port the OS to long mode; you won't be able to do some NUMA optimization tricks; you couldn't use "allocate on demand" to simplify the kernel's memory allocation; you won't get page faults when the kernel has bugs (harder to debug the kernel)...
Cheers,
Brendan