On sharing memory between the kernel and userland processes.
Posted: Wed May 10, 2023 10:00 am
I am *trying* to write an exokernel as a learning project. It has been great so far, I've been learning a lot (and restarting a lot, too, but I'm pretty sure it is part of the process). For now, I'm focusing on the x86_64 architecture only.
Being an exokernel, my OS (and by that, I mean "I") wants to let applications choose the mapping between physical memory and virtual memory.
Originally, I had two system calls:
However, having to query all available physical memory seems expansive, and it gets even worst when considering the race between `get_memory` and `map_memory`. Indeed, if the memory we want to map has been acquired by another process between a call to `get_memory` and the call to `map_memory`, we'd have to query all available memory again to find another free segment. That's pretty bad.
So, I thought, maybe it would be better if the kernel just shared the whole list of free segment to all userspace process (by mapping it in their address space at a well known address in the higher half, along with the rest of the kernel) with read-only permissions. This way, they wouldn't have to pay the cost of an extra system call when acquiring memory. They'd just have to read the list directly, and then perform the `map_memory` system call. Note that there's still a race, but it's less of a problem.
Unless...
Unless the kernel starts writing to the list. The list is mapped with read-only permissions, so processes can't really take a lock to look at the list. And even if they could, it seems like a pretty bad idea to allow untrusted processes to acquire global locks such as this one.
So here comes my question:
Is there a way to syncronize kernel writes with userspace processes without the kernel needing to wait for them to finish a read?
If there is an obvious answer, I'd be glad to learn about it. Otherwise, I've thought about it for a bit, and came up with something, which I'm not sure is really sound.
1. When the kernel starts writing to one of the elements of the list, atomically increment an epoch counter. (maybe even compare_exchange for concurrent access with other CPU cores)
2. When it is done, increment it again.
With that, in order to read an element from the list, userspace processes must:
1. Read the epoch counter once atomically
2. Read the value.
3. Read the epoch counter again and verify that it has not changed (there's still a race if it actually has wrapped around, but that seems unlikely enough).
4. If the value has changed, retry with the new value. Otherwise, do whatever with the value.
Would that even be sound? and How do atomic operations interract with multitasking?
PS: This is my first post here. I've been reading a lot since I started trying to write a kernel, and I must say, I would't have made it this far without you all. And I'm still at the very begining...
Being an exokernel, my OS (and by that, I mean "I") wants to let applications choose the mapping between physical memory and virtual memory.
Originally, I had two system calls:
Code: Select all
// Populates `dst` with at most `len` segments of free physical memory.
size_t get_memory(free_segment_t *dst, size_t len);
// Maps physical page at `phys` to virtual page at `virt`. Flags are not important for the point of this post.
size_t map_memory(uintptr_t phys, uintptr_t virt, int flags);
So, I thought, maybe it would be better if the kernel just shared the whole list of free segment to all userspace process (by mapping it in their address space at a well known address in the higher half, along with the rest of the kernel) with read-only permissions. This way, they wouldn't have to pay the cost of an extra system call when acquiring memory. They'd just have to read the list directly, and then perform the `map_memory` system call. Note that there's still a race, but it's less of a problem.
Unless...
Unless the kernel starts writing to the list. The list is mapped with read-only permissions, so processes can't really take a lock to look at the list. And even if they could, it seems like a pretty bad idea to allow untrusted processes to acquire global locks such as this one.
So here comes my question:
Is there a way to syncronize kernel writes with userspace processes without the kernel needing to wait for them to finish a read?
If there is an obvious answer, I'd be glad to learn about it. Otherwise, I've thought about it for a bit, and came up with something, which I'm not sure is really sound.
1. When the kernel starts writing to one of the elements of the list, atomically increment an epoch counter. (maybe even compare_exchange for concurrent access with other CPU cores)
2. When it is done, increment it again.
With that, in order to read an element from the list, userspace processes must:
1. Read the epoch counter once atomically
2. Read the value.
3. Read the epoch counter again and verify that it has not changed (there's still a race if it actually has wrapped around, but that seems unlikely enough).
4. If the value has changed, retry with the new value. Otherwise, do whatever with the value.
Would that even be sound? and How do atomic operations interract with multitasking?
PS: This is my first post here. I've been reading a lot since I started trying to write a kernel, and I must say, I would't have made it this far without you all. And I'm still at the very begining...