Hello,
I am new to OS kernel. I am wondering if there any applications of some nice programming paradigms like Dynamic programming, approximation, randomized, flow algorithms etc., in OS kernel such as Linux or any other kernel. I looked at some parts of kernel (very few) the most complex I found were RCU data structures and its whole procastination freeing system and combining trees. I didn't look at ext4 filesystem and many other subsystems. So, I was wondering experts in this wonderful forum might have some thoughts on it. I am skeptical about kernel having many complex algorithms due its low latency demands typically in in microseconds and due to less input size in most cases unlike once we get in Competitive programming.
So it would be wonderful if you share some nice programming paradigms that you came across in OS kernel!!
Thanks in advance
NIce Programming paradigms in OS Kernel [SOFT QUESTION]
Re: NIce Programming paradigms in OS Kernel [SOFT QUESTION]
A non-exhaustive list of some things that come to mind:
- For asynchronous interfaces: Allocate memory early. If a syscall performs an asynchronous operation, you don't want it to asynchronously fail with OOM. Either fail right away with OOM or complete asynchronously with success (or I/O failure). In other words, perform allocation in syscalls but not in I/O routines.
- Intrusive data structures. This is often a trade off of performance in favor of robustness. If your user-space program fails to allocate a vector, you just buy more memory. In the kernel, you cannot just crash the system, so you better make sure you do not need additional memory in critical kernel paths.
- O(1) worst-case data structures. You will find many more linked lists and similar data structures in kernels than in user-space applications. This is a trade off of average case performance in favor of worst-case performance. If your user-space application needs 50 us to rehash a hash table, who cares; you know that hash tables are fast on average. If your IRQ handler does that, the whole system hangs for 50 us (e.g. cannot accept higher priority IRQs from real-time devices).
- Resource limits everywhere. You want to limit every resource that is accessible from user-space, no matter how small it is. If your kernel allows an unlimited number of file descriptors, threads, in-flight I/O operations, IPC messages, bytes buffered in pipes, sockets, memory mappings, in-flight TLB shootdown messages or whatever else, that is a potential DoS vector.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Re: NIce Programming paradigms in OS Kernel [SOFT QUESTION]
Hi Korona, those are some nice observations.
Constant time worst case data structures seems like an absolute must in real-time operating systems, but are they worth sacrificing a better average time performance in a general purpose operating system. Since, even if we provide a constant time DS which answers a query in few micro-seconds we still need to cross the wrath of non-preemptive kernels which may more often lead to high latencies. So do you think its worth sacrificing better average time performance for O(1) worst-case performace?O(1) worst-case data structures. You will find many more linked lists and similar data structures in kernels than in user-space applications. This is a trade off of average case performance in favor of worst-case performance. If your user-space application needs 50 us to rehash a hash table, who cares; you know that hash tables are fast on average. If your IRQ handler does that, the whole system hangs for 50 us (e.g. cannot accept higher priority IRQs from real-time devices).
Re: NIce Programming paradigms in OS Kernel [SOFT QUESTION]
Actually most of the effort is aimed towards operating from cache and hitting the TLB. Access locality is more important than O(1), but it is difficult to have O(n) and access locality at the same time.
Also, the OS code is frequently on a critical path, or at least could be in a given workload. This means that performance variations are more difficult to hide and could become perceivable. Also, irregular performance in the kernel could cause sporadic unfairness. High priority code could suffer bigger delay then low priority code. Of course, some systems work on batch jobs of aggregating and filtering data 24/7 and this wont matter to them as much, especially if they are made to use deep I/O queues.
Finally, there are reasons for using complicated structures, which are not related to raw performance. RCU and similar non-blocking synchronization are well behaved in the presence of virtualization. Depending on whether paravirtualization is used, how the virtual cpus are scheduled, etc, it is possible for one vcpu to continue executing while another one is preempted by the hypervisor. In which case blocking synchronization could cause the remaining active cpu to spin for the entirety of its slice or simply surrender it. Non-blocking synchronization doesn't have this problem.
The biggest issue however is that blocking synchronization is deadlock prone. Non-blocking synchronization is comparatively expensive, but it is deadlock free. Not all structures can sustain fine-grained blocking locks, so you have to either choose slower non-blocking locks or coarser grained locking. For example, how would you safely remove element from a list using fine grained blocking locks. Consider two threads removing neighboring nodes. To be deadlock free, you need to guarantee consistent locking order. Assuming that they begin with their own node locked (just for reading or for writing is irrelevant here), there is no way to make the locking order consistent. Which necessitates some other approach to be used. It needn't be non-blocking synchronization, but you need to think of a way to transform the structure - locally dedicated data partitions, randomization, transactions, etc. Problems arise with other complicated data structures, so you have to lock the entire structure, or make a kind of key-value lock, or use non-blocking synchronization, etc.
Also, the OS code is frequently on a critical path, or at least could be in a given workload. This means that performance variations are more difficult to hide and could become perceivable. Also, irregular performance in the kernel could cause sporadic unfairness. High priority code could suffer bigger delay then low priority code. Of course, some systems work on batch jobs of aggregating and filtering data 24/7 and this wont matter to them as much, especially if they are made to use deep I/O queues.
Finally, there are reasons for using complicated structures, which are not related to raw performance. RCU and similar non-blocking synchronization are well behaved in the presence of virtualization. Depending on whether paravirtualization is used, how the virtual cpus are scheduled, etc, it is possible for one vcpu to continue executing while another one is preempted by the hypervisor. In which case blocking synchronization could cause the remaining active cpu to spin for the entirety of its slice or simply surrender it. Non-blocking synchronization doesn't have this problem.
The biggest issue however is that blocking synchronization is deadlock prone. Non-blocking synchronization is comparatively expensive, but it is deadlock free. Not all structures can sustain fine-grained blocking locks, so you have to either choose slower non-blocking locks or coarser grained locking. For example, how would you safely remove element from a list using fine grained blocking locks. Consider two threads removing neighboring nodes. To be deadlock free, you need to guarantee consistent locking order. Assuming that they begin with their own node locked (just for reading or for writing is irrelevant here), there is no way to make the locking order consistent. Which necessitates some other approach to be used. It needn't be non-blocking synchronization, but you need to think of a way to transform the structure - locally dedicated data partitions, randomization, transactions, etc. Problems arise with other complicated data structures, so you have to lock the entire structure, or make a kind of key-value lock, or use non-blocking synchronization, etc.