Handling system calls without blocking the kernel [AVR]
-
- Member
- Posts: 50
- Joined: Sat Mar 21, 2009 9:42 pm
Handling system calls without blocking the kernel [AVR]
Hello,
I am currently developing an operating system that runs on AVR microcontrollers. The target platform is VERY low spec -- I am trying to keep the minimum hardware requirement of an Atmega328 with 32KB of ROM and 2 KB of RAM. From reading the forum, I gather that most people on here are x86 developers. But hopefully my issue is generic enough that the architecture isn't an issue.
Thus far, I have FAT16 file system support, an ELF loader, memory allocator and a scheduler all while using less than 1 KB of kernel space RAM.
I've hit a roadblock though: system calls.
I have done a lot of searching both here and on the general web as well as reading source code of Linux and other open source operating systems but yielding no solution.
In short: When calling a blocking system call, how does one avoid blocking the kernel?
The way I am handling system calls now is that the process calls a vector at a known address and the vector adds the system call to a queue and then goes into a while() loop until the call is completed. At the same time, the kernel task looks for any outstanding system calls in the queue. It loops through the queue and does them in sequence if there are any, marks them complete and puts the result into the queue buffer. The system call vector then sees the call is completed and returns to the process with the result.
The major problem is if a blocking call like read() is performed, it blocks the kernel task so no other system calls can be performed until the system call is completed.
I'm not calling the kernel functions directly from the vector because some of the kernel functions are relatively memory hungry and I was hesitant to run them in the calling processes' stack.
Any ideas?
Thank you
I am currently developing an operating system that runs on AVR microcontrollers. The target platform is VERY low spec -- I am trying to keep the minimum hardware requirement of an Atmega328 with 32KB of ROM and 2 KB of RAM. From reading the forum, I gather that most people on here are x86 developers. But hopefully my issue is generic enough that the architecture isn't an issue.
Thus far, I have FAT16 file system support, an ELF loader, memory allocator and a scheduler all while using less than 1 KB of kernel space RAM.
I've hit a roadblock though: system calls.
I have done a lot of searching both here and on the general web as well as reading source code of Linux and other open source operating systems but yielding no solution.
In short: When calling a blocking system call, how does one avoid blocking the kernel?
The way I am handling system calls now is that the process calls a vector at a known address and the vector adds the system call to a queue and then goes into a while() loop until the call is completed. At the same time, the kernel task looks for any outstanding system calls in the queue. It loops through the queue and does them in sequence if there are any, marks them complete and puts the result into the queue buffer. The system call vector then sees the call is completed and returns to the process with the result.
The major problem is if a blocking call like read() is performed, it blocks the kernel task so no other system calls can be performed until the system call is completed.
I'm not calling the kernel functions directly from the vector because some of the kernel functions are relatively memory hungry and I was hesitant to run them in the calling processes' stack.
Any ideas?
Thank you
Re: Handling system calls without blocking the kernel [AVR]
Hi,
Now assume that your driver maintains some kind of list of pending requests; and when the device's IRQ occurs the driver:
Now assume that individual file systems (e.g. FAT, ext2, ISO9660, whatever) and the virtual file system layer all use similar techniques - e.g. they all work on "requests" and "replies", and they all keep track of "pending requests" and "in-progress requests". For example, from the VFS's perspective:
Now imagine that there's 54321 tasks that are all calling "read()" at various times (and blocking and unblocking); but the VFS is able to have many requests in various states at the same time, and the FAT file system is able to have many requests in various states at the same time, and storage device drivers are able to have many requests in various states at the same time.
Next; imagine that there's multiple file systems (FAT, ext2, ISO9660, etc) and multiple disk controller drivers and/or storage devices. At any point in time VFS might have 1234 requests that it's keeping track of; where FAT might have 234 of the requests, ext2 might have 600 of the requests and ISO9660 might have 400 of the requests; and where SATA disk controller might have 834 of the requests and USB CD-ROM driver might have 400 of the requests.
Also imagine that the VFS layer does file data caching. Whenever kernel/task sends a request to VFS asking to read some data from a file, VFS checks its cache and if the data is in the cache it sends a reply back immediately; and if the data isn't in the cache it sends a request to a file system, and when the file system replies the VFS grabs the data from the FS's reply and puts it in the cache (before VFS sends its own reply back to kernel/task that made the original request). Whenever kernel/task sends a request to VFS asking to write some data to a file, VFS stores the new data in its cache then sends a request to a file system.
Also imagine that all of these requests have an "IO priority"; and that each layer can take this into account when figuring out which request to do next; so that important things happen sooner/faster (and less important things happen later/slower). Then imagine that (for swap space) the kernel may send requests directly to the storage device drivers and that these requests are "very high priority" (so that swap space reads/writes happen very quickly even when there's 1234 lower/normal priority requests waiting to be done).
Note that for a micro-kernel (e.g. where all of these pieces are separate processes, and sending requests and replies is done using IPC/message passing) it should be relatively easy to see how all this works. For a monolithic kernel it's basically the same; except that instead of using IPC you use direct function calls and function pointers. For example, to send a request to a file system the VFS might do "fs_control_data->handle_read_request_function( info );" and to send a reply back to VFS the file system might do "VFS_handle_read_reply ( info );". When direct function calls are being used like this you can cheat a little by passing the same "request info structure" from VFS to FS to storage driver (as requests are made at each level) and then from storage driver to FS to VFS (as replies come back from each level), so VFS and file systems don't necessarily need to keep track of the requests while they're "in progress at a lower layer".
Cheers,
Brendan
Assume you have some kind of device where the driver issues a request/command to the device (e.g. to read or write some sectors) and then a little later the device causes an IRQ to inform the device driver that its request has completed.smeezekitty wrote:In short: When calling a blocking system call, how does one avoid blocking the kernel?
Now assume that your driver maintains some kind of list of pending requests; and when the device's IRQ occurs the driver:
- Gathers status from the "just completed" request, and informs something else (e.g. file system code)
- Finds the next "pending request" and asks the device to start doing that request
Now assume that individual file systems (e.g. FAT, ext2, ISO9660, whatever) and the virtual file system layer all use similar techniques - e.g. they all work on "requests" and "replies", and they all keep track of "pending requests" and "in-progress requests". For example, from the VFS's perspective:
- something sends a request to VFS asking it to fetch 20 KiB from a file, VFS remembers the details
- VFS sends a request to FAT file system code asking it to fetch 20 KiB from a file
- Sooner or later VFS receives reply from FAT file system, figures out what it's about (the details it remembered earlier) and sends reply back to whatever send the initial request
- VFS sends a request to FAT file system fetch 20 KiB from a file, FAT file system remembers the details
- FAT file system sends a request to storage device driver asking it to fetch 20 KiB from a hard disk
- Sooner or later FAT file system receives reply from storage device driver, figures out what it's about (the details it remembered earlier) and sends reply back to VFS
- FAT file system sends a request to storage device driver to fetch 20 KiB from a hard disk, storage device driver remembers the details
- Sooner or later storage device driver sends a command to the disk controller asking it to fetch 20 KiB from a hard disk
- Sooner or later storage device driver receives an IRQ from disk controller, figures out what it's about (the details it remembered earlier) and sends reply back to FAT file system
Now imagine that there's 54321 tasks that are all calling "read()" at various times (and blocking and unblocking); but the VFS is able to have many requests in various states at the same time, and the FAT file system is able to have many requests in various states at the same time, and storage device drivers are able to have many requests in various states at the same time.
Next; imagine that there's multiple file systems (FAT, ext2, ISO9660, etc) and multiple disk controller drivers and/or storage devices. At any point in time VFS might have 1234 requests that it's keeping track of; where FAT might have 234 of the requests, ext2 might have 600 of the requests and ISO9660 might have 400 of the requests; and where SATA disk controller might have 834 of the requests and USB CD-ROM driver might have 400 of the requests.
Also imagine that the VFS layer does file data caching. Whenever kernel/task sends a request to VFS asking to read some data from a file, VFS checks its cache and if the data is in the cache it sends a reply back immediately; and if the data isn't in the cache it sends a request to a file system, and when the file system replies the VFS grabs the data from the FS's reply and puts it in the cache (before VFS sends its own reply back to kernel/task that made the original request). Whenever kernel/task sends a request to VFS asking to write some data to a file, VFS stores the new data in its cache then sends a request to a file system.
Also imagine that all of these requests have an "IO priority"; and that each layer can take this into account when figuring out which request to do next; so that important things happen sooner/faster (and less important things happen later/slower). Then imagine that (for swap space) the kernel may send requests directly to the storage device drivers and that these requests are "very high priority" (so that swap space reads/writes happen very quickly even when there's 1234 lower/normal priority requests waiting to be done).
Note that for a micro-kernel (e.g. where all of these pieces are separate processes, and sending requests and replies is done using IPC/message passing) it should be relatively easy to see how all this works. For a monolithic kernel it's basically the same; except that instead of using IPC you use direct function calls and function pointers. For example, to send a request to a file system the VFS might do "fs_control_data->handle_read_request_function( info );" and to send a reply back to VFS the file system might do "VFS_handle_read_reply ( info );". When direct function calls are being used like this you can cheat a little by passing the same "request info structure" from VFS to FS to storage driver (as requests are made at each level) and then from storage driver to FS to VFS (as replies come back from each level), so VFS and file systems don't necessarily need to keep track of the requests while they're "in progress at a lower layer".
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Handling system calls without blocking the kernel [AVR]
A routine in a program or a system call could be very big and could take a long time.
In your system library, you could implement routines as a set of smaller functions that call each other in some sequence. Then you will be able to return as soon as you can, system calls will be executed in a chunked way, so as long as you don't need to get locked to keep state consistent, you will have the chance in your system code not only to keep calling the sequence of functions for an API call, but also to attend other things, switch to other threads or processes.
In your system library, you could implement routines as a set of smaller functions that call each other in some sequence. Then you will be able to return as soon as you can, system calls will be executed in a chunked way, so as long as you don't need to get locked to keep state consistent, you will have the chance in your system code not only to keep calling the sequence of functions for an API call, but also to attend other things, switch to other threads or processes.
YouTube:
http://youtube.com/@AltComp126
My x86 OS/software:
https://sourceforge.net/projects/api-simple-completa/
Donate to get more food/programming resources/computers:
https://www.paypal.com/donate/?hosted_b ... QS2YTW3V64
http://youtube.com/@AltComp126
My x86 OS/software:
https://sourceforge.net/projects/api-simple-completa/
Donate to get more food/programming resources/computers:
https://www.paypal.com/donate/?hosted_b ... QS2YTW3V64
-
- Member
- Posts: 50
- Joined: Sat Mar 21, 2009 9:42 pm
Re: Handling system calls without blocking the kernel [AVR]
Thank you for the information, Brendan. Unfortunately, implementing a full VFS system is just not practical for a system with such little RAM.
Minimalism is key for embedded systems.
I guess I am curious how the callback would be implemented. Would the lower layers call a function in the upper layer or is there some other method?
One of the biggest culprits of blocking is serial IO. I think I'll need to break down and implement buffered interrupt driven I/O. I'm trying to minimize buffering as much as possible but at some point it is counter productive. I want to get this blocking issue out of the way because I'm planning on adding an IP stack later.
Another idea I am thinking about trying is adding non-blocking versions of the system functions and just keeping the system call in the queue if it would need to block to satisfy the request. I just worry about the queue buffer filling up with outstanding requests.
Minimalism is key for embedded systems.
I guess I am curious how the callback would be implemented. Would the lower layers call a function in the upper layer or is there some other method?
Well the system routines are already broken down into subroutines. I suppose I could poll to service other system calls in the middle of some operations but it doesn't seem like the most elegant solution.In your system library, you could implement routines as a set of smaller functions that call each other in some sequence.
One of the biggest culprits of blocking is serial IO. I think I'll need to break down and implement buffered interrupt driven I/O. I'm trying to minimize buffering as much as possible but at some point it is counter productive. I want to get this blocking issue out of the way because I'm planning on adding an IP stack later.
Another idea I am thinking about trying is adding non-blocking versions of the system functions and just keeping the system call in the queue if it would need to block to satisfy the request. I just worry about the queue buffer filling up with outstanding requests.
Re: Handling system calls without blocking the kernel [AVR]
Hi,
Note that "embedded" ranges from tiny little micro-controllers in things like microwave ovens and washing machines (where no OS or file system makes sense), all the way up to smartphones (where it's not too different to a full blown desktop OS).
Cheers,
Brendan
The full description I provided can be stripped back (by ignoring/discarding parts) as much or as little as you like (mostly depending on how "minimal" your final product is), all the way up to the point where the question doesn't make sense in the first place (e.g. the final product is so minimal that no device is capable of "DMA transfer with IRQ on completion").smeezekitty wrote:Thank you for the information, Brendan. Unfortunately, implementing a full VFS system is just not practical for a system with such little RAM.
Minimalism is key for embedded systems.
Note that "embedded" ranges from tiny little micro-controllers in things like microwave ovens and washing machines (where no OS or file system makes sense), all the way up to smartphones (where it's not too different to a full blown desktop OS).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Handling system calls without blocking the kernel [AVR]
Polling is not a key part of what I'm telling you.smeezekitty wrote:Well the system routines are already broken down into subroutines. I suppose I could poll to service other system calls in the middle of some operations but it doesn't seem like the most elegant solution.
What I mean is that if, instead of having a monolithic API function, you can separate in individual functions for very well defined parts (MASTER FUNCTION --> STEP 0, STEP 1, STEP 2).
With that separation of steps, you can multithread. You can make individual steps look like extensions to the instruction set if you make them based on minimal optimized logic.
You can ensure that each individual step is either blocking or non-blocking, just like with atomic (lockable) and non-atomic (non-locking but optionally lockable) instructions.
Then, you can return from each step as soon as it's completed. You can attend other tasks in a multithreaded way, with execution states separated for each of the tasks you want to perform. The APIs would just make use of the state you pass to them.
But you don't need polling. You could make it in an interrupt-driven way or based in a main loop.
You can keep a pointer for the next step that you need to run that a master function can use.
There would be a master function for each API function, and that master function would call its STEP functions as it needed. There could be many common STEP functions, like generic instructions for generic tasks, but instructions implemented by you through functions.
The master function would be sort of a fetcher for the instructions of its own logic task.
YouTube:
http://youtube.com/@AltComp126
My x86 OS/software:
https://sourceforge.net/projects/api-simple-completa/
Donate to get more food/programming resources/computers:
https://www.paypal.com/donate/?hosted_b ... QS2YTW3V64
http://youtube.com/@AltComp126
My x86 OS/software:
https://sourceforge.net/projects/api-simple-completa/
Donate to get more food/programming resources/computers:
https://www.paypal.com/donate/?hosted_b ... QS2YTW3V64
Re: Handling system calls without blocking the kernel [AVR]
The kernel task need not wait for the read() to complete. It can record the fact that a process is waiting to read a device, and move on with other duties, until such a time arrives when the device does have data to send (signalled by an interrupt) to the process, at which point the kernel satisfies the request. Similar approach applies to other calls, as well as to other situations which demand a decoupling of requests and responses. A framework, which allows tasks to wait for events and be alerted when they arrive, is needed.smeezekitty wrote: The major problem is if a blocking call like read() is performed, it blocks the kernel task so no other system calls can be performed until the system call is completed.
If that occurred during a typical run of the system, wouldn't that prove that the software is improperly designed, or that the hardware itself lacks enough resources?smeezekitty wrote: I just worry about the queue buffer filling up with outstanding requests.
Re: Handling system calls without blocking the kernel [AVR]
Most OSes (e.g. Linux) for more powerful systems maintain one kernel stack per task and suspend this thread during the read(). They usually use IRQs to wait for read()s to complete. It seems that doing that is not an option in your situation.
Some questions you should work out are:
Some questions you should work out are:
- Can you afford multiple threads? Do you have enough RAM for their stacks? If the answer is no, a synchronous read() system call does not make sense.
- Do you need to support more than one concurrent I/O at all? Maybe your use case only requires one operation at a time, which simplifies the design and saves valuable ROM space.
- Does a kernel-userspace separation make sense in your resource constrained environment? Do you need system calls at all? Maybe it makes more sense to integrate the drivers into your application code as libraries.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
-
- Member
- Posts: 50
- Joined: Sat Mar 21, 2009 9:42 pm
Re: Handling system calls without blocking the kernel [AVR]
Okay. That makes a lot more sense. I'll have to give it some thought.With that separation of steps, you can multithread. You can make individual steps look like extensions to the instruction set if you make them based on minimal optimized logic.
You can ensure that each individual step is either blocking or non-blocking, just like with atomic (lockable) and non-atomic (non-locking but optionally lockable) instructions.
Then, you can return from each step as soon as it's completed. You can attend other tasks in a multithreaded way, with execution states separated for each of the tasks you want to perform. The APIs would just make use of the state you pass to them.
But you don't need polling. You could make it in an interrupt-driven way or based in a main loop.
You can keep a pointer for the next step that you need to run that a master function can use.
Well as of now I don't have a way to call back the tasks but they are stopped while waiting for the call to be serviced. It's kind of a head scratcher to me how to handle callbacks.
The kernel task need not wait for the read() to complete. It can record the fact that a process is waiting to read a device, and move on with other duties, until such a time arrives when the device does have data to send (signalled by an interrupt) to the process, at which point the kernel satisfies the request. Similar approach applies to other calls, as well as to other situations which demand a decoupling of requests and responses. A framework, which allows tasks to wait for events and be alerted when they arrive, is needed.
Well, yes and no. I have enough RAM to do multithreading. But not enough for an additional stack per processes. At that point I would just risk running the kernel routine in the process stack.Can you afford multiple threads? Do you have enough RAM for their stacks? If the answer is no, a synchronous read() system call does not make sense.
"Sort of". I probably don't need to be able to handle more than one request to the filesystem simultaneously (it's an SD card so it responds pretty fast). But I don't want to block all I/O waiting for a stream read/write (say serial or network)Do you need to support more than one concurrent I/O at all? Maybe your use case only requires one operation at a time, which simplifies the design and saves valuable ROM space
Yes. Because if the kernel has the I/O code that it will need anyway to do things like load ELF files anyway, it would use too much flash/RAM to duplicate all that code in each process that is running. Besides there would be the risk of having different processes sending conflicting instructions to an I/O deviceDoes a kernel-userspace separation make sense in your resource constrained environment? Do you need system calls at all? Maybe it makes more sense to integrate the drivers into your application code as libraries.
While I'm at it, what is the best method to pass argc/argv to main() of a starting process. Do I just stick it in the stack?
Re: Handling system calls without blocking the kernel [AVR]
For an actual callback, you can standardize the structure of the callback, call an API function including a function pointer to your callback function, then let the API function do whatever it does and call the callback function you passed with a pointer. If the pointer is 0 or NULL, then no callback should be called.
For passing parameters, you could pass a pointer to a system structure that would contain all of the data the program could need at startup time. You could pass that pointer in a register, the stack or a known system variable (more difficult), or resolve the pointer into a header field of your executable intended to contain that system parameter structure.
You could also massively export kernel functions with an index or some constant, and then have the system dynamically link those functions in another list of function pointers provided by the application. It would only need to copy the address of the function to the corresponding program import.
For passing parameters, you could pass a pointer to a system structure that would contain all of the data the program could need at startup time. You could pass that pointer in a register, the stack or a known system variable (more difficult), or resolve the pointer into a header field of your executable intended to contain that system parameter structure.
You could also massively export kernel functions with an index or some constant, and then have the system dynamically link those functions in another list of function pointers provided by the application. It would only need to copy the address of the function to the corresponding program import.
YouTube:
http://youtube.com/@AltComp126
My x86 OS/software:
https://sourceforge.net/projects/api-simple-completa/
Donate to get more food/programming resources/computers:
https://www.paypal.com/donate/?hosted_b ... QS2YTW3V64
http://youtube.com/@AltComp126
My x86 OS/software:
https://sourceforge.net/projects/api-simple-completa/
Donate to get more food/programming resources/computers:
https://www.paypal.com/donate/?hosted_b ... QS2YTW3V64
-
- Member
- Posts: 50
- Joined: Sat Mar 21, 2009 9:42 pm
Re: Handling system calls without blocking the kernel [AVR]
I solved this problem by checking if the action can be performed without blocking and if it cannot, leaving the call in the queue and checking it again next pass. The system calls are somewhat slower than I'd like but for now it works well enough.
I also implemented arguments as well as exit statuses. That was fun
I also implemented arguments as well as exit statuses. That was fun