Page 1 of 1

read() system call: how to transfer data to userspace.

Posted: Fri Jan 22, 2021 1:18 pm
by Alpha
Hey everyone,

I was wondering how an OS handles read (or write) calls, especially for microkernels. At some point an user process wants to read from a file or pipe or whatever.
It does by calling the read syscall while specifying a buffer for the result and then it blocks until the work is done. The kernel then signals the filesystem to perform the read operation on a disk or on whatever the process wants to read from. While waiting for the data, another user process could be running, so the virtual address space could be different. So how does the OS transfer the data from the filesystem back to the calling process?

I could think of copying the data first to a buffer in kernel space, then switch context and copy it back to the calling process. However that seems to be inefficient, especially when dealing with large amounts of data. Or another way could be to map the page of the user space buffer to the file system, but that would result in issues with protection and isolation of tasks, especially for microkernels.

Could someone tell me more about how this is commonly done? Or maybe tell me where I can find fore information about this?

Thanks a lot.

Re: read() system call: how to transfer data to userspace.

Posted: Fri Jan 22, 2021 1:42 pm
by AndrewAPrice
My microkernel's messaging system allows 'gifting' another process memory pages.

There will be some copying in the driver into the message, but then it's up to the receiving process if it wants to read the data in situ or copy it out.

Re: read() system call: how to transfer data to userspace.

Posted: Fri Jan 22, 2021 1:55 pm
by nexos
On a microkernel, read() is implemented in a userspace VFS server. Your kernel's IPC system should allow for sending data with a message. Also, using async I/O would be a nice touch. So, here is how it would go
A thread sends the VFS a 'read' message. On async I/O, this doesn't wait for the data to be read.
The VFS calls the filesystem driver (whether it resides in the process or in another server) to read
The filesystem driver reads data from disk, sending a message to the disk driver
The VFS sends the thread that initiated the request a message, passing it the data read
The thread continues on having read the file data.
That is how I would handle file I/O on a microkernel.

Re: read() system call: how to transfer data to userspace.

Posted: Fri Jan 22, 2021 3:29 pm
by xeyes
Alpha wrote: how this is commonly done? Or maybe tell me where I can find fore information about this?

Thanks a lot.
Look at the code or documentation of Linux or HURD and you'd know "how this is commonly done".

But my 2 cents are:
1. what's the fun in copying not only an interface but also its implementation?
2. their implementation could be complex enough, or have complex enough dependencies on the other parts of their kernel, to make copying them impractical time-wise or "value of time-wise"

My current thoughts on this problem:

The ideal and efficient way seems to be fully pass by reference, and if the user space app is cooperative enough this should be doable on modern HW if your allocation system has support for this.

Step 1:
Modern disk controllers, network cards and other high speed peripherals probably have DMAs that can scatter, or else you can use IOMMU, thus the "need a big physically contiguous range" requirement may be bypassed and scattered physical pages can be allocated to make up the buffer.

Step 2:
Once the physical pages are filled by the DMA, the ISR can (itself or schedule DPC to) map the user space app's VA the same way (as IOMMU or as the DMA is set up to scatter data) and then return to or signal (in case of async syscall) the user space app. Which will at that point have access to the requested data without any CPU copying.

Some caveats:
1. Highly advanced kernel (like able to interrupt and preempt ISR and syscall, support things like DPC) is probably needed to sustain and show off this kind of efficiency

2. Maybe this is just my design issue, I added a basic cache that is in the data path, so anything read from the disk will have to be put into the cache and any user of the cache will have to memcpy the data out of it for their usage. Maybe this can be bypassed at least in certain cases, but in general it seems that a cache might be in the way if you want fully pass by reference. The filesystem implementation could also stand in the way.

3. The user space app may not be cooperative, it can request "read X KB into a buffer based at an un-page-aligned address please"

Good luck coming up with something, or if you'd rather copy an implementation that's not a bad starting point either. Be aware that you'll not be able to easily 'unlearn' it once you know how it works though.