Modern storage is plenty fast. It is the APIs that are bad.

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Modern storage is plenty fast. It is the APIs that are b

Post by Korona »

That's true. Unfortunately, the POSIX AIO API is not implement efficiently on common platforms and that is partly due to the way it is specified: POSIX AIO uses signals to notify users about the completion of I/O operations, which introduces very awkward signal handling code (the same can be said for all asynchronous signal handling). Technically, POSIX AIO also allows notification handlers to run on pthreads but then you need to deal with a libc-managed thread pool that you have little control over. In addition, it integrates poorly with epoll (as the later is file-based and does not monitor individual I/O operations).

The Linux-specific io_uring or Windows' IOCP are much more convenient: these interfaces simply post the completed requests into a ring buffer in shared memory that can be queried by userspace (often without entering the kernel at all). This allows users to completely get rid of dedicated I/O threads. For high-performance applications, io_uring has a polling mode where drives are busy polled instead of waiting for IRQs. POSIX AIO cannot really take advantage of these features.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
nexos
Member
Member
Posts: 1078
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Modern storage is plenty fast. It is the APIs that are b

Post by nexos »

Korona wrote:Man, I ask you to discuss a topic in a non-inflammatory way and you start your reponse with "For one, no serious dev would ever [..]".

It's sad that it's no longer possible to discuss interesting topics without toxicity on this forum.
Yeah, I know. I have started threads about monolithic vs. microkernels and UEFI Wiki updates and they have disintegrated into flame wars. Something needs to be done....
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Modern storage is plenty fast. It is the APIs that are b

Post by bzt »

nexos wrote:
Korona wrote:Man, I ask you to discuss a topic in a non-inflammatory way and you start your reponse with "For one, no serious dev would ever [..]".

It's sad that it's no longer possible to discuss interesting topics without toxicity on this forum.
Yeah, I know. I have started threads about monolithic vs. microkernels and UEFI Wiki updates and they have disintegrated into flame wars. Something needs to be done....
I agree. I always answer in a polite manner, then someone starts trolling with sentences like "Or maybe, maybe, maybe, that guy is an expert in his field and your understanding of the blog post is flawed." then unsuccessfully tries to blame it on me. It would be great if everyone here could reason and provide links to back up statements, but I think it might be too much to ask.

@Korona: if you don't agree with one of my reasoning, then just say so. I've quoted and made it clear why the author is not an expert, you haven't responded to THAT at all. This is bad manner and toxic to this forum. Hope you understand.

Have a nice day,
bzt
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Modern storage is plenty fast. It is the APIs that are b

Post by Korona »

bzt, you have made some valid points and some invalid points that demonstrate that you did not really read the post but just try to justify your previous wrong comments (e.g., the post say that paging costs are the same order of magnitude as storage access and you make it sound as if the author claims that RAM is slower than disk). But the reason that I did not respond to your post is that your tone is not appropriate and that you apparently do not want to discuss topics on a technical level without insulting people (such as the author of this blog post). You've insulted the readers of this forum so many times already that I do not really understand why moderation is not taking action but if it is not clear to you where your tone is not appropriate, let me remind you of your tone. Your first comments are more-or-less calm but it got only worse as longer as this thread went on:
bzt wrote:But the author's mistaken in many points:
Accounting the flaws on the author and not the post. One of the 101 points of efficient communication: discuss ideas, not people.
bzt wrote:No, he does not know. @eekee is right, the author is confusing API concept and implementation, plus he is completely forgetting that readahead needs a buffer too![
Claiming that a substaintial contributor to the Linux readahead path does "not know" how readahead works is just ridiculous. Plus, the claim that "he does not know" is baseless, you cannot know about the extent of the author's knowledge, only what he posts about on his blog.
bzt wrote:It is very sad that people who don't understand the difference between concept and implementation (and blame the API for their ignorance) are working on the Linux kernel. Just sad. Maybe it's time to switch to one of the BSDs?
Now this is just flaming. We are not on 4chan, are we?
bzt wrote:For one, no serious dev would ever say that [...]
And here we arrive at the realm of insults and ad-hominem attacks. Which is why I just won't respond to that post until you re-phrase it to reflect your technical points and not your feelings.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Modern storage is plenty fast. It is the APIs that are b

Post by Korona »

I don't know about others but I don't want to come here to read/discuss rude comments, I want to discuss technical ideas. It does not matter if one is right or wrong (whether it's bzt or the author of that blog post). What matters to me is having a nice discussion board where the focus is on technical details, without the usual internet flame wars that one can also find on 4chan, parts of reddit or Discord.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Modern storage is plenty fast. It is the APIs that are b

Post by bzt »

Korona wrote:you make it sound as if the author claims that RAM is slower than disk
I never said that. What the author claims is that the RAM has the same speed as the NVMe, here's the quote (again):
Those operations [page fault, interrupts, copies or virtual memory mapping update] are now in the same order of magnitude of the I/O operation itself.
Korona wrote:that your tone is not appropriate
What would you say about the tone of "Or maybe, maybe, maybe, that guy is an expert in his field and your understanding of the blog post is flawed."? Do you honestly think that this is not a personal insult? Do you really think that writing something like that is within the scope of a "nice technical discussion"?

You also wrote "That's something that people cannot really verify since you are posting under a pseudonym here." which is essentially calling me a liar. No, I don't lie, I really do have that certificate, and btw I don't use pseudonym, I use a monogram.
Korona wrote:Claiming that a substaintial contributor to the Linux readahead path does "not know" how readahead works is just ridiculous.
Why would it be ridiculous? Do you really think that Linux source is only written by genius experts? Reading through the Linux source and looking at the insane number of bugs I'm certain it was written mostly by incompetent developers, and the real experienced developers are having hard time cleaning up the mess after them. Just for the records, Con Kolivas and Torvalds agrees (here and here and here and here). Unlike you, I back up my claim, because that's what makes a discussion civilized.

Look, from your posts it's obvious that you have put the blog's author on a pedestal and I'm sorry that you were mistaken about that. But that's still no reason for you to make personal insults the way you did.

Cheers,
bzt
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Modern storage is plenty fast. It is the APIs that are b

Post by Korona »

You are right that my tone was also not constructive in that "maybe maybe" message. I will be the first to admit that; I apologise. But your claims are still ridiculous. And no, you do not keep the discussion civilized, you escalate it. You do have a track record of insulting people. Threads that you participate in regularly degrade into flame wars, that is a repeating pattern.

I am not calling you a liar, your qualifications are just not backed up by verifiable evidence while I can easily type git log to verify that the author of that post indeed made good contributions to the Linux I/O code.

And no, Linux is not only written by experts but core Linux code has reasonably high quality (drivers are a different story) and the buffered I/O path is certainly battle-tested.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
nexos
Member
Member
Posts: 1078
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Modern storage is plenty fast. It is the APIs that are b

Post by nexos »

bzt, just apologize to Korona.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Modern storage is plenty fast. It is the APIs that are b

Post by Ethin »

As a newbie to this forum (compared to many users like Corona) I have to, sadly, agree with Corona. Though I did not read the article, it was not overly difficult for me to spot the toxicity, and the (almost condescending-sounding) manner that it was delivered in. I don't want to derail this topic too much but I'll leave you with this, Bzt: As someone who is a member of a forum that can get incredibly toxic I joined this forum for two reasons: first, I needed help with OS Dev and was interested in it; and second, I looked at this forum and went "wow, this forum is full of adults like myself who aren't as toxic as this other forum I'm a member of where the majority of users are under eighteen." I even went to one of my friends and proudly stated that I had found a place among others who were just like me, who enjoyed our craft in OS and embedded systems development, who knew what the difference was between a debate and a war of words, and who knew when to draw that line and go "Okay, I need to step back and think for a bit". Though I have seen some toxicity as I've browsed the forum, and (perhaps accidentally) contributed to it, I do not strive to be a toxic individual, and I try to stay out of toxic threads as much as possible unless someone is throwing around misinformation/nonsense. I also enjoy debates, as people have probably already determined. Anyway, back to what I was saying: please, please try to not make a lier out of me. I don't want to go back to that friend one day and tell him that I was wrong about my initial assessment of this forum. And I especially don't want this place to devolve into a cesspit of antagonism. Toxicity is okay, and I acknowledge that there's no way to eliminate it completely, but there's no need to try and stir the pot wherever you go.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Modern storage is plenty fast. It is the APIs that are b

Post by bzt »

nexos wrote:bzt, just apologize to Korona.
Hold your horses! Korona insulted me and called me a liar. Why should I do the apology? I did absolutely nothing against Korona!

Never apologize for telling the truth! End of discussion.

Have a nice day,
bzt
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Modern storage is plenty fast. It is the APIs that are b

Post by Korona »

I do not demand an apology, you did not insult me in this thread. My remarks in this thread were never about insults against me personally but about insults in general, whether they are against people of this forum or external people. I simply do not want to see that toxic attitude in a technical forum that I've been part of for 13 years.
What I demand is that you change your tone in this forum (towards all members and also external people) and to stop igniting the ever-repeating flame wars. Discuss technical issues and not people.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
nexos
Member
Member
Posts: 1078
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Modern storage is plenty fast. It is the APIs that are b

Post by nexos »

When you said "For one, no serious dev....", I thought you were talking about Korona. My bad. Still, flame wars break out to easily on this forum.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Re: Modern storage is plenty fast. It is the APIs that are b

Post by OSwhatever »

Korona wrote:That's true. Unfortunately, the POSIX AIO API is not implement efficiently on common platforms and that is partly due to the way it is specified: POSIX AIO uses signals to notify users about the completion of I/O operations, which introduces very awkward signal handling code (the same can be said for all asynchronous signal handling). Technically, POSIX AIO also allows notification handlers to run on pthreads but then you need to deal with a libc-managed thread pool that you have little control over. In addition, it integrates poorly with epoll (as the later is file-based and does not monitor individual I/O operations).

The Linux-specific io_uring or Windows' IOCP are much more convenient: these interfaces simply post the completed requests into a ring buffer in shared memory that can be queried by userspace (often without entering the kernel at all). This allows users to completely get rid of dedicated I/O threads. For high-performance applications, io_uring has a polling mode where drives are busy polled instead of waiting for IRQs. POSIX AIO cannot really take advantage of these features.
Back to topic. This connects more to the system architecture than the filesystem API. If I have understood correctly, the thesis is that IO is so fast that it is worth doing busy wait and fill the destination buffer directly. For small transfers that might be true however, it also demands a few things of your OS.

The kernel must provide a kernel filesystem interface of some sort and then operate directly on the user space buffer. Microkernels where the filesystem reside in another process and the benefit is pretty much gone. With larger data transfers we shouldn't busy wait and the system must use another method yielding the thread until the IO operation has been finished. Asynchronous IO is of course not in question here as the benefit would be busy wait. Signalling finished operations are in general slow.

While this might make small IO transfers faster, the question is will it matter in a real life program? One scenario is a server that serves thousands of clients and with busy wait there is risk of congestion and it would take longer because HW resources aren't available. For several concurrent IO operations, I still think that client server filesystem architecture might be better, where IO operations are queued on block level.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Modern storage is plenty fast. It is the APIs that are b

Post by Ethin »

OSwhatever wrote:
Korona wrote:That's true. Unfortunately, the POSIX AIO API is not implement efficiently on common platforms and that is partly due to the way it is specified: POSIX AIO uses signals to notify users about the completion of I/O operations, which introduces very awkward signal handling code (the same can be said for all asynchronous signal handling). Technically, POSIX AIO also allows notification handlers to run on pthreads but then you need to deal with a libc-managed thread pool that you have little control over. In addition, it integrates poorly with epoll (as the later is file-based and does not monitor individual I/O operations).

The Linux-specific io_uring or Windows' IOCP are much more convenient: these interfaces simply post the completed requests into a ring buffer in shared memory that can be queried by userspace (often without entering the kernel at all). This allows users to completely get rid of dedicated I/O threads. For high-performance applications, io_uring has a polling mode where drives are busy polled instead of waiting for IRQs. POSIX AIO cannot really take advantage of these features.
Back to topic. This connects more to the system architecture than the filesystem API. If I have understood correctly, the thesis is that IO is so fast that it is worth doing busy wait and fill the destination buffer directly. For small transfers that might be true however, it also demands a few things of your OS.

The kernel must provide a kernel filesystem interface of some sort and then operate directly on the user space buffer. Microkernels where the filesystem reside in another process and the benefit is pretty much gone. With larger data transfers we shouldn't busy wait and the system must use another method yielding the thread until the IO operation has been finished. Asynchronous IO is of course not in question here as the benefit would be busy wait. Signalling finished operations are in general slow.

While this might make small IO transfers faster, the question is will it matter in a real life program? One scenario is a server that serves thousands of clients and with busy wait there is risk of congestion and it would take longer because HW resources aren't available. For several concurrent IO operations, I still think that client server filesystem architecture might be better, where IO operations are queued on block level.
Not necessarily (I don't think anyway). If I'm understanding all of this right, you could, in theory, do something like this for IO requests -- especially if its on somethign like NVMe:
  1. Processes all send IO requests to the kernel
  2. Kernel has some kind of indicator that there aren't any more requests (e.g.: wait 2 timer ticks of really short intervals -- say, 500 Us if not smaller)
  3. Kernel knows that we're done for this timeslice, so notify the controller to service all the requests.
  4. Yield to other threads and wait for an interrupt to notify the kernel that all requests have been serviced.
  5. Return the read data in the system call return value (or a pointed-to buffer).
  6. Return from syscall.
This may not work; I know that my kernel for instance has a physical memory offset that I can use to convert physical addresses to virtual ones. However I don't know if I can do that to ask HW to write to those virtual addresses. (Theoretically, if physical and virtual addresses are completely separate, the buffer can be the same address in both and you can then just do a quick memmove to copy it into the virtual buffer.)
Octocontrabass
Member
Member
Posts: 5513
Joined: Mon Mar 25, 2013 7:01 pm

Re: Modern storage is plenty fast. It is the APIs that are b

Post by Octocontrabass »

Ethin wrote:This may not work; I know that my kernel for instance has a physical memory offset that I can use to convert physical addresses to virtual ones. However I don't know if I can do that to ask HW to write to those virtual addresses. (Theoretically, if physical and virtual addresses are completely separate, the buffer can be the same address in both and you can then just do a quick memmove to copy it into the virtual buffer.)
Converting physical addresses to virtual addresses is hard, but converting virtual addresses to physical addresses is easy. All you need to do is walk the page table to find the correct physical address for the hardware to use. I would have to do benchmarks to be sure, but I think telling hardware to write directly into the correct address will be faster than allocating a separate buffer and copying from there.
Post Reply