Page 3 of 3

Re: I keep recoding my OS!

Posted: Wed Dec 23, 2020 12:20 pm
by thewrongchristian
PeterX wrote:You probably mean this article:
https://www.nextplatform.com/2017/09/11 ... -posix-io/
It's interesting read.
It's also mistaken, I believe. For example:
Perhaps the biggest limitation to scalability presented by the POSIX I/O standard is not in its API, but in its semantics. Consider the following semantic requirement taken from the POSIX 2008 specification for the write() function:
After a write() to a regular file has successfully returned:

– Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

– Any subsequent successful write() to the same byte position in the file shall overwrite that file data.
That is, writes must be strongly consistent–that is, a write() is required to block application execution until the system can guarantee that any other read() call will see the data that was just written. While this is not too onerous to accomplish on a single workstation that is writing to a locally attached disk, ensuring such strong consistency on networked and distributed file systems is very challenging.
Nothing in the quoted text from the standard says the data written by write() must be persisted when write() returns. Subsequent read()s will read the data from the cache write() has just updated, and be totally consistent with the above requirements.

If writes are meant to be persisted immediately, then the file must be opened with O_SYNC option, or fsync() to synchronize outstanding writes for files not opened with O_SYNC.

Re: I keep recoding my OS!

Posted: Wed Dec 23, 2020 1:16 pm
by eekee
PeterX wrote:https://www.nextplatform.com/2017/09/11/whats-bad-posix-io/
It's interesting read.
It is indeed. It reminds me of when I wanted to challenge Plan 9's filesystem semantics. POSIX semantics are rather different, but the consistency guarantee problem is very similar. Plan 9 programs are allowed to violate consistency, but of all the programs included with the system, only one does. It's just a copy utility similar to cp. It's a much different-scale problem to NVME and hypercomputers, but recoding all the shell commands to violate consistency would probably break a lot of scripts. :)

EDIT: @thewrongchristian: I don't see your objection as being related to the problems the article discusses. POSIX doesn't care about caches so long as reads which follow writes return the data just written. That's impossible to do in a networked filesystem if each machine caches the filesystem independently.

(This in turn reminds me Plan 9 has a mount cache, but it's off by default. 9front's mount cache is on by default.)

Re: I keep recoding my OS!

Posted: Wed Dec 23, 2020 5:05 pm
by thewrongchristian
eekee wrote: @thewrongchristian: I don't see your objection as being related to the problems the article discusses. POSIX doesn't care about caches so long as reads which follow writes return the data just written. That's impossible to do in a networked filesystem if each machine caches the filesystem independently.

(This in turn reminds me Plan 9 has a mount cache, but it's off by default. 9front's mount cache is on by default.)
I only got as far as the text I quoted, as I was under other time pressures. Reading further, it explains the context about POSIX and distributed filesystems.

But the point stands, the POSIX semantics are not much of a problem for 99% of use cases, and designing an OS round the remaining 1% makes no sense. That 1% can just live without POSIX semantics, or just split up its data sufficiently that each piece can be processed independently.

Based on the article, the argument seems to be that POSIX prevents scaling where data must be shared between distributed nodes, whereas in fact if the data is required to be shared and updated concurrently in a consistent manner (and you do want the data to be consistent, else it is probably wrong,) the barrier to scalability is in fact distributed nature of the program and fundamentals such as the speed of light.

To be properly scalable, the problem being solved concurrently must be appropriately partitioned.

Re: I keep recoding my OS!

Posted: Thu Dec 24, 2020 1:49 am
by bloodline
nexos wrote: Yeah, that makes sense. I read an article a couple days ago about the only thing prevent computers with 100000+ nodes is POSIX I/O. Basically, it was saying POSIX I/O scales horribly, and I have to agree. Anyway, I think I have an architectural plan for my kernel, tell me any issues you see in it :)
So, my idea is an "everything is a message port idea". I'm going to revive the old Mach port idea, and message ports won't necessarily represent a message queue. It could represent something like a semaphore, a process control block, any resource inside the kernel is going to be a message port. A port representing one of these objects doesn't context switch, so it contains hardcoded function pointers called message handlers. The message manager takes the message ID, and uses that to find this messages messages handler, passing the message packet as a parameter. Now the object we sent the message to acts upon it, and returns, all without a context switch. Those are user mode to kernel mode messages. Then there are user mode to user mode messages, which work in the normal way. What do you say about that?
This is an interesting discussion point, perhaps it is time to start a new thread about the relative merits of different micro kernel designs. The current thread topic doesn’t really apply.