rdos wrote:Kevin wrote:You're more talking about "not relevant any more" parts of the OS, because if they hadn't been relevant at some point, I wouldn't have written them in the first place. I don't consider any parts of the OS "not relevant any more".
OSes with considerable age usually have some parts that no longer are used, and which could have been removed but aren't because they are linked into kernel some way.
Oh yes, that's obviously not what I meant. I was talking about parts of the OS that for Brendan aren't relevant any more because I'm working on implementing a design change and therefore my kernel wouldn't ready for these parts any more.
Brendan wrote:You're probably right - incremental change means you have to be careful to do things in a way that doesn't break existing code ("tip toe through the prickle patch"), and creates extra hassle/difficulty. More difficulty is likely to increase the rate of bugs; and therefore assuming the same rate of bugs is probably a bad assumption. We should be assuming that (for changes that effect a significant amount of the code) "incremental change" increases the bugs per line of code written while also increasing the number of lines of code written.
Okay, I hate to do it, but now I have to ask: Have you ever been part of a large project that changed some fundamental aspect of its design? You have an admirable knowledge of hardware and many low-level details, but when it comes to managing software projects you always bring up theories that are 100% contradictory to my own experiences.
I might even agree with the higher difficulty of incremenal changes, I'm not sure about this, but this difficulty comes from the fact that you put much more effort into making easily verifyable steps, each with a good justification. This is not the kind of difficulty that produces more bugs, on the contrary, you actively do hard work for avoiding bugs, and especially for avoiding regressions that can easily slip in in a rewrite from scratch.
You don't seem to understand the difference between design changes that effect a significant amount of the code, and minor/trivial changes.
There's no difference. Divide et impera. Just split the design changes in many trivial changes.
You think about changing the physical memory manager, but realise that you couldn't test the new locking because everything is still protected by the big kernel lock (that you can't remove yet).
You don't remove the BKL all at once. You push it down and gradually make more code run without holding the BKL. Implementing the locking in the PMM is part of this process and not a separate work item. The PMM is probably not where you start with it because it's not directly called from top-level functions, but there's nothing that stops you from doing other parts first.
In addition, every piece of code that allocates pages would need to do something like "alloc_page(page_colour, NUMA_domain);" instead of just "alloc_page();", and because the virtual memory manager and scheduler aren't NUMA aware yet they can't tell the physical memory manager which NUMA domain they're allocating pages for.
Introduce a NUMA_ANY_DOMAIN that means that the caller doesn't care. Get rid of the callers using NUMA_ANY_DOMAIN as soon as you can, it doesn't have to be now.
This is a general rule: If you enhance an interface with a new parameter, there is almost always a value that retains the old behaviour. If there isn't, it's usually not hard to add a special case that does.
You think about splitting kernel space into the 2 different areas but that would mean changing any piece of code that modifies data in the "per NUMA domain kernel space area" so that it propagates changes to the other NUMA domains without race conditions (which you can't test due to the big kernel lock) and that seems too hard.
Splitting into two different areas has no immediate effects on locking. You can just do it. (Or maybe you work on the BKL removal first.) But anyway, the default area for existing callers is the "global kernel space area", so any users of the "per NUMA domain" area are new and can be written to use it properly from day one.
You think about adding support for "out of physical memory", but that means implementing swap space support in user-space (e.g. a "swap manager" processes) and you decide that because all the messaging is going to change from synchronous to asynchronous it'd be a waste of time doing it now.
Nah, swap space is finite as well. You should just implement proper error handling. This is something you can always do.
You think about changing the IPC, but realise that you couldn't test the new locking because everything is still protected by the big kernel lock (that you can't remove yet). You also realise that the existing (synchronous messaging) code causes a task switch to the message receiver whenever a message is sent, and if this task switch doesn't happen it breaks the scheduling (e.g. you have threads that send messages in a loop that would become instant CPU hogs that never block).
Add asynchronous IPC by the side of the existing synchronous IPC. New code can use it if it wants. Now you're ready to implement your swap manager as well.
Convert users of the synchronous API gradually, or change the implementation of the synchronous API to use asynchronous IPC internally.
At this point you give up and decide that "incremental change" was a huge mistake. You create an empty "my OS version n+1" directory, then copy the old boot code into it, and start writing your new physical memory manager without having any reason to care about all the annoying dependencies between all the other pieces. Now you're able to make real, meaningful progress!
Yes, valid approach for your "Hello world!" kernel because you don't lose much. However, it's not a valid approach for a serious project of a certain size.
Would you waste your time writing a sound card driver and then radically change the device driver interface? Would you waste your time writing a GUI that relies heavily on a "framebuffer" and then radically change your video driver interface to a "video driver draws everything" model? Would you waste your time writing a file system that caches file data, and then redesign your VFS so it does all caching? The sound card driver, GUI and file system code would all be relevant eventually.
I wrote a network driver and then radically changed (or in fact, introduced) the device driver interfaces. Some actual driver code had to be adapted to the new interfaces, that was mostly mechanical. The rest of the code moved into the implementation of the driver interface. Not a huge deal.
We did have caching in the file system drivers before it was implemented in the VFS. We learned from the mistakes in the FS driver and hopefully did better in the VFS.
I don't think any of this work was wasted.