Brendan wrote:
Your argument might make sense for something like Gnome or Microsoft Office or Oracle Database, where the code actually has been used and tested by many people on many computers. However...
It does not make sense for almost everyone here, where the supposedly "well tested mature code" has never actually been used on any production system; and has probably spent a total of less than 4 hours being "partially tested" (due to never having many real applications, drivers, etc) on less than 5 computers.
Not so. Maturity doesn't come suddenly, but gradually. That means it is relevant for typical hobby-OSes as well. If you used a lot of time for developping drivers, and possibly test applications, you can do much better tests than if you just started writing a new boot-loader and kernel, that basically can run nothing, neither drivers nor applications.
Brendan wrote:
Due to this "there is no mature code to begin with" problem; what you're really suggesting is "don't replace old code (that was written when you had less experience and is therefore probably full of hidden bugs) with new code (that you will write when you've got more experience and will therefore probably have less hidden bugs)". Can you see how, in the context of hobby OSs, this is completely idiotic?
Not the same thing. I've replaced basically all code from the 90s, without doing any complete rewrite or starting from scratch. What that means is that the new code had much better test-cases than if I'd rewritten everything from scratch, where basically nothing would be testable. Much of the old code did not have any test-cases at all, and also wasn't regularily tested in the kernel debugger either, as that was before I made that tool. If I'd do a complete rewrite, basic functionality (paging, physical memory handing, scheduling and SMP) would have no test-cases, and thus would need years to become stable. Even almost trivial bugs in the SMP implementation or in locking could lurk for very long before even getting detected without adequate testing.
Lets take the implementation of PAE (and long-mode) paging in RDOS as an example. This was done in the present code-base, and thus could be put to elaborate tests from drivers and userland. If you write long mode paging from scratch, you would have no test-cases whatsoever, since paging is basic. The same goes for the recent change to the physical memory allocation, which could be stressed by present drivers and applications. Even the long-mode pagefault handler could be partially tested by using faults from 32-bit drivers.
Edit: And despite of this initial testing and stressing, I still missed a trivial bug in the physical memory handler, and I missed that the SMP scheduler could mess-up on single-core CPUs and let an unbehaved task run forever, locking out everything at the same priority-level.
I'd like to state that OS code written without proper debugging and stress-testing tools is "alpha", and highly unreliable, and needs to be replaced (or properly debugged/stressed, which is tedious and very time-consuming) before being considered stable. Then if you start your cycle from scratch every time, all your code will be in alpha-state all the time, regardless how sophisticated your system eventually becomes. I bet that the time you can save in the development cycle will multiplied by ten and used-up in finding the bugs when the code becomes more mature. However, if you never aim to approach a stable release, it seems like the start-over approach might be efficient, as you can spend a lot of time on development and not so much on bug-fixing.