OSDev.org

Posted: **Sat Dec 10, 2011 3:55 am**

gerryg400 wrote:BTW2, the reason Rdos could simply search and replace the cli instructions with spinlocks in his OS was because his OS was (probably) already re-entrant and supported multiple interrupts etc. already. Thus it needed fewer changes to be SMP supporting. Is that correct Rdos ?

Yes, RDOS was designed for multitasking in the kernel from the beginning, and already had server-threads supporting various hardware devices. Another design philosophy that helped was to keep ISRs at minimum, and delegating as much of the work as possible to server-threads. That's why SMP mostly affected the synchronization between ISRs and the server-thread. There also was a suitable synchronization interface with critical sections and signals that was used throught the kernel and it's device-drivers that could be ported to SMP with no change in the visible interface. The hardest issue actually was self-modifying code (syscalls), and providing page-fault handlers for applications that could handle SMP.

Posted: **Sat Dec 10, 2011 4:05 am**

Brendan wrote:The remaining 80% is replacing code that works on multi-CPU but has "exponential suckage" (poor scalability). This means researching lockless algorithms for IPC, splitting "one lock for physical memory management" into "thousands of locks", finding ways to minimise the work done in critical sections, finding ways to minimise IPIs (for TLB shootdown, etc), figuring out how to avoid a "global tick" while keeping each CPU's time in sync, finding things that idle CPUs can do to improve performance in future (when the system is under load), CPU load balancing (and a whole pile of IO prioritisation if you don't have it already), cache management (reducing false sharing), etc.

Yes, but for a gradual move, these issues are not critical, but can be implemented as needed. What is critical (and pretty hard) is to switch the OS to multicore and actually make it work without crashing too much. Until you have passed that threshold, your OS would be unstable on SMP. Even if you delete your kernel, and start from scratch, you will have a state where your OS is either unusable or unstable. The current state of RDOS is that it is stable on single core, but slightly unstable on multicore. That is no problem since the commercial application running on RDOS is running on a single core PC. I decided to provide different locking schemes on single core vs multicore, so at boot time the scheduler would select the correct one. After all, the more complex locking of multicore should not slow-down single core machines. On single core, the locking is essentially cli/sti.

Posted: **Sat Dec 10, 2011 4:23 am**

Hi,

rdos wrote:Yes, but for a gradual move, these issues are not critical, but can be implemented as needed. What is critical (and pretty hard) is to switch the OS to multicore and actually make it work without crashing too much. Until you have passed that threshold, your OS would be unstable on SMP. Even if you delete your kernel, and start from scratch, you will have a state where your OS is either unusable or unstable. The current state of RDOS is that it is stable on single core, but slightly unstable on multicore.

I've been trying to find something in an article I saw somewhere... and I've found it.

The paper is called "How to NOT write kernel driver" (and is written by one of Red Hat's kernel hackers, Arjan van de Ven). The specific piece I was thinking of is at the start of Chapter 4 (SMP):

Despite popular belief, SMP safety is not something you “weld” into your code as a hindsight. SMP
safety is something you need to take into account right from the start. Makeing a (largish) piece of
code SMP safe in hindsight leads to all kinds of lock ordering nightmares, makes you wish there were re-
cursive locks and generally results in a suboptimal solution. While I could give numerous drivers as
examples of how to not do it, the main kernel with the Big Kernel Lock (BKL) is the best example of
this. The BKL was put in to make the kernel work on SMP, in hindsight, and well, it still results in
nightmares with dozens and dozens of races. It is taking 5 years so far to ﬁx all the core subsystems
to have proper locking of their own.

Cheers,

Brendan

Posted: **Sat Dec 10, 2011 6:32 am**

If you are starting to write new code, I agree that should consider things like that from the beginning. However, the situation looks different when you already have a big pile of code and your options are using a BKL style approach or rewriting everything from scratch. I think there's a reason Linux did the BKL thing. Going through that kind of pain certainly sucks, but the cost and pain of rewriting everything from scratch would have been much higher. If you have existing code, in most cases you want an incremental approach.

Posted: **Sat Dec 10, 2011 7:01 am**

Hi,

Kevin wrote:If you are starting to write new code, I agree that should consider things like that from the beginning. However, the situation looks different when you already have a big pile of code and your options are using a BKL style approach or rewriting everything from scratch. I think there's a reason Linux did the BKL thing. Going through that kind of pain certainly sucks, but the cost and pain of rewriting everything from scratch would have been much higher. If you have existing code, in most cases you want an incremental approach.

I personally think that Linux did the whole BKL thing because they didn't have the benefit of hindsight. However, even with hindsight it may not have been the best approach for Linux, as its developers are cooperating but independent volunteers and not people that can be controlled by a benevolent dictator. Basically the chance of forks and the chance of people (volunteers/developers and end-users) moving to different projects instead (e.g. FreeBSD) may have justified the increased pain of retrofitting SMP to existing code.

None of this really applies to hobbyist developers like ourselves though - you don't need to worry about losing volunteers or market share when you have no volunteers and no market share to lose.

In addition, for hobbyist developers like ourselves, it's rarely a case of "restart to add SMP support alone". More often it'd be "restart to add SMP support, and improve lots of other things at the same time" or even just "restart to improve lots of other things at the same time (and add SMP too)".

Cheers,

Brendan

Posted: **Sat Dec 10, 2011 7:23 am**

I can imagine that any type of kernel that is maintained by "ordinary programmers / users" will suffer from SMP related problems because so many of these are used to writing sequential application code with no locks. If the people that write device-drivers are unaware of how to do proper locking, and how different things interact, they will write buggy device-drivers. In that respect having a kernel that is designed for SMP won't help, as every part of the system needs to be designed with SMP in mind. The issue for kernel itself is only to hand-out a working scheduler and synchronization primitives, it cannot enforce the correct usage of these in specific drivers.

I'm also amazed that Linux still has a BKL, and that it cannot handle floating-point usage in kernel.

Posted: **Sat Dec 10, 2011 7:56 am**

Hi,

rdos wrote:I'm also amazed that Linux still has a BKL, and that it cannot handle floating-point usage in kernel.

I'm not too sure when that paper was written. It says "It is taking 5 years so far to ﬁx all the core subsystems to have proper locking of their own.", and if I remember right SMP (and the big kernel lock) was added to Linux in around 1995. That means this paper would've been written in around the year 2000, but the legalese at the end shows an IBM copyright statement from 2001. Therefore I'm going to assume this paper was written in late 2001.

The big kernel lock was finally removed in Linux version 2.6.39, in May this year. Basically, Linux was only about 5 years old when they added the big kernel lock (1991 to 1995), and it took about 15 years to get rid of the big kernel lock (1995 to 2011). Of course they did work on a few other things during the last 15 years too..

Cheers,

Brendan

Posted: **Sat Dec 10, 2011 11:12 am**

Brendan wrote:None of this really applies to hobbyist developers like ourselves though - you don't need to worry about losing volunteers or market share when you have no volunteers and no market share to lose.

On the contrary: Something like Linux would be hurt if all volunteers left, but it would survive because there are companies interested in it.

Our hobby OSes only exist because there is at least one volunteer. There are a couple of projects that are done by a team rather than a single person, but these teams are usually relatively small. And this means that a small hobby project cannot afford to lose any volunteer.

In addition, for hobbyist developers like ourselves, it's rarely a case of "restart to add SMP support alone". More often it'd be "restart to add SMP support, and improve lots of other things at the same time" or even just "restart to improve lots of other things at the same time (and add SMP too)".

Yeah, been there, done that (not about SMP, but the lots of other things part). I wouldn't do it again.

While you're rewriting stuff from scratch, development comes to a halt. You're only working on getting things running that already worked on the old version. This, plus the fact that it's a really big rewrite so that no end is in sight, make developers lose their motivation. You cannot present anything new, so you lose testers. The longer it takes until you're back to "normal" development, the less motivation gets, the slower progress becomes. Which makes it take even longer.

There hasn't been a tyndur release for almost two years now, guess why.

Posted: **Sat Dec 10, 2011 11:48 am**

Well, basically the situation for me is that I suddenly (and unexpectedly) have a fair amount of time on my hands so I'm dusting off toy OS code I wrote about 10 years ago to play with. I decided that it was truly terrible so I started again from scratch and decided that the right way to do it would be to design for MP from the ground up. I'm still learning about ACPI and the like (have a LOT of reading still to do) so I'm still not fully aware if an ISR can be running on two cores simultaneously and that sort of thing.

To be honest I'm still monkeying around with the memory manager until I find a scheme I'm happy with. Right now I'm using a bitmap to track memory allocations and parsing the paging tables to find physical address for a page but this doesn't include reference counting and the like so I need to rethink that some more.

Any and all advice is greatly appreciated as this is very much a 100% new topic for me

Thanks!
Mike

OSDev.org

Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question