OSDev.org

Posted: **Thu Dec 08, 2011 5:15 pm**

This might be in the wrong section and if so I apologize.

I've been working on a hobby OS for a while now and through several iterations have managed to get most basic functionality working. The sole exception to this is working across multiple cores. I realize that there could be serious issues with interrupting the kernel at various stages but am unaware of details of exactly what to look for in the code. I checked the wiki and there isn't an article talking about either concept (reentrancy or interruptable).

So, can anybody give me a heads up of where to find information about these concepts? I have done some multi-threaded coding on various platforms but this is at a new level for me

Thanks!
Mike

Posted: **Thu Dec 08, 2011 5:37 pm**

Basically it's the same as with mulitthreaded userspace programs: Whenever you access data that can be access from multiple CPUs (i.e. generally everything that is not on the stack), you need locking.

Posted: **Fri Dec 09, 2011 6:34 am**

berkus wrote:Try this.

Bringing SMP to Your UP Operating System.

There are a few errors in that description. For instance, it is claimed that the IO-APIC is "backward compatible" with the PIC, which is not true. Most motherboards still have both one or more IO-APICs and a legacy PIC.

The article also only describes how to start the additional AP-cores, which is the easy part of providing SMP support in an OS.

Posted: **Fri Dec 09, 2011 6:37 am**

Kevin wrote:Basically it's the same as with mulitthreaded userspace programs: Whenever you access data that can be access from multiple CPUs (i.e. generally everything that is not on the stack), you need locking.

Basically, yes, but in addition to that, you need to replace everything that uses cli/sti with spinlocks. This is mostly in single-core OSes to synchronize between IRQs and tasks. Cli/sti no longer is sufficient with SMP. Then you need a SMP-aware scheduler, and synchronization primitives that can handle multiple cores.

Posted: **Fri Dec 09, 2011 6:48 am**

Yes, this follows logically. cli/sti is not a lock, so it's not sufficient to protect shared data.

Posted: **Fri Dec 09, 2011 7:13 am**

Kevin wrote:Yes, this follows logically. cli/sti is not a lock, so it's not sufficient to protect shared data.

Right, but it is typically used as a lock in single-core OSes because of it's simplicity and efficiency. When porting from single core to multi core it might be a good idea to eliminate all occurences of cli/cti outside of scheduler. Provided the single core OS was written to support multithreading in kernel, and doesn't use some kind of exclusive kernel-lock. I'm still not quite finished with this task, but it approaches completion.

Posted: **Fri Dec 09, 2011 8:14 am**

There is hardware that can lock down a bus which in the past have been used by some systems to do Multi processing.
Look at the TAS instruction on the 68K, and Som dual port RAM supports it too as a function.(test and set)

Just be sure you make your locking mechanism atomic.

Posted: **Fri Dec 09, 2011 10:28 am**

Hi,

berkus wrote:Try this.

Bringing SMP to Your UP Operating System.

That's a "tip of the iceberg" summary...

For the paragraph entitled "Other Considerations", I'd recommend the following approach:

First, use "grep" to find each occurrence of the "CLI" instruction; and write down which sub-system it was in (e.g. physical memory manager, scheduler, Realtek ethernet driver, whatever). Once you've got your list of effected sub-systems, completely redesign each sub-system to suit multi-CPU and make sure it can handle at least 32 CPUs pounding the daylights out of it without significant problems with lock contention, etc. Once you've finished that, completely redesign any other sub-system that wasn't on the original list in the same way.

If this sounds like a lot of work, then there *is* a faster way - delete your old kernel (or do a "feature freeze") and start from scratch.

There's also a much slower and much more painful way; and that's to do many smaller redesigns for each small piece (one at a time) while trying very hard not to break compatibility with any of the other pieces. This method is only sane for projects with a large number of volunteers (to reduce the chance of volunteers losing interest while it's happening).

Cheers,

Brendan

Posted: **Fri Dec 09, 2011 1:38 pm**

Brendan wrote:First, use "grep" to find each occurrence of the "CLI" instruction; and write down which sub-system it was in (e.g. physical memory manager, scheduler, Realtek ethernet driver, whatever). Once you've got your list of effected sub-systems, completely redesign each sub-system to suit multi-CPU and make sure it can handle at least 32 CPUs pounding the daylights out of it without significant problems with lock contention, etc. Once you've finished that, completely redesign any other sub-system that wasn't on the original list in the same way.

If this sounds like a lot of work, then there *is* a faster way - delete your old kernel (or do a "feature freeze") and start from scratch.

To use grep to find all occurrences of "CLI" is good advice. If most of these relate to synchronization issues not related to ISRs, then you have big trouble and need to take Brendan's advice - delete your kernel

Here are some other considerations:

1. Does the kernel support kernel threads, or at least multiple threads executing kernel code - if not, delete your kernel (such designs are likely not done with multitasking in mind, and thus require huge efforts to update)

2. Is it possible to keep the kernel synchronization interface intact with a SMP solution - if not, delete your kernel (changing the synchronization interface for SMP will likely break a lot of code and thus cause huge trouble).

Brendan wrote:There's also a much slower and much more painful way; and that's to do many smaller redesigns for each small piece (one at a time) while trying very hard not to break compatibility with any of the other pieces. This method is only sane for projects with a large number of volunteers (to reduce the chance of volunteers losing interest while it's happening).

Nah, for a single-developper project with 195,000 source lines, deleting the kernel is not an attractive option. I only deleted the scheduler (more or less). Currently, I have removed all "cli" from device-drivers, and only have 30-40 questionable occurences left in kernel.

Posted: **Fri Dec 09, 2011 1:55 pm**

My experience is that rewriting from scratch is in most cases the wrong approach. It means that you go a big step backwards and will have to redo all that boring stuff that you already did before. There are a few people who don't seem to have a problem with this, for the rest of us it kills the motivation.

Posted: **Fri Dec 09, 2011 2:05 pm**

Kevin wrote:My experience is that rewriting from scratch is in most cases the wrong approach. It means that you go a big step backwards and will have to redo all that boring stuff that you already did before. There are a few people who don't seem to have a problem with this, for the rest of us it kills the motivation.

Exactly. I have zero motivation for starting a new OS project from scratch. If the x86 platform becomes obsolete, I will simply do something else than writing OSes rather than start from scratch with another processor / language.

Posted: **Fri Dec 09, 2011 10:21 pm**

Actually, I'm already at the "redesign and rewrite" portion. Basically I was looking for resources so I can learn enough about it to design it properly. I've done several toy OS in the past but this is my first 64-bit and first attempt at an MP design so I want to be sure to do it right the first time. Heck, I've not even written the memory manager yet as I'm trying to see what issues lie ahead

Thanks!
Mike

Posted: **Fri Dec 09, 2011 11:38 pm**

There are very few resources and in fact I couldn't really find anything useful. Most of my SMP stuff came from people on this list and my own experiments.

My advice is to start over, building your kernel from scratch and retesting as you go. Of course there should be lots of code in your kernel that you can re-use. Don't delete your current kernel but be prepared to rewrite huge chunks of it.

Actually getting multiple cores up and running is not too difficult and once you do that you should spend some time making sure that your locking is working and working efficiently. I was very surprised at how bad my initial spinlock implementation performed. You'll need to really think though your locking strategy and decide early on whether you will allow lock-holders to sleep or re-schedule. Decisions like that really affect the design that you will need and adding bits on at the end might be pretty difficult.

There will be a number of things needed that you probably haven't considered in a single-core system. One example is reference counting. There is very little information around about how this should be done in an SMP system and it may be that you find out by trial and error the best way to implement it. In the end I've basically done it myself to fit my kernel.

Many hobby OS'es don't allow system calls to be interrupted by the schedule and thus allowing 2 threads in the kernel at once will be a problem. If your OS is like this the design change might be significant.

BTW, what Rdos said about bringing up multiple cores is true. It's about 1% of the problem. I suspect that work on BLT stalled right after the multiple cores were brought up because the rest of SMP is just very difficult.

BTW2, the reason Rdos could simply search and replace the cli instructions with spinlocks in his OS was because his OS was (probably) already re-entrant and supported multiple interrupts etc. already. Thus it needed fewer changes to be SMP supporting. Is that correct Rdos ?

Posted: **Sat Dec 10, 2011 3:37 am**

Hi,

gerryg400 wrote:BTW, what Rdos said about bringing up multiple cores is true. It's about 1% of the problem. I suspect that work on BLT stalled right after the multiple cores were brought up because the rest of SMP is just very difficult.

As a very rough guide, I'd say that topology detection (how many NUMA domains, how many cores in each, and how many logical CPUs) plus AP CPU startup is about 5% of the work. ACPI or MP Spec table parsing and rewriting the IRQ (IO APIC) handling is another 5%. Making it work on multi-CPUs (e.g. replacing "CLI" with spinlocks, replacing a "current_process_ID" global variable with per-CPU data, etc) would be another 10%.

The remaining 80% is replacing code that works on multi-CPU but has "exponential suckage" (poor scalability). This means researching lockless algorithms for IPC, splitting "one lock for physical memory management" into "thousands of locks", finding ways to minimise the work done in critical sections, finding ways to minimise IPIs (for TLB shootdown, etc), figuring out how to avoid a "global tick" while keeping each CPU's time in sync, finding things that idle CPUs can do to improve performance in future (when the system is under load), CPU load balancing (and a whole pile of IO prioritisation if you don't have it already), cache management (reducing false sharing), etc.

Of course that's only the first 100%. The second 100% is in user-space (rewriting GUIs, applications, utilities, etc to make use of multiple threads/CPUs efficiently).

Kevin wrote:My experience is that rewriting from scratch is in most cases the wrong approach. It means that you go a big step backwards and will have to redo all that boring stuff that you already did before. There are a few people who don't seem to have a problem with this, for the rest of us it kills the motivation.

That depends. If you don't have much experience and can only see the initial 5% to 10%, then it's easy to think that it won't be too hard. By the time you get about 20% done you realise there's another 20% after that, but you've already done the first 20% so another 20% doesn't look too hard either. By the time you actually know what is involved for the entire 100%, it's too late and most of it has already been rewritten twice.

If you do know exactly what the entire 100% involves from the start, it's a completely different proposition. When you're sitting on the ground trying to see the top of "turd mountain" high above in the clouds and know that one false move during your climb will start an avalanche; it's much easier to just wipe the slate clean and start building a new foundation of roses.

Cheers,

Brendan

Posted: **Sat Dec 10, 2011 3:50 am**

Very poetic =) I like it

I just wan't to add that the initial 10% probably doesn't need a rewrite so you could start with that and then dive into the pile of turd later when you have thought things through.

OSDev.org

Reentrant and interruptable kernel theory question

Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question

Re: Reentrant and interruptable kernel theory question