Why SMP?

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
rdos
Member
Member
Posts: 3276
Joined: Wed Oct 01, 2008 1:55 pm

Re: Why SMP?

Post by rdos »

I did much of my OS and scheduler before there were any (reasonable) SMP machines to buy, and it is a lot hazzle to convert a single core OS to a multicore one. I eventually succeeded, but it was a lot of work, and I truly would have failed without extensive use of SVN. Once I had hundreds of commits over several months that made the SMP core crash regularly, and I had no idea what I've done wrong. I eventually solved it with merging.

The only really significant difference between a SMP core and a single core OS is that in a SMP core you must use spinlocks instead of cli/sti for synchronization. So, if you put spinlocks in your code from start instead of sti/cli, then the move to SMP will be much easier.

And I think the main motivation that your OS should be able to run on multiple cores is that it is really cool feature to have code run truly in parallel. :-)

I would also want to one day create some kind of monitor / hard real time extension that would run on a dedicated core. Could be used if I once get those fast AD converters that can run in up to several 100s of MHz sample frequency, and so would need to be read regularly in real-time.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Why SMP?

Post by Schol-R-LEA »

rdos wrote:The only really significant difference between a SMP core and a single core OS is that in a SMP core you must use spinlocks instead of cli/sti for synchronization. So, if you put spinlocks in your code from start instead of sti/cli, then the move to SMP will be much easier.
Fair point. There are also some 'lockless' synchronization models which work for multicore/multiprocessor systems, but they all have various requirements or limitations, and none of them are as straightforward as spinlocks IIUC - they would require a much more radical departure when starting from a lock-based single-core system than spinlocks would.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Craze Frog
Member
Member
Posts: 368
Joined: Sun Sep 23, 2007 4:52 am

Re: Why SMP?

Post by Craze Frog »

Adding more cores is like adding more carriages to a train. The train doesn't go faster, in fact it could even go a bit slower, but it can carry twice the amount of cargo in nearly the same time.

Of course, it is required that the cargo can be divided into parts that don't touch each other, because it goes into separate carriages.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Why SMP?

Post by Korona »

Lock-free data structures are not a general replacement for locks. You can implement certain abstract data types in a lock-free way but lock-free operations do not compose (i.e., the composition is not atomic anymore).
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
rdos
Member
Member
Posts: 3276
Joined: Wed Oct 01, 2008 1:55 pm

Re: Why SMP?

Post by rdos »

I would recommend the use of lock-free physical memory allocation. It's not easy, but since pagefaults that potentially will need physical memory allocation can happen almost everywhere (unless you use extensive protective methods), this will make that code a lot easier in multicore systems.
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Why SMP?

Post by LtG »

rdos wrote:I would recommend the use of lock-free physical memory allocation. It's not easy, but since pagefaults that potentially will need physical memory allocation can happen almost everywhere (unless you use extensive protective methods), this will make that code a lot easier in multicore systems.
What do you mean "everywhere"? As in kernel? That's easy, don't let the kernel use demand paged memory.

As for lock-free physical memory allocation, that one I agree with. Though I wouldn't use lockless in the general sense, but rather divide physical memory for each core and let each core do their own PMEM allocation. If one runs out, rebalance, which should be rare and it's ok to take a tiny hit when that happens, instead of a tiny hit with every allocation.
rdos
Member
Member
Posts: 3276
Joined: Wed Oct 01, 2008 1:55 pm

Re: Why SMP?

Post by rdos »

LtG wrote:
rdos wrote:I would recommend the use of lock-free physical memory allocation. It's not easy, but since pagefaults that potentially will need physical memory allocation can happen almost everywhere (unless you use extensive protective methods), this will make that code a lot easier in multicore systems.
What do you mean "everywhere"? As in kernel? That's easy, don't let the kernel use demand paged memory.
That means you need to validate all userdata that is passed to kernel for presence, which results in poor performance. I rely on the page fault handler in those situations instead.
LtG wrote:As for lock-free physical memory allocation, that one I agree with. Though I wouldn't use lockless in the general sense, but rather divide physical memory for each core and let each core do their own PMEM allocation. If one runs out, rebalance, which should be rare and it's ok to take a tiny hit when that happens, instead of a tiny hit with every allocation.
My HRT data acquisition tool might allocate 100GB on a 128GB machine. :-)
User avatar
eekee
Member
Member
Posts: 872
Joined: Mon May 22, 2017 5:56 am
Location: Kerbin
Discord: eekee
Contact:

Re: Why SMP?

Post by eekee »

Interesting thread. I shall have to brush up on spinlocks.
LtG wrote:...divide physical memory for each core and let each core do their own PMEM allocation. If one runs out, rebalance, which should be rare and it's ok to take a tiny hit when that happens, instead of a tiny hit with every allocation.
What happens when threads/processes want to share memory? Do other cores get to look into the memory owned by the core where the shared region was allocated?
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Why SMP?

Post by LtG »

rdos wrote: That means you need to validate all userdata that is passed to kernel for presence, which results in poor performance. I rely on the page fault handler in those situations instead.
Do you mean syscall where the argument is a struct pointing to ten different places in memory, and thus you would have to validate presence to ensure page fault not triggered in kernel land?

If so, then that's not an issue for me, as I don't do complex kernel operations. For those that need such structs to be passed to kernel then that would be a problem.
rdos wrote:
LtG wrote:As for lock-free physical memory allocation, that one I agree with. Though I wouldn't use lockless in the general sense, but rather divide physical memory for each core and let each core do their own PMEM allocation. If one runs out, rebalance, which should be rare and it's ok to take a tiny hit when that happens, instead of a tiny hit with every allocation.
My HRT data acquisition tool might allocate 100GB on a 128GB machine. :-)
Shouldn't be a problem =)
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Why SMP?

Post by LtG »

eekee wrote:
LtG wrote:...divide physical memory for each core and let each core do their own PMEM allocation. If one runs out, rebalance, which should be rare and it's ok to take a tiny hit when that happens, instead of a tiny hit with every allocation.
What happens when threads/processes want to share memory? Do other cores get to look into the memory owned by the core where the shared region was allocated?
My proposal doesn't change any of that. I'm talking about ownership of the region of memory while it's owned by the kernel, so when it's free (or disk cache, which is pretty much the same thing).

So when PMEM is allocated each core can simply consult their own free PMEM list, this is the happy path. Only when a core runs out of it's own free PMEM list does it need to do synchronization, so it can "steal" free PMEM from other cores. It doesn't need to do full re-balancing either, it can just steal from one core.

I go a bit further, I treat each core separately, each having their own "kernel", instead of one kernel being called in all cores. The free PMEM list is essentially just a consequence of that.
rdos
Member
Member
Posts: 3276
Joined: Wed Oct 01, 2008 1:55 pm

Re: Why SMP?

Post by rdos »

LtG wrote: Do you mean syscall where the argument is a struct pointing to ten different places in memory, and thus you would have to validate presence to ensure page fault not triggered in kernel land?

If so, then that's not an issue for me, as I don't do complex kernel operations. For those that need such structs to be passed to kernel then that would be a problem.
No, I've forbidden both structs and enums in syscalls. :-)

I also forbid the implentation of ioctl. :mrgreen:

Still, if you pass a large string or data buffer to something like a filesystem driver it will need to copy it to or from some kernel buffer, and that operation could trigger demand paging. Or if you pass audio-data to the audio driver. Of course, if you want to pass it directly to a buffer ring of some device, then you will need to get the physical pages anyway, and so validation will be relatively cheap.
loonie
Posts: 7
Joined: Sat Jul 06, 2019 3:24 pm

Re: Why SMP?

Post by loonie »

rdos wrote: The only really significant difference between a SMP core and a single core OS is that in a SMP core you must use spinlocks instead of cli/sti for synchronization. So, if you put spinlocks in your code from start instead of sti/cli, then the move to SMP will be much easier..
Exactly, plus one of the bugs that are easier to catch on single cpu with spinlocks is then ring3 does syscall that accesses shared with device driver buffer and all of the sudden interrupt comes thats "exits" and does "sti" to do heavy task of populating shared buffer with events - and it waits until syscall releasses the lock.
On multiple CPUs this could simply manifest itself at slowdown that are harder and harder to catch on modern super fast CPUs. (single x64 core is like 5-10 times faster than in year 2001 i think)
Post Reply