Multiprocessing critical regions

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Multiprocessing critical regions

Post by Jeko »

Which are typical critical regions for multiprocessing? Where I must put the spinlocks?

For now I put them only in the scheduler function.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

Anywhere you change non-thread-local data.
User avatar
zaleschiemilgabriel
Member
Member
Posts: 232
Joined: Mon Feb 04, 2008 3:58 am

Post by zaleschiemilgabriel »

I've been asking myself that same question about hard disk access. Can two processors issue commands to an ATA device at the same time? I assume the command registers have to be set accordingly, so that's a big NO, but what would happen if a CPU accesses the status register and figures it's ok to send a command, but another CPU immediately steps in and sends another command before the first one? I've never written a hdd driver before, so I might be talking nonsense (maybe this sort of sync is done automatically by the APICs or something).

If anyone knows of similar cases with other devices, I'm making a list. :)

Cheers,
Gabriel
User avatar
JoeKayzA
Member
Member
Posts: 79
Joined: Wed Aug 24, 2005 11:00 pm
Location: Graz/Austria

Post by JoeKayzA »

zaleschiemilgabriel wrote:If anyone knows of similar cases with other devices, I'm making a list. :)
Well, the IDE controller or hd doesn't care or know about how many cpus there are nor do they do any syncing or multiplexing, so you have to make sure on your own that only one cpu at a time (and only one thread at a time, if you are in preemtible code areas) deals with it and locks the device as long as it's busy (i.e., as long as it has inconsistent state). This is obviously the case for *all* devices. The APICs don't have anything to do with this, an APIC just delivers interrupts to a cpu core, but it isn't involved in io operations or anything device specific.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Post by Korona »

You have to make sure that only one command is sent to the ata controller at the same time. That applies to every other device (e.g. floppy disk, network cards) that is not part of the cpu itself, too (e.g. you don't need locks to protect the local apic registers). The cpu cannot do that kind of synchronization for you as it does not understand the ata protocol. It just sends bytes over the pci bus.
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Post by AJ »

Hi,

I'm sorting out the same kind of thing at the moment.
what would happen if a CPU accesses the status register and figures it's ok to send a command, but another CPU immediately steps in and sends another command before the first one?
This is why you need to use spinlocks with an atomic test and write function. The spinlock really needs to be at the point of your hard disk driver, not at the point of port access. This is because you need to ensure that a process can either lock *all* or *none* of the resources it needs for a particular task in order to avoid deadlock and data corruption (particularly if a device has a range of ports).

It is also no good if one process sends its HDD request, finishes and releases the port range and then a second process locks and writes to the same ports, halfway through the disk access. For device access, therefore, the entire device needs locking out for the entirity of the operation. This means using some kind of messaging system or other queue control.
Which are typical critical regions for multiprocessing? Where I must put the spinlocks?
Agree with JamesM. To expand with some examples, you will need locks for physcal memory allocation, "kernel-space" and shared heap allocation, device drivers, VFS functions, thread queue accesses, semaphores and so on...

Cheers,
Adam
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

Well, there's still the lock-free/wait-free class of algorithms. Which means that in my case, large parts of memory management require no locking, just a few atomic read-modify-write operations. (XADD, CMPXCHG and BTS rock)
I can imagine the same principles being applied to hardware too.

However, I agree that locking is the easier route to take.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
zaleschiemilgabriel
Member
Member
Posts: 232
Joined: Mon Feb 04, 2008 3:58 am

Post by zaleschiemilgabriel »

IMO it's not the easier route, but the safest. Can you please explain the lock-free/wait-free method a bit?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

zaleschiemilgabriel wrote:IMO it's not the easier route, but the safest.
singletasking is easier, true, but we want correctness in a multiprocessor environment, no?
Can you please explain the lock-free/wait-free method a bit?
Have you done your own research yet?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

zaleschiemilgabriel wrote:IMO it's not the easier route, but the safest. Can you please explain the lock-free/wait-free method a bit?
It's *definately* the easier route. Locking is much easier than atomic ops.

In an algorithm where you can identify that the invariant is maintained throughout all code bar a few lines, for example - you calculate something (in thread local storage) then add it to a global accumulator. All this code except the global addition is preemptible and safe. The last one needs exclusivity - you could make a lock, grab it, update the global and release the lock, OR you could use some atomic ops to atomically change the value of that global variable.

Whether you can or not depends on the processor, but instructions like CMPXCHG on the x86 allow you to do this.
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Post by bewing »

Or, you could just insist that only one thread on one core is ever allowed to access the device, and all requests are funneled through that one process, through IPC. Then you don't have to ever lock a damned thing.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

bewing wrote:Or, you could just insist that only one thread on one core is ever allowed to access the device, and all requests are funneled through that one process, through IPC. Then you don't have to ever lock a damned thing.
That method is actually described in the wiki link Combuster posted.
User avatar
zaleschiemilgabriel
Member
Member
Posts: 232
Joined: Mon Feb 04, 2008 3:58 am

Post by zaleschiemilgabriel »

Plainly put, those instructions are sync'ed among processors? Is there anything specific that makes them that way? I think this is the point where I get overly inquisitive and should just STFU and google? :D
Just wanted to say thanks... and also to point out that actually accessing thread-local-data is not the only circumstance in which you need locks; you also need them for accessing device registers in an atomic fashion. :wink:
If there are any other resources you might know of that require similar sync'ing, I'm still making that list...
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

That list of devices depends on the functional behaviour of an individual device, your requirements, and your experience points in Advanced Wizardry.

In other words: different per person.

I wouldn't lock thread-local data. As the name suggests, it's limited to a thread, hence there exist no concurrent access.

As for how specific instructions work, see the intel manuals. I recommend getting some background education on concurrent programming first as they don't make sense really without that underlying knowledge.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Post by Jeko »

Code: Select all

typedef struct spinlock

{

	volatile unsigned long lock;

} spinlock_t;

//! Lock the semaphore.

static __INLINE__ void spin_lock( spinlock_t *lock )

{

	__asm__ __volatile__(

		"1: lock ; decb %0\n"

		"jge 3f\n"

		"2:\n"

		"cmpb $0, %0\n"

		"rep; nop\n"

		"jle 2b\n"

		"jmp 1b\n"

		"3:\n"

		: "=m"(lock->lock) : : "memory");

}



//! Unlock the semaphore.

static __INLINE__ void spin_unlock( spinlock_t *lock )

{

	__asm__ __volatile__(

		"movb $1, %0"

		: "=m"(lock->lock) : : "memory");

}
This is my code for spinlocks.
In a critical section I do:

Code: Select all

void scheduler(unsigned int stack)
{
       unsigned int flags;
       local_irq_save(flags);
       spin_lock(sched_lock);

       ...
       //Code of the scheduler
       ...

       spin_unlock(sched_lock);
       local_irq_restore(flags);
}
Is this a good method?
Or there are faster or safer methods?
Post Reply