Page 1 of 2

Multiprocessing critical regions

Posted: Mon Apr 14, 2008 1:46 am
by Jeko
Which are typical critical regions for multiprocessing? Where I must put the spinlocks?

For now I put them only in the scheduler function.

Posted: Mon Apr 14, 2008 1:51 am
by JamesM
Anywhere you change non-thread-local data.

Posted: Mon Apr 14, 2008 3:18 am
by zaleschiemilgabriel
I've been asking myself that same question about hard disk access. Can two processors issue commands to an ATA device at the same time? I assume the command registers have to be set accordingly, so that's a big NO, but what would happen if a CPU accesses the status register and figures it's ok to send a command, but another CPU immediately steps in and sends another command before the first one? I've never written a hdd driver before, so I might be talking nonsense (maybe this sort of sync is done automatically by the APICs or something).

If anyone knows of similar cases with other devices, I'm making a list. :)

Cheers,
Gabriel

Posted: Mon Apr 14, 2008 3:39 am
by JoeKayzA
zaleschiemilgabriel wrote:If anyone knows of similar cases with other devices, I'm making a list. :)
Well, the IDE controller or hd doesn't care or know about how many cpus there are nor do they do any syncing or multiplexing, so you have to make sure on your own that only one cpu at a time (and only one thread at a time, if you are in preemtible code areas) deals with it and locks the device as long as it's busy (i.e., as long as it has inconsistent state). This is obviously the case for *all* devices. The APICs don't have anything to do with this, an APIC just delivers interrupts to a cpu core, but it isn't involved in io operations or anything device specific.

Posted: Mon Apr 14, 2008 3:42 am
by Korona
You have to make sure that only one command is sent to the ata controller at the same time. That applies to every other device (e.g. floppy disk, network cards) that is not part of the cpu itself, too (e.g. you don't need locks to protect the local apic registers). The cpu cannot do that kind of synchronization for you as it does not understand the ata protocol. It just sends bytes over the pci bus.

Posted: Mon Apr 14, 2008 3:43 am
by AJ
Hi,

I'm sorting out the same kind of thing at the moment.
what would happen if a CPU accesses the status register and figures it's ok to send a command, but another CPU immediately steps in and sends another command before the first one?
This is why you need to use spinlocks with an atomic test and write function. The spinlock really needs to be at the point of your hard disk driver, not at the point of port access. This is because you need to ensure that a process can either lock *all* or *none* of the resources it needs for a particular task in order to avoid deadlock and data corruption (particularly if a device has a range of ports).

It is also no good if one process sends its HDD request, finishes and releases the port range and then a second process locks and writes to the same ports, halfway through the disk access. For device access, therefore, the entire device needs locking out for the entirity of the operation. This means using some kind of messaging system or other queue control.
Which are typical critical regions for multiprocessing? Where I must put the spinlocks?
Agree with JamesM. To expand with some examples, you will need locks for physcal memory allocation, "kernel-space" and shared heap allocation, device drivers, VFS functions, thread queue accesses, semaphores and so on...

Cheers,
Adam

Posted: Mon Apr 14, 2008 4:40 am
by Combuster
Well, there's still the lock-free/wait-free class of algorithms. Which means that in my case, large parts of memory management require no locking, just a few atomic read-modify-write operations. (XADD, CMPXCHG and BTS rock)
I can imagine the same principles being applied to hardware too.

However, I agree that locking is the easier route to take.

Posted: Mon Apr 14, 2008 4:47 am
by zaleschiemilgabriel
IMO it's not the easier route, but the safest. Can you please explain the lock-free/wait-free method a bit?

Posted: Mon Apr 14, 2008 4:53 am
by Combuster
zaleschiemilgabriel wrote:IMO it's not the easier route, but the safest.
singletasking is easier, true, but we want correctness in a multiprocessor environment, no?
Can you please explain the lock-free/wait-free method a bit?
Have you done your own research yet?

Posted: Mon Apr 14, 2008 4:56 am
by JamesM
zaleschiemilgabriel wrote:IMO it's not the easier route, but the safest. Can you please explain the lock-free/wait-free method a bit?
It's *definately* the easier route. Locking is much easier than atomic ops.

In an algorithm where you can identify that the invariant is maintained throughout all code bar a few lines, for example - you calculate something (in thread local storage) then add it to a global accumulator. All this code except the global addition is preemptible and safe. The last one needs exclusivity - you could make a lock, grab it, update the global and release the lock, OR you could use some atomic ops to atomically change the value of that global variable.

Whether you can or not depends on the processor, but instructions like CMPXCHG on the x86 allow you to do this.

Posted: Mon Apr 14, 2008 8:00 am
by bewing
Or, you could just insist that only one thread on one core is ever allowed to access the device, and all requests are funneled through that one process, through IPC. Then you don't have to ever lock a damned thing.

Posted: Mon Apr 14, 2008 8:05 am
by JamesM
bewing wrote:Or, you could just insist that only one thread on one core is ever allowed to access the device, and all requests are funneled through that one process, through IPC. Then you don't have to ever lock a damned thing.
That method is actually described in the wiki link Combuster posted.

Posted: Mon Apr 14, 2008 8:09 am
by zaleschiemilgabriel
Plainly put, those instructions are sync'ed among processors? Is there anything specific that makes them that way? I think this is the point where I get overly inquisitive and should just STFU and google? :D
Just wanted to say thanks... and also to point out that actually accessing thread-local-data is not the only circumstance in which you need locks; you also need them for accessing device registers in an atomic fashion. :wink:
If there are any other resources you might know of that require similar sync'ing, I'm still making that list...

Posted: Mon Apr 14, 2008 9:09 am
by Combuster
That list of devices depends on the functional behaviour of an individual device, your requirements, and your experience points in Advanced Wizardry.

In other words: different per person.

I wouldn't lock thread-local data. As the name suggests, it's limited to a thread, hence there exist no concurrent access.

As for how specific instructions work, see the intel manuals. I recommend getting some background education on concurrent programming first as they don't make sense really without that underlying knowledge.

Posted: Mon Apr 14, 2008 12:34 pm
by Jeko

Code: Select all

typedef struct spinlock

{

	volatile unsigned long lock;

} spinlock_t;

//! Lock the semaphore.

static __INLINE__ void spin_lock( spinlock_t *lock )

{

	__asm__ __volatile__(

		"1: lock ; decb %0\n"

		"jge 3f\n"

		"2:\n"

		"cmpb $0, %0\n"

		"rep; nop\n"

		"jle 2b\n"

		"jmp 1b\n"

		"3:\n"

		: "=m"(lock->lock) : : "memory");

}



//! Unlock the semaphore.

static __INLINE__ void spin_unlock( spinlock_t *lock )

{

	__asm__ __volatile__(

		"movb $1, %0"

		: "=m"(lock->lock) : : "memory");

}
This is my code for spinlocks.
In a critical section I do:

Code: Select all

void scheduler(unsigned int stack)
{
       unsigned int flags;
       local_irq_save(flags);
       spin_lock(sched_lock);

       ...
       //Code of the scheduler
       ...

       spin_unlock(sched_lock);
       local_irq_restore(flags);
}
Is this a good method?
Or there are faster or safer methods?