Fun with volatile

Colonel Kernel · Post by **Colonel Kernel** » Sat Sep 03, 2005 12:49 am

I've become concerned with my lack of use of the keyword "volatile" in my C and C++ code. Paranoia is creeping in.

Rather than launch into a lengthy example from my C code, I'll present a much simpler one from C++. Assume this is compiled on an x86 box and that InterlockedIncrement() and InterlockedDecrement() are written in assembler and use the LOCK prefix.

Here's the C++ code. It's useless, but it only serves to make a point:

Code: Select all

class Foo
{
public:
    Foo()
    {
        this->refcount = 0;
    }

    long addRef() const
    {
        return ::InterlockedIncrement( &(this->refcount) );
    }

    long release() const
    {
        long ref = ::InterlockedDecrement( &(this->refcount) );
        if (ref == 0)
        {
            delete this;
        }
        return ref;
    }

private:
    mutable long refcount;
};

Is this code thread-safe, or does it need volatile somewhere? If it needs volatile, where should I put it? If I don't want to pay the cost of volatile for cases when I know a particular Foo instance isn't shared by multiple threads/processors, what can I do? ???

df · Post by df » Sat Sep 03, 2005 10:23 am

are you sure volatile has anything to do with thread safe? for that you should be using mutexs or locking etc... volatile doesnt mean thread safe...

LOCK just holds the buss (iirc) unles things have changed...

meh i need food and a cup of tea...

Colonel Kernel · Post by **Colonel Kernel** » Sat Sep 03, 2005 11:06 am

I thought about it some more, and the code makes sense as-is. I'm not using mutexes because it would be overkill for something as simple as incrementing a refcount... Also because the example is similar to the long & complicated C example from my kernel, which has no mutexes.

I'm trying to characterize the situations in which you actually need volatile. It seems to me that if you encapsulate all global variables in objects, you can avoid volatile altogether.

Here's where I think you need volatile:

If you're reading from a dereferenced pointer more than once in the same function, where that pointer points to data that may be modified by another thread/interrupt handler/signal handler/etc.
If you're reading from a global variable more than once in the same function that might be modified by another thread/interrupt handler/signal handler/etc.
If you're reading from a local variable more than once in the same function and some other thread/interrupt handler/signal handler/etc. has a pointer to that local variable through which they can modify it (yucky, but possible).

<edit>Add "writing to" in addition to "reading from" above, where the other thread/interrupt handler/signal handler/etc. is doing the reading...</edit>

I think that's an exhaustive list, but I'm not sure. I want to be really sure of what I'm doing, or I may encounter strange bugs in my kernel in the future (I didn't put this on the OS forum since I figured it was a C/C++ language question).

So... does the list look true and complete?

Colonel Kernel · Post by **Colonel Kernel** » Mon Sep 05, 2005 10:07 pm

<bump/>

I would have thought our resident language lawyers and experienced OS devers would have had fun with this one. It seems that "volatile" is like the elephant in the room... no one can ignore it, but everyone wants to pretend that it isn't there.

Doesn't anyone else want to try and understand volatile better to avoid "debugging sessions darker than night"?

Ok, maybe I'll help by providing an example that isn't completely off-base, like my last one was.

Code: Select all

class GimmeStatus
{
public:
    GimmeStatus()
    {
        this->status = 0;
    }

    int getStatus() const
    {
        // Hmmm... unsynchronized read. Could there be problems?
        return this->status;
    }

    void setStatus( int newStatus )
    {
        // Use this as an atomic write, throw away the return value.
        ::InterlockedExchange( &(this->status), newStatus );
    }

private:
    int status;
};

Ok, according to what I know about ::InterlockedExchange() and x86, this code should be fine as long as the "status" field is aligned on a 4-byte boundary. By "fine", I mean multiple threads can access the status of a GimmeStatus simultaneously and get the same answer, and if one of the threads sets the status, the other threads will at some point see the new status, and not some messed up half-and-half (which might happen if the reads and writes were to unaligned memory).

With me so far?

Here's where things get scary IMO:

Code: Select all

static GimmeStatus globalStatus;

void thread1()
{
    // Wait for the status to change.
    while (globalStatus.getStatus() == 0)
    {
        do_something_else();
    }

    do_something_important();
}

void thread2()
{
    do_something_time_consuming();
    globalStatus.setStatus( 1 );
    do_something_else();
}

Would this work? What happens if the compiler inlines the call to getStatus() and the optimizer puts the status field in a register? Could this happen?

This is why volatile worries me -- I don't understand enough about the situations where it's necessary.

gaf · Post by **gaf** » Wed Sep 07, 2005 6:54 am

Colonel Kernel wrote:Multiple threads can access the status of a GimmeStatus simultaneously and get the same answer, and if one of the threads sets the status, the other threads will at some point see the new status, and not some messed up half-and-half (which might happen if the reads and writes were to unaligned memory).

The x86 instructions are (almost) all atomic which means that you won't see any half-written variables. On a multiprocessor the lock prefix can be used to make sure that only one processor at a time can access the systembus to write to a specific memory word. This would for example be necessary when several processes on multiple CPUs are waiting for a mutex protecting a region of shared memory (x86 synchronisation).

Colonel Kernel wrote:What happens if the compiler inlines the call to getStatus() and the optimizer puts the status field in a register? Could this happen?

Yep, this is possible and according to Murphy's law it will happen sooner or later. Volatile won't help you here as it only says that a memory word may not be cached but has to be read/written every single time it is accessed from memory. To me your problem looks like a classical race situation which might be solved using a mutex/semaphore..

regards,
gaf

Colonel Kernel · Post by **Colonel Kernel** » Wed Sep 07, 2005 9:37 am

gaf wrote:Volatile won't help you here as it only says that a memory word may not be cached but has to be read/written every single time it is accessed from memory.

Not "cached" per se, but "cached in a register", right?

From here:

Incidentally, this issue is not related to CPU caches except that re-loading of variables into registers may involve cache hits or misses.

To me your problem looks like a classical race situation which might be solved using a mutex/semaphore..

I don't see this as a race condition, but rather as a problem with thread1 always reading stale data because of some inappropriate compiler trickery.

If the status field were read from memory on every pass through thread1's while loop, and that status field is properly aligned, why wouldn't the problem be solved? Eventually thread1 would see thread2's change to the value, and pop out of the while loop like it's supposed to. Check out this example -- it seems to be the same in spirit, although it's vanilla C with no classes. Adding classes makes the whole issue a lot harder to understand IMO. :-\

I realize that if I implemented an "atomic_read()" function it would make this problem go away -- not because the read would be atomic (which it probably already is), but because it would be an asm function that the compiler can't inline and the optimizer can't touch. It's just that this seems to me to be the purpose of volatile in the first place...

JoeKayzA · Post by **JoeKayzA** » Wed Sep 07, 2005 12:14 pm

Colonel Kernel wrote: I realize that if I implemented an "atomic_read()" function it would make this problem go away -- not because the read would be atomic (which it probably already is), but because it would be an asm function that the compiler can't inline and the optimizer can't touch. It's just that this seems to me to be the purpose of volatile in the first place...

Had a quick look at your code. 'volatile' would do exactly what you need, IMO, so why not use it? I find it strange too, however, that you need a special keyword to make sure the code really behaves like it was expected to (that a variable is a memory location)

.

OTOH, the reason why you don't see volatile that often is probably that there are so few occasions where you really need it (the things you listed above - provided that you deny using mutexes for some reasons), and don't forget that a call to a function 'atomic_read()' is probably more self-explaining than 'volatile'.

cheers Joe

Colonel Kernel · Post by **Colonel Kernel** » Wed Sep 07, 2005 12:37 pm

JoeKayzA wrote:Had a quick look at your code. 'volatile' would do exactly what you need, IMO, so why not use it?

I could, I suppose. It's a little wonky thinking about how volatile applies to objects instead of simple primitive-typed variables.

More importantly, I'm worried that my "spider senses" might fail me in the future.

It's difficult to come up with good rules of thumb on when you really need to use volatile.

Joel (not logged in) · Wed Sep 07, 2005 2:34 pm

volatile should be used whenever there's a chance that a variable's value can be modified somewhere other than in the code that's using it, for exactly the reason that optimizers will frequently optimize away spinners and the like if it sees that the value never changes in the loop.

See http://www.flounder.com/badprogram.htm#volatile

Colonel Kernel · Post by **Colonel Kernel** » Wed Sep 07, 2005 6:28 pm

Thanks for the link. Unfortunately, the example there is much too simple (as usual).

I discovered this article which is quite interesting. However, there is something there that I can't quite bring myself to believe:

Inside a critical section defined by a mutex, only one thread has access. Consequently, inside a critical section, the executing code has single-threaded semantics. The controlled variable is not volatile anymore ? you can remove the volatile qualifier.

Imagine in my example that I use a mutex instead of InterlockedExchange(), and I lock the same mutex in getStatus(). IMO, the "nightmare scenario" I described is still possible in principle. Why wouldn't the optimizer just produce code like this for thread1?

Code: Select all

void thread1()
{
    // Wait for the status to change.
    LockMutex( globalStatus.mutex );
    register int cachedStatus = globalStatus.status;
    if (cachedStatus != 0) goto end_loop;
    UnlockMutex( globalStatus.mutex );
    do_something_else();

start_loop_pass_2:    
    LockMutex( globalStatus.mutex );
    if (cachedStatus != 0) goto end_loop;
    UnlockMutex( globalStatus.mutex );
    do_something_else();
    goto start_loop_pass_2;

end_loop:
    do_something_important();
}

The main reason I can think of why an optimizer wouldn't do this is because it can't determine that LockMutex() and UnlockMutex() won't change globalStatus.status as a side effect. In other words, most multithreaded code that doesn't use volatile currently works by accident, not by design.

Does this make sense?

This is why I'm so concerned. If this was just about writing new code, you'd just be thinking "why doesn't this crazy guy just use volatile and move on?"

The real problem is that I'm responsible for many thousands of lines of production code that is supposed to be thread-safe and yet doesn't use volatile anywhere (yes, it does use critical sections and interlocked operations where needed). This is why I'm so anxious to understand all the ramifications of not using volatile (beyond the dirt-simple examples that litter the web).

Joel (not logged in) · Wed Sep 07, 2005 7:33 pm

Unfortunately, I can't give anything more useful, because the only time I've ever been bitten by not using volatile is in exactly that kind of oversimple code.

In fact, the only time not using volatile would be bad is if you've got a value that might be cached that can be modified away from the code that's doing the caching.

The trouble is that what exactly falls under this umbrella is hard to say. I think the answer is more along the lines that, yes, any code that accesses values that might be changed concurrently in another thread but doesn't use volatile is technically broken; however it is very unlikely to actually break.

Actually, based on what MSDN says about volatile it also sounds like it could bite you if something else is reading the data and you're writing to it:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm/key_s-z_10.asp

but that just sounds like the other side of the fence. They're the same problem: the effects of optimization on concurrent access.

AR · Post by AR » Thu Sep 08, 2005 1:46 am

Essentially "volatile" simply means "do not cache contents, always reread first" so it would be used in any circumstance fitting that criteria which would basically fit any variable that could be modified by another thread whilst it's in use within the current scope of the language (ie. within the function, code block, if statement, etc). The compiler can't tell that the variable may be modified so volatile informs it that it may be and it is unfortunately your only way to tell it as much other than manually manipulating the scope.

Your globalStatus example would probably fit this whereas if you mutexed the whole function rather than individual accesses then you could avoid volatile [since the variable is locked for the scope of the function, it can be guaranteed that the only modifications will come from the current path of execution]. This basically sets the rule of thumb to: "If a variable may be modified by another thread and we can't mutex the variable for the entire function [or whatever the current scope is] then it must be declared volatile to prevent caching between individual 'sessions' (lock/unlocks)"

A possible optimization to avoid blanket global volatile is to trick your way around the compiler with case by case volatile using pointers:

Code: Select all

int globalvar;
int multithreadproblem()
{
    //Although it applies to the pointer rather than the source, it should make no difference as it will continually reread the source in case the pointer changed
    volatile int *globalvarptr = &globalvar;

    ...
}

(Note that this is derived mainly from logical deductions based on various bits of documentation without direct testing)

gaf · Post by **gaf** » Thu Sep 08, 2005 6:00 am

Colonel Kernel wrote:I discovered this article which is quite interesting. However, there is something there that I can't quite bring myself to believe:

"Inside a critical section defined by a mutex, only one thread has access. Consequently, inside a critical section, the executing code has single-threaded semantics. The controlled variable is not volatile anymore - you can remove the volatile qualifier."

The 'remove' doesn't mean that you the class needn't be declared as volatile any longer but just that, while inside the mutex, the data can be accessed as in a single threaded app and caching is thus possible. Once the critical region is left, however, the data is volatile again as it can be modified by another thread.

"You enter a critical section by locking a mutex. You remove the volatile qualifier from a type by applying a const_cast. If we manage to put these two operations together, we create a connection between C++'s type system and an application's threading semantics. We can make the compiler check race conditions for us."

How this might work in detail is actually described in the paragraph following the one I've just quoted. It's pretty similiar to what AR has already said and allows an apps to benefit from optimization wherever possible while keeping it thread-safe.

regards,
gaf

Colonel Kernel · Post by **Colonel Kernel** » Thu Sep 08, 2005 11:41 am

AR wrote:Your globalStatus example would probably fit this whereas if you mutexed the whole function rather than individual accesses then you could avoid volatile [since the variable is locked for the scope of the function, it can be guaranteed that the only modifications will come from the current path of execution].

Which would render the example pointless, since thread1() would never exit the loop. But I understand what you're saying...

Code: Select all
int globalvar;
int multithreadproblem()
{
    //Although it applies to the pointer rather than the source, it should make no difference as it will continually reread the source in case the pointer changed
    volatile int *globalvarptr = &globalvar;

    ...
}
(Note that this is derived mainly from logical deductions based on various bits of documentation without direct testing)

The way you've declared it, volatile actually does apply to the pointed-to variable and not to the pointer itself, which is exactly what you want. A volatile pointer to a non-volatile int would look like this:

Code: Select all

int* volatile foo;

Same rules as for const.

gaf wrote:The 'remove' doesn't mean that you the class needn't be declared as volatile any longer but just that, while inside the mutex, the data can be accessed as in a single threaded app and caching is thus possible. Once the critical region is left, however, the data is volatile again as it can be modified by another thread.

Ahhh... you're right (and so is Andrei... how could I ever doubt the Amazing Alexandrescu?

). I thought about my example some more and realized that getStatus() would be written like this according to the article (minus the smart pointer mojo):

Code: Select all

    int getStatus() const volatile
    {
        int myStatus = 0;
        LockMutex( this->mutex );
        GimmeStatus* nonVolatileThis =
            const_cast<const GimmeStatus*>( this );
        myStatus = nonVolatileThis->GetStatus();
        UnlockMutex( this->mutex );
        return myStatus;
    }

    // Single-threaded version.
    int getStatus() const
    {
        return this->status;
    }

This would be ok, I think, as long as the static GimmeStatus instance is also declared volatile. This makes my head hurt...

NotTheCHEAT · Post by **NotTheCHEAT** » Thu Sep 08, 2005 9:25 pm

I don't think you need volatile in that example. The one thing I would worry about is if your compiler makes any rash assumptions about the value it assumes that variable has, during run-time. If the asm function is supposed to modfiy the variable then it shouldn't make a lot of difference, except where your overly optimizing compiler makes bad assumptions. If the variable needs to be saved before and restored after the asm function, then you might want a volatile, unless the asm code PUSH and POP's the variable or something.

I'm not an expert on this, I only say what I know. Forgive me if i'm only pointing out the obvious :-[

OSDev.org

Fun with volatile

Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile

Re:Fun with volatile