OS Support For Transactional Memory

Brendan · Post by **Brendan** » Thu May 03, 2007 12:34 am

Hi,

Lately I've been thinking about transactional memory (the wikipedia page has a very good explanation of transactional memory).

I don't think transactional memory is suitable for situations where you need good performance under high contention, but IMHO transactional memory would work very well for software that doesn't expect much contention. It would make it much easier to write thread-safe code, and it would make a very nice addition to an OS (especially considering that a lot of application software doesn't really have high contention).

What I'm considering is supporting transactional memory directly in the OS's linear memory manager (instead of as a special compiler extension, like researchers currently seem to be doing). This would make it easier to use for any language (without modifying compilers, etc to suit), allow object files created from different compilers/languages to be linked together without worrying about conflicting transactional memory support, and even allow transactional memory to be used in assembly language without serious hassles.

Basically a process would have an area of it's address space set aside as a transactional memory area, and the kernel would provide a function to start a transaction and another function to end a transaction (or commit the transaction). If the "commitTransaction()" function fails software would retry the transaction.

The kernel would also keep track of a nesting level, so that if a thread calls the "startTransaction()" function N times and then calls the "commitTransaction()" N times, then only the last call to "commitTransaction()" could result in changes being comitted to the process' address space..

Any thoughts?

Cheers,

Brendan

mystran · Post by **mystran** » Thu May 03, 2007 8:01 am

Actually, if I've not misunderstood anything, transactional memory would be especially suited for situations where locking would be too expensive, because there would be either lots of overhead from fine-grained locking, or lots of contention to a few global locks.

I'm not quite convinced that it makes sense to do transactional memory on page granularity though, because unless you can localize the changes to a few pages, you'll end up with lots of false conflicts if unrelated objects modified by unrelated transactions happen to reside on the same pages.

For some special situations might be useful, but the applications taking advantage of it will need to deal with the details anyway, at least to some extent, so I'm not sure what advantages you gain my involving kernel. I don't think equating an object with a page is realistic, just like equating malloc with page allocator isn't realistic.

Kernel support might be useful on platforms that can't support atomic test-and-set on hardware, but how many general purpose systems don't? And unless I've misunderstood something, test-and-set is more or less all that you need to do atomic commit to shared memory.

Colonel Kernel · Post by **Colonel Kernel** » Thu May 03, 2007 9:02 am

mystran wrote:Actually, if I've not misunderstood anything, transactional memory would be especially suited for situations where locking would be too expensive, because there would be either lots of overhead from fine-grained locking, or lots of contention to a few global locks.

No, Brendan is right -- under high contention, transactional memory will cause a lot of re-tries. It is not meant to solve a performance problem, but rather a complexity problem.

--disclaimer--
The next part will discuss implementation concepts and techniques, but not details. In other words, I'm not going to talk about how hardware transactional memory would be implemented on a particular processor. So don't use this as a reason to reject the idea of an "advanced topics" forum.

And now back to the topic at hand...

I'm not quite convinced that it makes sense to do transactional memory on page granularity though

I don't think current prototypes of HTM use the paging system (at least not primarily). AFAIK, they use the multiprocessor cache coherence protocol to keep track of conflicts and so on. Yes, this means that HTM requires direct hardware support and doesn't really "exist" yet commercially.

Personally I prefer software transactional memory for its flexibility, but to do it properly (i.e. -- not using paging) you really need intervention from the compiler, since it has to emit the proper read and write barriers... basically it instruments the code within an "atomic block" so that it can keep track of transactions.

To put it simply, I don't think transactional memory is really the OS' responsibility. HTM is up to the hardware (although this probably requires some kernel support) and STM is up to the compiler and language run-time environment.

Combuster · Post by **Combuster** » Thu May 03, 2007 10:46 am

Brendan wrote:The kernel would also keep track of a nesting level, so that if a thread calls the "startTransaction()" function N times and then calls the "commitTransaction()" N times, then only the last call to "commitTransaction()" could result in changes being comitted to the process' address space..

well, then what do you do when you try to commit on top level and you detect a conflict - how would the nested calls know that things have not been committed while the call returned true? (or only being able to redo part of the commit) I think nesting causes more trouble than it solves.

Candy · Post by **Candy** » Thu May 03, 2007 11:10 am

Combuster wrote:
Brendan wrote:The kernel would also keep track of a nesting level, so that if a thread calls the "startTransaction()" function N times and then calls the "commitTransaction()" N times, then only the last call to "commitTransaction()" could result in changes being comitted to the process' address space..
well, then what do you do when you try to commit on top level and you detect a conflict - how would the nested calls know that things have not been committed while the call returned true? (or only being able to redo part of the commit) I think nesting causes more trouble than it solves.

If I recall correctly, lockless algorithms work by not doing anything permanent until the commit. They won't know, then. The concept is:

Read state
Do work
If state equals old state, write result of work
If it didn't, try again

That means that if your part of the work wasn't stored, you won't know because you'll just be invoked again. The functions used for this should be "pure" functions with respect to other variables except for perhaps a retry-counter or such - they should hold no state and not touch anything but the explicit state.

grimpy · Post by **grimpy** » Thu May 03, 2007 11:52 am

[quote]If I recall correctly, lockless algorithms work by not doing anything permanent until the commit. They won't know, then. The concept is:[/quote]

The vast difference between lockless algorithms (LA) and sw transactions is that LA's make progress for at least one process (at least if they are wait free). In both cases, you definitely gain less complexity in most cases.

SW transactions has the same problems as optimistic locking in databases; they lead to high rollback rates on hotspots. Alas, an OS is littered with hotspots

Of course, in the case there is no hot spots, you can us Copy-Modify-Update.

That's my take, at least.

Kevin McGuire · Post by **Kevin McGuire** » Thu May 03, 2007 7:14 pm

So this optimistic locking in databases, transactional memory, and copy-modify-update are essentially the same technique.

Candy wrote: Read state
Do work
If state equals old state, write result of work
If it didn't, try again

Oddly enough optimistic locking could slow down transactions with a shared object I would imagine since you have to perform the comparison (or in relation to the size of the object) versus a exchange.

Let C be the time of a comparison operation on one unit of object size.
Let S be the object size.
Let M be the time of a test and set operation.
Let T be the number of threads.

CS > M (I would imagine this would almost always be true.)
CS*T = M*T

Colonel Kernel wrote: No, Brendan is right -- under high contention, transactional memory will cause a lot of re-tries. It is not meant to solve a performance problem, but rather a complexity problem.

I can imagine the thread constantly spinning between reading, processing, and attempting a commit.

The only way I could think of doing it off the top of my head would be a example in C here being more relevant to Brandon's original post.
http://kmcguire.jouleos.galekus.com/dok ... d=kmcguire

And as you guys know that little peanut sized particle up there I have only works half the time... never the less the example uses a linked list of people with support for family members to be added and such not operations. However each operation uses a tmLog and then uses tmCommit to actually record the changes. I figured you could use no blocking at all and just move full speed ahead using some sort of callback mechanism for error recovery in the case a commit failed.

I dunno.. Nothing I would investigate deeper myself unless it actually looked viable for usage somewhere, but maybe just a idea for someone else to get inspired over.

@Brandon:

But you specified it being implemented directly in the "OS's linear memory manager". I am wondering more about the details of how this could work such as using page faults or something?

grimpy · Post by **grimpy** » Thu May 03, 2007 7:57 pm

Kevin McGuire wrote:So this optimistic locking in databases, transactional memory, and copy-modify-update are essentially the same technique.

On the surface, yes.

But comparing optimistic locking to CMU or transactional memory isn't really fair.

Why? Because databases in a sense have a much easier job, because they have subsystems ensuring consistency and isolation - and they only deal with data.

The things you lock in an operating system isn't just data, you also protect access to I/O ports and other resources that makes it impossible to provide isolation (in a db, a transaction can be inconsistent while it executes, but no other transaction generally observes that => the I in ACID). Many actions are irreversible (i.e. undoing sending a packet, writing to port that starts a device motor, whatever). Hence we still need locks.

In short, I think lockfree ADT's are fine, but I don't see a general transaction mechanism happening before we have hardware support at most levels.

Colonel Kernel · Post by **Colonel Kernel** » Thu May 03, 2007 9:22 pm

grimpy wrote:In short, I think lockfree ADT's are fine, but I don't see a general transaction mechanism happening before we have hardware support at most levels.

Hardware support for pulling packets back off the wire or un-writing blocks from disk?

Some actions are irrevocable, IMO. This doesn't invalidate the usefulness of TM though -- but it suggests that STM at least should prevent system calls and other "unknowable, possibly irrevocable" actions from being done in an atomic block.

@kmcguire: I found a really good paper on transactional memory, but I seem to have misplaced it. If I find it, I'll post it (or a link to it).

Candy · Post by **Candy** » Thu May 03, 2007 11:33 pm

Colonel Kernel wrote:
grimpy wrote:In short, I think lockfree ADT's are fine, but I don't see a general transaction mechanism happening before we have hardware support at most levels.
Hardware support for pulling packets back off the wire or un-writing blocks from disk? Some actions are irrevocable, IMO. This doesn't invalidate the usefulness of TM though -- but it suggests that STM at least should prevent system calls and other "unknowable, possibly irrevocable" actions from being done in an atomic block.

@kmcguire: I found a really good paper on transactional memory, but I seem to have misplaced it. If I find it, I'll post it (or a link to it).

I would expect a transactional memory with any non-memory output to cache the "write" to each of the outputs until the transaction is approved and then sends them all out. I'm not sure how you would do that atomically as they're separate actions but if I'm thinking correctly, the only device that requires atomicity is memory.

Hmm... On the other hand, files too...

Colonel Kernel · Post by **Colonel Kernel** » Thu May 03, 2007 11:44 pm

Candy wrote:I would expect a transactional memory with any non-memory output to cache the "write" to each of the outputs until the transaction is approved and then sends them all out. I'm not sure how you would do that atomically as they're separate actions but if I'm thinking correctly, the only device that requires atomicity is memory.

Hmm... On the other hand, files too...

There are already concurrency mechanisms in place for files... why extend something that's supposed to be for memory to other things when it's not needed? Especially when it adds so much complexity. It reminds me of finalizers in Java & C#. Freeing files and sockets is what deterministic destruction is for! Extending STM to non-memory resources is just as crazy as extending GC to non-memory resources IMO...

About that paper I was going to look for... I can't find it.

I suspect it was on Intel's research site, which seems to have moved. If you Google "software transactional memory", you'll find a ton of links though. Unforunately many of the examples are in Haskell and are therefore beyond the reach of mere mortals (I was taught by a prof who was one of the inventors of Haskell, and now I hate the language with a passion.

)

grimpy · Post by **grimpy** » Fri May 04, 2007 8:02 am

Colonel Kernel wrote: Hardware support for pulling packets back off the wire or un-writing blocks from disk?

With proper device queuing and, say, SATA support for transaction logs, this is not as sci-fi as it seems.

That said, I don't think it will happend in my lifetime.

So we'll have a mix of lock-free algos and locks. There's no practical away around it IMO.

Colonel Kernel · Post by **Colonel Kernel** » Fri May 04, 2007 8:58 am

grimpy wrote:With proper device queuing and, say, SATA support for transaction logs, this is not as sci-fi as it seems.

I think the only way for this to work in general is for someone to invent time travel.

Consider what would happen if a transaction tried to send something over the network, then do a blocking receive waiting for the reply, then take some action based on the content of the reply. I can't see how this could be revocable.

grimpy · Post by **grimpy** » Fri May 04, 2007 9:45 am

Colonel Kernel wrote:
grimpy wrote:With proper device queuing and, say, SATA support for transaction logs, this is not as sci-fi as it seems.
I think the only way for this to work in general is for someone to invent time travel. Consider what would happen if a transaction tried to send something over the network, then do a blocking receive waiting for the reply, then take some action based on the content of the reply. I can't see how this could be revocable.

Thinking outside the box, and assuming all hardware and software is written for the new fancy Transactional Computer Programming TCP (tm)

, it could work.

Of course, it would mean the receiving end would need to understand and participate in a distributed transaction, potentially with compensating transactions.

Yes, it's highly unlikely, but probably doable in a near-perfect world.

Brendan · Post by **Brendan** » Fri May 04, 2007 10:30 am

Hi,

mystran wrote:I'm not quite convinced that it makes sense to do transactional memory on page granularity though, because unless you can localize the changes to a few pages, you'll end up with lots of false conflicts if unrelated objects modified by unrelated transactions happen to reside on the same pages.

I'm not convinced that (for software with low contention) page granular STM won't give better performance than fine grained "compiler based STM".

For an example, consider a large transaction that reads from 1000 addresses spread across 10 pages and writes to 100 addresses spread across 2 pages. In this case "compiler based STM" has all the overhead for 1000 reads and 100 writes, while "OS based STM" has the overhead for 10 reads and 2 writes. Also, "OS based STM" can make use of existing mechanisms built into the CPU (page faults, accessed & dirty flags) to keep track of accesses, while "compiler based STM" needs to insert additional code at every access.

mystran wrote:Personally I prefer software transactional memory for its flexibility, but to do it properly (i.e. -- not using paging) you really need intervention from the compiler, since it has to emit the proper read and write barriers... basically it instruments the code within an "atomic block" so that it can keep track of transactions.

To put it simply, I don't think transactional memory is really the OS' responsibility. HTM is up to the hardware (although this probably requires some kernel support) and STM is up to the compiler and language run-time environment.

OS based STM wouldn't be less flexible - it needs to be optional (as not all processes would use STM) and therefore wouldn't prevent compiler based STM from being used.

But, to do STM properly (i.e. not forcing people to use a specific compiler or a specific library, not causing problems with linking code from different compilers, and perhaps providing a transperant upgrade path to future HTM), you really want intervention from the linear memory manager...

The only real question is whether or not the performance, convenience and other advantages of OS based STM justify the additional time and complexity involved in designing and implementing it. If it takes ages to implement and performance is poor, then it'd be much less worthwile.

There are also other considerations here - I'm always looking for things my OS could do that other OSs can't (even if it's just for eventual marketting purposes), and AFAIK it would be the first time anyone has ever implemented STM using the CPU's paging mechanism (making the OS itself interesting for research purposes).

Cheers,

Brendan

OSDev.org

OS Support For Transactional Memory

OS Support For Transactional Memory

Re: OS Support For Transactional Memory

Re: OS Support For Transactional Memory

Re: OS Support For Transactional Memory