OS Support For Transactional Memory
OS Support For Transactional Memory
Hi,
Lately I've been thinking about transactional memory (the wikipedia page has a very good explanation of transactional memory).
I don't think transactional memory is suitable for situations where you need good performance under high contention, but IMHO transactional memory would work very well for software that doesn't expect much contention. It would make it much easier to write thread-safe code, and it would make a very nice addition to an OS (especially considering that a lot of application software doesn't really have high contention).
What I'm considering is supporting transactional memory directly in the OS's linear memory manager (instead of as a special compiler extension, like researchers currently seem to be doing). This would make it easier to use for any language (without modifying compilers, etc to suit), allow object files created from different compilers/languages to be linked together without worrying about conflicting transactional memory support, and even allow transactional memory to be used in assembly language without serious hassles.
Basically a process would have an area of it's address space set aside as a transactional memory area, and the kernel would provide a function to start a transaction and another function to end a transaction (or commit the transaction). If the "commitTransaction()" function fails software would retry the transaction.
The kernel would also keep track of a nesting level, so that if a thread calls the "startTransaction()" function N times and then calls the "commitTransaction()" N times, then only the last call to "commitTransaction()" could result in changes being comitted to the process' address space..
Any thoughts?
Cheers,
Brendan
Lately I've been thinking about transactional memory (the wikipedia page has a very good explanation of transactional memory).
I don't think transactional memory is suitable for situations where you need good performance under high contention, but IMHO transactional memory would work very well for software that doesn't expect much contention. It would make it much easier to write thread-safe code, and it would make a very nice addition to an OS (especially considering that a lot of application software doesn't really have high contention).
What I'm considering is supporting transactional memory directly in the OS's linear memory manager (instead of as a special compiler extension, like researchers currently seem to be doing). This would make it easier to use for any language (without modifying compilers, etc to suit), allow object files created from different compilers/languages to be linked together without worrying about conflicting transactional memory support, and even allow transactional memory to be used in assembly language without serious hassles.
Basically a process would have an area of it's address space set aside as a transactional memory area, and the kernel would provide a function to start a transaction and another function to end a transaction (or commit the transaction). If the "commitTransaction()" function fails software would retry the transaction.
The kernel would also keep track of a nesting level, so that if a thread calls the "startTransaction()" function N times and then calls the "commitTransaction()" N times, then only the last call to "commitTransaction()" could result in changes being comitted to the process' address space..
Any thoughts?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Actually, if I've not misunderstood anything, transactional memory would be especially suited for situations where locking would be too expensive, because there would be either lots of overhead from fine-grained locking, or lots of contention to a few global locks.
I'm not quite convinced that it makes sense to do transactional memory on page granularity though, because unless you can localize the changes to a few pages, you'll end up with lots of false conflicts if unrelated objects modified by unrelated transactions happen to reside on the same pages.
For some special situations might be useful, but the applications taking advantage of it will need to deal with the details anyway, at least to some extent, so I'm not sure what advantages you gain my involving kernel. I don't think equating an object with a page is realistic, just like equating malloc with page allocator isn't realistic.
Kernel support might be useful on platforms that can't support atomic test-and-set on hardware, but how many general purpose systems don't? And unless I've misunderstood something, test-and-set is more or less all that you need to do atomic commit to shared memory.
I'm not quite convinced that it makes sense to do transactional memory on page granularity though, because unless you can localize the changes to a few pages, you'll end up with lots of false conflicts if unrelated objects modified by unrelated transactions happen to reside on the same pages.
For some special situations might be useful, but the applications taking advantage of it will need to deal with the details anyway, at least to some extent, so I'm not sure what advantages you gain my involving kernel. I don't think equating an object with a page is realistic, just like equating malloc with page allocator isn't realistic.
Kernel support might be useful on platforms that can't support atomic test-and-set on hardware, but how many general purpose systems don't? And unless I've misunderstood something, test-and-set is more or less all that you need to do atomic commit to shared memory.
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
No, Brendan is right -- under high contention, transactional memory will cause a lot of re-tries. It is not meant to solve a performance problem, but rather a complexity problem.mystran wrote:Actually, if I've not misunderstood anything, transactional memory would be especially suited for situations where locking would be too expensive, because there would be either lots of overhead from fine-grained locking, or lots of contention to a few global locks.
--disclaimer--
The next part will discuss implementation concepts and techniques, but not details. In other words, I'm not going to talk about how hardware transactional memory would be implemented on a particular processor. So don't use this as a reason to reject the idea of an "advanced topics" forum.
And now back to the topic at hand...
I don't think current prototypes of HTM use the paging system (at least not primarily). AFAIK, they use the multiprocessor cache coherence protocol to keep track of conflicts and so on. Yes, this means that HTM requires direct hardware support and doesn't really "exist" yet commercially.I'm not quite convinced that it makes sense to do transactional memory on page granularity though
Personally I prefer software transactional memory for its flexibility, but to do it properly (i.e. -- not using paging) you really need intervention from the compiler, since it has to emit the proper read and write barriers... basically it instruments the code within an "atomic block" so that it can keep track of transactions.
To put it simply, I don't think transactional memory is really the OS' responsibility. HTM is up to the hardware (although this probably requires some kernel support) and STM is up to the compiler and language run-time environment.
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: OS Support For Transactional Memory
well, then what do you do when you try to commit on top level and you detect a conflict - how would the nested calls know that things have not been committed while the call returned true? (or only being able to redo part of the commit) I think nesting causes more trouble than it solves.Brendan wrote:The kernel would also keep track of a nesting level, so that if a thread calls the "startTransaction()" function N times and then calls the "commitTransaction()" N times, then only the last call to "commitTransaction()" could result in changes being comitted to the process' address space..
Re: OS Support For Transactional Memory
If I recall correctly, lockless algorithms work by not doing anything permanent until the commit. They won't know, then. The concept is:Combuster wrote:well, then what do you do when you try to commit on top level and you detect a conflict - how would the nested calls know that things have not been committed while the call returned true? (or only being able to redo part of the commit) I think nesting causes more trouble than it solves.Brendan wrote:The kernel would also keep track of a nesting level, so that if a thread calls the "startTransaction()" function N times and then calls the "commitTransaction()" N times, then only the last call to "commitTransaction()" could result in changes being comitted to the process' address space..
Read state
Do work
If state equals old state, write result of work
If it didn't, try again
That means that if your part of the work wasn't stored, you won't know because you'll just be invoked again. The functions used for this should be "pure" functions with respect to other variables except for perhaps a retry-counter or such - they should hold no state and not touch anything but the explicit state.
Re: OS Support For Transactional Memory
[quote]If I recall correctly, lockless algorithms work by not doing anything permanent until the commit. They won't know, then. The concept is:[/quote]
The vast difference between lockless algorithms (LA) and sw transactions is that LA's make progress for at least one process (at least if they are wait free). In both cases, you definitely gain less complexity in most cases.
SW transactions has the same problems as optimistic locking in databases; they lead to high rollback rates on hotspots. Alas, an OS is littered with hotspots Of course, in the case there is no hot spots, you can us Copy-Modify-Update.
That's my take, at least.
The vast difference between lockless algorithms (LA) and sw transactions is that LA's make progress for at least one process (at least if they are wait free). In both cases, you definitely gain less complexity in most cases.
SW transactions has the same problems as optimistic locking in databases; they lead to high rollback rates on hotspots. Alas, an OS is littered with hotspots Of course, in the case there is no hot spots, you can us Copy-Modify-Update.
That's my take, at least.
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
So this optimistic locking in databases, transactional memory, and copy-modify-update are essentially the same technique.
Let C be the time of a comparison operation on one unit of object size.
Let S be the object size.
Let M be the time of a test and set operation.
Let T be the number of threads.
CS > M (I would imagine this would almost always be true.)
CS*T = M*T
The only way I could think of doing it off the top of my head would be a example in C here being more relevant to Brandon's original post.
http://kmcguire.jouleos.galekus.com/dok ... d=kmcguire
And as you guys know that little peanut sized particle up there I have only works half the time... never the less the example uses a linked list of people with support for family members to be added and such not operations. However each operation uses a tmLog and then uses tmCommit to actually record the changes. I figured you could use no blocking at all and just move full speed ahead using some sort of callback mechanism for error recovery in the case a commit failed.
I dunno.. Nothing I would investigate deeper myself unless it actually looked viable for usage somewhere, but maybe just a idea for someone else to get inspired over.
@Brandon:
But you specified it being implemented directly in the "OS's linear memory manager". I am wondering more about the details of how this could work such as using page faults or something?
Oddly enough optimistic locking could slow down transactions with a shared object I would imagine since you have to perform the comparison (or in relation to the size of the object) versus a exchange.Candy wrote: Read state
Do work
If state equals old state, write result of work
If it didn't, try again
Let C be the time of a comparison operation on one unit of object size.
Let S be the object size.
Let M be the time of a test and set operation.
Let T be the number of threads.
CS > M (I would imagine this would almost always be true.)
CS*T = M*T
I can imagine the thread constantly spinning between reading, processing, and attempting a commit.Colonel Kernel wrote: No, Brendan is right -- under high contention, transactional memory will cause a lot of re-tries. It is not meant to solve a performance problem, but rather a complexity problem.
The only way I could think of doing it off the top of my head would be a example in C here being more relevant to Brandon's original post.
http://kmcguire.jouleos.galekus.com/dok ... d=kmcguire
And as you guys know that little peanut sized particle up there I have only works half the time... never the less the example uses a linked list of people with support for family members to be added and such not operations. However each operation uses a tmLog and then uses tmCommit to actually record the changes. I figured you could use no blocking at all and just move full speed ahead using some sort of callback mechanism for error recovery in the case a commit failed.
I dunno.. Nothing I would investigate deeper myself unless it actually looked viable for usage somewhere, but maybe just a idea for someone else to get inspired over.
@Brandon:
But you specified it being implemented directly in the "OS's linear memory manager". I am wondering more about the details of how this could work such as using page faults or something?
On the surface, yes.Kevin McGuire wrote:So this optimistic locking in databases, transactional memory, and copy-modify-update are essentially the same technique.
But comparing optimistic locking to CMU or transactional memory isn't really fair.
Why? Because databases in a sense have a much easier job, because they have subsystems ensuring consistency and isolation - and they only deal with data.
The things you lock in an operating system isn't just data, you also protect access to I/O ports and other resources that makes it impossible to provide isolation (in a db, a transaction can be inconsistent while it executes, but no other transaction generally observes that => the I in ACID). Many actions are irreversible (i.e. undoing sending a packet, writing to port that starts a device motor, whatever). Hence we still need locks.
In short, I think lockfree ADT's are fine, but I don't see a general transaction mechanism happening before we have hardware support at most levels.
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Hardware support for pulling packets back off the wire or un-writing blocks from disk? Some actions are irrevocable, IMO. This doesn't invalidate the usefulness of TM though -- but it suggests that STM at least should prevent system calls and other "unknowable, possibly irrevocable" actions from being done in an atomic block.grimpy wrote:In short, I think lockfree ADT's are fine, but I don't see a general transaction mechanism happening before we have hardware support at most levels.
@kmcguire: I found a really good paper on transactional memory, but I seem to have misplaced it. If I find it, I'll post it (or a link to it).
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
I would expect a transactional memory with any non-memory output to cache the "write" to each of the outputs until the transaction is approved and then sends them all out. I'm not sure how you would do that atomically as they're separate actions but if I'm thinking correctly, the only device that requires atomicity is memory.Colonel Kernel wrote:Hardware support for pulling packets back off the wire or un-writing blocks from disk? Some actions are irrevocable, IMO. This doesn't invalidate the usefulness of TM though -- but it suggests that STM at least should prevent system calls and other "unknowable, possibly irrevocable" actions from being done in an atomic block.grimpy wrote:In short, I think lockfree ADT's are fine, but I don't see a general transaction mechanism happening before we have hardware support at most levels.
@kmcguire: I found a really good paper on transactional memory, but I seem to have misplaced it. If I find it, I'll post it (or a link to it).
Hmm... On the other hand, files too...
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
There are already concurrency mechanisms in place for files... why extend something that's supposed to be for memory to other things when it's not needed? Especially when it adds so much complexity. It reminds me of finalizers in Java & C#. Freeing files and sockets is what deterministic destruction is for! Extending STM to non-memory resources is just as crazy as extending GC to non-memory resources IMO...Candy wrote:I would expect a transactional memory with any non-memory output to cache the "write" to each of the outputs until the transaction is approved and then sends them all out. I'm not sure how you would do that atomically as they're separate actions but if I'm thinking correctly, the only device that requires atomicity is memory.
Hmm... On the other hand, files too...
About that paper I was going to look for... I can't find it. I suspect it was on Intel's research site, which seems to have moved. If you Google "software transactional memory", you'll find a ton of links though. Unforunately many of the examples are in Haskell and are therefore beyond the reach of mere mortals (I was taught by a prof who was one of the inventors of Haskell, and now I hate the language with a passion. )
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
With proper device queuing and, say, SATA support for transaction logs, this is not as sci-fi as it seems.Colonel Kernel wrote: Hardware support for pulling packets back off the wire or un-writing blocks from disk?
That said, I don't think it will happend in my lifetime.
So we'll have a mix of lock-free algos and locks. There's no practical away around it IMO.
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
I think the only way for this to work in general is for someone to invent time travel. Consider what would happen if a transaction tried to send something over the network, then do a blocking receive waiting for the reply, then take some action based on the content of the reply. I can't see how this could be revocable.grimpy wrote:With proper device queuing and, say, SATA support for transaction logs, this is not as sci-fi as it seems.
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
Thinking outside the box, and assuming all hardware and software is written for the new fancy Transactional Computer Programming TCP (tm) , it could work.Colonel Kernel wrote:I think the only way for this to work in general is for someone to invent time travel. Consider what would happen if a transaction tried to send something over the network, then do a blocking receive waiting for the reply, then take some action based on the content of the reply. I can't see how this could be revocable.grimpy wrote:With proper device queuing and, say, SATA support for transaction logs, this is not as sci-fi as it seems.
Of course, it would mean the receiving end would need to understand and participate in a distributed transaction, potentially with compensating transactions.
Yes, it's highly unlikely, but probably doable in a near-perfect world.
Hi,
For an example, consider a large transaction that reads from 1000 addresses spread across 10 pages and writes to 100 addresses spread across 2 pages. In this case "compiler based STM" has all the overhead for 1000 reads and 100 writes, while "OS based STM" has the overhead for 10 reads and 2 writes. Also, "OS based STM" can make use of existing mechanisms built into the CPU (page faults, accessed & dirty flags) to keep track of accesses, while "compiler based STM" needs to insert additional code at every access.
But, to do STM properly (i.e. not forcing people to use a specific compiler or a specific library, not causing problems with linking code from different compilers, and perhaps providing a transperant upgrade path to future HTM), you really want intervention from the linear memory manager...
The only real question is whether or not the performance, convenience and other advantages of OS based STM justify the additional time and complexity involved in designing and implementing it. If it takes ages to implement and performance is poor, then it'd be much less worthwile.
There are also other considerations here - I'm always looking for things my OS could do that other OSs can't (even if it's just for eventual marketting purposes), and AFAIK it would be the first time anyone has ever implemented STM using the CPU's paging mechanism (making the OS itself interesting for research purposes).
Cheers,
Brendan
I'm not convinced that (for software with low contention) page granular STM won't give better performance than fine grained "compiler based STM".mystran wrote:I'm not quite convinced that it makes sense to do transactional memory on page granularity though, because unless you can localize the changes to a few pages, you'll end up with lots of false conflicts if unrelated objects modified by unrelated transactions happen to reside on the same pages.
For an example, consider a large transaction that reads from 1000 addresses spread across 10 pages and writes to 100 addresses spread across 2 pages. In this case "compiler based STM" has all the overhead for 1000 reads and 100 writes, while "OS based STM" has the overhead for 10 reads and 2 writes. Also, "OS based STM" can make use of existing mechanisms built into the CPU (page faults, accessed & dirty flags) to keep track of accesses, while "compiler based STM" needs to insert additional code at every access.
OS based STM wouldn't be less flexible - it needs to be optional (as not all processes would use STM) and therefore wouldn't prevent compiler based STM from being used.mystran wrote:Personally I prefer software transactional memory for its flexibility, but to do it properly (i.e. -- not using paging) you really need intervention from the compiler, since it has to emit the proper read and write barriers... basically it instruments the code within an "atomic block" so that it can keep track of transactions.
To put it simply, I don't think transactional memory is really the OS' responsibility. HTM is up to the hardware (although this probably requires some kernel support) and STM is up to the compiler and language run-time environment.
But, to do STM properly (i.e. not forcing people to use a specific compiler or a specific library, not causing problems with linking code from different compilers, and perhaps providing a transperant upgrade path to future HTM), you really want intervention from the linear memory manager...
The only real question is whether or not the performance, convenience and other advantages of OS based STM justify the additional time and complexity involved in designing and implementing it. If it takes ages to implement and performance is poor, then it'd be much less worthwile.
There are also other considerations here - I'm always looking for things my OS could do that other OSs can't (even if it's just for eventual marketting purposes), and AFAIK it would be the first time anyone has ever implemented STM using the CPU's paging mechanism (making the OS itself interesting for research purposes).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.