Posted: Fri May 04, 2007 10:39 am
I'm still not sure. What if you have a disk resource and a memory resource that both need to work under one transaction? You can do either atomically, but not both, unless I'm missing something.
The Place to Start for Operating System Developers
https://f.osdev.org/
I don't think typical transactions would be that large. The kind of lock-based code I find myself writing (that would still be a pain to re-write using lock-free techniques) usually updates less than 10 shared memory locations, all on the same page (likely near other shared variables that get updated concurrently). But I'm only one data point.Brendan wrote:For an example, consider a large transaction that reads from 1000 addresses spread across 10 pages and writes to 100 addresses spread across 2 pages. In this case "compiler based STM" has all the overhead for 1000 reads and 100 writes, while "OS based STM" has the overhead for 10 reads and 2 writes.
I'm not mystran.Brendan wrote:mystran wrote:Personally I prefer software transactional memory for its flexibility, but to do it properly (i.e. -- not using paging) you really need intervention from the compiler, since it has to emit the proper read and write barriers... basically it instruments the code within an "atomic block" so that it can keep track of transactions.
To put it simply, I don't think transactional memory is really the OS' responsibility. HTM is up to the hardware (although this probably requires some kernel support) and STM is up to the compiler and language run-time environment.
You can't have small, fine-grained transactions. Applications have to rely on your OS' policies for optimistic versus pessimistic transaction management. With the compiler-based approach, transactions can be very fine-grained and you can change the transaction policy by choosing a different run-time library.OS based STM wouldn't be less flexible
How often have you needed to change something in a file atomically, while holding a lock on a piece of shared memory?Candy wrote:I'm still not sure. What if you have a disk resource and a memory resource that both need to work under one transaction? You can do either atomically, but not both, unless I'm missing something.
You're post was broad for me, but if you intention was just that then I suppose asking for clarification would be out of the way. However, if it is not out of the way I would interested in you're concept of these:Colonel Kernel wrote: You can't have small, fine-grained transactions. Applications have to rely on your OS' policies for optimistic versus pessimistic transaction management. With the compiler-based approach, transactions can be very fine-grained and you can change the transaction policy by choosing a different run-time library.
a. I had one-hundred threads do one write each across two pages would that not increase the collision domain and come up around the same figure as the "compiler based STM" overhead considering each instance of the transaction expense to be one to one with you're paging mechanism?Brandon wrote: For an example, consider a large transaction that reads from 1000 addresses spread across 10 pages and writes to 100 addresses spread across 2 pages. In this case "compiler based STM" has all the overhead for 1000 reads and 100 writes, while "OS based STM" has the overhead for 10 reads and 2 writes. Also, "OS based STM" can make use of existing mechanisms built into the CPU (page faults, accessed & dirty flags) to keep track of accesses, while "compiler based STM" needs to insert additional code at every access.
Data size, and also data locality. With compiler-based STM, you can have independent transactions involving words that are right next to each other in memory without having one transaction trip up the other one and cause it to re-try. With a paging-based approach, collisions are far more likely (as you've pointed out).Kevin McGuire wrote:a. What would be considered a fine-grained transaction. I am assuming the data size of the transaction, right?
The primary factor is your implementation technique. Compiler-based STM can be very fine-grained. Hardware transactional memory (HTM) is typically based on cache lines, so it is more coarse-grained, but still not as coarse-grained as a page-based approach, which is what Brendan is proposing.a1. What is the primary factor in preventing or limiting fine-grained transactions?
It's not so much providing functionality as it is providing a choice of performance trade-offs. Depending on the implementation technique, there can be "optimistic" or "pessimistic" approaches to reading and writing in transaction. I sometimes get these two confused, so bear with me. I think an optimistic write policy means that the transaction writes to the target location immediately, keeping an undo log off to the side. This makes commits cheap and roll-backs expensive. A pessimistic policy re-directs writes to a separate copy of the target data, making rollbacks cheap and commits expensive. Depending on the run-time behaviour of the transactions, the performance of the two approaches varies. In general it isn't really possible to claim that one is always better than the other, because the situation can change (just like there is no single page-replacement policy that is optimal for all applications).b. What functionality could a transaction policy provide?
It would only be limited within whatever mechanism was offered by the kernel. So, if an app didn't use the kernel's mechanism, it could do whatever it liked. But that begs the question of why should the kernel implement it in the first place. Especially if, as I suspect, a page-based approach would be sub-optimal for most applications (if I find any relevant papers that dispute this, I'll post them here).b1. How could a application be limited to one policy because transactional memory was implemented in kernel?
For a high performance database engine that may need to handle high contention, I wouldn't use any form of transactional memory. I also wouldn't let any programmer who lacks the necessary experience with traditional locking anywhere need the code.Colonel Kernel wrote:I don't think typical transactions would be that large. The kind of lock-based code I find myself writing (that would still be a pain to re-write using lock-free techniques) usually updates less than 10 shared memory locations, all on the same page (likely near other shared variables that get updated concurrently). But I'm only one data point.Brendan wrote:For an example, consider a large transaction that reads from 1000 addresses spread across 10 pages and writes to 100 addresses spread across 2 pages. In this case "compiler based STM" has all the overhead for 1000 reads and 100 writes, while "OS based STM" has the overhead for 10 reads and 2 writes.
In general, the use of large transactions is discouraged for performance reasons anyway (in databases and TM).
So far I've seen a few STM libraries for C and C++, and an obscure language that supports STM (Haskell).Colonel Kernel wrote:It would only be limited within whatever mechanism was offered by the kernel. So, if an app didn't use the kernel's mechanism, it could do whatever it liked. But that begs the question of why should the kernel implement it in the first place. Especially if, as I suspect, a page-based approach would be sub-optimal for most applications (if I find any relevant papers that dispute this, I'll post them here).b1. How could a application be limited to one policy because transactional memory was implemented in kernel?
IMHO this paper suggests that proposed HTM techniques are broken, and then offers ways of fixing the proposed HTM techniques using page-based extensions. Ironically one of the reasons that the proposed HTM techniques are broken is that transactions may be too large for the cache.Colonel Kernel wrote:Ok, here's one:
http://www.cse.ucsd.edu/~calder/papers/ ... 06-PTM.pdf
The down side is, even this technique assumes additional hardware support. I haven't finished reading it yet, but I'm getting a hunch that page-based TM is not feasible with today's CPUs...
You misunderstood me. I was talking about recommended database transaction sizes, not about using STM in the implementation of a database engine (which really has little or nothing to do with implementing database transactions). It was an analogy.Brendan wrote:For a high performance database engine that may need to handle high contention, I wouldn't use any form of transactional memory.
True, but:IMHO all forms of STM are only useful for low contention situations, and exist to make multi-threaded software easier to write rather than to improve performance.
What I've been trying to tell you is that transactional memory is currently being researched. Those libraries you've seen are mostly for experimentation, to get a sense of the characteristics of the way multi-threaded code would interact with STM. They are research tools, not commercial tools.So far I've seen a few STM libraries for C and C++, and an obscure language that supports STM (Haskell).
If you're writing something in Pascal and want to use STM, what are your current choices? Would you write your own Pascal compiler (or Pascal library, if that's possible), or manually insert code to support transactions into your application? What if you're an assembly programmer like me?
Of course not. That's why researchers with PhDs that are much smarter than both of us are trying to figure this out.Would someone who lacks the experience or time necessary to use traditional re-entrancy locking correctly have the experience or time necessary to implement their own STM support correctly?
I don't agree with that interpretation. What the paper is saying is that the first HTM techniques have really good performance, but impose hard resource limits on the programmer (cache size limits, and time limits imposed by interrupts and context switches). It isn't that the techniques are broken, it's that they are at too low a level of abstraction to be used directly. The proposal is really about virtualizing the transactions to abstract the time & space limits away from the programmer, in much the same way that virtual memory abstracts away the limits of physical memory (by analogy, not by implementation). The idea is that the current HTM techniques will make the common case fast, while the virtualization makes the whole system easy enough to actually use.IMHO this paper suggests that proposed HTM techniques are broken, and then offers ways of fixing the proposed HTM techniques using page-based extensions. Ironically one of the reasons that the proposed HTM techniques are broken is that transactions may be too large for the cache.
It is entirely possible to implement paged-based STM. The only critical flaw with page-based STM is that it's not very portable (i.e. you'd need different code on each architecture and the target architecture must have a suitable MMU).Colonel Kernel wrote:This will all turn out to be moot anyway if your idea isn't feasible to implement. I'm assuming for now that since none of the papers talk about a paging-based approach on current hardware, that must mean there is some critical flaw with the idea that we're not seeing yet.
The policy would be implemented by the run-time environment, not the compiler.Brendan wrote:Compiler-based STM - allows different performance trade-offs to be made by different libraries and/or different compilers.
Only if the policy is "pessimistic" (i.e. -- lazy writes). For an "optimistic" policy (i.e. -- eager writes), the overhead is on abort, not commit.The "commit()" has "per accessed location" overhead.
I disagree -- since most of the complexity is in the run-time environment, I think it would be possible for a cross-language environment like the CLR to support STM for any of the supported languages. The compilers would have to support it too, to some degree (it may be possible to implement a lot of the instrumentation at the step where native code is generated from IL).Is unlikely to allow programmers to mix languages
While this is true, it may be possible to have different policies in the OS support code that goes along with the special HTM hardware... I haven't read enough about it to say for sure.HTM - doesn't allow different performance trade-offs to be made by different libraries and/or different compilers.
Probably more than just the linear memory manager. What part of the OS maintains transaction state across interrupts, thread switches, and thread migrations?Is unlikely to be usable in practice without linear memory manager assistance.
Given the massive industry-wide push into multi-core, and the realization that new techniques are required to make multi-threaded programming easier (necessary in order to sell more multi-core chips), I think Intel and AMD will be all over this as soon as the research has stabilized somewhat and reached some consensus.May never exist unless the CPU manufacturers consider HTM to be a profitable/marketable feature.
I said feasible, not possible.It is entirely possible to implement paged-based STM.
That's not a critical flaw -- lots of kernel code is not portable. I was thinking about it some more, and my initial hunch that conflict detection would be too slow doesn't really hold up, assuming I understand your proposed implementation correctly:The only critical flaw with page-based STM is that it's not very portable (i.e. you'd need different code on each architecture and the target architecture must have a suitable MMU).
If it's terribly slow under normal conditions, I would say it isn't really helping.It would also have different performance characteristics, but I wouldn't consider that a critical flaw (that'd be like refusing to consider interpretted languages because compiled code is faster, even though there's good reasons to use interpretted languages for some things).
True, but given the efficiency of Google and the popularity of the topic, I don't find this to be very likely.Of course just because we can't find papers talking about the page-based approach doesn't necessarily mean that they've never been written...
I discount all optimistic policies, as they tend to allow transactions to read inconsistent state (see this Wikipedia page). While it's technically possible to detect this in exception handlers and recover (e.g. abort transactions if an exception occurs while a transaction is working on inconsistent data), it's messy and doesn't help if the code goes into an infinite loop.Colonel Kernel wrote:Only if the policy is "pessimistic" (i.e. -- lazy writes). For an "optimistic" policy (i.e. -- eager writes), the overhead is on abort, not commit.The "commit()" has "per accessed location" overhead.
I was thinking more of using GCC to compile some C and linking it with object files compiled by Delphi and NASM. I was also thinking of the current implementations of STM (which are mostly libraries, not run-time environments).Colonel Kernel wrote:I disagree -- since most of the complexity is in the run-time environment, I think it would be possible for a cross-language environment like the CLR to support STM for any of the supported languages. The compilers would have to support it too, to some degree (it may be possible to implement a lot of the instrumentation at the step where native code is generated from IL).Is unlikely to allow programmers to mix languages
I guess it'd also be technically possible to have different policies in an OS's page-based STM support code too...Colonel Kernel wrote:While this is true, it may be possible to have different policies in the OS support code that goes along with the special HTM hardware... I haven't read enough about it to say for sure.HTM - doesn't allow different performance trade-offs to be made by different libraries and/or different compilers.
I don't know, but whatever part of the OS does would probably just call the linear memory manager routines for shifting data from the CPUs tranactional cache to temporary pages of RAM (where the temporary pages of RAM would need to be handled by the linear memory manager during "transaction abort exceptions" and "transaction commit exceptions").Colonel Kernel wrote:Probably more than just the linear memory manager. What part of the OS maintains transaction state across interrupts, thread switches, and thread migrations?Is unlikely to be usable in practice without linear memory manager assistance.
I see 4 possibilities:Colonel Kernel wrote:Given the massive industry-wide push into multi-core, and the realization that new techniques are required to make multi-threaded programming easier (necessary in order to sell more multi-core chips), I think Intel and AMD will be all over this as soon as the research has stabilized somewhat and reached some consensus.May never exist unless the CPU manufacturers consider HTM to be a profitable/marketable feature.
I'm used to having one heap for process space and another heap for thread space. Transactional space would need at least another heap, but there's no reason transactional space can't have several more heaps (one for each set of related data). How processes manage their heaps is up to the processes (and to be honest, I have a tendancy to use statically allocated space with large sparse arrays, "allocation on demand" and explicit "freeLinearPage()" calls, with no heap managers at all).Colonel Kernel wrote:Besides the issue of conflicts, which IMO are both expensive and likely with this scheme, it has another big flaw: Only a certain region of the address space can be used for transactional memory. Programs that have large, complex data structures allocated in the heap that are shared by multiple threads would need extra support from the user-level heap allocator as well (to avoid "false conflicts", it would have to try and interleave "related" allocations across many different pages -- but it doesn't know which allocations are "related"!). The compiler and linker would also have to arrange to have global data loaded in the transactional area. Suddenly this technique doesn't look very transparent to the rest of the system.
The overhead (currently) consists of:Colonel Kernel wrote:If it's terribly slow under normal conditions, I would say it isn't really helping.It would also have different performance characteristics, but I wouldn't consider that a critical flaw (that'd be like refusing to consider interpretted languages because compiled code is faster, even though there's good reasons to use interpretted languages for some things).
I'm not convinced that the infinite powers of Google extend to hardcopies sitting on shelves in university libraries that were never released in electronic form to the general public, or to work done by large commercial companies like Intel or IBM who may not want competitors to see their results (regardless of whether the results are actually useful or not). As for the papers that Google does list, I've probably only looked at 10% of them...Colonel Kernel wrote:True, but given the efficiency of Google and the popularity of the topic, I don't find this to be very likely.Of course just because we can't find papers talking about the page-based approach doesn't necessarily mean that they've never been written...
Do you have a link?ruibo wrote:you should read this paper:
Tradeoffs in Transactional Memory Virtualization
First they explain that HTM isn't really that useful by itself, and propose "XTM" which virtualizes HTM so that it can cope with task switches, working sets larger than HTM alone can handle, etc.Colonel Kernel wrote:I've started reading it (am being interrupted by this annoyingly persistent thing called my job ). It looks like a discussion of how to use paging to virtualize TM beyond the limits imposed by hardware-based approaches. I don't think it's a scheme that can perform well enough without dedicated hardware support (I could be wrong though... I'm only a few pages in).