Is register allocation so important?

Programming, for all ages and all languages.
Post Reply
User avatar
AndrewAPrice
Member
Member
Posts: 2299
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Is register allocation so important?

Post by AndrewAPrice »

LuaJIT is pretty competitive in performance, even against many static optimizing compilers.

I'm looking through the LuaJIT source, and here is the DynASM file for generating x86 assembly: https://github.com/LuaDist/luajit/blob/master/src/vm_x86.dasc and rather than doing some fancy register allocation (as I was expecting for a pretty competitive JIT compiler), it uses a static register layout (unless I'm reading the source wrong.) It looks like it keeps the registers on the stack, performs an operation, then stores it back on the stack.

If this is competitive with the register allocation in an optimizing compiler like GCC (which I'd assume would be close to state of the art), then it makes me wonder how important register allocation actually is. Perhaps if you're only referencing memory within the L1 cache (such as your local variable stack) - the CPU's microcode can optimize or pipeline most of this to be just as fast as operating on registers?

If this is true, it would make implementing my own JIT compiler much simpler.

Are there benchmarks out that that measure the execution performance of register allocation algorithms? I have tried Googling, but I only found information about the performance of the algorithm during compilation rather than the execution of the resulting code.
My OS is Perception.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Is register allocation so important?

Post by Rusky »

It's probably a combination of the particular benchmark you linked, the hardware doing pipelining like you suggest, and the hardware doing register renaming. You'll probably get decent performance without a register allocator just by running native code, but there will be situations where a register allocator would significantly improve performance.

I found this: http://compilers.cs.ucla.edu/fernando/p ... periments/ They benchmarked a few register allocators using the LLVM test suite and SPEC CPU.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Is register allocation so important?

Post by Brendan »

Hi,
MessiahAndrw wrote:LuaJIT is pretty competitive in performance, even against many static optimizing compilers.

I'm looking through the LuaJIT source, and here is the DynASM file for generating x86 assembly: https://github.com/LuaDist/luajit/blob/master/src/vm_x86.dasc and rather than doing some fancy register allocation (as I was expecting for a pretty competitive JIT compiler), it uses a static register layout (unless I'm reading the source wrong.) It looks like it keeps the registers on the stack, performs an operation, then stores it back on the stack.
MessiahAndrw wrote:If this is competitive with the register allocation in an optimizing compiler like GCC (which I'd assume would be close to state of the art), then it makes me wonder how important register allocation actually is. Perhaps if you're only referencing memory within the L1 cache (such as your local variable stack) - the CPU's microcode can optimize or pipeline most of this to be just as fast as operating on registers?
GCC gets beaten by some other compilers (e.g. Intel's ICC) regularly, and while it's possibly better than average I wouldn't say it's state of the art. There are major differences in languages (for example, a compiler for a language that doesn't need to worry about "pointer aliasing" can optimise better than a compiler for a language that does need to worry about pointer aliasing); and major differences in toolchains and environments (for example, whether libraries destroy any hope of effective "whole program optimisation"); and compromises between start-up overhead and "after startup overhead". There's also the ability to optimise code for a wide variety of quite different CPUs (e.g. older "in-order Atom" vs. newer "out-of-order Atom").

Mostly what I'm saying here is that there are many factors that effect performance that are likely to influence the results you get from a benchmark significantly more than register allocation; and to get usable statistics (for comparing with and without register allocation) you'd have to eradicate all of those other things.

Now, here's an interesting paragraph from the wikipedia page:

"Graph coloring allocators produce efficient code, but their allocation time is high. In cases of static compilation, allocation time is not a significant concern. In cases of dynamic compilation, such as just-in-time (JIT) compilers, fast register allocation is important. An efficient technique proposed by Poletto and Sarkar is linear scan allocation. This technique requires only a single pass over the list of variable live ranges. Ranges with short lifetimes are assigned to registers, whereas those with long lifetimes tend to be spilled, or reside in memory. The results are on average only 12% less efficient than graph coloring allocators.

This implies that register allocation is quite important; but for JIT you have to take into account the overhead of the code that does register allocation and the best solution is a compromise between efficient use of registers and register allocator overhead (and isn't purely about efficient use of registers alone).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
sandras
Member
Member
Posts: 146
Joined: Thu Nov 03, 2011 9:30 am

Re: Is register allocation so important?

Post by sandras »

@OP: I think I've read in another thread that you have implemented a stack based virtual machine for your language. As a sort of meeting point between stack machines and JIT compilation you may be interested in colorForth. It is in essence a macro assembler. It generates native x86 code, in which registers have predefined roles, and the code manipulates values on the two stacks. As you have probably already guessed, two of the registers are for the stack pointers.

Compared to stack machine byte-code, the stack machine macro-assembly is faster, but is also easy to generate from the byte-code. Here are the colorForth register assignment and macros.

It seems you really do your research, so you may have already known about this, but I thought I'd point this possibility out just in case.
User avatar
AndrewAPrice
Member
Member
Posts: 2299
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: Is register allocation so important?

Post by AndrewAPrice »

Thanks guys.

I already translate my bytecode into SSA form and preform some dead code elimination and constant propagation. Those basic operations alone (which in retrospect after implementing them were relatively basic) have eliminated a lot of my redundant/useless instructions such as shuffling variables around needlessly, and reduces a lot of variables to a minimal set needed. I feel that interpreting my SSA form would be much faster than interpreting my stack byte code (which is fantastic as a storage format because it's very compact) simply due to the fewer instructions needed to represent the same thing.

The simplest JIT would be to keep my SSA variables on the stack and use fixed register allocation. Grab operand one, grab operand two, perform operation, store result. This seems really inefficient but it's basically what non-optimizing compilers do and they seem to perform okay.

I am wondering what sort of execution speedup would register allocation result in? 5%? 25%? 100%? Should I focus on a really advanced algorithm or would there not be that much advantage over a simple linear scan?

I'll look more into colorforth. I could learn something from it, but I'm interested in doing as much of the assembling as possible myself.
My OS is Perception.
embryo

Re: Is register allocation so important?

Post by embryo »

MessiahAndrw wrote:I am wondering what sort of execution speedup would register allocation result in? 5%? 25%? 100%? Should I focus on a really advanced algorithm or would there not be that much advantage over a simple linear scan?
Unfortunately the current processor development trend prevents us from calculating such speedup, at least for x86. But you can switch to another architecture, like MIPS, which is more open about the hardware implementation details and there is some information about timings and cache interaction. You even can allocate some variables in cache without bothering with cache line hit and other related stuff.
kscguru
Member
Member
Posts: 27
Joined: Sat Jan 19, 2008 12:29 pm

Re: Is register allocation so important?

Post by kscguru »

My answer is similar to Brendan's but from different principles.

Whether register allocation helps is a property of function length. In C/C++, functions tend to be long - and inlining heuristics tend to make them even longer - and long functions is where register allocation really shines since it saves push/pop pairs. I'd vaguely guess 10-50% performance difference, but it's highly dependent on the code itself.

JITs are an entirely different beast; JITs (regardless of language) tend to operate on basic blocks (average 5 high-level instructions, typical implementations max at 10-20 instructions in a basic block). When you need to return to the ABI contract to jump between basic blocks every 5 instructions, there is really no room for a register allocator to make any contribution (and that's before including runtime cost). To a JIT, it's easier to hard-code the perfect allocation for the 100 most common blocks into the JIT engine than to compute allocations on the fly.

More broadly: register allocation makes a difference when the optimizer can allocate registers across branches. Compiled languages tend to optimize across branches, JITs tend to not optimize across branches, because optimizing across branches is orders of magnitude more expensive than optimizing non-branching code. This is why normally, a compiled language beats a JITed language unless the JITed language can extract large runtime optimizations from high-level code (e.g. Java optimizing out redundant type checks / virtual function lookups).
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Is register allocation so important?

Post by Owen »

Your benchmark tests how efficient your hashtable implementation is, not how good your JIT is (other than as it pertains to inlining hashtable operations)

The hash table is a primitive in Lua and JavaScript, therefore V8 and Lua can see into them, then probably optimize away all the pointless operations your microbenchmark did. Even if they fail at that, they intern their strings therefore don't pay the cost of a hash function.

The C++ hash table implementation is slow, because libstdc++'s implementation of unordered_map is pretty bad and unfixable because of ABI constraints (and because it's tuned for larger sizes). The C hash table implementation is terrible (linked list based hash maps exhibit appalling performance)

LuaJIT and co succeed for high level benchmarks. They fail for number crunching, because they have poor register allocation.
Post Reply