And or double shift to select middle bits?

Programming, for all ages and all languages.
User avatar
MuchLearning
Member
Member
Posts: 26
Joined: Sun Feb 12, 2017 1:50 am

And or double shift to select middle bits?

Post by MuchLearning »

If I have 32 bits and want to grab some group from the middle is it faster to use an and followed by a shift, or two shifts?

Example, say I want bits 16-23.

Code: Select all

uint32_t someBits;
uint8_t optionA = (someBits & 0xff0000) >> 16; //isolate the bits and shift
uint8_t optionB = (someBits << 8) >> 24; //double shift to clear unwanted bits
Which is faster?
Octocontrabass
Member
Member
Posts: 5516
Joined: Mon Mar 25, 2013 7:01 pm

Re: And or double shift to select middle bits?

Post by Octocontrabass »

Have you checked the code your compiler generates for those? You might be surprised by what you see.
User avatar
MuchLearning
Member
Member
Posts: 26
Joined: Sun Feb 12, 2017 1:50 am

Re: And or double shift to select middle bits?

Post by MuchLearning »

With no optimizations g++ did an and followed by a shift for the first case and two shifts for the second case.
With any optimizations on the compiler replaced the values with pre-calculated constants.
Octocontrabass
Member
Member
Posts: 5516
Joined: Mon Mar 25, 2013 7:01 pm

Re: And or double shift to select middle bits?

Post by Octocontrabass »

I'm guessing you actually want to see what happens when GCC can't optimize it to a constant, so try making your test cases functions:

Code: Select all

uint8_t optionA( uint32_t someBits )
{
    return (someBits & 0xff0000) >> 16;
}
uint8_t optionB( uint32_t someBits )
{
    return (someBits << 8) >> 24;
}
uint8_t optionC( uint32_t someBits )
{
    return someBits >> 16;
}
User avatar
dozniak
Member
Member
Posts: 723
Joined: Thu Jul 12, 2012 7:29 am
Location: Tallinn, Estonia

Re: And or double shift to select middle bits?

Post by dozniak »

First shift right and then mask low bits could make it more efficient code (due to using smaller constants in an and).
Learn to read.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: And or double shift to select middle bits?

Post by Schol-R-LEA »

We really can't make a general reply to this, TBH, because it depends on too many factors to really say.
  • Does the program or compiler know which bits you need to be selected at compile time?
  • Is the final value one which can be pre-computed at compile time (as you saw, GCC will do this when optimizations are on - it is a pretty basic optimization strategy, but it isn't always going to be obvious to the client programmer when this is possible).
  • Is the mask already loaded into a register (or can it be pre-loaded into one while the processor is executing some other operations - something which on an out-of-order superscalar like the current x86s can only be determined absolutely by the pipeline hardware, though the compiler can trying to re-order it to make it more likely).
  • Are any memory operands going to be in cache when the load is made or does the CPU need to do a fetch from memory (which can take hundreds of cycles and probably cause a pipeline stall even with re-ordering)? On most modern architectures, this literally cannot be predicted until you have the instruction fetched and decoded (and it's dicey even then), because if a task switch occurs between the values being loaded and them being used, the cache manager can and will evict the memory cells to make room for memory used by the other process.
  • Is the instruction preceded by a jump/branch of some kind (it almost certainly will be, especially since the only way the difference between the two approaches would matter is if the operation is in a tight loop)? If so, then a branch prediction miss will leave this question pointless, as the pipeline stalls will take several times longer than the difference between using one single cycle instruction or two could ever be.
This is just a few possible complications I came up with off the top of my head.

Unless this is some sort of bottleneck in the program - and you didn't mention it being one - this is the sort of pre-mature optimization that is literally only academic. There is just no way to guess which would work better on any x86 system more recent than the Pentium Pro (for reference, that's from 1995), and it is very likely that it will vary over different runs of the program because of cache and/or branch prediction misses.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
Geri
Member
Member
Posts: 442
Joined: Sun Jul 14, 2013 6:01 pm

Re: And or double shift to select middle bits?

Post by Geri »

double shift will be faster IN MOST CASES IN THEORY becouse it results smaller code, and it needs fewer transistors so out of order superscalar almost always can kick in. not much difference, since the AND overhead can be hidden by pipelines - but with the power you asked it here as a question, you could benchmark it, maybe there will not be any difference. if yes, shift will be a bit faster (only a few percent, dont expect spectacular difference)
Operating system for SUBLEQ cpu architecture:
http://users.atw.hu/gerigeri/DawnOS/download.html
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: And or double shift to select middle bits?

Post by Brendan »

Hi,
MuchLearning wrote:Example, say I want bits 16-23.

Code: Select all

uint32_t someBits;
uint8_t optionA = (someBits & 0xff0000) >> 16; //isolate the bits and shift
uint8_t optionB = (someBits << 8) >> 24; //double shift to clear unwanted bits
Which is faster?
For this specific example; fastest would probably be:
uint8_t optionC = (someBits >> 16) & 0xff;[/code]

This is because most CPUs have a relatively fast way to convert 8-bit integers into larger integers. For example, on 80x86 I'd expect this code to be compiled to "SHR" then "MOVZX" without much optimisation at all.

With good optimisation, I'd expect that the shift would also disappear most of the time - e.g. the compiler deciding to do something like "movzx eax,byte [ebp-4+(16/8)]" instead of doing something like "mov eax,[ebp-4]; shr eax,16; movzx eax,al".


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: And or double shift to select middle bits?

Post by Schol-R-LEA »

Perhaps we need to start over from the top:

@MuchLearning, why do you want to know? I am not saying you shouldn't want to know, but the intent is relevant here, because (as I said earlier) this is the sort of micro--optimization that is entirely irrelevant in actual practice - any gain or loss from it is going be swamped by other factors unless it is a bottleneck in a very tight loop.

If you're just curious, fine; the answers already given are more than suitable, with Geri and Brendan's answers being pretty much equivalent in terms of cycles used (assuming that the compiler uses an 8-bit arguments for both the shift and mask, which I would surely hope it would but I wouldn't count on it).

If the intent is to break a bottleneck, then we'd need a lot more detail as to what the actual code is doing, rather than just a mocked-up test. The test serves to illustrate the question, but to fix the live code, we'd need to see it.

Beyond that, meh. It really isn't something that you get a real advantage on one way or the other. How you do it is up to you. That having been said, I would be curious as to what you are using it for, in case there is some aspect of it that could make it amenable to an entirely different solution (one which your question doesn't give traction for, making this a shoe or bottle problem).
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
dchapiesky
Member
Member
Posts: 204
Joined: Sun Dec 25, 2016 1:54 am
Libera.chat IRC: dchapiesky

Re: And or double shift to select middle bits?

Post by dchapiesky »

Plagiarize. Plagiarize. Let not one line escape thine eyes...
User avatar
MuchLearning
Member
Member
Posts: 26
Joined: Sun Feb 12, 2017 1:50 am

Re: And or double shift to select middle bits?

Post by MuchLearning »

Schol-R-LEA wrote:Perhaps we need to start over from the top:

@MuchLearning, why do you want to know? I am not saying you shouldn't want to know, but the intent is relevant here, because (as I said earlier) this is the sort of micro--optimization that is entirely irrelevant in actual practice - any gain or loss from it is going be swamped by other factors unless it is a bottleneck in a very tight loop.

Pure and unbridled curiosity. :lol:
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: And or double shift to select middle bits?

Post by Solar »

1) Pick an architecture, compiler, and optimization level.

2) Measure.
Every good solution is obvious once you've found it.
arframme
Posts: 2
Joined: Sun Sep 03, 2017 4:29 am
Contact:

Re: And or double shift to select middle bits?

Post by arframme »


Thank you!
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: And or double shift to select middle bits?

Post by simeonz »

I know that the question refers to C for clarity, but it has to be said, that you can't optimize in such detail using C. The goal in such a language is to make the intent as easy to assimilate by the compiler as possible, which requires more knowledge about the compiler than the machine. That's why you mostly aim for expression that can be suitably implemented, but without delving into the specifics. Since the instruction output may vary from one version of the compiler to the next.

That being said, I intuitively prefer different operations when possible, because on some architecture they may be implemented in different execution units. This is not the case here, as a matter of fact, but it is a general tendency for me. You also want to eliminate data dependencies, which fails in both cases here. As Brendan pointed out, extracting the data from the least significant byte is good for x86, possibly for arm. The expression can be rewritten without the masking:

Code: Select all

uint8_t optionC = (uint8_t)(someBits >> 16);
I hate the readability though. The cast may be omitted, but it still messes up the clarity, because you have to think about twos complement roll over/modulo assignment in both cases. For cores with modern instruction caches, since the microops themselves are cached, the size and encoding complexity of the instructions in the inner loop should not matter that much. But this again is the kind of reasoning that requires at least some inline comments in the code, if used in C at all.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: And or double shift to select middle bits?

Post by Solar »

Encapsule the actual operation in a mnemonically-named inline function (or macro, if you must).

That way, you achieve
  • readability / clarity of intention no matter how obscure the operation, and
  • you can use #ifdef to select the fastest operation depending on platform / option switch (if you must).
You could even (gasp!) quickly compare the two different implementations, and see for yourself if they actually make any measurable difference in the big scheme...
Every good solution is obvious once you've found it.
Post Reply