Optimization to get a number in a range from a given value

~ · Post by ~ » Mon May 16, 2016 2:30 pm

I need an optimized way to limit a value and adjust it to get the offset that would correspond to it.

If I have a very big buffer in one place, but I want to copy only a tiny buffer, I still know the offset that would correspond if I want to access it (for example, if I want to read only 1 or 2 sectors of a FAT12 table at a time).

A slow way would be like this:

Code: Select all

var originalValue=24531;

//The following will result in 467,
//but will be uselessly slow:
///
 while(originalValue>=512)
 {
  originalValue-=512;
 }

And it seems that a way would be to use an AND with a mask set to all 1's for the size that corresponds to the size of the partial buffer. For example, for a 512-byte partial buffer, it would just take:

Code: Select all

var originalValue=24531;

//The following will result in 467, with base address of 0:
///
 originalValue&=parseInt("111111111", 2);   //AND with 511 to get a value between 0 to 511

___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
(FAT) Reading 2 File Allocation Table Sectors At a Time Instead of 1

It seems to be a trick similar to that from the Graphics Programming Black Book, which talks about reading and searching through partially-loaded files with resume.

It looks like some Cluster Numbers when stored on disk traverse and go beyond a single sector, specifically those that happen to start at byte 511 of a 512-byte sector. Each cluster for FAT12 uses 2 bytes, so it would end up with 1 byte in each sector, the first at the very end of the first sector (offset 511) and the second at the very start of the second/next sector (offset 0).

So, as long as we have already adjusted and know the actual offset of the cluster into the FAT, we will always avoid that division of a cluster value between 2 sectors if we simply read 2 FAT sectors instead of 1. Then the partial value will now always be available with this trick.

The code to handle it will also be smaller and we will just need a 1024-byte buffer instead of a 512-byte one.

It seems to be very good trick to be able to read partial values, be able to resume them, and avoid reading the whole data set (the whole FAT table).

___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
Getting Lowest and Highest Number from Bigger Value?

Now the question is, how could I also get an offset only between N and M, for example, only between 1-511 or only between 127-511 instead of the typical one 0-511 where only the highest value matters but not the lowest?

Rusky · Post by **Rusky** » Mon May 16, 2016 2:47 pm

The bitwise-and trick is a special case of modular arithmetic, where the modulus is a power of two. In general, you can get X into the range [0,N) with X % N.

If I understand you correctly, your [N,M) range can also be seen as N+[0,M-N), so you could use (X - N) % M + N.

Schol-R-LEA · Post by **Schol-R-LEA** » Tue May 17, 2016 10:39 am

Rusky wrote:The bitwise-and trick is a special case of modular arithmetic, where the modulus is a power of two. In general, you can get X into the range [0,N) with X % N.

If I understand you correctly, your [N,M) range can also be seen as N+[0,M-N), so you could use (X - N) % M + N.

While this is true, the trick is still one that could be of use, as it is both easy to code and faster than the modulo operator (which uses integer division to get the modulus, or rather, the remainder). OTOH, it only us usable for powers of two (though that should be the case for most values you would use), and the intent is less clear than using the modulo operator would be. It would probably be premature optimization to commit to it right at the start.

On the gripping hand, I would expect a good C compiler to recognize the special case of a modulo by a constant power of two and perform the optimization automagically (just as I would expect it to replace a multiply or divide with a shift when appropriate), so it is probably a moot point from a performance standpoint.

onlyonemac · Post by **onlyonemac** » Tue May 17, 2016 1:14 pm

Schol-R-LEA wrote:On the gripping hand, I would expect a good C compiler to recognize the special case of a modulo by a constant power of two and perform the optimization automagically (just as I would expect it to replace a multiply or divide with a shift when appropriate), so it is probably a moot point from a performance standpoint.

I would be interested to see if that is actually the case though, as I usually work on the assumption that the compiler doesn't perform such specific optimizations.

Octocontrabass · Post by **Octocontrabass** » Tue May 17, 2016 1:56 pm

onlyonemac wrote:I would be interested to see if that is actually the case though, as I usually work on the assumption that the compiler doesn't perform such specific optimizations.

GCC does. Here's a function that does nothing but return a value modulo 512 compiled for i386:

Code: Select all

mov    eax,DWORD PTR [esp+0x4]
and    eax,0x1ff
ret

Here's the same function for m68k:

Code: Select all

linkw %fp,#0
movel %fp@(8),%d0
andil #511,%d0
unlk %fp
rts

And powerpc:

Code: Select all

clrlwi  r3,r3,23
blr

I tried -O0, but it still generated the same optimized code (with prologue and epilogue fluff). I expect you'll see similar results with other optimizing compilers.

Schol-R-LEA · Post by **Schol-R-LEA** » Wed May 18, 2016 2:47 pm

Thank you, Octocontrabass, I should have thought to check before posting that myself. I didn't think I would need to, though; like constant folding and tail-call optimization, power-of-two optimizations for integer arithmetic are pretty well established, and unlike TCO, it is hard to see any possible downsides to it (TCO folds the stack, making tracing it in debugging impossible without adding large data structures and expensive operations while losing most of the advantages of TCO). It would come well before more complicated techniques such as even basic register painting in the list of things a compiler writer would implement, and would be feasible even with a fairly simple recursive-descent direct-emit compiler, which is problematic even for something as simple as constant folding (which could require a peephole pass over the output of such a compiler, since a basic aspect of RDDE parsers is that they don't need to keep any information about previous statements aside from the symbol table).

Schol-R-LEA · Post by **Schol-R-LEA** » Wed May 18, 2016 3:19 pm

Oh, and just to show that TCO is simple enough to do (sometimes) with an RDDE parser that emits assembly code, here's how it works.

1) at the beginning of every function, emit a secondary entry point label right after the activation frame is created.
2) Keep a table with the sizes of the activation records of every compiled function, and another with the sizes of their their parameter lists.

3) when parsing a return statement:

Code: Select all

 if the value is the return of another function, then it is a tail call
    if ((AR_size[callee] != NULL && PL_size[callee] != NULL) && (AR_size[callee] <= AR_size[caller]  &&  PL_size[callee] <= PL_size[callee]))
        replace the current arguments on the stack with the function call's arguments
        emit jump to the secondary entry point of the callee
    else
        emit normal function call
else
    not a tail call, emit value return as usual

While this is a simplified version, and It isn't a perfect solution (as it won't work if the callee hasn't been compiled yet or is an external call, or if the functions in question are type void), it will always work for simple tail recursion on a non-void function, and should work for many other cases as well. Still, the point is that yes, TCO is indeed a fairly basic technique that pretty much even the simplest compilers should be able to offer as an optional setting.

Now, just to prove the point, I'm going to try to add this to my own RDDE-based Suntiger Algol compiler, which admittedly I haven't touched in several years.

Schol-R-LEA · Post by **Schol-R-LEA** » Wed May 18, 2016 6:31 pm

sigh Me and my big fat mouth...

It seems I forgot a few things about that project, not the least of which being giving you a link to the Suntiger Algol GitHub repo. Most significantly, I forgot a) that I never did write the code to parse user-defined functions and function calls, and b) just what an unholy mess the code I wrote in the last three weeks of that course was. I think that taking some time to work on this might be a good idea, if only as a confidence-building warm-up exercise. So, the agenda for the rest of this week:

Fix the code for storing and retrieving labels, emitting code, and who knows what else,
add a symbol table subclass specifically for function names and signatures,
Write a simple function call parser,
write a simple function definition parser, and
extend those to handle tail call optimization

Mind you, I have long wanted to do this, and eventually want to re-write the parser itself to generate an AST (as I wanted to do from the first) instead of directly emitting code. Maybe this is a good thing, I know this is something I can do and it is probably worth doing if we ever get the CompilerDev wiki going.

OSDev.org

Optimization to get a number in a range from a given value

Optimization to get a number in a range from a given value

Re: Optimization to get a number in a range from a given val

Re: Optimization to get a number in a range from a given val

Re: Optimization to get a number in a range from a given val

Re: Optimization to get a number in a range from a given val

Re: Optimization to get a number in a range from a given val

Re: Optimization to get a number in a range from a given val

Re: Optimization to get a number in a range from a given val