float/double/long double Implementation

01000101 · Post by **01000101** » Sat Oct 11, 2008 6:03 pm

I've been doing some research on public-key cryptography and it is well known that keys for algorithms such as RSA need to be very large ( > 128 bits). This is an issue seeing as how the largest variable available is only 64-bits wide. I would like to get basic functionality from precision integers (float/double/long double) so that I may implement these algorithms, but without having to go through the horrendous code for arbitrary-precision logic (a ton of chars making up a logical very large integer).

I have the FPU setup and ready to go, I can use and alter floats/doubles/long doubles just fine and without any GPF's. I believe precision is to 64 bits, but I'm not too sure as I can't find the documentation/explanation of the FPU control register/words, btw, does anyone have information on this? I would really like to figure out the default precision and how to set it.

I thought about implementing the MPFR library as it is under the LGPL and is somewhat small, but it is not very portable and has alot that I don't need (arbitrary-precision support).

I have some 'extremely' basic functions that only deal with positive numbers (i think). What I really need to work on is a float/double/etc.. 2 char array function so that I may produce keys for these algorithms.

Code: Select all

long double ld_shl(long double ld, uint16 count)
{
    while(count--)
    {ld *= 2;}
    return ld;
}

long double ld_shr(long double ld, uint16 count)
{
    while(count--)
    {ld /= 2;}
    return ld;
}

long double ld_pow(long double ld, long double exp)
{
    long double n = ld;
    if(!exp){return 1;}
    while(--exp){ld *= n;}
    return ld;
}

Am I even going about this the right way to produce large RSA primes? or should I just give up on this and go for a library (last resort).

Also, is there a way to assign an integer >= 64-bits wide to a long double without GCC freaking out about it being to large? Does this have to be done in ASM with xmm's?

bewing · Post by **bewing** » Sat Oct 11, 2008 6:34 pm

I can only comment on a little of this, but here goes:

The way that integer math works on an x86, it is very easy to make "integers" that are some multiple in length of the basic integer size -- in ASM. For addition, subtraction, shifts, etc., all you need to do is "carry" the carry bit from one instruction to the next.
Multiplication and division are already done in sizes that are twice the size of the basic integer size.
If you need to do more than that, some simple algebra can quickly add another 32 or 64 bits to any operation you are doing.

For FPU -- the size of the value is 80 bits. This is a "double" float. 64 bits for the "mantissa", and 16 bits for the exponent.

Craze Frog · Post by **Craze Frog** » Sun Oct 12, 2008 9:52 am

Values on the FPU are 80 bits. This is an "extended" float. Double floats are 64 bits, single floats are 32 bits.

Here is how to easily do huge but finite-precision integers in asm:
http://net.pku.edu.cn/~course/cs201/200 ... hmetic.pdf

Solar · Post by **Solar** » Sun Oct 12, 2008 10:48 am

Erm... where to begin?

I think it's best not to tell you "it cannot be done because...", but rather point a couple of things out that you should research before you continue down the road you chose.

normalized vs. denormalized numbers,
NaNs, especially signalling NaNs,
Infinity values,
FPU exceptions.

The x86 can handle floating point numbers up to 80 bits, but IEEE 754 defines "doubles" as 64 bit (which can optionally be set on the x86).

JamesM · Post by **JamesM** » Sun Oct 12, 2008 10:58 am

FWIW, My understanding was that even though the x87 FPU uses an 80bit representation internally, it's just that - internal. An IEEE754 float/double-precision float is converted into an intermediate representation with a larger mantissa so that rounding errors during operations are reduced. That 80bit value is then truncated again back into a float/double.

So although it's 80bit, those extra 16 bits are only there to avoid rounding errors, not to support larger numbers.

01000101 · Post by **01000101** » Sun Oct 12, 2008 1:27 pm

@JamesM: What is FWIW?? "From What I ..." ?

I have started reading the PDF provided earlier and it seems to address the exact issue I'm having related to calculating huge integers. Using multiple 32-bit values, it is not that hard to create psuedo 128+ integers using the array. I already have a working addition function that adds variable bit integers together.

I really thought that since the "long double" type had a very large exponent container, that it would somehow produce the actual fully extended integer, eg: 123456^1234.

Brynet-Inc · Post by **Brynet-Inc** » Sun Oct 12, 2008 6:59 pm

01000101 wrote:@JamesM: What is FWIW?? "From What I ..." ?

It means "For what it's worth", please learn to Google.

Thank you.

01000101 · Post by **01000101** » Sun Oct 12, 2008 11:39 pm

Sorry for my lack of enthusiasm to Google every acronym I see online, if I did that, I would never be off line.

Anyways,
I need some help understanding how to implement the multiplication and division functions. The addition and subtraction functions were pretty straight forward as far as carry usage goes, but these two are seemingly far more complex of beasts.

The examples in the previously posted PDF shows a basic representation of the action, but I believe it is very intertwined with the specified bit length that the author has picked (128).

Does anyone know of any basic examples of these using variable lengths? I have been searching for a while and have come up with pretty much nothing, I even searched for pseudo-code just to get a glimpse of a basic implementation. I could just (for multiplying) use the addition function in a loop, but that seems to be very inefficient.

bontanu · Post by **bontanu** » Mon Oct 13, 2008 12:11 am

JamesM : you are wrong.

Combuster · Post by **Combuster** » Mon Oct 13, 2008 4:00 am

@01000101: for basic RSA you only need an large-width add, multiply (exponent is a series of multiplications) and modulus operation. I don't know an algorithm for computing large primes though, but since its integer maths I doubt you'll need floating point support for that.
Try SIMD stuff instead

bontanu wrote:JamesM : you are wrong.

Proof wanted.

AJ · Post by AJ » Mon Oct 13, 2008 4:02 am

bontanu wrote:JamesM : you are wrong.

This is not a very helpful comment. I would also like to see proof and even if you are correct, it would have been much more helpful to anyone reading this thread if you explained how JamesM is wrong.

Cheers,
Adam

Craze Frog · Post by **Craze Frog** » Mon Oct 13, 2008 7:22 am

AJ wrote:
bontanu wrote:JamesM : you are wrong.
This is not a very helpful comment. I would also like to see proof and even if you are correct, it would have been much more helpful to anyone reading this thread if you explained how JamesM is wrong.

Cheers,
Adam

It's not very helpful to post misleading information, either (even though it was just his understanding).

Code: Select all

fld tword [my_extended_var]
fstp tword [my_extended_var]

Those loads and stores 80-bit floats. How can 80-bit floats possibly be only internal to the fpu, when they can be loaded and stored to and from memory???

Double precision floats have an 11-bit exponent and a 52-bit significand. Extended precision floats have a 15-bit exponent and a 64-bit significand. Obviously these numbers can be bigger, anything else would be nonsense.

JamesM · Post by **JamesM** » Mon Oct 13, 2008 7:38 am

Craze Frog wrote:
AJ wrote:
bontanu wrote:JamesM : you are wrong.
This is not a very helpful comment. I would also like to see proof and even if you are correct, it would have been much more helpful to anyone reading this thread if you explained how JamesM is wrong.

Cheers,
Adam
It's not very helpful to post misleading information, either (even though it was just his understanding).
Code: Select all
fld tword [my_extended_var]
fstp tword [my_extended_var]
Those loads and stores 80-bit floats. How can 80-bit floats possibly be only internal to the fpu, when they can be loaded and stored to and from memory???

Double precision floats have an 11-bit exponent and a 52-bit significand. Extended precision floats have a 15-bit exponent and a 64-bit significand. Obviously these numbers can be bigger, anything else would be nonsense.

Apologies, I was confusing the x86's FPU with the PowerPC, or SPARC, or something. That definately applies to one mainstream architecture - I thought it was x86. The information posted wasn't deliberately misleading!

James

Brendan · Post by **Brendan** » Mon Oct 13, 2008 7:57 am

Hi,

01000101 wrote:I need some help understanding how to implement the multiplication and division functions. The addition and subtraction functions were pretty straight forward as far as carry usage goes, but these two are seemingly far more complex of beasts.

For multiplication, back in school you would've been taught how to multiply 2 large numbers. For example:

Code: Select all

    1 2 3 4
*       1 2
-----------
    2 4 6 8
+ 1 2 3 4 0
-----------
= 1 4 8 0 8

This is "base 10" mathmatics, where there's 10 digits. The thing is, it works the same in any base. What if you used "base 256" instead of "base 10"?

Code: Select all

       0x01 0x02 0x03 (66051 in decimal)
*           0x01 0x02 (258 in decimal)
---------------------
       0x02 0x04 0x06 (132102 in decimal)
+ 0x01 0x02 0x03 0x00 (16909056 decimal)
---------------------
= 0x01 0x04 0x07 0x06 (17041158 decimal)

See how it works the same?

You could multiply a pair of 256-bit integers together using this method without much trouble, but with a 32-bit CPU why use "base 256"? You could just as easily use "base 4294967296". With "base 4294967296" working on 256-bit integers, you'd actually be multiplying an 8-digit number with another 8-digit number, which works out to a maximum of 64 smaller (32-bit) multiplications, and the final 512-bit addition would be done with sixteen 32-bit additions (using the "ADC" instruction). Note: when multiplying 256-bit integers together, the result will be a 512-bit integer (or a smaller integer that could overflow).

Also, back in school you would've been taught "long division". For example:

Code: Select all

         1 2 3 4      <-answer
 1 2 ) 1 4 8 0 9
     - 1 2
      ----
         2 8
       - 2 4
        ----
           4 0
         - 3 6
          ----
             4 9
           - 4 8
            ----
             0 1         <- remainder

This also works in any number base, including "base 4294967296".

Lastly, for shifting there's a pair of instructions "SHLD" and "SHRD" which shift N bits from the source operand into the destination operand. For example, if EAX = 0x01234567 and EBX = 0x89ABCDEF, then after doing "SHLD eax, ebx, 16" you'd end up with EAX = 0x456789AB. This saves you the hassle of shifting one bit at a time in a loop.

However, if the shift count is too large you'll need to "pre-shift". For example, if you're operating on a string of dwords and the shift count is larger than 32, then you'd need to use "rep movsd" to pre-shift it by "N / 32 bits" and then use "shift count = shift_count & 0x1F" with SHLD/SHRD.

Also note that it's possible to use addition and shifting to implement multiplication; and subtraction, comparison and shifting to implement division. This is actually the exact same thing as above, but using "base 2". AFAIK this is the method that CPU's use internally to implement multiplication and division.

For performance reasons you'd probably want to use the largest base you can. For e.g. it's better to multiply a pair of 8-digit "base 4294967296" numbers together than it is to multiply a pair of 32-digit "base 256" numbers together or a pair of 256-digit "base 2" numbers (even though you'd get identical results).

Cheers,

Brendan

Craze Frog · Post by **Craze Frog** » Mon Oct 13, 2008 12:22 pm

JamesM wrote:Apologies, I was confusing the x86's FPU with the PowerPC, or SPARC, or something. That definately applies to one mainstream architecture - I thought it was x86. The information posted wasn't deliberately misleading!

James

It could well be that you got that information presented as correct, as loading and storing 80-bit floats is not normal practice, so whoever told you simply didn't know it was possible. It is usually regarded as an internal format since it's seldom stored to memory. But that doesn't mean it can't be stored to memory.

OSDev.org

float/double/long double Implementation

float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation

Re: float/double/long double Implementation