comp.lang.idl-pvwave archive: archive » Re: converting floats to doubles

Home » Public Forums » archive » Re: converting floats to doubles

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Re: converting floats to doubles [message #44102]

Fri, 20 May 2005 14:35

Michael Wallace
Messages: 409
Registered: December 2003

Senior Member

>> dblarr(n) is the same as double(fltarr(n)). The fltarr(n) will create
>> n many floating point numbers, all of which are 0. Converting all
>> these floats into doubles will yield a double array where all values
>> are 0 and this is the very same thing as dblarr(n).
>
>
> Actually I meant dindgen(n) vs. double(findgen(n)) in the first place. I
> always mix them up ... But I think I got the point.

Ah, okay. Same logic still applies.

Because doubles have the same structure as floats, but more capacity,
the set of all possible double precision values on a particular
architecture is a superset of the set of all possible single precision
(float) values on the same architecture. That, in short, was what all
my rambling before was trying to say. I have never been one to really
understand what the word "concise" means. :-)

-Mike

Report message to a moderator

Re: converting floats to doubles [message #44103 is a reply to message #44102]

Fri, 20 May 2005 13:58

Benjamin Hornberger is currently offline

Benjamin Hornberger
Messages: 258
Registered: March 2004

Senior Member

Michael Wallace wrote:
>
> dblarr(n) is the same as double(fltarr(n)). The fltarr(n) will create n
> many floating point numbers, all of which are 0. Converting all these
> floats into doubles will yield a double array where all values are 0 and
> this is the very same thing as dblarr(n).

Actually I meant dindgen(n) vs. double(findgen(n)) in the first place. I
always mix them up ... But I think I got the point.

Thanks,
Benjamin

Report message to a moderator

Re: converting floats to doubles [message #44104 is a reply to message #44103]

Fri, 20 May 2005 13:52

Michael Wallace
Messages: 409
Registered: December 2003

Senior Member

First, let me assure you that when you convert any number from a float
to a double, there is absolutely no change in value. When you print out
a value and do not supply a specific format, IDL will show a different
number of decimal places depending on whether the number is a float or
double. Here's an example of what I mean:

IDL> print, float(3), double(3)
3.00000 3.0000000

The above results are just because IDL's default format for floats are
different than the default format for doubles.

Floating point numbers do not represent a number exactly. Floating
point numbers are composed of a sign bit, exponent and a mantissa.
These three values are then fed into an equation which then produces the
actual floating point number we see. This why I can say that converting
a number from a float to a double doesn't change anything. The
mantissa, exponent and sign bit of the float are copied directly into
sign bit, mantissa and exponent of the double. The extra bits in the
double are just left at 0.

dblarr(n) is the same as double(fltarr(n)). The fltarr(n) will create n
many floating point numbers, all of which are 0. Converting all these
floats into doubles will yield a double array where all values are 0 and
this is the very same thing as dblarr(n).

sqrt(dblarr(n)) is not the same thing as double(sqrt(fltarr(n)). In the
latter, you are taking the sqrt of the floating point numbers first.
There will be precision lost because the float mantissa is only capable
of storing so much information. If you take this value and cast it into
a double, the less precise float value is preserved and the other bits
of the double's mantissa are left at 0. Had you done sqrt(dblarr(n)),
the sqrt operation would have been calculated using double precision
arithmetic and the entire mantissa of the double would be filled.
Because the double's mantissa is larger than the float's mantissa, it is
able to store more precision.

A lot of the gory details of IEEE 754, the specification of floating
point numbers, can be found here: http://en.wikipedia.org/wiki/IEEE_754.

-Mike

Benjamin Hornberger wrote:
> Hi computation gurus,
>
> is dblarr(n) equivalent in precision to double(fltarr(n))? I know that
> in a case like sqrt(dblarr(n)) vs. double(sqrt(fltarr(n))), they are not
> equivalent (the second version is not true double precision). But I
> thought when I start with whole numbers anyway, it might be the case.
>
> In other words, when a floating point number is converted to double, are
> the additional digits always set to zero, or is it possible that they
> aren't? I tried it out by printing some numbers, and it looks like they
> add only zeroes, but I would be happy if the experts could confirm.
>
> Thanks for any insight,
>
> Benjamin

Report message to a moderator

Re: converting floats to doubles [message #44105 is a reply to message #44104]

Fri, 20 May 2005 13:17

Dick Jackson
Messages: 347
Registered: August 1998

Senior Member

Hi,

"Benjamin Hornberger" <benjamin.hornberger@stonybrook.edu> wrote in message
news:428e3721_4@marge.ic.sunysb.edu...
> Hi computation gurus,
>
> is dblarr(n) equivalent in precision to double(fltarr(n))? I know that in
> a case like sqrt(dblarr(n)) vs. double(sqrt(fltarr(n))), they are not
> equivalent (the second version is not true double precision). But I
> thought when I start with whole numbers anyway, it might be the case.
>
> In other words, when a floating point number is converted to double, are
> the additional digits always set to zero, or is it possible that they
> aren't? I tried it out by printing some numbers, and it looks like they
> add only zeroes, but I would be happy if the experts could confirm.

You're right, it's good to be careful about these things, but indeed there
are a lot of integers that are precisely correct in Float (and even more in
Double). Empirically:

;; Run a loop until the Double version of an integer is not equal to
;; the Float version (this took several seconds to run)

IDL> for i=0D,1D9 do if i ne Double(Float(i)) then break

;; Variable 'i' (Double) has the first mismatch...

IDL> print,i,format='(F20.10)'
16777217.0000000000
IDL> print,Float(i),format='(F20.10)'
16777216.0000000000

Well, look at that, the number of good integer Floats is 256^3:
IDL> print,256L*256*256
16777216
... which makes all kinds of sense, as a Float has 3 bytes for the mantissa
(or significand). For more, see:

http://en.wikipedia.org/wiki/Floating_point

Cheers,
--
-Dick

Dick Jackson / dick@d-jackson.com
D-Jackson Software Consulting / http://www.d-jackson.com
Calgary, Alberta, Canada / +1-403-242-7398 / Fax: 241-7392

Report message to a moderator

Re: converting floats to doubles [message #44194 is a reply to message #44103]

Sun, 22 May 2005 16:01

Peter Mason
Messages: 145
Registered: June 1996

Senior Member

Benjamin Hornberger wrote:
<...>
> Actually I meant dindgen(n) vs. double(findgen(n)) in the first
> place. I always mix them up ... But I think I got the point.

Hi Benjamin,

A single-precision float (as in findgen()) has a mantissa that's effectively
24 bits. Consequentially it represents every integer in the range
[-16777215, 16777215] *exactly*. So if your "n" in "findgen(n)" is smaller
than 1677216, it doesn't matter whether you use double(findgen(n)) or
dindgen(n).
For bigger "n", you will start getting repeat values if you use
double(findgen(n)) as the float mantissa will have run out of bits and the
double conversion happens after the fact. Here, dindgen() is better. The
double mantissa is much bigger (effectively 53 bits) so dindgen ~should~
only start hitting repeats for "n" >= 2^53. (I can't actually check if the
algorithm within dindgen() behaves properly outside the 32-bit unsigned
integer range. Dindgen(2LL^32LL) alone would be a 32GB array! :-))

Peter Mason

Report message to a moderator

Previous Topic:	ascii-template
Next Topic:	Re: Catch the area outside a contour

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Fri Nov 28 14:04:21 PST 2025

Total time taken to generate the page: 2.00674 seconds