comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Histogram indeterminate results
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Re: Histogram indeterminate results [message #33188] Thu, 12 December 2002 15:12 Go to next message
K. Bowman is currently offline  K. Bowman
Messages: 330
Registered: May 2000
Senior Member
In article <ataqhb$prr$1@skates.gsfc.nasa.gov>,
thompson@orpheus.nascom.nasa.gov (William Thompson) wrote:

> I think the problem is that NBINS should be
>
> nbins = nx*ny*nz + 1
>
> to accommodate points which fall on exactly the maxima, i.e. points with
> x=360, y=1, and z=1000, or so close that round-off error makes it look that
> way. I would change your code to read

That is plausible, but the actual values of the input to histogram at
the time the error occurred are clearly not out of the range [0, nbins]

>> IDL> n = nx*ny*LONG((z0 - z_min)/dz) + $
>> IDL> nx*LONG((y0 - y_min)/dy) + $
>> IDL> LONG((x0 - x_min)/dx)
>> IDL> print, min(n), max(n), nbins
>>
>>
>> 30 143991 144000

> P.S. I would also check that there isn't some problem with the way the
> keyword
> SINE_LAT is handled.

I was worried about round-off error, in that calculation in particular,
but as the diagnosis shows, 0 is LE n and LT nbins.

The really disturbing thing is that running the same program repeatedly
on the same data generates errors in different places.

Ken
Re: Histogram indeterminate results [message #33193 is a reply to message #33188] Thu, 12 December 2002 12:11 Go to previous messageGo to next message
thompson is currently offline  thompson
Messages: 584
Registered: August 1991
Senior Member
Kenneth:

I think the problem is that NBINS should be

nbins = nx*ny*nz + 1

to accommodate points which fall on exactly the maxima, i.e. points with
x=360, y=1, and z=1000, or so close that round-off error makes it look that
way. I would change your code to read

n = nx*ny*LONG((z0 - z_min)/dz) + $
nx*LONG((y0 - y_min)/dy) + $
LONG((x0 - x_min)/dx)
if (MIN(n) lt 0) OR (MAX(n) GE nbins) THEN $
MESSAGE, 'Some data out of range'
hh = HISTOGRAM(n, MIN = 0, BINSIZE = 1, NBINS = nbins)

(If you lengthen NBINS by one, then change the "GE" to "GT".) That way, you
can isolate whether the problem is within HISTOGRAM or in the data that you're
passing to it.

William Thompson

P.S. I would also check that there isn't some problem with the way the keyword
SINE_LAT is handled.


Kenneth Bowman <k-bowman@null.tamu.edu> writes:

> Since HISTOGRAM is undoubtedly the most important procedure in all of
> IDL, I thought many folks would be interested in this problem. I'll be
> happy if someone tells me that the fault lies with my code rather than
> IDL. I am sending a problem report to RSI.

> Ken Bowman


> This is IDL 5.6 under Mac OS X 10.2.

> My program reads a large number of data files containing the positions
> of points in a finite 3-D volume (x,y,z). It uses histogram to find the
> number of particles in discrete boxes (bins) within the 3-D volume. To
> check the calculations, after each file I compare the number of
> particles with the cumulative total of the histogram. If they do not
> match, the program stops and issues an error message. This is a check
> to make sure the input points are all in the volume.

> Repeatedly running the program on the same input files results in the
> program stopping at different points in the execution (that is, while
> processing different input files).


> Here is the basic code:

> h = LONARR(nx, ny, nz)
> nbins = nx*ny*nz
> FOR ifile = 0, nfiles-1 DO BEGIN
> iid = NCDF_OPEN(infile[ifile])
> NCDF_VARGET, iid, x_name, x0
> NCDF_VARGET, iid, y_name, y0
> NCDF_VARGET, iid, z_name, z0
> NCDF_CLOSE, iid
> np = N_ELEMENTS(x0)

> IF KEYWORD_SET(sine_lat) THEN y0 = SIN(!DTOR*y0)

> hh = HISTOGRAM(nx*ny*LONG((z0 - z_min)/dz) + $
> nx*LONG((y0 - y_min)/dy) + $
> LONG((x0 - x_min)/dx), $
> MIN = 0, BINSIZE = 1, NBINS = nbins)

> IF (ROUND(TOTAL(hh, /DOUBLE)) NE np) THEN $
> MESSAGE, 'Some particles not counted in histogram.'

> h = h + hh
> ENDFOR



> Here is the output and some diagnostics. The program stops at the
> MESSAGE statement in the FOR loop above.

> % MEAN_HIST_XYZ: Some particles not counted in histogram.

> IDL> print, ROUND(TOTAL(hh, /DOUBLE)), np
> 399999 400000

> The cumulative histogram is less than the number of input points.

> First I check to make sure that the input values are in the correct
> range, x = {0, 360}, y = {-1, 1}, and z = {0, 1000}.

> IDL> print, min(x0), max(x0), min(y0), max(y0), min(z0), max(z0)
> 0.00134277 359.999 -0.999998 0.999993 3.78302e-06
> 1000.00
>
> The input values look OK.

> Next I check to be sure that I am computing the bin indices correctly:

> IDL> n = nx*ny*LONG((z0 - z_min)/dz) + $
> IDL> nx*LONG((y0 - y_min)/dy) + $
> IDL> LONG((x0 - x_min)/dx)
> IDL> print, min(n), max(n), nbins
>
> 30 143991 144000
>
> These are also in the correct range, {0, 144000}.

> Finally, I re-compute the histogram. (Sorry, I should have saved the
> old one for comparison.)

> IDL> hh = HISTOGRAM(nx*ny*LONG((z0 - z_min)/dz) + $
> IDL> nx*LONG((y0 - y_min)/dy) + $
> IDL> LONG((x0 - x_min)/dx), $
> IDL> MIN = 0, BINSIZE = 1, NBINS = nbins)
> IDL> print, ROUND(TOTAL(hh, /DOUBLE)), np
>
> 400000 400000


> This time it (apparently) counts the particles correctly.

> It does not appear to be an I/O problem (no I/0 since the error
> occurred). It looks like histogram is producing indeterminate results.
Re: Histogram indeterminate results [message #33250 is a reply to message #33188] Sat, 14 December 2002 07:29 Go to previous message
Wolf Schweitzer is currently offline  Wolf Schweitzer
Messages: 21
Registered: October 2001
Junior Member
I can't offer help, but I can offer another view on this which relates
to similar errors of IDL.

A while ago, I was intrigued about not finding the correct solution for
a couple of trigonometric operations. My results were OFF! I used
textbooks. I used Mathematica. I used paper and pencil. I used IDL break
points and error visualization. And lo and behold, I found the problem.

Now know that mathematical results can be *simply* numerically unstable
if calculated in the wrong data type on IDL.

My solution is to test code items in critical ranges and go down to that
level of numerical stability / instability, and have errors or
confindence levels, typed out and included in your operations where
applicable (some calls to 'minimum' functions, for example, should add
for slack of about 3* machine epsilon), and use appropriate data types
if necessary. It may mean to alter routines or functions called by your
code.

I visualized a crass example here:
http://www.swisswuff.ch/pnphoenix721/html/modules.php?op=mod load&name=News&file=article&sid=11&mode=thre ad&order=0&thold=0

This example is nothing else but trying to find the center of a circle
in 3 3d-points, by cutting the planes through the midpoints of the
triangle sides (result: centerline) with the triangle plane.

What you see here is the error between floating point ('wrong') and
double precision ('correct') solutions. The closer two triangle points
of the initial triangle are, the larger the error in getting the
centerpoint using floating point variables. Of course, double precision
variables also, eventually, will reach a similar limit.

Does this, in any abstract way, relate to your problem?

Wolf Schweitzer
http:/www.swisswuff.ch


Kenneth Bowman wrote:
> In article <ataqhb$prr$1@skates.gsfc.nasa.gov>,
> thompson@orpheus.nascom.nasa.gov (William Thompson) wrote:
>
>
>> I think the problem is that NBINS should be
>>
>> nbins = nx*ny*nz + 1
>>
>> to accommodate points which fall on exactly the maxima, i.e. points with
>> x=360, y=1, and z=1000, or so close that round-off error makes it look that
>> way. I would change your code to read
>
>
> That is plausible, but the actual values of the input to histogram at
> the time the error occurred are clearly not out of the range [0, nbins]
>
>
>>> IDL> n = nx*ny*LONG((z0 - z_min)/dz) + $
>>> IDL> nx*LONG((y0 - y_min)/dy) + $
>>> IDL> LONG((x0 - x_min)/dx)
>>> IDL> print, min(n), max(n), nbins
>>>
>>>
>>> 30 143991 144000
>>
>
>> P.S. I would also check that there isn't some problem with the way the
>> keyword
>> SINE_LAT is handled.
>
>
> I was worried about round-off error, in that calculation in particular,
> but as the diagnosis shows, 0 is LE n and LT nbins.
>
> The really disturbing thing is that running the same program repeatedly
> on the same data generates errors in different places.
>
> Ken
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Re: Proper use of assoc
Next Topic: Histogram indeterminate results

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 13:39:00 PDT 2025

Total time taken to generate the page: 0.00665 seconds