comp.lang.idl-pvwave archive: archive » Chunk Array Decimation

Home » Public Forums » archive » Chunk Array Decimation

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Re: Chunk Array Decimation [message #32420 is a reply to message #32363]

Wed, 09 October 2002 17:18

JD Smith
Messages: 850
Registered: December 1999

Senior Member

On Mon, 07 Oct 2002 08:51:21 -0700, Jaco van Gorkom wrote:

> "JD Smith" <jdsmith@as.arizona.edu> wrote in message
> news:pan.2002.10.04.22.07.45.664757.24772@as.arizona.edu...
>
> JD, we mortals need a tutorial or two in order to even begin to
> understand that Sparse Array trick you pulled there. My humble
> contribution to this thread is to propose an additional and hopefully
> simpler (or, more intuitive) algorithm: let us first sort the data array
> based on the index array, and then use a cumulative total to do the
> chunking.

Well, think of an array multiplication of a very large array on a very
long data vector. Consider only the first row. If almost everything
in the first column is 0., but a few choice 1.'s, that is the same as
selecting those elements of the data vector and totaling them. If you
really had to do all the null operations like:

...+0.*data[12]+0.*data[13]+...

then it would do you no good at all: all those useless
multiply-by-zeroes would waste far too much time to be efficient.
Fortunately, "sparse" arrays were invented for just this problem: you
specify only the non-zero elements, and, when multiplying using
sprsax, they alone are computed. If you adjust the array values, you
could obviously use this to do much more than totaling selected data
elements.

> The above algorithm is by far not fast enough because of the very slow
> SORT() operation. If this is replaced by a sorting routine which is more
> optimised for the problem at hand, such as, well, I'll let you guess;) ,
> then things get much better:
> h = HISTOGRAM(inds, REVERSE_INDICES=ri) nh = N_ELEMENTS(h) sortData =
> data[ inds[ri[nh+1:*]] ]
> totSortData = [0., TOTAL(sortData, /CUMULATIVE)] vec8 =
> totSortData[ri[1:nh]-nh-1] - $
> totSortData[ri[0:nh-1]-nh-1]

Aha, a very interesting and compact submission. I think there's one
error there. Should it not be data[ri[nh+1:*]] without the inds? I
translated as:

h = histogram(inds, REVERSE_INDICES=ri)
nh = n_elements(h)
tdata = [0.,total(data[ri[nh+1:*]],/CUMULATIVE)]
vec9 = tdata[ri[1:nh]-nh-1]-tdata[ri[0:nh-1]-nh-1]

The biggest problem with this method, though, is roundoff error, which
accumulates like mad in a cumulative total of any length. If you do
it in floating point, as you've written, I find unacceptably large
roundoff errors for as few as 500 individual indices. If I convert
the addition to DOUBLE, this mitigates (but does not eliminate) the
roundoff errors, but then it erases the slight time advantage this
method has over the prior champ (the dual histogram). Neither, of
course, come close to the compiled DLM :(.

Thanks for the example.

JD

Report message to a moderator

[Message index]

		Chunk Array Decimation By: JD Smith on Tue, 01 October 2002 12:32
		Re: Chunk Array Decimation By: JD Smith on Fri, 04 October 2002 15:07
		Re: Chunk Array Decimation By: Jaco van Gorkom on Mon, 07 October 2002 08:51
		Re: Chunk Array Decimation By: JD Smith on Wed, 09 October 2002 17:18
		Re: Chunk Array Decimation By: JD Smith on Wed, 09 October 2002 17:19
		Re: Chunk Array Decimation By: Jaco van Gorkom on Thu, 10 October 2002 07:26
		Re: Chunk Array Decimation By: JD Smith on Thu, 10 October 2002 11:51
		Re: Chunk Array Decimation By: Keflavich on Tue, 18 November 2008 07:21
		Re: Chunk Array Decimation By: Wout De Nolf on Tue, 18 November 2008 02:52
		Re: Chunk Array Decimation By: Keflavich on Mon, 17 November 2008 13:40

Previous Topic:	ASTER Data
Next Topic:	reading slices of FITS files

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sun Nov 30 22:50:12 PST 2025

Total time taken to generate the page: 0.96283 seconds