comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Speed-up of code
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Re: Speed-up of code [message #67848] Tue, 25 August 2009 15:30
JDS is currently offline  JDS
Messages: 94
Registered: March 2009
Member
On Aug 25, 12:12 pm, Philip Elson <philipel...@googlemail.com> wrote:
>>   h = HISTOGRAM(day, REVERSE_INDICES=ri)
>>   nh = N_ELEMENTS(h)
>>   sortData = value[ri[nh+1:*]]
>>   totSortData = [0., TOTAL(sortData, /CUMULATIVE)]
>>   vec8 = totSortData[ri[1:nh]-nh-1] - $
>>          totSortData[ri[0:nh-1]-nh-1]
>>   print,vec8/h  ;May need to check for zero values first
>
> Aha! Thank you all very much!
>
> If I was to extend this further, any suggestions on the obtaining the
> maximum rather than the average of each bin?
>
> This would indeed provide some nice examples of gathering basic
> statistics using the histogram function!

The "dual histogram" method is useful for this. I'd use it like this:

h=histogram(day,REVERSE_INDICES=ri)
mx=fltarr(n_elements(h))
h2=histogram(h,OMIN=om,REVERSE_INDICES=ri2)
for k=long(om eq 0),n_elements(h2)-1 do begin
if h2[k] eq 0 then continue
bins=ri2[ri2[k]:ri2[k+1]-1] ;bins of the histogram with this
repeat count

if k+om eq 1 then begin
mx[bins]=value[ri[ri[bins]]]
endif else begin
targ=[h2[k],k+om]
points=ri[ri[rebin(bins,targ)]+rebin(lindgen(1,k+om),targ)]
mx[bins]=max(value[points],DIMENSION=2)
endelse
endfor

You'll note this does use a loop, but it's a loop over the number of
different bin occupation counts, i.e. a very small loop typically,
even with very large inputs. Please note that this is really only
useful compared to your 2nd example above only if you need speed at
the expense of clarity. You might want to set the array to NaN
originally, to notice any "missing" days in this example. Beware that
the cumulative total method of binning suffers greater round-off error
than the other methods, so you might be advised to use it with /
DOUBLE.

JD
Re: Speed-up of code [message #67856 is a reply to message #67848] Tue, 25 August 2009 09:18 Go to previous message
philipelson is currently offline  philipelson
Messages: 9
Registered: March 2009
Junior Member
> Those are the techniques I would have tried! Be careful: in your
> second example, you don't handle the case where the histogram bin h[i]
> is empty. You just need an "if h[i] GT 0" test there.

Wayne,

To fix the note you have added in your code

> print,vec8/h ;May need to check for zero values first

I added the line:

print,vec8/(h>1)

Which does the trick nicely.

Regards,

Philip
Re: Speed-up of code [message #67857 is a reply to message #67856] Tue, 25 August 2009 09:14 Go to previous message
philipelson is currently offline  philipelson
Messages: 9
Registered: March 2009
Junior Member
P.S. I promise not to move the goalposts again! :-)
Re: Speed-up of code [message #67858 is a reply to message #67857] Tue, 25 August 2009 09:12 Go to previous message
philipelson is currently offline  philipelson
Messages: 9
Registered: March 2009
Junior Member
> h = HISTOGRAM(day, REVERSE_INDICES=ri)
> nh = N_ELEMENTS(h)
> sortData = value[ri[nh+1:*]]
> totSortData = [0., TOTAL(sortData, /CUMULATIVE)]
> vec8 = totSortData[ri[1:nh]-nh-1] - $
> totSortData[ri[0:nh-1]-nh-1]
> print,vec8/h ;May need to check for zero values first


Aha! Thank you all very much!

If I was to extend this further, any suggestions on the obtaining the
maximum rather than the average of each bin?

This would indeed provide some nice examples of gathering basic
statistics using the histogram function!

Many Thanks,

Philip
Re: Speed-up of code [message #67860 is a reply to message #67858] Tue, 25 August 2009 08:42 Go to previous message
Craig Markwardt is currently offline  Craig Markwardt
Messages: 1869
Registered: November 1996
Senior Member
On Aug 25, 10:16 am, Philip Elson <philipel...@googlemail.com> wrote:
> Dear All,
>
> I have a question relating to the optimization of some code which
> averages an array based on the values in another array.
> Its much easier to explain in an example:
>
> day    = [ 1, 1, 2, 3, 3, 3, 3]
> value = [ 2, 4, 5, 2, 3, 2, 1]
>
> Which should return, depending on which is easier, either
> avg    = [ 3, 5, 2]
> or
> avg    = [ 3, 3, 5, 2, 2, 2, 2]
>
> This is fairly straightforward using a for loop, but how to do it in
> the IDL way?
>
> You can see two examples of the basic code below:
>
> ; ==================================
> ;                      FIRST EXAMPLE
> ; ==================================
> unique = uniq(day)
> avg = intarr(n_elements(unique))
> FOR i=0, n_elements(unique) -1 DO BEGIN
>   res = WHERE(day EQ day[unique[i]], count)
>   if count GT 0 THEN avg[i] = total(value[res],/DOUBLE) / count
> ENDFOR
> print, avg
>
> ; ==================================
> ;                     SECOND EXAMPLE
> ; ==================================
> h = histogram(day, REVERSE_INDICES=ri)
> avg = h*0
> FOR i=0, n_elements(h)-1 DO BEGIN
>   data_inds = ri[ri[i]:ri[i+1]-1]
>   avg[i] = total(value[data_inds],/DOUBLE) / h[i]
> ENDFOR
> print, avg
>
> At this stage I open the floor; I essentially want to achieve the
> results as above without the need for the for loop.
>
> My assumption is that the HISTOGRAM function will be helpful, but
> having spent quite some time on this I am beginning to think that it
> cannot be done - though I would love to be proved wrong by any
> histogram guru out there.

Those are the techniques I would have tried! Be careful: in your
second example, you don't handle the case where the histogram bin h[i]
is empty. You just need an "if h[i] GT 0" test there.

Craig
Re: Speed-up of code [message #67861 is a reply to message #67860] Tue, 25 August 2009 08:36 Go to previous message
wlandsman is currently offline  wlandsman
Messages: 743
Registered: June 2000
Senior Member
On Aug 25, 10:16 am, Philip Elson <philipel...@googlemail.com> wrote:


> My assumption is that the HISTOGRAM function will be helpful, but
> having spent quite some time on this I am beginning to think that it
> cannot be done - though I would love to be proved wrong by any
> histogram guru out there.

I think you want to read the "drizzling" article on David's Web page
(http://www.dfanning.com/code_tips/drizzling.html ). (Your question
differs only in that in you want to average the values rather than sum
them).

I think "Histogram plus cumulative total" usually had the winning
time:

h = HISTOGRAM(day, REVERSE_INDICES=ri)
nh = N_ELEMENTS(h)
sortData = value[ri[nh+1:*]]
totSortData = [0., TOTAL(sortData, /CUMULATIVE)]
vec8 = totSortData[ri[1:nh]-nh-1] - $
totSortData[ri[0:nh-1]-nh-1]
print,vec8/h ;May need to check for zero values first

--Wayne
Re: Speed-up of code [message #67865 is a reply to message #67861] Tue, 25 August 2009 08:01 Go to previous message
David Fanning is currently offline  David Fanning
Messages: 11724
Registered: August 2001
Senior Member
Philip Elson writes:

> My assumption is that the HISTOGRAM function will be helpful, but
> having spent quite some time on this I am beginning to think that it
> cannot be done - though I would love to be proved wrong by any
> histogram guru out there.

Just a note for any new members of the IDL newsgroup
out there. THIS is the way you ask a question that
gets results! :-)

Cheers,

David

--
David Fanning, Ph.D.
Coyote's Guide to IDL Programming (www.dfanning.com)
Sepore ma de ni thui. ("Perhaps thou speakest truth.")
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Histogram Too Many Array Elements
Next Topic: Re: Digits in IDL

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 13:47:16 PDT 2025

Total time taken to generate the page: 0.00726 seconds