comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Hist_nd 3D +1 gridding / binning data
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Return to the default flat view Create a new topic Submit Reply
Hist_nd 3D +1 gridding / binning data [message #94249] Mon, 06 March 2017 14:42 Go to previous message
clement.feller@obspm. is currently offline  clement.feller@obspm.
Messages: 5
Registered: March 2017
Junior Member
Hello everyone,

To go straight to the matter, I had a problem and found my solution. However I am writing to you for comments and advices.

Looking through the posts, I have read severals refering to the use of hist_nd or to that of reverse_indices, I also have found solutions using rebin/interpol for (longitude, latitude, temperature) problems, or refering to ncp and cic (from the astron library) or grid3.
But no express reference on the concurrent binning of 3 independants variables and 1 associated quantity. Do correct if I'm mistaken on that point, but in the meantime here's what I came up with.

From images, I have assembled a large table (4 columns of single-precision floats and about 160 millions lines) - 3 independants variables and 1 quantity - which I will later use to perform the inversion of a radiative transfer model through MPFIT.
Given hardware limitations, I sought to bin/resample/grid the data. Hence the following lines:

density = hist_nd([col1, col2, col3, col4], nbins=50, $
reverse_indices=ri) ;size(col1, /dimension) = [1,P]
index = where(density ne 0, cts) ;finding non-empty bins

newcol1 = fltarr(cts) ; a better way to allocate memory than density*0.
newcol2 = newcol1
newcol3 = newcol1
newcol4 = newcol1

for ijk=0L, (cts-1L) do begin
init = ri[index[ijk]]
stop = ri[index[ijk]+1L]-1L
newcol1[ijk] = mean(col1[ri[init:stop]])
newcol2[ijk] = mean(col2[ri[init:stop]])
newcol3[ijk] = mean(col3[ri[init:stop]])
newcol4[ijk] = mean(col4[ri[init:stop]])
endfor
..... save data and move on to the next task

It takes about 15-20 secs to do the hist_nd task using 4 threads on a Intel Core i5-3230M CPU (3rd gen) @ 2.60GHz, which is pretty awesome.
But the averaging takes on a few hours, burning through all the cpu reserves.

Since my initial data are images, I binned them down to a 512x512 size (a fourfold reduction) and ended up with a table of 8.5 million lines instead.
In this case, hist_nd takes less than a second and the averaging takes about 15 minutes.

Do you have any advice, or have you ever tried to do that kind of task in a different way ?

I'll be looking forward to read your posts.
/C.

PS: For the python-enthousiasts out there which don't know it already, I found out that such a task can be achieved with the scipy.binned_statistics_dd method.


Disclaimer: What's pushing me to post and explicit this solution is that I was slow on the uptake from JD Smith's histogram tutorial and the documentation of hist_2d and hist_nd, that the reverse_indices vector is to be applied on *each* and *all* variable to get your data properly binned.
Yes, in the end, it's glaringly obvious but to quote JD, “one needs to learn to flex his/her histogram muscle.”
[Message index]
 
Read Message
Read Message
Read Message
Previous Topic: Undefined procedure/function: 'REVERSE'
Next Topic: Re: Remote plotting

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 10:53:45 PDT 2025

Total time taken to generate the page: 0.00451 seconds