2D Operations in 3D Data Range

QUESTION: Consider that I have a 3D array with the dimensions labeled X, Y and Z. And consider that I have a list of XY pairs and I want to perform some kind of operation (add them, smooth them, find the median value of them, etc.) on these XY pairs for some data range in Z (say, from zstart to zend). How can find the indices of the points I want so I can perform the operation as an array operation, rather than doing the operation in a loop?

Here is an example. Suppose the operation I want to perform is to total these points. One way to get the result is like this. Suppose the following are my XY pairs in my array, named array.

    x = [3,2,3]    y = [4,1,7] 

And my Z range is from 1 to 3. My goal is to create the sum total, like this.

    sum = Total( array[3,4,1] + array[3,4,2] + array[3,4,3] + $
                array[2,1,1] + array[2,1,2] + array[2,1,3] + $
                array[3,7,1] + array[3,7,2] + array[3,7,3] ) 

I wish to come up with a method to do this in a general way.

I can think of one IDL Way to do this using simple dimensional juggling. Suppose we put the XY values into an npair-by-2 array named pair, like this:

    pair = [[x],[y]]
   npair = N_Elements(pair) / 2 

I would proceed as follows.

    zstart = 1    zend = 3    nz = zend - zstart + 1
   zindices = Rebin( Reform( zstart + Lindgen(nz), 1, nz ), npair, nz )
   xindices = Rebin( pair[*,0], npair, nz )
   yindices = Rebin( pair[*,1], npair, nz )
   sum = Total( array[xindices, yindices, zindices] ) 

The problem with this method is that if the number of XY pairs is large, then generating all of those 2D index arrays is wasteful of memory. Here is another solution with the same problem.

   sum = Total( array[pair[*,0], pair[*,1], zstart:zend] * $
         Rebin( Identity(npair), npair, npair, nz ) ) ) 

But, again, two intermediate npair*nz arrays are created. Is there a method that avoids creating these wasteful arrays?

ANSWER: The answer was provided by Dick Jackson, with a hint from Chris Beaumont, on the IDL Newsgroup.

The first step is to construct an array, so you can see what is going on.

    array = Transpose( IndGen(10, 10, 10), [2,1,0] ) 

Using this array, the value array[3,0,4] is equal to 304, for example. This will allow us to determine if we are subscripting the array properly.

The next step is to reform the array to a different shape temporarily.

    dims = Size( array, /DIMENSIONS )
   array = Reform( array, dims[0]*dims[1], dims[2], /OVERWRITE ) 

Next, we make a pairs index array and using that with the Z data range, we can immediately find the values (placed into the array, subset) that we are looking for. You can, of course, perform any operation you like on these values.

    pairIndicies = pair[*,0] + (pair[*,1] * dims[0])
   subset = array[pairIndicies, zstart:zend]
   array = Reform( array, dims, /OVERWRITE ) 

If we look closely at the subset value array, we see it contains all nine values we are looking for in a 3-by-3 array. And if we print the values, you can see that we have indexed the array properly.

    IDL> Help, subset 
        SUBSET          INT       = Array[3, 3]    IDL> Print, subset 
        341     211     371         342     212     372
        343     213     373 

Dick points out that the Reforms take effectively no time at all to perform, and that only one set of indices are created to subscript the array.

Comment

JD Smith, our usual expert in the IDL Way, had this to say about Dick's solution.

This is an really cool way of doing it, but it's nothing that straightforward Rebin can't handle:

  nxy = dims[0] * dims[1] 
 nz = z1 - z0+1  t = [nz, n_elements(xy)/2]
 indices = rebin(xy[0,*] + dims[0]*xy[1,*], t) + rebin(nxy * (z0+lindgen(nz)), t)

I regard the computation of an index array as a benefit not a liability here and in many cases. The reason? IDL happily computes its own array of indices for you behind the scenes when you use the higher-order indexing function, e.g. a[xyIndices, z0:z1]. Especially problematic are statements like a[*,*,1:3] (as above) which don't just make an index vector, but one which is much larger than needed, wasting time and memory. The advantage of computing the index vector yourself is that you can re-use it without throwing it away and computing it all over again (which is what IDL would do).

What was extremely interesting about this problem was the relative performance of these two methods for very large lists of xy indices and large arrays. For small to moderate lists of xy pairs, the Reform method Dick presented was roughly two times faster for me. However, as the size of the xy list got large, the Rebin method catches up and eventually overtakes the Reform method. Over about 5 million xy pairs by 100 z planes, the Rebin method keeps getting faster compared to the IDL-native calculations of the indices. Probably a memory usage difference, or perhaps related to the use of the thread pool (dual processor system). Still, getting IDL to do all the index computation almost entirely internally, as Dick's method does, seems to be a real benefit at least for some problem sizes.

Version of IDL used to prepare this article: IDL 7.0.3.

Google
 
Web Coyote's Guide to IDL Programming