Dimensional Juggling Tutorial

QUESTION: I've heard of J.D. Smith performing all kinds of array manipulation magic with Rebin and Reform. Do you know anything about it?

ANSWER: J.D. has kindly offered to let me post his Dimensional Juggling Tutorial here. This information was first made available to the IDL newsgroup on April 1st, 2001. We all thought it was a joke until we tried it and saw that it worked exactly as he described it. Go figure. Here is his tutorial. You might also be interested in J.D.'s Array Concatenation Tutorial.

I've sensed lots of confusion recently on how exactly Rebin and Reform work together to allow dimensional manipulations. It often looks cryptic, but it's actually pretty simple. The basic idea is as follows:

To "vectorize" some operations, you often need to "inflate" a vector to something larger than its former self.

Example:

   IDL> a=indgen(2,3)
   IDL> print,a
          0       1
          2       3
          4       5
   IDL> b=randomu(sd,3)
   IDL> print,b
        0.662640     0.991186     0.479801

Suppose we'd like to multiply each column of "a" by "b". If we had a new array, "b2", the same size as "a", but with a copy of "b" in all its columns, we could perform the multiplication trivially. How do we get such a beast? Rebin is the answer. First point to memorize: Rebin only will inflate numeric-type arrays to integer multiples of their present dimensions (each dimension can have a different integer). That is, something with dimensions [2,3] could become [2,6] or [4,3] or [4,6], but not [3,2].

We can also add new trailing dimensions with Rebin, as long as all dimensions before it follow this rule. For example, [2,3] could become [2,3,5] without trouble, but not [3,2,5]. (You can think of this by imagining a vector/array has implicity as many trailing shallow dimensions as you want (see below). IDL often truncates these, but also auto-creates them as necessary, as in this case!)

Back to the task at hand. Our "b" vector only has a single dimension, 3. Now consider:

   IDL> print,rebin(b,3,2)
        0.662640     0.991186     0.479801
        0.662640     0.991186     0.479801

Aha, our first dimension has remained the same, but now we have two identical rows. Fine, you say, but we wanted identical columns. Well, as we've seen, we can't just say:

   IDL> print,rebin(b,2,3)
   % REBIN: Result dimensions must be integer factor of original dimensions

The problem here is we're trying to change the single dimension "3" (the 1st of "b") to dimension "2". Not gonna happen. How can we proceed? If only b had dimensions [1,3] : [2,3] certainly follows the "integer multiple" rules then. That leading unit dimension makes this what I call a "column vector", a terminology some people object to, but which is descriptive nonetheless. I like to call dimensions of size 1 "shallow". For example, I say, "b has a shallow leading dimension."

We can add shallow dimensions with, you guessed it, Reform:

   IDL> print, reform(b,1,3)
        0.662640
        0.991186
        0.479801

Aha, printing like a column now. Now we put these two things together:

   IDL> print, rebin(reform(b,1,3),2,3)
        0.662640     0.662640
        0.991186     0.991186
        0.479801     0.479801

Two columns side by side. Just what we needed! And of course our multiplication is trivial now:

   IDL>  print, rebin(reform(b,1,3),2,3)*a
         0.00000     0.662640
         1.98237      2.97356
         1.91920      2.39901

This isn't the only technique, but I think it's the most straightforward. Another common approach is to make an array of indices (using, say lindgen) of the correct target size, and then use some arithmetic (% and /) to force the indices to be correct, then use this as a subscript into the original array. It gets complicated fast for higher dimensions, but it's the only possible method for arrays of non-numeric types (like strings, structures, pointers, or objects).

N.B. There are other ways to make "column vectors". Examples:

   transpose(b)
   rotate(b,1)
   1#b
   b##1

You'll often see these in place of reform(b,1,3), but don't be confused, they do exactly the same thing: prepend a leading shallow dimension. They are nice because they relieve you from having to know they length of b (3 here). However, they only work for creating column vectors, though: i.e. up to 2D data, and the latter two only work for numeric data.

What if you have 3D data, and you'd like to inflate things over the third dimension? Well, you guessed it, we'll need another shallow dimension:

   IDL> print, size(reform(b,1,1,3),/DIMENSIONS)
              1           1           3

And we can thread this "down" through the depth of a data cube, just as easily:

   IDL> print, rebin(reform(b,1,1,3),3,3,3)
        0.662640     0.662640     0.662640
        0.662640     0.662640     0.662640
        0.662640     0.662640     0.662640

        0.991186     0.991186     0.991186
        0.991186     0.991186     0.991186
        0.991186     0.991186     0.991186

        0.479801     0.479801     0.479801
        0.479801     0.479801     0.479801
        0.479801     0.479801     0.479801

Imagine those three groups as "planes" in a data cube, and you can see our vector is now filling downwards.

You should also easily see how to thread across all rows on every plane:

   IDL> print, rebin(b,3,3,3)

or columns on every plane

   IDL> print, rebin(reform(b,1,3),3,3,3)

With these tricks you should be able to perform all kinds of complex vector operations.

And now we get to an advanced application, which isn't yet easy. Could we write a generic routine to do this for any given dimension: i.e. could we say "replicate this vector over dimension #4". Yes, but not as easily as you might hope.

The key is the ability to specify dimensions all together, as a vector, instead of as individual arguments. E.g.

   IDL> print, reform(b,[1,1,3]) ; Notice the [ and ]

Now we can custom make the second argument to have as many leading shallow dimensions as are needed. To complete the operation, you'd need Rebin to have the same functionality. Alas, it does not. Why? No reason. Typical RSI incomplete implementation. I know Craig agrees with me here.

So, RSI, if you're listening, why not allow Rebin to interpret it's first argument as a vector also? Or use a keyword DIMENSION ala Make_Array? (And while I'm at it -- nothing like two different mechanisms for exactly the same thing, there).

Update: RSI apparently was listening. As of IDL v5.5, a Reform-style dimension-as-array argument is accepted by Rebin, among many other routines.

Perhaps I'll also write a Histogram Tutorial, revealing all my tricks. Then I could pass the torch to Pavel...

Case Study 1

QUESTION: I have the following problem. I want to get a subset of data out of a 3D array. This is how I do it now, but I want to eliminate the loop:

   num_lon = 360               ; longitude dimension
   num_lat = 181               ; latitude dimension
   num_mod_levs = 60           ; number of model levels
   temp3D  = FltArr(num_lon, num_lat, num_mod_levs)
   OpenR, 1, case_data.ecmwf_path+case_data.prefix+time+'_T.dump'
   ReadU, 1, temp3D
   Close, 1

   num_records = 6500
   T = FltArr(num_mod_levs, num_records)

   ; latindex and lonindex are of dimension num_records
   ; and contain latitude and longitude
   ; indices. I.e. temp3D[lonindex[0],latindex[0],*] picks
   ; out one column in my global map and
   ; gives me the temperature for that colum. I want to get
   ; those temperature columns for
   ; 6500 latindex/lonindex combinations, and save
   ; them in the array T.

   FOR i=0,num_records-1 DO BEGIN
     T[*,i] = Reform(temp3D[lonindex[i], latindex[i], *], 60)
   ENDFOR

It seems to me I should be able to do something like this (with maybe a Rotate or something thrown in) and avoid the loop.

   T = temp3D[lonindex, latindex, *]

But this doesn't seem to work. Would it work if num_mod_levs were the leading dimension?

ANSWER: First, an answer by Greg Michael that gets you thinking in the correct manner.

If I understand what you've written, you have a mask of 6500 points that you're interested in out of a grid of 360x181 (=65160), described by your two vectors latindex and lonindex.

I would first convert this to a 2-d mask:
   mask = BytArr(360,181)
   mask[lonindex,latindex] = 1
Then use this mask to extract the columns by finding the indices of the elements of the mask which you need:
   indices = Where(mask EQ 1)
And rearrange the cube into 2D, so you can use the indices to pick out the columns:
   temp2d = Reform(temp3d, 360*181, 60)
And then, finally, extract the columns:
   result = temp2d[indices, *]
The result should be an array of (6500,60).

And then, finally, a more concise method by JD Smith, without comment, using the principles of array juggling.

   s = Size(temp3D, /DIMENSIONS)
   s1 = Size(T, /DIMENSIONS)
   T = temp3d[Rebin(Transpose(lon+s[0]*lat), s1) + Rebin(Lindgen(60)*s[0]*s[1], s1)]