Programming a NULL Array

QUESTION: I often find myself wanting to do things like this with arrays:

   data_I_want_vec =  Init
   FOR ix = 0,(size(data_array))[1]-1 DO $
     IF condition(data_array[ix]) THEN $
        Data_I_want_vec = [data_I_want_vec, data_array[ix]]

I realize there is analogous code for a vectorized version of my question, but I think the for-loop version is clearer.

The questions that arise are things like: What to use for 'Init' ? What do I do to test for a null result? What if there's no data?

I know Matlab has an empty vector ([]) that serves this purpose. Is there something similar in IDL? If not, what is the "standard" way of doing this kind of thing?

ANSWER: Today's answer is written by J.D. Smith, who gave the answer in an IDL newsgroup article on 28 May 2003.

No there is no Matlab "empty vector", but there is a standard way of dealing with this: by vectorizing it. :-)

Despite your awareness of vector techniques, you might be unaware that in your code you've combined two of the absolute least efficient operations in all of IDL: long loops (assuming your data is substantial), and array concatenation.

Loops are slow because each trip around a FOR loop takes a spin through the full IDL interpreter loop, which is necessarily pokey, thanks to all the wonderful conveniences it provides. With concatenation, every time you append an element, you're allocating space for a new larger array, copying the entire original array into it, and then copying the appended element(s). With time and experience, you'll find that vectorized versions are more terse and quickly understandable. It's not the cleanest syntax, but it does become second-nature in time.

Of course, there are plenty of times where "just vectorize it" is not an option, and you'd really like to build dynamic-sized arrays efficiently. You have (at least) two choices:

I for one would love to have true efficient array concatenation in IDL, ala:


These functions from Perl are efficient because Perl does all the ugly work typified in #2 above for you. Each PUSH does not necessarily copy the entire array. Instead, arrays are pre-allocated to be larger than you need, and grown in intelligently-sized chunks, shifted, unshifted, spliced, etc., but only re-allocated and copied if it's absolutely necessary. This would come with a tradeoff though: arrays would need to be smarter, and hence, in more pedestrian usage (e.g. a=b+c) would be slower. Perhaps two kinds of interoperable arrays would be in order: the fancy kind that grow/shrink/prepend/append/pre-allocate automagically, and those that are built purely for speed, and never take up more space than the data they contain.

By the way, if you don't like to type all the repetitive "if n_elements()" bit, you can use a routine to do it for you (albeit even slightly more inefficiently thanks to the routine-call overhead):

   pro push, array, append
     if n_elements(array) eq 0 then array=[append] else array=[array,append]

By the way, there was talk of adding a zero-length-array type a few years ago, but the consensus was that it's probably too late to fit it easily into a 25 year old program.

Web Coyote's Guide to IDL Programming