# Averaging Time Series Components

**QUESTION:** I have been trying to average components of a time series when they
meet certain criteria. For example, I have values in which I use a **Where **statement to specify the critera:

criteria = Where(time LT 1000)I get an array returned which looks like this:

data = [0,1,2,3,4,6,8,9,10]

Now, given this array, I would like to specify individual arrays for any “block” of subsequent numbers with size greater than three. In this example, the result would look like this:

b1 = [0,1,2,3,4] b2 = [8,9,10]

I tried using the complement keyword in the “Where” statement to put the null values into an array and try to subscript my way through the answer. But that turned out to be quite messy, especially since I am using multiple files which all have a different “patterns of three or more”.

I've messed around with FOR loops and IF statements, but again I have to change those for each individual file. It would be nice to know of a technique that could find multiple sequences of consecutive integer series in an array and make individual arrays out of those sequences. This seems like a simple problem, but I haven't been able to figure it out. Can you help?

** ANSWER:** As usual, our guide for this journey is JD Smith.

The solution to this problem is actually not as simple as it seems. Here is a method that uses
**LABEL_REGION **and **HISTOGRAM** for the solution:

data = [0,1,2,3,4,6,8,9,10] labels = Label_Region([0L, (Shift(data,-1) - data) EQ 1, 0L]) h = Histogram(labels[1:N_Elements(labels)-2], MIN=1, REVERSE_INDICES=ri) wh = Where(h GE 3-1, count) p = PtrArr(count) FOR j=0,count-1 DO p[j] = Ptr_New(data[ri[ri[wh[j]]]:ri[ri[wh[j]+1]-1]+1])

To see the answer:

IDL> FOR j=0,count-1 DO Print, *p[j] 0 1 2 3 4 8 9 10

There are a couple of tricky bits in this solution that are worth pointing out.
First, note that we have to bracket the **LABEL_REGION **array with zeros. This is
because **LABEL_REGION **has the annoying habit of labelling the end points as “no region”.
If we fail to do this, there is a good chance we will not have the correct number of points in
the end sequences. For example, compare this:

IDL> Print, Label_Region( (Shift(data,-1) - data) EQ 1 ) 0 1 1 1 0 0 2 2 0

To this:

IDL> Print, Label_Region( [0, (Shift(data,-1) - data), 0] EQ 1 ) 0 1 1 1 1 0 0 2 2 0 0

And be very careful of the usage of **REVERSE_INDICES **here. Instead of the normal
semantic:

data[ri[ri[j]:ri[j+1]-1]]

which is like *data[index_vector]*, I instead use an explicit range:

data[ri[ri[j]]:ri[ri[j+1]-1]+1]which is like

*data[low:high]*.

Stare at this until you see the difference. Here I put the two descriptions together so you can see the difference more easily.

data[ri[ri[j]:ri[j+1]-1]] data[ri[ri[j]]:ri[ri[j+1]-1]+1]

I did this because I knew that the elements in the
labeled regions had consecutive indices, and because we are
always missing one member of the “consecutive run” (due to the
**SHIFT **call). The latter is also the reason for the final +1.
In general the indices in a given **HISTOGRAM **bucket
are not adjacent in the original array, so while either of these forms
would be correct, the second is more convenient for our purposes.

Another way to say this is that it's just the difference between *data[[1,2,3,4,5]]*
and *data[1:5]*. No big
deal. If you want to add “6”, it's easier to use the second form than the
first form.

Be sure to free up your pointers when you are finished with them.

IDL> Ptr_Free, p

Copyright © 2006 David W. Fanning

Last Updated 23 May 2006