Autocorrelation with (LOTS) of missing data. [message #59140] |
Tue, 11 March 2008 10:27  |
jameskuyper
Messages: 79 Registered: October 2007
|
Member |
|
|
I've got a time series 807793 bins long, with missing data in all but
48945 of those bins. Only 7392 of those bins have a non-zero event
count. Those bins have a total count of about 1 million events, which
tells you that events are highly clustered, at least at the time scale
of the bin size (5 minutes).
I want to use autocorrelation analysis to investigate the clustering
of these events on longer time scales. The large amount of missing
data makes such analysis difficult, but the non-missing data is
clustered on time spans of 9 bins or so. Therefore, it seems to me
that with the right algorithm, it should be possible to estimate the
autocorrellation at lags of less than 9 bins. Does anyone know what
the right algorithm would be?
|
|
|
Re: Autocorrelation with (LOTS) of missing data. [message #59236 is a reply to message #59140] |
Fri, 14 March 2008 08:59  |
jameskuyper
Messages: 79 Registered: October 2007
|
Member |
|
|
Brian Larsen wrote:
> On Mar 11, 1:27�pm, jameskuy...@verizon.net wrote:
>> I've got a time series 807793 bins long, with missing data in all but
>> 48945 of those bins. Only 7392 of those bins have a non-zero event
>> count. Those bins have a total count of about 1 million events, which
>> tells you that events are highly clustered, at least at the time scale
>> of the bin size (5 minutes).
>>
>> I want to use autocorrelation analysis to investigate the clustering
>> of these events on longer time scales. The large amount of missing
>> data makes such analysis difficult, but the non-missing data is
>> clustered on time spans of 9 bins or so. Therefore, it seems to me
>> that with the right algorithm, it should be possible to estimate the
>> autocorrellation at lags of less than 9 bins. Does anyone know what
>> the right algorithm would be?
>
> Seems to me that this is an issue, I would use normal techniques on
> subsets of the data. There might be other ways but clusters of
> missing data are kinda like small data sets.
The individual clusters are too small to calculculate meaningful
autocorrelation values; I would need to know an appropriate way to
combine autocorrelation functions calculated from different sets of
varying lengths.
I've found an article <http://sankhya.isical.ac.in/search/
61a2/61a27036.pdf> which describes three estimators that can be used
for this purpose. I was hoping I could use code that had already been
written, but it should be pretty straightforward to write a program to
calculate those estimators.
|
|
|
Re: Autocorrelation with (LOTS) of missing data. [message #59266 is a reply to message #59140] |
Wed, 12 March 2008 12:48  |
Brian Larsen
Messages: 270 Registered: June 2006
|
Senior Member |
|
|
On Mar 11, 1:27 pm, jameskuy...@verizon.net wrote:
> I've got a time series 807793 bins long, with missing data in all but
> 48945 of those bins. Only 7392 of those bins have a non-zero event
> count. Those bins have a total count of about 1 million events, which
> tells you that events are highly clustered, at least at the time scale
> of the bin size (5 minutes).
>
> I want to use autocorrelation analysis to investigate the clustering
> of these events on longer time scales. The large amount of missing
> data makes such analysis difficult, but the non-missing data is
> clustered on time spans of 9 bins or so. Therefore, it seems to me
> that with the right algorithm, it should be possible to estimate the
> autocorrellation at lags of less than 9 bins. Does anyone know what
> the right algorithm would be?
Seems to me that this is an issue, I would use normal techniques on
subsets of the data. There might be other ways but clusters of
missing data are kinda like small data sets.
Cheers,
Brian
------------------------------------------------------------ --------------
Brian Larsen
Boston University
Center for Space Physics
|
|
|