comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Handle big data files
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Handle big data files [message #92221] Sun, 01 November 2015 17:44 Go to next message
lucesmm is currently offline  lucesmm
Messages: 26
Registered: October 2014
Junior Member
Hello
I have been working with big data files (~18MB each)

And everytime I need to handle anything with the data, mostly plotting, takes a long time.
So I was trying to put them all in one file per year (the data is arrange by date)
But just running this takes a few hours.

PRO Save_1year
GET_LUN, outlun
Openw, outlun, 'name.txt'
Year = ''
READ, Year, PROMPT='Enter Year:'
openw, outlun, 'C:data\year'+ Year+'.txt', /get_lun

files=FILE_SEARCH('C:data'+Year+'*', COUNT=nfiles)
Print, files
data = []

FOR i=0,nfiles-1 DO BEGIN
filename=files[i]
fileNumber = STRMID(filename,51,5)
nlines = FILE_LINES(files[i])
thisFile = files[i]

;line=fltarr(17,1)
OpenR, inLun, thisFile, /Get_Lun
SKIP_LUN, inLun, 1, /LINES
while not EOF(inlun) do begin & $
line=make_array(17,1, type=5)
Readf, inLun, line, FORMAT='...'
data=[[data],[line]]

Endwhile
Free_Lun, inLun

ENDFOR
PrintF, outLun, data, FORMAT= '...'
END

I was wondering if there is an easier way to handle this data, I don't know anything about SAVE files, may be this is easier?
Please help

Thank you

Luz Maria
Re: Handle big data files [message #92223 is a reply to message #92221] Mon, 02 November 2015 01:13 Go to previous messageGo to next message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
Hi,
I think that this line is responsible for making things slow:

data=[[data],[line]]

If the array data gets to be lon, then it will take a long time to copy the previous data to a new variable and add one element...

You have two options:
1) only valid for IDL version >8.0. Use a list(). before the for use:
data = list()
then instead of data=[[data],[line]] use:
data->add, line
Then at the end:
PrintF, outLun, data->toArray(), FORMAT= '...'

2) it's more complicated, but general. Create the data array loooong, then fill it up. You could also actually guess it's length:
nData = 0l
FOR i=0,nfiles-1 DO BEGIN
nlines = FILE_LINES(files[i])
nData += nlines-1 ;one line you always disregard
ENDFOR

Now create data so that it is long enough:
myDataStructure = make_array(17,1, type=5)
data = replicate(myDataStructure, nData)

and in the cycle you fill up. You will also need a "fill-up" counter:

fillCounter = 0l
FOR...
...
while...
...
data[fillCounter] = line
fillCounter++
endwhile
...
endfor

I hope it helps...

Cheers,
Helder
Re: Handle big data files [message #92225 is a reply to message #92223] Mon, 02 November 2015 07:02 Go to previous messageGo to next message
lucesmm is currently offline  lucesmm
Messages: 26
Registered: October 2014
Junior Member
Hello Helder
Thank you for your help, I am implementing the first option you suggested. I am having trouble now because one of the columns is calendar format and I am getting this error
PRINTF: Value of Julian date is out of allowed range
:( is there an easy way to solve this, or shoudl I just keep date info in different colums?

HEre is what the data look like and my format:

2014-12-01T00:00:12.905
C(CYI, X ,CMOI02, X ,CDI02,X, CHI02,X, CMI02, X, CSF0)
I am using the same fomat in both reading and writing the data
is this correct?

Now I wanted to ask something else. I have a bunch of columns that I don't really need, is there a way to create a save file just with the date column and the one that I care? (I am guessing this is the easiest version of files to save big data because they are binary)

Thanks again

-Luz Maria
On Monday, November 2, 2015 at 1:13:43 AM UTC-8, Helder wrote:
> Hi,
> I think that this line is responsible for making things slow:
>
> data=[[data],[line]]
>
> If the array data gets to be lon, then it will take a long time to copy the previous data to a new variable and add one element...
>
> You have two options:
> 1) only valid for IDL version >8.0. Use a list(). before the for use:
> data = list()
> then instead of data=[[data],[line]] use:
> data->add, line
> Then at the end:
> PrintF, outLun, data->toArray(), FORMAT= '...'
>
> 2) it's more complicated, but general. Create the data array loooong, then fill it up. You could also actually guess it's length:
> nData = 0l
> FOR i=0,nfiles-1 DO BEGIN
> nlines = FILE_LINES(files[i])
> nData += nlines-1 ;one line you always disregard
> ENDFOR
>
> Now create data so that it is long enough:
> myDataStructure = make_array(17,1, type=5)
> data = replicate(myDataStructure, nData)
>
> and in the cycle you fill up. You will also need a "fill-up" counter:
>
> fillCounter = 0l
> FOR...
> ...
> while...
> ...
> data[fillCounter] = line
> fillCounter++
> endwhile
> ...
> endfor
>
> I hope it helps...
>
> Cheers,
> Helder
Re: Handle big data files [message #92226 is a reply to message #92225] Mon, 02 November 2015 08:26 Go to previous messageGo to next message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
On Monday, November 2, 2015 at 4:02:15 PM UTC+1, luc...@gmail.com wrote:
> Hello Helder
> Thank you for your help, I am implementing the first option you suggested. I am having trouble now because one of the columns is calendar format and I am getting this error
> PRINTF: Value of Julian date is out of allowed range
> :( is there an easy way to solve this, or shoudl I just keep date info in different colums?
>
> HEre is what the data look like and my format:
>
> 2014-12-01T00:00:12.905
> C(CYI, X ,CMOI02, X ,CDI02,X, CHI02,X, CMI02, X, CSF0)
> I am using the same fomat in both reading and writing the data
> is this correct?
>
> Now I wanted to ask something else. I have a bunch of columns that I don't really need, is there a way to create a save file just with the date column and the one that I care? (I am guessing this is the easiest version of files to save big data because they are binary)
>
> Thanks again
>
> -Luz Maria
> On Monday, November 2, 2015 at 1:13:43 AM UTC-8, Helder wrote:
>> Hi,
>> I think that this line is responsible for making things slow:
>>
>> data=[[data],[line]]
>>
>> If the array data gets to be lon, then it will take a long time to copy the previous data to a new variable and add one element...
>>
>> You have two options:
>> 1) only valid for IDL version >8.0. Use a list(). before the for use:
>> data = list()
>> then instead of data=[[data],[line]] use:
>> data->add, line
>> Then at the end:
>> PrintF, outLun, data->toArray(), FORMAT= '...'
>>
>> 2) it's more complicated, but general. Create the data array loooong, then fill it up. You could also actually guess it's length:
>> nData = 0l
>> FOR i=0,nfiles-1 DO BEGIN
>> nlines = FILE_LINES(files[i])
>> nData += nlines-1 ;one line you always disregard
>> ENDFOR
>>
>> Now create data so that it is long enough:
>> myDataStructure = make_array(17,1, type=5)
>> data = replicate(myDataStructure, nData)
>>
>> and in the cycle you fill up. You will also need a "fill-up" counter:
>>
>> fillCounter = 0l
>> FOR...
>> ...
>> while...
>> ...
>> data[fillCounter] = line
>> fillCounter++
>> endwhile
>> ...
>> endfor
>>
>> I hope it helps...
>>
>> Cheers,
>> Helder

Hi,
sorry, but I don't know much about dates and Julian in particular. You should either wait for somebody else to answer or repost with new subject.
I don't understand your second question what you mean by "just with the date column and the one that I care". Could you make an example, be more specific?

cheers,
Helder
Re: Handle big data files [message #92227 is a reply to message #92226] Mon, 02 November 2015 08:36 Go to previous messageGo to next message
lucesmm is currently offline  lucesmm
Messages: 26
Registered: October 2014
Junior Member
Hi again,
I worked aorunf the date issue, and now I have year in one column, month in other column, etc.
Also I re did my data with only the number I care about
newline=[line(1),line(2),line(3),line(4),line(5),line(6),lin e(10),line(19),line(20),line(21)]
So now this is how the data looks like
[year, month, day, hour, min, secs, data1, data2, data3, data4]
2003.00 1 1 0.0 0.0 44.2630 19.7620 0.173730 8.0 0.0

But I have another issue now...

When I do
PrintF, outLun, Newdata->toArray(), FORMAT='(...)' I only get the first column onto my array (this is what the file shows):
2003 2003 2003 2003 2003 2003.000000 2003.000000 2003.000000 2003 2003
2003 2003 2003 2003 2003 2003.000000 2003.000000 2003.000000 2003 2003
2003 2003 2003 2003 2003 2003.000000 2003.000000 2003.000000 2003 2003
2003 2003 2003 2003 2003 2003.000000 2003.000000 2003.000000 2003 2003
2003 2003 2003 2003 2003 2003.000000 2003.000000 2003.000000 2003 2003
2003 2003 2003 2003 2003 2003.000000 2003.000000 2003.000000 2003 2003
2003 2003 2003 2003 2003 2003.000000 2003.000000 2003.000000 2003 2003
2003 2003 2003 2003 2003 2003.000000 2003.000000 2003.000000 2003 2003
2003 2003 2003 2003 2003 2003.000000 2003.000000 2003.000000 2003 2003
Re: Handle big data files [message #92228 is a reply to message #92227] Mon, 02 November 2015 09:03 Go to previous message
lucesmm is currently offline  lucesmm
Messages: 26
Registered: October 2014
Junior Member
I solve it!
I need /TRANSPOSE keyword in the toArray()
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Read data from file
Next Topic: not an IDL question.... scrolling in nedit

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 15:06:40 PDT 2025

Total time taken to generate the page: 0.00600 seconds