comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » the last line of a large file
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
the last line of a large file [message #55246] Thu, 09 August 2007 12:49 Go to next message
queiny is currently offline  queiny
Messages: 15
Registered: November 2005
Junior Member
Hi,

I am processing a large file with data in the same format till the
last line. In the last line, it states how many records are included
in this file.

So the structure of my program is:

while not eof(in_unit) do begin
readf, in_unit, input_line
if( input_line ne 'last line) then begin
.....
else
....
endif
endwhile

Do I have to use 'if/then' to test whether every input_line is the
last line of the file? Since there are many data records in the file,
repeat calls to 'if/then' can be time consuming. But if I don't do the
test, the program will be halted when it read in the last line.

A Easy way I can think of is to delete the last line, but sometime we
are not supposed to change the input files.
Re: the last line of a large file [message #55311 is a reply to message #55246] Tue, 14 August 2007 10:42 Go to previous message
David Fanning is currently offline  David Fanning
Messages: 11724
Registered: August 2001
Senior Member
queiny writes:

> Thank you all for the valuable suggestions.
>
> I just found out that 'file_lines' and 'do loop' not always working.
>
> IDL> Loop limit expression too large for loop variable type.
> <LONG64 ( 3548257)>.

As long as you are learning new things today, look up
COMPILE_OPT defint32. You will be glad you did. :-)

Cheers,

David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
Re: the last line of a large file [message #55312 is a reply to message #55246] Tue, 14 August 2007 10:22 Go to previous message
queiny is currently offline  queiny
Messages: 15
Registered: November 2005
Junior Member
Thank you all for the valuable suggestions.

I just found out that 'file_lines' and 'do loop' not always working.

IDL> Loop limit expression too large for loop variable type.
<LONG64 ( 3548257)>.

And thank Connor for suggestions about trunk data to smaller pieces
and only worry about the last line problem one. I think that would be
what I will do.
Re: the last line of a large file [message #55326 is a reply to message #55246] Fri, 10 August 2007 14:15 Go to previous message
little davey is currently offline  little davey
Messages: 5
Registered: June 2007
Junior Member
On Aug 9, 2:49 pm, queiny <quein...@yahoo.com> wrote:
> Hi,
>
> I am processing a large file with data in the same format till the
> last line. In the last line, it states how many records are included
> in this file.
>
> So the structure of my program is:
>
> while not eof(in_unit) do begin
> readf, in_unit, input_line
> if( input_line ne 'last line) then begin
> .....
> else
> ....
> endif
> endwhile
>
> Do I have to use 'if/then' to test whether every input_line is the
> last line of the file? Since there are many data records in the file,
> repeat calls to 'if/then' can be time consuming. But if I don't do the
> test, the program will be halted when it read in the last line.
>
> A Easy way I can think of is to delete the last line, but sometime we
> are not supposed to change the input files.

Would it be faster to use ON_IOERROR? One could process the "normal"
cases of reading the input lines. When there is an IO_ERROR, you trap
it and close the file. Naturally if there is a real IO_ERROR, you'd
have to test for a "real" input error, or the end of the file. But
that would mean IF statements executed only if there were IO_ERRORS.

-- Little Davey --
Re: the last line of a large file [message #55327 is a reply to message #55246] Fri, 10 August 2007 10:39 Go to previous message
James Kuyper is currently offline  James Kuyper
Messages: 425
Registered: March 2000
Senior Member
On Aug 10, 11:00 am, Carsten Lechte <c...@toppoint.de> wrote:
> Conor wrote:
>> lol! Really! What in the world is the point of putting the number of
>> lines at the end of the file?
>
> One legitimate reason would be that sometimes you only know how much
> data you have until after you have processed it all, especially if the
> data sets are so large that you only ever have a small subset in RAM.

That's easily worked around. Write a dummy length at the beginning of
the file, process data until you reach the end of the file, then go
back to the beginning of the file are overwrite the dummy length with
the correct one. For ASCII format files, that requires being very
careful to make sure that the dummy length is a sufficiently long
string, and that the final length is padded with spaces (or leading
0s) to have exactly the same size.

There are obvious limitations to this technique (i.e. the output has
to be seekable); but not seem applicable in this context.
Re: the last line of a large file [message #55330 is a reply to message #55246] Fri, 10 August 2007 08:51 Go to previous message
Conor is currently offline  Conor
Messages: 138
Registered: February 2007
Senior Member
On Aug 10, 11:00 am, Carsten Lechte <c...@toppoint.de> wrote:
> Conor wrote:
>> lol! Really! What in the world is the point of putting the number of
>> lines at the end of the file?
>
> One legitimate reason would be that sometimes you only know how much
> data you have until after you have processed it all, especially if the
> data sets are so large that you only ever have a small subset in RAM.
>
> A legitimate example are zip archives, where the table of contents is
> written to the end of the file, because the the compressed sizes of
> the archive members cannot be known in advance, and it would double
> the running time to determine the compressed size beforehand, it would
> furthermore use twice the disk space to re-write the file with the
> contents in front, it would be impossible to keep the whole archive
> in RAM before writing it, and finally, one may not be able leave space
> for the contents table at the beginning of the file, to be filled in
> later, because one would have to know how long the table will be
> beforehand...
>
> Of course, this does not mean that the original poster's data has a
> legitimate reason for being organised like this.
>
> For the original poster's problem, one idea is to get the file size
> in bytes, skip to position file_size-1000, read that small chunk and
> parse it for the desired metadata. This might even be faster than
> actually counting the lines with FILE_LINES, but it is probably only
> worth it if the metadata contains more useful information that just
> the number of lines in the file.
>
> chl

As an actual suggestion, if the file_lines dosesn't take too long you
can always just count the number of lines and break down the file into
manageable chunks. Imagine for a moment that the following file has
1,000,000 lines and your computer can only make arrays with 10,000
rows at a time (which you would know in advanced). You might do
something like this:

max_size = 10000
num_rows = file_lines(file) ; 1,000,000
num_parts = num_rows/max_size ; 10 parts
num_cols = 10

data = fltarr(max_size,numcols)

for i=0,num_parts-1 do begin
readf,lun,data
; do something with data, then read the next chunk
endfor

There's a couple other things you can do. For starters, if you don't
already know it you can calucluate the number of columns in the file
by reading in the first line, using strsplit, and then rewinding the
file to the beginning. Also, I haven't included it in the above code,
but you'll have to keep track of the last line still. In this case
what you would probably do is calculate how many lines you want to
read in the last chunk of data, and worry about it then. For
instance, imagine the same example but now the line has 75,000 lines
and you don't want to read the last one:

max_size = 10000
num_rows = file_lines(file) ; 75,000
num_parts = ceil(num_rows/max_size) ; 8 parts
num_cols = 10
last_read = max_size - (num_parts*max_size - num_rows) - 1 ; 4999

data = fltarr(max_size,numcols)

for i=0,num_parts-1 do begin
if i eq num_parts-1 then begin
readf,lun,data
endif else begin
data = fltarr(last_read,numcols)
readf,lun,data
endelse
; do something with data, then read the next chunk
endfor

Not exactly elegant, but it should work for your problem.
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: IDL resources
Next Topic: IDL resources on my Web page

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 18:07:10 PDT 2025

Total time taken to generate the page: 0.00501 seconds