comp.lang.idl-pvwave archive: archive

Home » Public Forums » archive » multiple delimiters

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

multiple delimiters [message #21677]

Wed, 13 September 2000 20:58

nrh
Messages: 19
Registered: September 2000

Junior Member

Has anyone nutted out a way to read ASCII files with multiple
delimiters? Our current solution involves some messy string operations
that are restricted to 5.3, and I/O operations we would like to avoid.
Am I asking the impossible?

Sent via Deja.com http://www.deja.com/
Before you buy.

Report message to a moderator

Re: multiple delimiters [message #21756 is a reply to message #21677]

Fri, 15 September 2000 00:00

Martin Schultz
Messages: 515
Registered: August 1997

Senior Member

nrh@imag.wsahs.nsw.gov.au wrote:
>
> Well, its actually a whole heap of strings, most fields separated by
> blanks, and some fields, where there is more than one word, are
> encased by quotation marks. The fields inside the quotes have spaces as
> well, but we want them to be all one field, if you know what I mean.
> Right now we pull out the strings within the quotes, replace all the
> spaces with '_', put it back in, remove the quotes and then we can use
> the strsplit function to remove the extra white spaces created by
> replacing the quotes.
> so, in a nutshell, we have:
> ....PROJECTION-R OTYP DP EXTN img PROC "CM CARDIAC MIBI" .....
> and make it to be(through many painful string ops - it is a huge
> database file)
> PROJECTION-R OTYP DP EXTN img PROC CM_CARDIAC_MIBI ........
> and then we have to arrange it in a struct as every second field is the
> info we actually need. Odd fields are the descriptors.
> Clear as mud?

Your life could be a LOT easier if you had a formatted output, i.e. if
all columns are aligned (I am sure the database that you are using
should be able to produce this). Then you could simply use a formatted
read statement
readf, lun, proj, otyp, dp, extn, img, proc, label, ...,
format='(i4,i6,...,A20,...)'
or (even more elegantly) read into a structure
temp = { proj:1.0, otyp:0L, ..., label:'', ... }
readf, lun, temp, format='...'

It probably boils down to workflow optimization: If you have to do it
once, don't bother and just use a simple code. If you have to do it
several times - always with the same data set - convert the data set
once into a better format (something binary to speed up reading,
optimally a scientific data format for they allow better usability and
are self-describing). If you need to do this several times with
changing data sets, make sure the original data set producer change
their format ;-)

Cheers,
Martin

--
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ [[[[[[[
[[ Dr. Martin Schultz Max-Planck-Institut fuer Meteorologie [[
[[ Bundesstr. 55, 20146 Hamburg [[
[[ phone: +49 40 41173-308 [[
[[ fax: +49 40 41173-298 [[
[[ martin.schultz@dkrz.de [[
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ [[[[[[[

Report message to a moderator

Re: multiple delimiters [message #21758 is a reply to message #21677]

Thu, 14 September 2000 21:58

Chris Rennie
Messages: 6
Registered: October 1999

Junior Member

nrh@imag.wsahs.nsw.gov.au wrote:
>
> Well, its actually a whole heap of strings, most fields separated by
> blanks, and some fields, where there is more than one word, are
> encased by quotation marks. The fields inside the quotes have spaces as
> well, but we want them to be all one field, if you know what I mean.
> Right now we pull out the strings within the quotes, replace all the
> spaces with '_', put it back in, remove the quotes and then we can use
> the strsplit function to remove the extra white spaces created by
> replacing the quotes.
> so, in a nutshell, we have:
> ....PROJECTION-R OTYP DP EXTN img PROC "CM CARDIAC MIBI" .....
> and make it to be(through many painful string ops - it is a huge
> database file)
> PROJECTION-R OTYP DP EXTN img PROC CM_CARDIAC_MIBI ........
> and then we have to arrange it in a struct as every second field is the
> info we actually need. Odd fields are the descriptors.
> Clear as mud?

This is my suggestion:

PRO ParseLine, line, structure
; This routine first separates the line into 'coarse' chunks, based
on
; using quoation marks as delimiters. This intermediate result is
; a set of strings. Every 0th, 2nd, 4th,... string is then
separated
; further by using spaces as delimiters, and every 1st, 3rd, 5th....
; string has its spaces translated to underscores.

CoarseChunks=str_sep(line,'"')
if (n_elements(CoarseChunks) mod 2) ne 1 then stop, 'ParseLine
error'

; Process 0th coarse chunk
FineChunks=str_sep(CoarseChunks[0],' ')
structure.field0=FineChunks[0]
structure.field1=FineChunks[1]
structure.field2=FineChunks[2]
structure.field3=FineChunks[3]
structure.field4=FineChunks[4]
structure.field5=FineChunks[5]

; Process 1st coarse chunk
Bytearray=byte(CoarseChunks[1])
spaces=where(ByteArray eq 32, NSpaces)
if NSpaces gt 0 then ByteArray[spaces]=byte('_')
structure.field6=string(ByteArray)

; Process 2nd coarse chunk
CoarseChunks[2]=strtrim(CoarseChunks[2],2)
FineChunks=str_sep(CoarseChunks[2],' ')
structure.field7=FineChunks[0]
structure.field8=FineChunks[1]
end ; ParseLine

;;;;;;;;;;;;;;;;;;;;;;;;; main ;;;;;;;;;;;;;;;;;;;;;;;;
TestLine='PROJECTION-R OTYP DP EXTN img PROC "CM CARDIAC MIBI" etc etc'
TestStruct={field0:'', field1:'', field2:'', field3:'', field4:'', $
field5:'', field6:'', field7:'', field8:''}
ParseLine, TestLine, TestStruct

print, TestStruct
end

This is the result:

IDL> help, /struct, TestStruct
** Structure <8192ab4>, 9 tags, length=72, refs=1:
FIELD0 STRING 'PROJECTION-R'
FIELD1 STRING 'OTYP'
FIELD2 STRING 'DP'
FIELD3 STRING 'EXTN'
FIELD4 STRING 'img'
FIELD5 STRING 'PROC'
FIELD6 STRING 'CM_CARDIAC_MIBI'
FIELD7 STRING 'etc'
FIELD8 STRING 'etc'

--
Chris Rennie rennie@physics.usyd.edu.au
Rm 466, School of Physics
Building A29 Tel: +61 (2) 9351 5799
University of Sydney
NSW 2006, Australia Fax: +61 (2) 9351 7726

Report message to a moderator

Re: multiple delimiters [message #21762 is a reply to message #21677]

Thu, 14 September 2000 00:00

nrh
Messages: 19
Registered: September 2000

Junior Member

Well, its actually a whole heap of strings, most fields separated by
blanks, and some fields, where there is more than one word, are
encased by quotation marks. The fields inside the quotes have spaces as
well, but we want them to be all one field, if you know what I mean.
Right now we pull out the strings within the quotes, replace all the
spaces with '_', put it back in, remove the quotes and then we can use
the strsplit function to remove the extra white spaces created by
replacing the quotes.
so, in a nutshell, we have:
....PROJECTION-R OTYP DP EXTN img PROC "CM CARDIAC MIBI" .....
and make it to be(through many painful string ops - it is a huge
database file)
PROJECTION-R OTYP DP EXTN img PROC CM_CARDIAC_MIBI ........
and then we have to arrange it in a struct as every second field is the
info we actually need. Odd fields are the descriptors.
Clear as mud?

> Did you have data as
>
> 1 ,3 <TAB> 5<SPACE>6
> 1 ,3 5 6
>
> please give me an example.
>
> Reimar
>
> --
> R.Bauer
>

Sent via Deja.com http://www.deja.com/
Before you buy.

Report message to a moderator

Re: multiple delimiters [message #21769 is a reply to message #21677]

Thu, 14 September 2000 00:00

R.Bauer
Messages: 1424
Registered: November 1998

Senior Member

Report message to a moderator

Re: multiple delimiters [message #21772 is a reply to message #21677]

Thu, 14 September 2000 00:00

meron
Messages: 51
Registered: July 1995

Member

In article <39C09FE1.C9E0B54@dkrz.de>, Martin Schultz <martin.schultz@dkrz.de> writes:
> nrh@imag.wsahs.nsw.gov.au wrote:
>>
>> Has anyone nutted out a way to read ASCII files with multiple
>> delimiters? Our current solution involves some messy string operations
>> that are restricted to 5.3, and I/O operations we would like to avoid.
>> Am I asking the impossible?
>>
>> Sent via Deja.com http://www.deja.com/
>> Before you buy.
>
> 1. Read the file line by line as strings
> 2. use my StrRepl function to replace all delimiters with one value
> e.g.
> line = StrRepl(line, ';',' ')
> line = StrRepl(line, ',',' ')
> line = StrRepl(line, ':',' ')
> 3. Use ReadS, Str_Sep or StrSplit (5.3) to extract the numbers.
> Caution: With Str_Sep or StrSplit you should always add a
> StrTrim(StrCompress(line),2) before
>
> You can find StrRepl at
> http://www.mpimet.mpg.de/~schultz.martin/idl/html/libmartin_ schultz.html
>
Or, you can use my STRPARSE, which takes any number of delimiters.

Mati Meron | "When you argue with a fool,
meron@cars.uchicago.edu | chances are he is doing just the same"

Report message to a moderator

Re: multiple delimiters [message #21773 is a reply to message #21677]

Thu, 14 September 2000 00:00

Martin Schultz
Messages: 515
Registered: August 1997

Senior Member

... or you use some Unix tool like sed or awk to get rid of the
multiple delimiters before you even touch IDL. This will certainly
speed things up if you have a large file(s) and intend to access it
frequently.

Martin

--
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ [[[[[[[
[[ Dr. Martin Schultz Max-Planck-Institut fuer Meteorologie [[
[[ Bundesstr. 55, 20146 Hamburg [[
[[ phone: +49 40 41173-308 [[
[[ fax: +49 40 41173-298 [[
[[ martin.schultz@dkrz.de [[
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ [[[[[[[

Report message to a moderator

Re: multiple delimiters [message #21774 is a reply to message #21677]

Thu, 14 September 2000 00:00

Martin Schultz
Messages: 515
Registered: August 1997

Senior Member

nrh@imag.wsahs.nsw.gov.au wrote:
>
> Has anyone nutted out a way to read ASCII files with multiple
> delimiters? Our current solution involves some messy string operations
> that are restricted to 5.3, and I/O operations we would like to avoid.
> Am I asking the impossible?
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.

1. Read the file line by line as strings
2. use my StrRepl function to replace all delimiters with one value
e.g.
line = StrRepl(line, ';',' ')
line = StrRepl(line, ',',' ')
line = StrRepl(line, ':',' ')
3. Use ReadS, Str_Sep or StrSplit (5.3) to extract the numbers.
Caution: With Str_Sep or StrSplit you should always add a
StrTrim(StrCompress(line),2) before

You can find StrRepl at
http://www.mpimet.mpg.de/~schultz.martin/idl/html/libmartin_ schultz.html

Cheers,
Martin

--
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ [[[[[[[
[[ Dr. Martin Schultz Max-Planck-Institut fuer Meteorologie [[
[[ Bundesstr. 55, 20146 Hamburg [[
[[ phone: +49 40 41173-308 [[
[[ fax: +49 40 41173-298 [[
[[ martin.schultz@dkrz.de [[
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ [[[[[[[

Report message to a moderator

Previous Topic:	flow charts and IDL
Next Topic:	Maximum ROI within an ROI

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Fri Nov 28 08:13:08 PST 2025

Total time taken to generate the page: 0.01210 seconds