Null terminated strings [message #28677] |
Mon, 07 January 2002 10:11  |
James Kuyper
Messages: 425 Registered: March 2000
|
Senior Member |
|
|
I'm reading a string-valued file attribute from an HDF file that was
created using C code. As seems quite reasonable for C programs, the
attribute was written with a length that includes a terminating null
character. When I read it in using IDL, that null character got included
as well. This causes a number of bizarre effects, most notably:
IDL> print,date
2001-10-07
IDL> print,date+'T12:00:00'
2001-10-07'T12:00:0
I can handle this particular case by using strmid(date,0,10), but in
general a file attribute might contain multiple null-delimited strings,
of unknown length. Is there an efficient way of converting such a string
into an IDL string array?
|
|
|
|
|
Re: Null terminated strings [message #28741 is a reply to message #28677] |
Wed, 09 January 2002 02:26   |
Malcolm Walters
Messages: 8 Registered: October 2001
|
Junior Member |
|
|
"James Kuyper" <kuyper@gscmail.gsfc.nasa.gov> wrote in message
news:3C3B5D8D.9050806@gscmail.gsfc.nasa.gov...
>
> You're relying there on the fact that btest is a two-dimensional array;
> string() converts each row into a seperate string. In the case I'm
> worrying about, I would have a single IDL string, containing null
> characters at arbitrary positions. Try the following:
>
> btest = [84B,104B,105B,115B,0B,105B,115B,0B,97B,0B,106B,101B,115B,11 6B]
> stest = string(btest)
>
> All you get in stest is the 'This'.
>
Personally I would do this in two stages,
Firstly if you replace each 0B with a 10B using
btest[where(btest eq 0B)]=10B
you can now print btest then you get the line breaks (This can be used in
message boxes etc.)
Or if you want to go one further then split them using
stest = strsplit(string(btest),string(10B),/extract)
print, stest
This is a jest
help,stest
<Expression> STRING = Array[4]
I hope this is of some help.
Malcolm Walters
|
|
|
|
Re: Null terminated strings [message #28751 is a reply to message #28677] |
Tue, 08 January 2002 12:58   |
James Kuyper
Messages: 425 Registered: March 2000
|
Senior Member |
|
|
William Thompson wrote:
> James Kuyper <kuyper@gscmail.gsfc.nasa.gov> writes:
...
>> I'm still wonder how to best convert a null-delimited list of strings
>> into an IDL string array (it's just curiousity, I don't have any
>> immediate need for that ability). My best solution so far is to convert
>> it to a byte array, find the null delimiting characters with where(),
>> and then write a loop to convert each subarray into a seperate IDL
>> string. This should work, but I'm always suspicious of the efficiency of
>> any solution for an IDL problem that involves an explicit loop.
>
>
>
> As far as I can determine, that should work equally as well with arrays as
> with strings. For example,
>
> IDL> test = ['This','is','a','test']
> IDL> btest=byte(test)
> IDL> print,btest
> 84 104 105 115
> 105 115 0 0
> 97 0 0 0
> 116 101 115 116
> IDL> stest = string(btest)
> IDL> help,stest
> STEST STRING = Array[4]
> IDL> print,strlen(stest)
> 4 2 1 4
> IDL> print,stest
> This is a test
>
> You shouldn't have to use a loop.
You're relying there on the fact that btest is a two-dimensional array;
string() converts each row into a seperate string. In the case I'm
worrying about, I would have a single IDL string, containing null
characters at arbitrary positions. Try the following:
btest = [84B,104B,105B,115B,0B,105B,115B,0B,97B,0B,106B,101B,115B,11 6B]
stest = string(btest)
All you get in stest is the 'This'.
|
|
|
Re: Null terminated strings [message #28752 is a reply to message #28677] |
Tue, 08 January 2002 11:51   |
Craig Markwardt
Messages: 1869 Registered: November 1996
|
Senior Member |
|
|
James Kuyper <kuyper@gscmail.gsfc.nasa.gov> writes:
> Craig Markwardt wrote:
...
>> What happens when you swizzle it through a STRING-BYTE-STRING
>> transformation?
>>
>> I.e.,
>>
>> date = string(byte(date))
>>
>> I believe that STRING will ignore any trailing 0-bytes, hence this may
>> solve your problem exactly, at the expense of some extra CPU.
>
> Thanks - that worked. It only solves the single-string case, but that's
> the case I am currently facing. It saves me the trouble of figuring out
> how long the string is, and it does the right thing, whether or not the
> string is null-terminated.
>
> I'm still wonder how to best convert a null-delimited list of strings
> into an IDL string array (it's just curiousity, I don't have any
> immediate need for that ability). My best solution so far is to convert
> it to a byte array, find the null delimiting characters with where(),
> and then write a loop to convert each subarray into a seperate IDL
> string. This should work, but I'm always suspicious of the efficiency of
> any solution for an IDL problem that involves an explicit loop.
The problem is that IDL has no way to represent the 0th ASCII
character in a string. At least no way that I can find, other than
bringing the data in from outside, as you have done with HDF.
My best solution is to do as you have, which is to convert to BYTEs,
then locate the 0's. But at this stage you can quickly replace the
0's with some other control character, say ASCII 1. [ This assumes
that 1 = CTRL-A never appears in your strings. ] Then you can convert
back to string and use STRSPLIT or STR_SEP to split it up.
However, you should be aware that STR_SEP uses FOR loops, and I have
never really noticed an impact when I've used it. Unless your code is
*actually* a dog with FOR loops, versus *hypothetically* a dog, then
there is no reason to optimize it. :-)
Craig
--
------------------------------------------------------------ --------------
Craig B. Markwardt, Ph.D. EMAIL: craigmnet@cow.physics.wisc.edu
Astrophysics, IDL, Finance, Derivatives | Remove "net" for better response
------------------------------------------------------------ --------------
|
|
|
Re: Null terminated strings [message #28759 is a reply to message #28677] |
Tue, 08 January 2002 09:17   |
thompson
Messages: 584 Registered: August 1991
|
Senior Member |
|
|
James Kuyper <kuyper@gscmail.gsfc.nasa.gov> writes:
> Craig Markwardt wrote:
>> James Kuyper <kuyper@gscmail.gsfc.nasa.gov> writes:
>>
>>
>>> I'm reading a string-valued file attribute from an HDF file that was
>>> created using C code. As seems quite reasonable for C programs, the
>>> attribute was written with a length that includes a terminating null
>>> character. When I read it in using IDL, that null character got included
>>> as well. This causes a number of bizarre effects, most notably:
>>>
>>> IDL> print,date
>>> 2001-10-07
>>> IDL> print,date+'T12:00:00'
>>> 2001-10-07'T12:00:0
>>>
>>> I can handle this particular case by using strmid(date,0,10), but in
>>> general a file attribute might contain multiple null-delimited strings,
>>> of unknown length. Is there an efficient way of converting such a string
>>> into an IDL string array?
>>
>>
>> What happens when you swizzle it through a STRING-BYTE-STRING
>> transformation?
>>
>> I.e.,
>>
>> date = string(byte(date))
>>
>> I believe that STRING will ignore any trailing 0-bytes, hence this may
>> solve your problem exactly, at the expense of some extra CPU.
> Thanks - that worked. It only solves the single-string case, but that's
> the case I am currently facing. It saves me the trouble of figuring out
> how long the string is, and it does the right thing, whether or not the
> string is null-terminated.
> I'm still wonder how to best convert a null-delimited list of strings
> into an IDL string array (it's just curiousity, I don't have any
> immediate need for that ability). My best solution so far is to convert
> it to a byte array, find the null delimiting characters with where(),
> and then write a loop to convert each subarray into a seperate IDL
> string. This should work, but I'm always suspicious of the efficiency of
> any solution for an IDL problem that involves an explicit loop.
As far as I can determine, that should work equally as well with arrays as
with strings. For example,
IDL> test = ['This','is','a','test']
IDL> btest=byte(test)
IDL> print,btest
84 104 105 115
105 115 0 0
97 0 0 0
116 101 115 116
IDL> stest = string(btest)
IDL> help,stest
STEST STRING = Array[4]
IDL> print,strlen(stest)
4 2 1 4
IDL> print,stest
This is a test
You shouldn't have to use a loop.
Bill Thompson
|
|
|
|
Re: Null terminated strings [message #28812 is a reply to message #28738] |
Thu, 10 January 2002 06:57  |
Struan Gray
Messages: 178 Registered: December 1995
|
Senior Member |
|
|
Craig Markwardt, craigmnet@cow.physics.wisc.edu writes:
> Struan Gray <struan.gray@sljus.lu.se> writes:
>> James Kuyper, kuyper@gscmail.gsfc.nasa.gov writes:
>>
>>> I'm still wonder how to best convert a null-delimited
>>> list of strings into an IDL string array
>>
>> Sounds like a job for supersonic HISTOGRAM and his
>> ever-eager sidekick REVERSE_INDICES.
>
> That's a good idea, although I think you can't avoid a FOR loop. In
> fact, it is my belief that by using REVERSE_INDICES to look at more
> than one bin in a histogram, you are *guaranteed* to use a FOR loop or
> equivalent.
Aahh. But in this case, you are only going to look at
one bin, and the first one at that - which avoids the usual
problem of having to step through the REVERSE_INDICES array.
I haven't tried it, but it might even be possible to force
Histogram to construct a histogram with just one bin. Then
you're laughing.
Of course, this is much the same as using where(),
except that as those who have read the HISTOGRAM
documentation know, it's faster doing it the non-obvious
way.
Mind you, on further reflection, I would probably just
adapt Malcolm Walters' idea to use array compares, which
in theory at least should be faster than either HISTOGRAM or
WHERE.
btext = byte(text)
btext = btext + 10B*(btext < 1B)
textarr = strsplit(string(btext), string(10B), /extract)
Struan
|
|
|