|
Re: Unicode Question [message #46876 is a reply to message #46872] |
Sat, 07 January 2006 14:57  |
R.Bauer
Messages: 1424 Registered: November 1998
|
Senior Member |
|
|
Hi all
I think it's time to add a feature request to rsi
in python it's done this way
import codecs
f = codecs.open("file.txt","rb","utf8").read()
Are there plans known about when utf-8 is added to idl?
cheers
Reimar
David Fanning wrote:
> grunes@yahoo.com writes:
>
>> First, you might look at
>>
>> http://www.unicode.org
>>
>> to see what unicode codes are.
>>
>> Don't forget that some people write the ASCII subset in 8 bits, others
>> include a null byte to make it 16.
>>
>> Open and read an 8 bit code to the file in the usual way:
>> a=string(0b) & b=a
>> openr,1,'yourfilename'
>> readu,1,a
>>
>> Then if a is 0, drop it. If not, and it is a legit ASCII char, rather
>> than one of the Unicode prefixes, as you can determine from its range,
>> the character is string(a). else
>> readu,1,b
>> and the character will be in string
>> string([a,b])
>> Of course, that string is two bytes long - which is right for Unicode.
>>
>> I haven't checked this out, as I don't have a licensed IDL where I am
>> now, but it should work.
>
> Yeah, that's kinda what I thought, too. But I'm not so sure
> it is as simple as this anymore. :-)
>
> But I am handicapped by not having the actual file, too.
> I really was just wondering if anyone had any experience
> with this. My suggestions are still resulting in a lot of
> *&%^$ type of nonsense.
>
> Cheers,
>
> David
>
|
|
|
Re: Unicode Question [message #46886 is a reply to message #46876] |
Fri, 06 January 2006 13:30  |
David Fanning
Messages: 11724 Registered: August 2001
|
Senior Member |
|
|
grunes@yahoo.com writes:
> First, you might look at
>
> http://www.unicode.org
>
> to see what unicode codes are.
>
> Don't forget that some people write the ASCII subset in 8 bits, others
> include a null byte to make it 16.
>
> Open and read an 8 bit code to the file in the usual way:
> a=string(0b) & b=a
> openr,1,'yourfilename'
> readu,1,a
>
> Then if a is 0, drop it. If not, and it is a legit ASCII char, rather
> than one of the Unicode prefixes, as you can determine from its range,
> the character is string(a). else
> readu,1,b
> and the character will be in string
> string([a,b])
> Of course, that string is two bytes long - which is right for Unicode.
>
> I haven't checked this out, as I don't have a licensed IDL where I am
> now, but it should work.
Yeah, that's kinda what I thought, too. But I'm not so sure
it is as simple as this anymore. :-)
But I am handicapped by not having the actual file, too.
I really was just wondering if anyone had any experience
with this. My suggestions are still resulting in a lot of
*&%^$ type of nonsense.
Cheers,
David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
|
|
|
Re: Unicode Question [message #46887 is a reply to message #46886] |
Fri, 06 January 2006 13:16  |
mitch grunes
Messages: 6 Registered: November 1999
|
Junior Member |
|
|
First, you might look at
http://www.unicode.org
to see what unicode codes are.
Don't forget that some people write the ASCII subset in 8 bits, others
include a null byte to make it 16.
Open and read an 8 bit code to the file in the usual way:
a=string(0b) & b=a
openr,1,'yourfilename'
readu,1,a
Then if a is 0, drop it. If not, and it is a legit ASCII char, rather
than one of the Unicode prefixes, as you can determine from its range,
the character is string(a). else
readu,1,b
and the character will be in string
string([a,b])
Of course, that string is two bytes long - which is right for Unicode.
I haven't checked this out, as I don't have a licensed IDL where I am
now, but it should work. I'll let you figure out the prefix codes.
|
|
|