comp.lang.idl-pvwave archive: archive

Home » Public Forums » archive » Re: Unicode Question

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Re: Unicode Question [message #46872]

Sat, 07 January 2006 21:40

mitch grunes
Messages: 6
Registered: November 1999

Junior Member

Oops. Based on

http://www.unicode.org/versions/Unicode4.0.0

There are more characters than 16 bits accounts for. Can everything I
said. Used to be true.

Report message to a moderator

Re: Unicode Question [message #46876 is a reply to message #46872]

Sat, 07 January 2006 14:57

R.Bauer
Messages: 1424
Registered: November 1998

Senior Member

Hi all

I think it's time to add a feature request to rsi

in python it's done this way

import codecs
f = codecs.open("file.txt","rb","utf8").read()

Are there plans known about when utf-8 is added to idl?

cheers
Reimar

David Fanning wrote:

> grunes@yahoo.com writes:
>
>> First, you might look at
>>
>> http://www.unicode.org
>>
>> to see what unicode codes are.
>>
>> Don't forget that some people write the ASCII subset in 8 bits, others
>> include a null byte to make it 16.
>>
>> Open and read an 8 bit code to the file in the usual way:
>> a=string(0b) & b=a
>> openr,1,'yourfilename'
>> readu,1,a
>>
>> Then if a is 0, drop it. If not, and it is a legit ASCII char, rather
>> than one of the Unicode prefixes, as you can determine from its range,
>> the character is string(a). else
>> readu,1,b
>> and the character will be in string
>> string([a,b])
>> Of course, that string is two bytes long - which is right for Unicode.
>>
>> I haven't checked this out, as I don't have a licensed IDL where I am
>> now, but it should work.
>
> Yeah, that's kinda what I thought, too. But I'm not so sure
> it is as simple as this anymore. :-)
>
> But I am handicapped by not having the actual file, too.
> I really was just wondering if anyone had any experience
> with this. My suggestions are still resulting in a lot of
> *&%^$ type of nonsense.
>
> Cheers,
>
> David
>

Report message to a moderator

Re: Unicode Question [message #46886 is a reply to message #46876]

Fri, 06 January 2006 13:30

David Fanning
Messages: 11724
Registered: August 2001

Senior Member

grunes@yahoo.com writes:

> First, you might look at
>
> http://www.unicode.org
>
> to see what unicode codes are.
>
> Don't forget that some people write the ASCII subset in 8 bits, others
> include a null byte to make it 16.
>
> Open and read an 8 bit code to the file in the usual way:
> a=string(0b) & b=a
> openr,1,'yourfilename'
> readu,1,a
>
> Then if a is 0, drop it. If not, and it is a legit ASCII char, rather
> than one of the Unicode prefixes, as you can determine from its range,
> the character is string(a). else
> readu,1,b
> and the character will be in string
> string([a,b])
> Of course, that string is two bytes long - which is right for Unicode.
>
> I haven't checked this out, as I don't have a licensed IDL where I am
> now, but it should work.

Yeah, that's kinda what I thought, too. But I'm not so sure
it is as simple as this anymore. :-)

But I am handicapped by not having the actual file, too.
I really was just wondering if anyone had any experience
with this. My suggestions are still resulting in a lot of
*&%^$ type of nonsense.

Cheers,

David

--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/

Report message to a moderator

Re: Unicode Question [message #46887 is a reply to message #46886]

Fri, 06 January 2006 13:16

mitch grunes
Messages: 6
Registered: November 1999

Junior Member

First, you might look at

http://www.unicode.org

to see what unicode codes are.

Don't forget that some people write the ASCII subset in 8 bits, others
include a null byte to make it 16.

Open and read an 8 bit code to the file in the usual way:
a=string(0b) & b=a
openr,1,'yourfilename'
readu,1,a

Then if a is 0, drop it. If not, and it is a legit ASCII char, rather
than one of the Unicode prefixes, as you can determine from its range,
the character is string(a). else
readu,1,b
and the character will be in string
string([a,b])
Of course, that string is two bytes long - which is right for Unicode.

I haven't checked this out, as I don't have a licensed IDL where I am
now, but it should work. I'll let you figure out the prefix codes.

Report message to a moderator

Previous Topic:	Re: Problem Compiling and Using Functions
Next Topic:	reading a ninary file

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Fri Oct 10 09:59:56 PDT 2025

Total time taken to generate the page: 0.56227 seconds