comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: What? You can't histogram a string array?
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Return to the default flat view Create a new topic Submit Reply
Re: What? You can't histogram a string array? [message #51546 is a reply to message #51545] Tue, 28 November 2006 10:08 Go to previous messageGo to previous message
JD Smith is currently offline  JD Smith
Messages: 850
Registered: December 1999
Senior Member
On Tue, 28 Nov 2006 09:16:12 -0800, Braedley wrote:

> JD, a small nitpick: ind_int_sort will occasionally take the index from
> [a, b], and not from just a. This can quickly lead to out of bounds
> conditions if the user doesn't want to index [a, b], but just wants to
> index a. In my case, a is a column from a 2D string array, where b is
> just a 1D string array. I think a where statement is all that is
> needed to fix this (I know, it'll slow it down for large sets).

This is not good, and much worse than a minor nitpick. The
IND_INT_SORT algorithm relies on SORT doing the right thing. That is,
for two identical elements in the concatenated vector [a,b], SORT
should place the first one first, i.e. the matching elements from 'a'
will show up before those from 'b'. That's the only reason it
works. There was always the concern that IDL's SORT would change and
this would no longer be the case (the element from b would come
first), in which case the algorithm would be broken.

Can you provide an example where this isn't happening? I just tried
it on a simulated set of 100,000 random 6 character strings, and it
didn't show this behavior: all ~30 matching elements were selected
from a. I then ran this test 100 times, and in all cases it behaved
as expected. Perhaps it depends on the machine/OS? I'm actually not
sure if SORT calls a library sort function (which might make the
algorithm non-portable), or uses its own. You can try this test
yourself, like this:

for i=1,100 do begin
a=string(byte(randomu(sd,6,100000)*26)+65b)
b=string(byte(randomu(sd,6,100000)*26)+65b)
s=ind_int_sort(a,b)
print,strtrim(n_elements(s),2),' matches found'
m=max(s)
if m ge 100000 then begin
print,'Out of bounds: ',m
break
endif
endfor

Let me know if it runs through without error for you. For anyone else
who wants to test this, it would be appreciated. Here I run:

IDL> help,!VERSION,/st
** Structure !VERSION, 8 tags, length=76, data length=76:
ARCH STRING 'x86'
OS STRING 'linux'
OS_FAMILY STRING 'unix'
OS_NAME STRING 'linux'
RELEASE STRING '6.3'
BUILD_DATE STRING 'Mar 23 2006'
MEMORY_BITS INT 32
FILE_OFFSET_BITS
INT 64

BTW, if you only want the *values*, not the positions, where match
occurred, replace:

return,srt[wh]

with

return,s[wh]

and this will "solve" the problem for you (with this change, it's
equivalent to the CONTAIN function I posted long long ago). This is
insensitive to the ordering of a or b SORT performs.

Also note that IND_INT_SORT only returns *one* match for repeated
elements, which may or may not be what you want.

JD
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Overlay Point Sources on Maps
Next Topic: What? You can't histogram a string array?

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Sun Oct 12 14:47:09 PDT 2025

Total time taken to generate the page: 0.40046 seconds