comp.lang.idl-pvwave archive: archive » Re: What? You can't histogram a string array?

Home » Public Forums » archive » Re: What? You can't histogram a string array?

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Switch to threaded view of this topic

Create a new topic

Submit Reply

Re: What? You can't histogram a string array? [message #51543]

Tue, 28 November 2006 10:49

David Fanning is currently offline

David Fanning
Messages: 11724
Registered: August 2001

Senior Member

David Fanning writes:

> In running the test program, I get immediate out-of-bounds
> errors with IDL's SORT routine. But nothing of the sort
> (a pun!) with the NASA BSORT routine I always use when I
> need to sort something "for real".

Whoops! I forgot the mandatory link:

http://www.dfanning.com/tips/sort.html

Cheers,

David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")

Report message to a moderator

Re: What? You can't histogram a string array? [message #51544 is a reply to message #51543]

Tue, 28 November 2006 10:43

David Fanning is currently offline

David Fanning
Messages: 11724
Registered: August 2001

Senior Member

JD Smith writes:

> This is not good, and much worse than a minor nitpick. The
> IND_INT_SORT algorithm relies on SORT doing the right thing. That is,
> for two identical elements in the concatenated vector [a,b], SORT
> should place the first one first, i.e. the matching elements from 'a'
> will show up before those from 'b'. That's the only reason it
> works. There was always the concern that IDL's SORT would change and
> this would no longer be the case (the element from b would come
> first), in which case the algorithm would be broken.

In running the test program, I get immediate out-of-bounds
errors with IDL's SORT routine. But nothing of the sort
(a pun!) with the NASA BSORT routine I always use when I
need to sort something "for real".

Running on Windows XP.

Cheers,

David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")

Report message to a moderator

Re: What? You can't histogram a string array? [message #51545 is a reply to message #51544]

Tue, 28 November 2006 10:12

JD Smith is currently offline

JD Smith
Messages: 850
Registered: December 1999

Senior Member

On Tue, 28 Nov 2006 09:52:06 -0800, Braedley wrote:

>
> Braedley wrote:
>> JD, a small nitpick: ind_int_sort will occasionally take the index from
>> [a, b], and not from just a. This can quickly lead to out of bounds
>> conditions if the user doesn't want to index [a, b], but just wants to
>> index a. In my case, a is a column from a 2D string array, where b is
>> just a 1D string array. I think a where statement is all that is
>> needed to fix this (I know, it'll slow it down for large sets).
>>
>> Braedley
>
> Actually, the fix was much easier than previously thought. Instead of
> return, srt[wh]
> use
> return, srt[wh]<srt[wh+1]
>
> I haven't done any tests, but it shouldn't take much longer for sparse
> or small sets.

That is a clever fix, but if the ordering of elements from a and b is
random, and if you have a repeated set in a match a repeated set in b, and
their interleaved sorted order is random, you'll get back a random number
of the matching repeats (not 1, as was intended).

See my other post though, and let me know your findings w.r.t. SORT.

Thanks,

JD

Report message to a moderator

Re: What? You can't histogram a string array? [message #51546 is a reply to message #51545]

Tue, 28 November 2006 10:08

JD Smith is currently offline

JD Smith
Messages: 850
Registered: December 1999

Senior Member

On Tue, 28 Nov 2006 09:16:12 -0800, Braedley wrote:

> JD, a small nitpick: ind_int_sort will occasionally take the index from
> [a, b], and not from just a. This can quickly lead to out of bounds
> conditions if the user doesn't want to index [a, b], but just wants to
> index a. In my case, a is a column from a 2D string array, where b is
> just a 1D string array. I think a where statement is all that is
> needed to fix this (I know, it'll slow it down for large sets).

This is not good, and much worse than a minor nitpick. The
IND_INT_SORT algorithm relies on SORT doing the right thing. That is,
for two identical elements in the concatenated vector [a,b], SORT
should place the first one first, i.e. the matching elements from 'a'
will show up before those from 'b'. That's the only reason it
works. There was always the concern that IDL's SORT would change and
this would no longer be the case (the element from b would come
first), in which case the algorithm would be broken.

Can you provide an example where this isn't happening? I just tried
it on a simulated set of 100,000 random 6 character strings, and it
didn't show this behavior: all ~30 matching elements were selected
from a. I then ran this test 100 times, and in all cases it behaved
as expected. Perhaps it depends on the machine/OS? I'm actually not
sure if SORT calls a library sort function (which might make the
algorithm non-portable), or uses its own. You can try this test
yourself, like this:

for i=1,100 do begin
a=string(byte(randomu(sd,6,100000)*26)+65b)
b=string(byte(randomu(sd,6,100000)*26)+65b)
s=ind_int_sort(a,b)
print,strtrim(n_elements(s),2),' matches found'
m=max(s)
if m ge 100000 then begin
print,'Out of bounds: ',m
break
endif
endfor

Let me know if it runs through without error for you. For anyone else
who wants to test this, it would be appreciated. Here I run:

IDL> help,!VERSION,/st
** Structure !VERSION, 8 tags, length=76, data length=76:
ARCH STRING 'x86'
OS STRING 'linux'
OS_FAMILY STRING 'unix'
OS_NAME STRING 'linux'
RELEASE STRING '6.3'
BUILD_DATE STRING 'Mar 23 2006'
MEMORY_BITS INT 32
FILE_OFFSET_BITS
INT 64

BTW, if you only want the *values*, not the positions, where match
occurred, replace:

return,srt[wh]

with

return,s[wh]

and this will "solve" the problem for you (with this change, it's
equivalent to the CONTAIN function I posted long long ago). This is
insensitive to the ordering of a or b SORT performs.

Also note that IND_INT_SORT only returns *one* match for repeated
elements, which may or may not be what you want.

JD

Report message to a moderator

Re: What? You can't histogram a string array? [message #51547 is a reply to message #51546]

Tue, 28 November 2006 09:52

Braedley is currently offline

Braedley
Messages: 57
Registered: September 2006

Member

Braedley wrote:
> JD, a small nitpick: ind_int_sort will occasionally take the index from
> [a, b], and not from just a. This can quickly lead to out of bounds
> conditions if the user doesn't want to index [a, b], but just wants to
> index a. In my case, a is a column from a 2D string array, where b is
> just a 1D string array. I think a where statement is all that is
> needed to fix this (I know, it'll slow it down for large sets).
>
> Braedley

Actually, the fix was much easier than previously thought. Instead of
return, srt[wh]
use
return, srt[wh]<srt[wh+1]

I haven't done any tests, but it shouldn't take much longer for sparse
or small sets.

Braedley

Report message to a moderator

Re: What? You can't histogram a string array? [message #51549 is a reply to message #51547]

Tue, 28 November 2006 09:27

David Fanning is currently offline

David Fanning
Messages: 11724
Registered: August 2001

Senior Member

R.G. Stockwell writes:

> Wow, I must admit I have not noticed the ads before, how long
> have they been around?

Since the unemployment insurance ran out. :-(

They bring in a solid dollar a day, rain or shine.
It's just about enough to keep the site on the air.

Cheers,

David

--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")

Report message to a moderator

Re: What? You can't histogram a string array? [message #51553 is a reply to message #51549]

Tue, 28 November 2006 09:19

news.qwest.net is currently offline

news.qwest.net
Messages: 137
Registered: September 2005

Senior Member

"David Fanning" <news@dfanning.com> wrote in message
news:MPG.1fd6160cadb472da989def@news.frii.com...
> JD Smith writes:
>
>> I for one try to click on one of your ads of interest when I visit
>> to keep the hosting fees covered.
>
> Well, if you and ten thousand of your friends keep this up,
> it's possible I may be able to buy a case or two of beer
> for the next IEPA gathering. :-)

Wow, I must admit I have not noticed the ads before, how long
have they been around? I'll have to start paying attention to them.
(And perhaps assign clicking duties to some underlings)

Cheers,
bob

Report message to a moderator

Re: What? You can't histogram a string array? [message #51555 is a reply to message #51553]

Tue, 28 November 2006 09:16

Braedley is currently offline

Braedley
Messages: 57
Registered: September 2006

Member

JD, a small nitpick: ind_int_sort will occasionally take the index from
[a, b], and not from just a. This can quickly lead to out of bounds
conditions if the user doesn't want to index [a, b], but just wants to
index a. In my case, a is a column from a 2D string array, where b is
just a 1D string array. I think a where statement is all that is
needed to fix this (I know, it'll slow it down for large sets).

Braedley

Report message to a moderator

Re: What? You can't histogram a string array? [message #51558 is a reply to message #51555]

Tue, 28 November 2006 08:47

David Fanning is currently offline

David Fanning
Messages: 11724
Registered: August 2001

Senior Member

JD Smith writes:

> I for one try to click on one of your ads of interest when I visit
> to keep the hosting fees covered.

Well, if you and ten thousand of your friends keep this up,
it's possible I may be able to buy a case or two of beer
for the next IEPA gathering. :-)

Cheers,

David

P.S. I *really* appreciate it, though!

--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")

Report message to a moderator

Re: What? You can't histogram a string array? [message #51559 is a reply to message #51558]

Tue, 28 November 2006 08:31

JD Smith is currently offline

JD Smith
Messages: 850
Registered: December 1999

Senior Member

On Mon, 27 Nov 2006 16:28:49 -0700, R.G. Stockwell wrote:

>
> "David Fanning" <news@dfanning.com> wrote in message
> news:MPG.1fd510686e7366e2989de7@news.frii.com...
>> JD Smith writes:
>>
>>> (Has anyone else noticed that most reply posts these days start by
>>> referencing some dfanning.com link?)
>>
>> Maybe I need to implement better search technology. :-(
>>
>> This site has expanded quite a ways beyond my original
>> vision for it. If anyone has suggestions for how it might
>> be better organized, and the suggestion doesn't take man-years
>> to implement, I'm all ears.
>
> I think you have the optimal search alrgorithm as it stands now.
> 1)user posts to comp.lang.idl-pvwave
> 2) read response with link to dfanning.com page
> 3) click on link
>
> It is very effective.

Yeah, my point was a positive one. Without your site, we'd be reduced to
"Search the archive for something we may have written regarding this topic
last year or maybe before that", instead of pointing to a nicely
formatted page with oft-humorous editorial introductory notes. An
excellent resource whose value grows day by day. Thanks for keeping it up,
David. I for one try to click on one of your ads of interest when I visit
to keep the hosting fees covered.

JD

Report message to a moderator

Re: What? You can't histogram a string array? [message #51563 is a reply to message #51559]

Tue, 28 November 2006 04:41

Braedley is currently offline

Braedley
Messages: 57
Registered: September 2006

Member

David Fanning wrote:
> JD Smith writes:
>
>> (Has anyone else noticed that most reply posts these days start by
>> referencing some dfanning.com link?)
>
> Maybe I need to implement better search technology. :-(
>
> This site has expanded quite a ways beyond my original
> vision for it. If anyone has suggestions for how it might
> be better organized, and the suggestion doesn't take man-years
> to implement, I'm all ears.
>
> Cheers,
>
> David
> --
> David Fanning, Ph.D.
> Fanning Software Consulting, Inc.
> Coyote's Guide to IDL Programming: http://www.dfanning.com/
> Sepore ma de ni thui. ("Perhaps thou speakest truth.")

The ironic thing is that I have read this article in the past, but
forgot which section it was in.

Braedley

Report message to a moderator

Re: What? You can't histogram a string array? [message #51572 is a reply to message #51563]

Mon, 27 November 2006 15:28

news.qwest.net is currently offline

news.qwest.net
Messages: 137
Registered: September 2005

Senior Member

"David Fanning" <news@dfanning.com> wrote in message
news:MPG.1fd510686e7366e2989de7@news.frii.com...
> JD Smith writes:
>
>> (Has anyone else noticed that most reply posts these days start by
>> referencing some dfanning.com link?)
>
> Maybe I need to implement better search technology. :-(
>
> This site has expanded quite a ways beyond my original
> vision for it. If anyone has suggestions for how it might
> be better organized, and the suggestion doesn't take man-years
> to implement, I'm all ears.

I think you have the optimal search alrgorithm as it stands now.
1)user posts to comp.lang.idl-pvwave
2) read response with link to dfanning.com page
3) click on link

It is very effective.

Report message to a moderator

Re: What? You can't histogram a string array? [message #51578 is a reply to message #51572]

Mon, 27 November 2006 14:11

David Fanning is currently offline

David Fanning
Messages: 11724
Registered: August 2001

Senior Member

JD Smith writes:

> (Has anyone else noticed that most reply posts these days start by
> referencing some dfanning.com link?)

Maybe I need to implement better search technology. :-(

This site has expanded quite a ways beyond my original
vision for it. If anyone has suggestions for how it might
be better organized, and the suggestion doesn't take man-years
to implement, I'm all ears.

Cheers,

David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")

Report message to a moderator

Re: What? You can't histogram a string array? [message #51580 is a reply to message #51578]

Mon, 27 November 2006 13:56

JD Smith is currently offline

JD Smith
Messages: 850
Registered: December 1999

Senior Member

On Mon, 27 Nov 2006 10:26:20 -0800, Braedley wrote:

> I'm very disappointed. I had a beautiful solution to a problem which
> involved determining if all the elements in one array exist in a second
> using histogram, but apparently I can't do that with string arrays. Oh
> well, I think I've seen something else in the built in library that'll
> do it just as fast and easily.

http://www.dfanning.com/tips/set_operations.html

ind_int_SORT is probably what you want.

(Has anyone else noticed that most reply posts these days start by
referencing some dfanning.com link?)

JD

Report message to a moderator

Re: What? You can't histogram a string array? [message #51680 is a reply to message #51544]

Tue, 28 November 2006 13:17

JD Smith is currently offline

JD Smith
Messages: 850
Registered: December 1999

Senior Member

On Tue, 28 Nov 2006 11:43:20 -0700, David Fanning wrote:

> JD Smith writes:
>
>> [quoted text muted]
>
> In running the test program, I get immediate out-of-bounds
> errors with IDL's SORT routine. But nothing of the sort
> (a pun!) with the NASA BSORT routine I always use when I
> need to sort something "for real".

OK, so far OSX and Windows XP throw out of bounds errors. Can anyone
on Linux confirm that this runs without error? I checked libidl.so,
and it mentions qsort, of the GLIBC variety. So it must be my
implementation of qsort in my GLIBC preserves order, but others do
not. Ouch.

Might want to add a note to that page. If you don't have repeated
elements, then the fix Braedley offered works fine. BSORT from
Nasalib sorts and then reorders duplicates to preserve the original
order. It will compromise speed somewhat, but is a good alternative.

JD

P.S. How long as it been the case that SORT scrambles order on Windows?
I'm surprised the issue with IND_INT_SORT didn't come up before.

Report message to a moderator

Re: What? You can't histogram a string array? [message #51689 is a reply to message #51545]

Tue, 28 November 2006 11:23

Braedley is currently offline

Braedley
Messages: 57
Registered: September 2006

Member

JD Smith wrote:
> On Tue, 28 Nov 2006 09:52:06 -0800, Braedley wrote:
>
>>
>> Braedley wrote:
>>> JD, a small nitpick: ind_int_sort will occasionally take the index from
>>> [a, b], and not from just a. This can quickly lead to out of bounds
>>> conditions if the user doesn't want to index [a, b], but just wants to
>>> index a. In my case, a is a column from a 2D string array, where b is
>>> just a 1D string array. I think a where statement is all that is
>>> needed to fix this (I know, it'll slow it down for large sets).
>>>
>>> Braedley
>>
>> Actually, the fix was much easier than previously thought. Instead of
>> return, srt[wh]
>> use
>> return, srt[wh]<srt[wh+1]
>>
>> I haven't done any tests, but it shouldn't take much longer for sparse
>> or small sets.
>
> That is a clever fix, but if the ordering of elements from a and b is
> random, and if you have a repeated set in a match a repeated set in b, and
> their interleaved sorted order is random, you'll get back a random number
> of the matching repeats (not 1, as was intended).
>
> See my other post though, and let me know your findings w.r.t. SORT.
>
> Thanks,
>
> JD

I hit an out of bounds on my first try. Running MacOSX, 10.4.8,
IDLv6.2. Unfortunately, I do need the indices, as I pointed out
earlier. Perhaps I'll use BSORT instead.

Report message to a moderator

Re: What? You can't histogram a string array? [message #51691 is a reply to message #51543]

Tue, 28 November 2006 10:53

news.qwest.net is currently offline

news.qwest.net
Messages: 137
Registered: September 2005

Senior Member

"David Fanning" <news@dfanning.com> wrote in message
news:MPG.1fd632bc5a98f848989df5@news.frii.com...
> David Fanning writes:
>
>> In running the test program, I get immediate out-of-bounds
>> errors with IDL's SORT routine. But nothing of the sort
>> (a pun!) with the NASA BSORT routine I always use when I
>> need to sort something "for real".
>
> Whoops! I forgot the mandatory link:
>
> http://www.dfanning.com/tips/sort.html
>

You know, those ad links are actually pretty good. I'm looking
at the Princeton Instruments brochure right now, to see the latest
in near infrared imaging systems. I wonder if the tech has gotten to
the point where we can use off the shelf stuff now, instead of building
our own.

Report message to a moderator

Switch to threaded view of this topic

Create a new topic

Submit Reply

Previous Topic:	Overlay Point Sources on Maps
Next Topic:	What? You can't histogram a string array?

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

PDF

]

Current Time: Wed Oct 08 19:15:37 PDT 2025

Total time taken to generate the page: 0.00520 seconds