comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Random selection
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Return to the default flat view Create a new topic Submit Reply
Re: Random selection [message #51324 is a reply to message #51242] Mon, 13 November 2006 10:46 Go to previous message
JD Smith is currently offline  JD Smith
Messages: 850
Registered: December 1999
Senior Member
On Sun, 12 Nov 2006 05:07:19 -0800, greg michael wrote:

> I'm not sure doubles are going to help - you'll get the same problem at
> 3 digits. And if you use the sort method, it doesn't really matter if
> you get two the same - they'll still map to unique indices.

If you only want to pick 10 random elements from a list 100,000 long,
it's very inefficient to generate a set of 100,000 random numbers, sort
them all, and then take the first 10 indices. There are all sorts of
iterative higher-order algorithms for selection without replacement, but
they don't match to IDL well. One simple trick would be to start by
generating M random numbers, check for duplicates, and generate M-n more,
accumulating until you have enough.

M=10
len=100000L
inds=lonarr(n,/NOZERO)
n=M
while n gt 0 do begin
inds[M-n]=long(randomu(sd,n)*len)
inds=inds[sort(inds)]
u=uniq(inds)
n=M-n_elements(u)
inds[0]=inds[u]
end

For this case, the speedup is immense, on average about 3500x faster.
What about a case with more duplicates likely? How about len=100000,
M=25000?

Sort All randoms: 0.13349121
Brute force replacement: 0.091892505

Still about 1.5x faster.

Obviously, if you wanted len-1 random indices, this won't scale, but in
that case, you could just invert the problem, choose the random indices
to be *discarded*, and use HISTOGRAM to generate the "real" list.
Here's a general function which does this for you.

function random_indices, len, n_in
swap=n_in gt len/2
if swap then n=len-n_in else n=n_in
inds=lonarr(n,/NOZERO)
M=n
while n gt 0 do begin
inds[M-n]=long(randomu(sd,n)*len)
inds=inds[sort(inds)]
u=uniq(inds)
n=M-n_elements(u)
inds[0]=inds[u]
endwhile

if swap then inds=where(histogram(inds,MIN=0,MAX=len-1) eq 0)
return,inds
end

It is outperformed by the simple sort method:

r=randomu(sd,len)
inds=(sort(r))[0:M-1]

only when M is close to len/2. For example, I found that selecting from
length 100000 fewer than 30000 or more than 70000 elements favored
RANDOM_INDICES. At worst case (M=len/2), it's roughly 3x slower. The
RANDOM_INDICES method also returns the indices in sorted order (which
you may or may not care about). You could obviously also make a hybrid
approach which switches from one method to the other for

abs(M-len/2) lt len/5

or so, but the tuning would be machine-specific.

JD
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Re: colorbar positioning in postscript
Next Topic: canonical correlations

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Fri Oct 10 06:29:44 PDT 2025

Total time taken to generate the page: 0.40065 seconds