# Obtaining Random Indices

**QUESTION:** I have an array with two columns and a thousand rows. Each row has a
latitude and longitude value. For my analysis, I need to randomly select 100 latitude/longitude pairs.
What is the best way to do this in IDL?

** ANSWER:** The obvious answer to this question is to simply use **RandomU** to
select 100 random values between 0 and 999:

indices = Round( RandomU(seed, 100) * 999 )

But it was pointed out in the IDL newsgroup
that this doesn't guarantee *unique *indices. In other words,
it is possible, using this method, to return the same index multiple times. To guarantee only unique
indices are returned requires a different technique.

Ken Bowman suggested a simple technique in which the random numbers are first sorted, and then
the first 100 are selected, which *does *result in unique index numbers:

randomNumbers = RANDOMU(seed, 1000) sortedRandomNumbers = SORT(r) indices = sortedRandomNumbers[0:99]

This works well, but JD Smith points out it can be terribly inefficient if you wanted, say 10 random elements in an array of 100,000. You would have to make and sort 100,000 values to get the 10 you needed. He writes:

There are all sorts of iterative higher-order algorithms for selection without replacement, but they don't match to IDL well. One simple trick would be to start by generating

Mrandom numbers, check for duplicates, and generateM-nmore, accumulating until you have enough.M = 10 len = 100000L inds = LonArr(n, /NOZERO) n = M WHILE n GT 0 DO BEGIN inds[M-n] = Long(RandomU(seed, n)*len) inds = inds[Sort(inds)] u = Uniq(inds) n = M - N_Elements(u) inds[0] = inds[u] ENDWHILEFor this case, the speedup is immense. On average this method is about 3500 times faster than sorting. What about a case with more duplicates likely? How about

len=100000, withM=25000?Sort All randoms: 0.13349121 Brute force replacement: 0.091892505The method is still about 1.5 times faster.

Obviously, if you wanted

len-1random indices, this won't scale, but in that case, you could just invert the problem, choose the random indices to bediscarded, and useHistogramto generate the "real" list. Here's a general function which does this for you.FUNCITON Random_Indices, len, n_in swap = n_in gt len/2 IF swap THEN n = len-n_in ELSE n = n_in inds = LonArr(n, /NOZERO) M = n WHILE n GT 0 DO BEGIN inds[M-n] = Long( RandomU(seed, n)*len ) inds = inds[Sort(inds)] u = Uniq(inds) n = M-n_elements(u) inds[0] = inds[u] ENDWHILE IF swap THEN inds = Where(Histogram(inds,MIN=0,MAX=len-1) EQ 0) RETURN, inds endIt is outperformed by the simple sort method only when

Mis close tolen/2.r=randomu(sd,len) inds=(sort(r))[0:M-1]For example, I found that selecting from length 100000 fewer than 30000 or more than 70000 elements favored

Random_Indices. At worst case (M=len/2), it's roughly 3 times slower. TheRandom_Indicesmethod also returns the indices in sorted order (which you may or may not care about). You could obviously also make a hybrid approach which switches from one method to the other for

abs(M-len/2) lt len/5or so, but the tuning would be machine-specific.

This algorithm has been implemented in a slightly more robust way in the Coyote Library program, cgRandomIndices.

Copyright © 2006 David W. Fanning

Written: 15 November 2006

Last Updated: 22 December 2011