Random Numbers Are...Uh, Not Random!

QUESTION: Depending upon how you create random numbers in your IDL programs, you may find that your random numbers are not very random. Here is a typical user experience.

So I'm beginning to wonder if I have encountered a strange IDL bug. (Although everytime I thought that in the past, I've been wrong). Still, I thought I would ask the question.

I have an IDL program that performs some simulations. It calls a function (Generate_Population) which generates 1000 variables populated along a certain distribution. Later in my program I generate another set of 1000 variables to select a random sample of the previously generated population. In both of these cases IDL has to initialize the random seed generator because (of course) the Generate_Population program is a separate IDL function and so the main routine that calls it can't use the random number seed variable created in the Generate_Population program.

However, I noticed that the cut being made was very strange and not at all what it should have been. After some examination, I discovered that the random variables being created by Generate_Population the second time I called it were exactly the same random variables that were returned from it the first time I called it. This can't be right! What in the world is going on here?

ANSWER: While I was trying to carve out time to provide an answer to this question, our disillusioned protagonist was making more discoveries.

OK, I did a simple test and have a better idea of how the seed initialization works with RandomU. In my mind, this is not good news! It seems IDL has some sort of hidden global "seed" variable. The first time you pass an undeclared variable to RandomU and let it initialize the seed, it is storing the seed it generated in this hidden variable, and using that same hidden variable the next time RandomU is called with an undeclared variable.

Why is this a problem? Because if you are calling RandomU with both a declared seed variable and undeclared seed variables, it will end up returning the same set of random numbers! This will happen, as in my case, if you have one routine that is generating random variables, and you are calling subroutines that also generate random variables. If you don't pass your seed value to the subroutine, then it will be working with an undeclared variable and will end up re-generating the exactly the same random numbers you've already used! Here is an example.

   IDL> Print, randomu(seed,1)         0.456874
   IDL> Print, RandomU(first_var_undeclared,1)         0.528637
   IDL> Print, RandomU(seed,1)         0.528637
   IDL> Print, RandomU(second_var_undeclared,1)         0.783314
   IDL> Print, RandomU(seed,1)         0.783314
   IDL> Print, RandomU(third_var_undeclared,1)         0.0769241
   IDL> Print, RandomU(seed,1)         0.0769241 

Notice that there is no visible relation between my calls to RandomU with the seed variable and my calls to RandomU with a series of undeclared and undefined variables. However since my seed variable was seeded by the same hidden variable used in all calls to RandomU, I end up getting duplicate results from subsequent calls to RandomU. This is not good! Moreover this hidden variable is global and is used in calls in subroutines as well. Consider these two simple routines.

   PRO TEST2, SEED=seed       Print, RandomU(seed,1)    END     PRO TEST
      Test2, SEED=seed       Print, RandomU(seed,1)       Test2
      Print, RandomU(seed,1)       Test2       Print, RandomU(seed,1)
      Test2    END 

When I run the Test program, I get the following results.

     0.0217081      0.345711      0.345711      0.370247      0.370247
     0.143830      0.143830 

I really wish I had understood this behavior before (it is not documented, at least in any way I could understand it). I have had routines which keep track of their own seed and call subroutines which use an undeclared seed. That means that my main routine and my subroutine are re-using the same random numbers, which is decidedly bad. Ultimately this is my fault. I should have been passing the seed back and forth between subroutines. But how would I have known that?

Uh, I don't know. The documentation doesn't seem to indicate this to me, either. But, it is a fact. If you want your numbers to be random, you have to make sure you keep track of and use the seed that is returned from RandomU each time you make another call to RandomU.

The IDL documentation recommends you keep track of the seed with a Common block. But this is a pain to implement sometimes. I prefer to do it another way, with an object, which can keep track of the seed thoughout the entire IDL session. To this end, I have written a RandomNumberGenerator object, which keeps careful track of the seed variable and always uses the last seed to generate the next random number.

This is, essentially, a wrapper for the RandomU function, and the same keywords and parameters can be used in its GetRandomNumbers method as you previously used to call RandomU. The only difference is that the seed variable supplied to RandomU is continuously updated from one call to the next.

There are many ways the RandomNumberGenerator object can be used, but one way is to create the RandomNumberGenerator as a system variable in an IDL startup file. This way, any program running in the IDL session can create truly random numbers. (RandomU is, of course, a pseudo-random number generator. But the repeat sequence is massively long if the seed variable is properly used.)

    IDL> DefSysV, '!RNG', Obj_New('RandomNumberGenerator')
   IDL> randomNumbers = !RNG -> GetRandomNumbers(3)    IDL> Print, randomNumbers
         0.16575761      0.70548638      0.20411179 

It is possible to set the seed yourself when you initialize the object, but if you don't, the number of seconds after 1 January 1970 is used.

   IDL> DefSysV, '!RNG', Obj_New('RandomNumberGenerator', -3L)
   IDL> Print, !RNG -> GetRandomNumbers(2,3)         0.89791570      0.76692993
        0.060318137     0.037889217         0.14239354      0.89490415 

Of course, any of the keywords you would normally use with RandomU can also be used. For example, if you wish to have a Poisson distribution of 100 random numbers, which a mean number of events per unit time of 10, you could do this.

   IDL> randomNumbers = !RNG -> GetRandomNumbers(100, POISSON=10) 

Using the Random Number Generator to Produce Random Digits

The RandomNumberGenerator object can also be used to generate a sequence of random digits of any length. These digits are normally returned as a string, which can be used, for example, in creating a unique file name. Or, the sequence of random numbers can be returned as a byte array.

   IDL> Print, !RNG -> GetRandomDigits(8)         37340547
   IDL> digits = !RNG -> GetRandomDigits(8, /BYTES)    IDL> Help, digits
        DIGITS          BYTE      = Array[8]    IDL> Print, digits
        2   5   8   3   9   4   2   6 

Version of IDL used to prepare this article: IDL 7.1.

Google
 
Web Coyote's Guide to IDL Programming