# Calculating the Statistical Mode

**QUESTION:** I have an integer array and I would like to calculate the statistical MODE
of the array. How do I do this in IDL?

** ANSWER:** I'm trying to remember all the way back to grade school, I think, but
isn't the statistical mode the value having the largest frequency distribution? I
remember that the mode isn't unique, because two values
could be represented with the same frequency in the sample (*bimodal*, I guess),
and it is theoretically possible to have no mode at all (all distribution frequencies
in the sample are the same).

But saying all that, if you have an integer array, I think I would calculate the mode of the array like this:

IDL> array = [1, 1, 2 , 4, 1, 3, 3, 2, 4, 5, 3, 2, 2, 1, 2, 6, -3] IDL> distfreq = Histogram(array, MIN=Min(array)) IDL> maxfreq= Max(distfreq) IDL> mode = Where(distfreq EQ maxfreq) + Min(array) IDL> Print, mode 2

As usual, going for readability is not always the “IDL way.” JD Smith offers this correction.

Just a hint on your

HISTOGRAMusage. This might be slightly preferred, since it skips theWHEREandMIN:IDL> array = [1, 1, 2 , 4, 1, 3, 3, 2, 4, 5, 3, 2, 2, 1, 2, 6] IDL> void = Max(Histogram(array,OMIN=mn), mxpos) IDL> mode = mn + mxposThis method, of course, will be

veryproblematic if you have, for example:array = [1, 1, 2 , 4, 1, 3, 3, 2, 4, 5, 3, 2, 2, 1, 2, 6, 10000000]Another option, if you worry about this, would be to re-cast the solution as a sorting problem, using the method discussed in a recent IDL newsgroup thread:

array = array[Sort(array)] wh = where(array ne Shift(array,-1), cnt) if cnt eq 0 then mode = array[0] else begin void = Max(wh-[-1,wh], mxpos) mode = array[wh[mxpos]] endelse Print, modeBoth methods will give you the lowest number in the case of ties for the mode. The second will be slower, but more robust against large dynamic range in your array. You could use both, deciding which to use by the

min/maxof the array.

If I were going to turn this simple idea of mine into a (readable) function, I would probably add some warnings and error messages.

FUNCTION Mode, array ; Calculates the MODE (value with the maximum frequency distribution) ; of an array. Works ONLY with integer data. On_Error, 2 ; Check for arguments. IF N_Elements(array) EQ 0 THEN Message, 'Must pass an array argument.' ; Is the data an integer type? If not, exit. dataType = Size(array, /Type) IF ((dataType GT 3) AND (dataType LT 12)) THEN $ Message, 'Data is not INTEGER type.' ; Calculate the distribution frequency distfreq = Histogram(array, MIN=Min(array)) ; Find the maximum of the frequency distribution. maxfreq = Max(distfreq) ; Find the mode. mode = Where(distfreq EQ maxfreq, count) + Min(array) ; Warn the user if the mode is not singular. IF count NE 1 THEN ok = Dialog_Message('The MODE is not singular.') RETURN, mode END

Several years after this article was written, Allan Whiteford had this to add.

I think all the discussions on finding the mode probably pre-dated the ++ operator. It could be that using the vectorised ++ operator is a better way to do it. (Although I doubt it. Normally if histogram can do something then histogram will be the best way!)

It would make this example into something like this:

array = [1, 1, 2 , 4, 1, 3, 3, 2, 4, 5, 3, 2, 2, 1, 2, 6,-3] f = IntArr(Max(array) - Min(array)+1) f[array - Min(array)]++ void = Max(f,idx) mode = idx + Min(array) Print, modeAgain, I have no idea whether this would be more efficient. If you're doing analysis on measurements (typically non-integers) then you'd need to invoke histogram anyway to bin them before trying to find the mode.

Copyright © 2006 David W. Fanning

Written 25 January 2006

Last Updated 11 February 2008