Fanning Software Consulting

Drawing a Box and Whisker Plot

QUESTION: Can you show me how to draw a box and whisker plot in IDL. The box on the plot should be drawn around the 25th and 75 quartile of the data, and the whiskers should extend out to largest and smallest value within 1.5 times the interquartile range (IGR). Outliers should be marked with circles.

ANSWER: The idea behind a box and whisker plot is that the data should first be divided into two equal groups by finding the median value of the data. Then, each of these two sub-groups should be divided in the same way. If done properly, this should divide the data into four equally populated sub-groups. The divisions between groups are called the 25th quartile, the median value, and the 75th quartile.

In the cgBoxplot program I wrote to do this, I sort the data and find the quartiles and the IRG like this.

   sortedData = data[Sort(data)]
   IF N_Elements(sortedData) MOD 2 EQ 0 THEN BEGIN
      index = N_Elements(sortedData)/2
      medianData = (sortedData[index-1] + sortedData[index]) / 2.0
      lowerGroup = sortedData[0:index-1]
      higherGroup = sortedData[index:N_Elements(data)-1]
   ENDIF ELSE BEGIN
      index = N_Elements(sortedData)/2
      medianData = sortedData[index]
      lowerGroup = sortedData[0:index-1]
      higherGroup = sortedData[index+1:N_Elements(data)-1]
   ENDELSE
   quartile_25 = Median(lowerGroup, /EVEN)
   quartile_75 = Median(higherGroup, /EVEN) 
   irq = quartile_75 - quartile_25

The next step is easy. All we have to do is use IDL's graphics commands to draw lines and symbols on a plot. Given that we have a width of the box and a location where we should draw the box along the X axis (in the variables width and xlocation, resprectively), we can draw the box plot like this. Note how I use Value_Locate to identify those data that are inside the part of the plot represented by the whiskers.

   minData = MIN(data, MAX=maxData)
   halfwidth = width / 2.0
   x1 = xlocation - halfwidth
   x2 = xlocation + halfwidth
   y1 = quartile_25
   y2 = quartile_75
   cgPlotS, [x1,x1,x2,x2,x1], [y1,y2,y2,y1,y1], COLOR=color
   cgPlotS, [x1, x2], [medianData, medianData], COLOR=color
      
   ; Are there any data greater than 1.5*irq
   imax = Where(data GT quartile_75 + (1.5 * irq), maxcount)
   IF maxcount EQ 0 THEN BEGIN
     top = maxData 
   ENDIF ELSE BEGIN
         index = Value_Locate(sortedData, quartile_75 + (1.5 * irq))
         top = sortedData[0 > (index) < (N_Elements(data)-1)]
   ENDELSE
      
   ; Are there any data less than 1.5*irq
   imin = Where(data LT quartile_25 - (1.5 * irq), mincount)
   IF mincount EQ 0 THEN BEGIN
         bottom = minData 
   ENDIF ELSE BEGIN
         index = Value_Locate(sortedData, quartile_25 - (1.5 * irq))
         bottom = sortedData[0 > (index+1) < (N_Elements(data)-1)]
   ENDELSE
      
   ; Draw the whiskers.
   cgPlotS, [xlocation, xlocation], [quartile_75, top], COLOR=color
   cgPlotS, [xlocation, xlocation], [quartile_25, bottom], COLOR=color
   cgPlotS, [xlocation - (halfwidth*0.5), xlocation + (halfwidth*0.5)], $
          [top, top], COLOR=color
   cgPlotS, [xlocation - (halfwidth*0.5), xlocation + (halfwidth*0.5)], $
          [bottom, bottom], COLOR=color
      
   ; Draw outliners if there are any.
   IF maxcount GT 0 THEN BEGIN
      FOR j=0,maxcount-1 DO cgPlotS, xlocation, data[imax[j]], $
         PSYM=cgSymCat(9), COLOR=color
   ENDIF
   IF mincount GT 0 THEN BEGIN
      FOR j=0,mincount-1 DO cgPlotS, xlocation, data[imin[j]], $
         PSYM=cgSymCat(9), COLOR=color
   ENDIF

As an example, you can download data from the Michaelson-Morely experiment in which they measured the speed of light. The data is in a file named mm_data.dat. You can use this code to open and read the data in the file, and display it as a box plot.

   OpenR, 1, 'mm_data.dat'
   header = Strarr(2)
   Readf, 1, header
   data = Intarr(5, 20)
   Readf, 1, data
   Close, 1
   cgBoxPlot, data, XTITLE='Experiment Number', $
      YTITLE='Speed of Light (km/s minus 299,000)', /Window
   cgPlotS, !X.CRange, [792.458,792.458], Color='red', /AddCmd
   cgText, 3, 775, /Data, Color='red', 'True Speed', Alignment=0.5, /AddCmd

You can see the results in the figure below.

Box and Whisker Plot
A box and whisker plot in IDL using data from the Michaelson-Morley experiment.
 

A different version of this plot can be found in the Coyote Plot Gallery.

Google
 
Web Coyote's Guide to IDL Programming