Drawing a Box and Whisker Plot
QUESTION: Can you show me how to draw a box and whisker plot in IDL. The box on the plot should be drawn around the 25th and 75 quartile of the data, and the whiskers should extend out to largest and smallest value within 1.5 times the interquartile range (IGR). Outliers should be marked with circles.
ANSWER: The idea behind a box and whisker plot is that the data should first be divided into two equal groups by finding the median value of the data. Then, each of these two sub-groups should be divided in the same way. If done properly, this should divide the data into four equally populated sub-groups. The divisions between groups are called the 25th quartile, the median value, and the 75th quartile.
In the cgBoxplot program I wrote to do this, I sort the data and find the quartiles and the IRG like this.
sortedData = data[Sort(data)] IF N_Elements(sortedData) MOD 2 EQ 0 THEN BEGIN index = N_Elements(sortedData)/2 medianData = (sortedData[index-1] + sortedData[index]) / 2.0 lowerGroup = sortedData[0:index-1] higherGroup = sortedData[index:N_Elements(data)-1] ENDIF ELSE BEGIN index = N_Elements(sortedData)/2 medianData = sortedData[index] lowerGroup = sortedData[0:index-1] higherGroup = sortedData[index+1:N_Elements(data)-1] ENDELSE quartile_25 = Median(lowerGroup, /EVEN) quartile_75 = Median(higherGroup, /EVEN) irq = quartile_75 - quartile_25
The next step is easy. All we have to do is use IDL's graphics commands to draw lines and symbols on a plot. Given that we have a width of the box and a location where we should draw the box along the X axis (in the variables width and xlocation, resprectively), we can draw the box plot like this. Note how I use Value_Locate to identify those data that are inside the part of the plot represented by the whiskers.
minData = MIN(data, MAX=maxData) halfwidth = width / 2.0 x1 = xlocation - halfwidth x2 = xlocation + halfwidth y1 = quartile_25 y2 = quartile_75 cgPlotS, [x1,x1,x2,x2,x1], [y1,y2,y2,y1,y1], COLOR=color cgPlotS, [x1, x2], [medianData, medianData], COLOR=color ; Are there any data greater than 1.5*irq imax = Where(data GT quartile_75 + (1.5 * irq), maxcount) IF maxcount EQ 0 THEN BEGIN top = maxData ENDIF ELSE BEGIN index = Value_Locate(sortedData, quartile_75 + (1.5 * irq)) top = sortedData[0 > (index) < (N_Elements(data)-1)] ENDELSE ; Are there any data less than 1.5*irq imin = Where(data LT quartile_25 - (1.5 * irq), mincount) IF mincount EQ 0 THEN BEGIN bottom = minData ENDIF ELSE BEGIN index = Value_Locate(sortedData, quartile_25 - (1.5 * irq)) bottom = sortedData[0 > (index+1) < (N_Elements(data)-1)] ENDELSE ; Draw the whiskers. cgPlotS, [xlocation, xlocation], [quartile_75, top], COLOR=color cgPlotS, [xlocation, xlocation], [quartile_25, bottom], COLOR=color cgPlotS, [xlocation - (halfwidth*0.5), xlocation + (halfwidth*0.5)], $ [top, top], COLOR=color cgPlotS, [xlocation - (halfwidth*0.5), xlocation + (halfwidth*0.5)], $ [bottom, bottom], COLOR=color ; Draw outliners if there are any. IF maxcount GT 0 THEN BEGIN FOR j=0,maxcount-1 DO cgPlotS, xlocation, data[imax[j]], $ PSYM=cgSymCat(9), COLOR=color ENDIF IF mincount GT 0 THEN BEGIN FOR j=0,mincount-1 DO cgPlotS, xlocation, data[imin[j]], $ PSYM=cgSymCat(9), COLOR=color ENDIF
As an example, you can download data from the Michaelson-Morely experiment in which they measured the speed of light. The data is in a file named mm_data.dat. You can use this code to open and read the data in the file, and display it as a box plot.
OpenR, 1, 'mm_data.dat' header = Strarr(2) Readf, 1, header data = Intarr(5, 20) Readf, 1, data Close, 1 cgBoxPlot, data, XTITLE='Experiment Number', $ YTITLE='Speed of Light (km/s minus 299,000)', /Window cgPlotS, !X.CRange, [792.458,792.458], Color='red', /AddCmd cgText, 3, 775, /Data, Color='red', 'True Speed', Alignment=0.5, /AddCmd
You can see the results in the figure below.
|A box and whisker plot in IDL using data from the Michaelson-Morley experiment.|
A different version of this plot can be found in the Coyote Plot Gallery.
Copyright © 2007-2009 David W. Fanning
Updated 26 August 2007
Last Updated 4 March 2009