# The Confidence Interval of the Correlation Coefficient

QUESTION: How can I calculate the 95% confidence interval of a correlation coefficient between two variables?

ANSWER: It appears from a quick search of the Internet that the best way to do this is to use a Fisher Z Transformation to convert the distribution of the correlation coefficient into a normalized distribution from which a confidence level can be determined. The IDL code I will show you came from a snippet of SAS code found in this reasonably lucid account of the process by Shen and Lu. I had a bit of trouble (since I am no statistician) with one step in their process (picking the critical value of a 95 percent confidence limit), but I got some help with this in this article by David Lane, and in this more visual diagram.

Let's start by plotting some data and obtaining the correlation coefficient, *rho*.

n = 101 x = cgDemoData(1)+ RandomU(-3L, n) * 10 y = cgDemoData(1)+ RandomU(-5L, n) * 10 cgScatter2D, x, y, Coefficient=rho

A plot of the data is shown here.

The scatter plot of the data with the correlation coefficient. |

There are other wayy to calculate the correlation coefficient. For example, we could have done this.

rho = Correlate(x, y)

However you get it, you need to apply the Fisher Z Transformation to it. The code, taken from the Shen and Lu paper, looks like this. The number 1.96 comes from a table of critical values for normalized distributions. The value for a 99 percent confidence level would be 2.58. The latter, of course, would result in a wider confidence interval.

fisherTransform = 0.5D * (Alog(1+rho) - Alog(1-rho)) sigmaz = 1.0D / Sqrt(n-3) l_95 = fisherTransform - (1.96D * sigmaz) h_95 = fisherTransform + (1.96D * sigmaz) lo95 = (Exp(2*l_95)-1)/(Exp(2*l_95)+1) hi95 = (Exp(2*h_95)-1)/(Exp(2*h_95)+1) Print, '95% Confidence Interval: ', [lo95, hi95]

The results looked like this.

95% Confidence Interval: 0.83528709 0.92187240

For a sanity check to be sure I coded this correctly, I used another page from my Google search to find a confidence level calculator. I plugged my numbers in and they matched.

Ken Bowman has provided IDL users on the IDL newsgroup with a program to find confidence intervals for other statistical values. The program is named Regression_Statistics_KPB

Here is an example of how the program can be used and its output.

stats = Regression_Statistics_KPB(x, y, CI=95, /Verbose)

The output from the program (which is also returned in the variable *stats*) looks like this. You see that it provides confidence levels for both the slope and the intercept of the correlation line.

Intercept (a) : 2.77016 Slope (b) : 0.858863 Correlation coefficient r : 0.886077 r^2 : 0.785133 F-statistic : 361.750 Chi-square statistic : 1373.92 n : 101 Mean of x : 19.5523 Mean of y : 19.5629 S.D. of residuals : 3.72532 S_xx : 6805.92 S_yy : 6394.28 S_xy : 5845.35 S.E. of a : 0.957571 S.E. of b : 0.045156 Confidence limit : 95% Level for t-test : 0.025000000 t-statistic : 1.9842176 Confidence interval for a : 2.77016+/-1.9000288 [ 0.87013186, 4.67018950] Confidence interval for b : 0.858863+/-0.089600280 [ 0.76926272, 0.94846328]

*Version of IDL used to prepare this article: IDL 8.2.3.*