Re: Regression fit and random noise [message #83675] |
Fri, 29 March 2013 10:24  |
kisCA
Messages: 78 Registered: January 2011
|
Member |
|
|
> Philip and Craig, thank you for your example.
I still don't understand the asymptotic value I reach. My point is, if you "drown" your signal in noise, even if it's between 0 and 1, the R^2 should tend to zero.
Craig, what do you mean by: "R^2 itself has sample variance."
Thank you again for your help, things start clearing up in my mind
|
|
|
Re: Regression fit and random noise [message #83683 is a reply to message #83675] |
Thu, 28 March 2013 18:16   |
Craig Markwardt
Messages: 1869 Registered: November 1996
|
Senior Member |
|
|
On Thursday, March 28, 2013 7:37:34 PM UTC-4, kisCA wrote:
> I understand the process of "destroying" the correlation. What I don't get is why does the coefficient of determination (R2) reach a plateau value (0.3) and doesn't get closer to zero as I raise the noise_ratio a lot (like a hundred)...
>
>> "underlying statistics"
The little cut and paste example below should show that the correlation factors do indeed converge to zero as the noise value is increased. Of course an individual sample of random scale factors may not make a R^2 value that goes to zero. R^2 itself has sample variance.
Craig
x = randomu(seed,100) ;; Random X positions
ym = 0.3 - 0.7 * x ;; Pure Y model (no noise)
ye = 0.01 ;; Initial scatter
;; Sampled y value
ys = ym + randomn(seed,100)*ye
print, r_correlate(x, ys)
;; NOISE_FACTOR multiples
noise_factors = [1, 10, 100, 1000, 10000]
;; Try different noise factors
for i = 0, n_elements(noise_factors)-1 do begin & ys1 = ym + randomn(seed,100)*ye * noise_factors(i) & print, r_correlate(x, ys1)
;; EXAMPLE RUN:
;; i NOISE SPEARMAN SIGNIF
;; 0 1 -0.998248 0.00000
;; 1 10 -0.904374 5.05695e-38
;; 2 100 -0.175770 0.0802491
;; 3 1000 0.0576417 0.568917
;; 4 10000 -0.0107651 0.915328
|
|
|
Re: Regression fit and random noise [message #83685 is a reply to message #83683] |
Thu, 28 March 2013 16:40   |
Phillip Bitzer
Messages: 223 Registered: June 2006
|
Senior Member |
|
|
This is actually a good point, and likely explains your asymptotic value of the coefficient. Check out the sample code I posted to see an example of how to add uniform noise distributed about 0
----> noiseFactor*(RANDOMU(seed, npts)-0.5)
Depending on the data, and what you're trying to do, you could instead add Gaussian-distributed noise as mentioned. You'll probably want to modify the width of the Gaussian noise to *really* test the model as well.
See the help for more:
http://www.exelisvis.com/docs/RANDOMU.html
http://www.exelisvis.com/docs/RANDOMN.html
On Thursday, March 28, 2013 6:26:30 PM UTC-5, Mats Löfdahl wrote:
>
>
> Maybe I misunderstand what you are trying to do but... Are you aware that randomu has a uniform distribution between 0 and 1? So you are adding on the average something like 0.5*noise_ratio to your original signal. So maybe you want to add noise_ratio*(randomu(...)-0.5) instead. Or, since randomn is normal distributed with zero mean, simply noise_ratio*randomn(...).
|
|
|
|
|
Re: Regression fit and random noise [message #83688 is a reply to message #83687] |
Thu, 28 March 2013 16:18   |
Phillip Bitzer
Messages: 223 Registered: June 2006
|
Senior Member |
|
|
The (linear) correlation coefficient, r, is a measure how well the independent/dependent variables are correlated. For perfectly correlated data, r = 1, and the data plots as a straight line with positive slope. Perfectly anit-correlated data has a r=-1, and the data plots as a straight line with negative slope. Uncorrelated data has r=0; in this case, the best fit line has a slope of zero (imagine data points that are scattered with no perceptible trend). (You're dealing with the multiple correlation coefficient, but the concept is similar. There's a nice discussion in Bevington, among other places. BTW, the multiple correlation coefficient can be shown to be a linear combination of the linear correlation coefficients for each variable x_i. Further, the linear correlation coefficient can be used to assess the usefulness of a predictor in the model.)
In your case, setting noise ratio = 0 should provide the same value as if no noise was present because no (artificial) noise is present! As you increase noise_ratio, you're essentially "destroying" the correlation, in a manner of speaking. I bet if you crank up noise_ratio far enough you can get essentially uncorrelated data.
Be careful when you speak of a "good fit" - there ways to qualify what is a good fit (for example, using the chi squared value to test the null hypothesis). Depending on the SNR, the model will still be a "good fit" to the (noisy) data.
Ultimately, the answer to your question lies in the underlying statistics - there isn't (shouldn't be?) anything wonky going on in IDL.
Hope this helps!
|
|
|
|
Re: Regression fit and random noise [message #83690 is a reply to message #83689] |
Thu, 28 March 2013 14:51   |
Phillip Bitzer
Messages: 223 Registered: June 2006
|
Senior Member |
|
|
Not quite sure what you're asking here - we need a little more info. What routine are you using to do the fit? When you say noise/signal ratio, do you mean signal to noise ratio (SNR)? Do you have some sort of example data?
Regardless, consider the following "simple" linear regression, adapted from the IDL help:
PRO test_model
npts = 100
x = FINDGEN(npts)
noiseFactor = 0.
y = x + noiseFactor*(RANDOMU(seed, npts)-0.5)
plot, x, y, psym=2
result = REGRESS(X, Y, SIGMA=sigma, CONST=const, corr = r2, $
MEASURE_ERRORS=measure_errors)
PRINT, 'Constant: ', const
PRINT, 'Coefficients: ', result[*]
PRINT, 'Standard errors: ', sigma
PRINT, 'Correl Coeff: ', r2
END
Notice as you increase the noise factor, the correlation coefficient gets worse. This is entirely expected and is not a IDL-only thing. Basically, the "signal" gets swamped out by the "noise". You should get your hands on a good statistics book (e.g., Data Reduction and Error Analysis in the Physical Sciences by Bevington, Statistical Methods in the Atmospheric Sciences by Wilks) to better interpet what's going on "under the hood". For instance, according to regress.pro, the fit is done via chi squared minimization.
Good luck!
|
|
|
Re: Regression fit and random noise [message #83808 is a reply to message #83675] |
Fri, 29 March 2013 16:39  |
Craig Markwardt
Messages: 1869 Registered: November 1996
|
Senior Member |
|
|
On Friday, March 29, 2013 1:24:32 PM UTC-4, kisCA wrote:
>> Philip and Craig, thank you for your example.
>
>
>
> I still don't understand the asymptotic value I reach. My point is, if you "drown" your signal in noise, even if it's between 0 and 1, the R^2 should tend to zero.
Yes, it does. I gave you an example.
> Craig, what do you mean by: "R^2 itself has sample variance."
Given a random sample of data, the computed R^2 value will not exactly equal its expected theoretical value. There is variance about the expected mean value. Only in the limit of averaging over many samples does it convert to the expectation value.
Craig
|
|
|