Sameer Mathur
What is the effect on regression analyses if the predictors are nearly uncorrelated versus highly correlated?
Multicollinearity
---
Let's focus on the relationships among the response BP and the predictors Weight; BSA and Stress.
# reading data
bp.df <- read.delim("BloodPressureData.txt")
# attaching data columns of the dataframe
attach(bp.df)
# dimension of the dataframe
dim(bp.df)
[1] 20 8
summary(lm(BP ~ Weight, data = bp.df))
Call:
lm(formula = BP ~ Weight, data = bp.df)
Residuals:
Min 1Q Median 3Q Max
-2.6933 -0.9318 -0.4935 0.7703 4.8656
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.20531 8.66333 0.255 0.802
Weight 1.20093 0.09297 12.917 1.53e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.74 on 18 degrees of freedom
Multiple R-squared: 0.9026, Adjusted R-squared: 0.8972
F-statistic: 166.9 on 1 and 18 DF, p-value: 1.528e-10
summary(lm(BP ~ Stress, data = bp.df))
Call:
lm(formula = BP ~ Stress, data = bp.df)
Residuals:
Min 1Q Median 3Q Max
-8.6394 -3.3014 0.0722 2.2181 9.9287
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 112.71997 2.19345 51.389 <2e-16 ***
Stress 0.02399 0.03404 0.705 0.49
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.502 on 18 degrees of freedom
Multiple R-squared: 0.02686, Adjusted R-squared: -0.0272
F-statistic: 0.4969 on 1 and 18 DF, p-value: 0.4899
summary(lm(BP ~ Weight + Stress, data = bp.df))
Call:
lm(formula = BP ~ Weight + Stress, data = bp.df)
Residuals:
Min 1Q Median 3Q Max
-2.4865 -0.9395 0.1950 0.5080 4.0023
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.71023 8.09054 0.211 0.8351
Weight 1.19522 0.08683 13.765 1.2e-10 ***
Stress 0.01924 0.01006 1.913 0.0727 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.625 on 17 degrees of freedom
Multiple R-squared: 0.9199, Adjusted R-squared: 0.9105
F-statistic: 97.59 on 2 and 17 DF, p-value: 4.807e-10
summary(lm(BP ~ BSA, data = bp.df))
Call:
lm(formula = BP ~ BSA, data = bp.df)
Residuals:
Min 1Q Median 3Q Max
-5.314 -1.963 -0.197 1.934 4.831
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45.183 9.392 4.811 0.00014 ***
BSA 34.443 4.690 7.343 8.11e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.79 on 18 degrees of freedom
Multiple R-squared: 0.7497, Adjusted R-squared: 0.7358
F-statistic: 53.93 on 1 and 18 DF, p-value: 8.114e-07
summary(lm(BP ~ Weight + BSA, data = bp.df))
Call:
lm(formula = BP ~ Weight + BSA, data = bp.df)
Residuals:
Min 1Q Median 3Q Max
-1.8932 -1.1961 -0.4061 1.0764 4.7524
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.6534 9.3925 0.602 0.555
Weight 1.0387 0.1927 5.392 4.87e-05 ***
BSA 5.8313 6.0627 0.962 0.350
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.744 on 17 degrees of freedom
Multiple R-squared: 0.9077, Adjusted R-squared: 0.8968
F-statistic: 83.54 on 2 and 17 DF, p-value: 1.607e-09