library(UsingR)
## Warning: package 'UsingR' was built under R version 4.2.3
## Loading required package: MASS
## Loading required package: HistData
## Warning: package 'HistData' was built under R version 4.2.3
## Loading required package: Hmisc
## Warning: package 'Hmisc' was built under R version 4.2.3
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
data(father.son)
fit = lm(sheight ~ fheight, data = father.son)
y = father.son$sheight
x = father.son$fheight
b1 = cor(y, x) * sd(y) / sd(x)
b0 = mean(y) - b1 * mean(x)
rbind(coef(fit), c(b0, b1))
## (Intercept) fheight
## [1,] 33.8866 0.514093
## [2,] 33.8866 0.514093
library(ggplot2)
g = ggplot(father.son, aes(x = fheight, y = sheight))
g = g + geom_point()
g = g + geom_smooth(method = lm, se = FALSE, lwd = 2)
g
## `geom_smooth()` using formula = 'y ~ x'
fit = lm(sheight ~ fheight, data = father.son)
summary(fit)
##
## Call:
## lm(formula = sheight ~ fheight, data = father.son)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.8772 -1.5144 -0.0079 1.6285 8.9685
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.88660 1.83235 18.49 <2e-16 ***
## fheight 0.51409 0.02705 19.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.437 on 1076 degrees of freedom
## Multiple R-squared: 0.2513, Adjusted R-squared: 0.2506
## F-statistic: 361.2 on 1 and 1076 DF, p-value: < 2.2e-16
xc = x - mean(x)
yc = y - mean(y)
sum(xc * yc) / sum(xc^2)
## [1] 0.514093
lm(yc ~ xc - 1)
##
## Call:
## lm(formula = yc ~ xc - 1)
##
## Coefficients:
## xc
## 0.5141
xn = (x - mean(x)) / sd(x)
yn = (y - mean(y)) / sd(y)
lm(yn ~ xn)
##
## Call:
## lm(formula = yn ~ xn)
##
## Coefficients:
## (Intercept) xn
## 1.820e-15 5.013e-01
cor(xn, yn)
## [1] 0.5013383
lm(formula = xn ~ yn)
##
## Call:
## lm(formula = xn ~ yn)
##
## Coefficients:
## (Intercept) yn
## -2.216e-15 5.013e-01
fit = lm(sheight ~ fheight, data = father.son)
summary(fit)
##
## Call:
## lm(formula = sheight ~ fheight, data = father.son)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.8772 -1.5144 -0.0079 1.6285 8.9685
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.88660 1.83235 18.49 <2e-16 ***
## fheight 0.51409 0.02705 19.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.437 on 1076 degrees of freedom
## Multiple R-squared: 0.2513, Adjusted R-squared: 0.2506
## F-statistic: 361.2 on 1 and 1076 DF, p-value: < 2.2e-16
predict(fit, newdata = data.frame(fheight = 63))
## 1
## 66.27447
coef(fit)
## (Intercept) fheight
## 33.886604 0.514093
bo = coef(fit)[1]; b1 = coef(fit)[2]
Consider a data set where the standard deviation of the outcome variable is double that of the predictor. Also, the variables have a correlation of 0.3. If you fit a linear regression model, what would be the estimate of the slope?
sd(Y)/sd(x)=2;cor(Y,X)=0.3
β1=cor(Y,X)×sd(Y)/sd(X)
β1=0.3×2=0.6Consider the previous problem. The outcome variable has a mean of 1 and the predictor has a mean of 0.5. What would be the intercept?
β0=Y¯−β1X¯=1−0.6×0.5=0.7True or false, if the predictor variable has mean 0, the estimated intercept from linear regression will be the mean of the outcome?
β0=Y¯−β1X¯=Y¯−β1×0=>β0=Y¯
Consider problem 5 again. What would be the estimated slope if the predictor and outcome were reversed?
β1=cor(Y,X)×sd(X)/sd(Y)=0.3∗1/2=0.15