1. Install and load the package UsingR and load the father.son data with data(father.son). Get the linear regression fit where the son’s height is the outcome and the father’s height is the predictor. Give the intercept and the slope, plot the data and overlay the fitted regression line.
library(UsingR)
## Warning: package 'UsingR' was built under R version 4.2.3
## Loading required package: MASS
## Loading required package: HistData
## Warning: package 'HistData' was built under R version 4.2.3
## Loading required package: Hmisc
## Warning: package 'Hmisc' was built under R version 4.2.3
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
data(father.son)
fit = lm(sheight ~ fheight, data = father.son)
y = father.son$sheight
x = father.son$fheight
b1 = cor(y, x) * sd(y) / sd(x)
b0 = mean(y) - b1 * mean(x)
rbind(coef(fit), c(b0, b1))
##      (Intercept)  fheight
## [1,]     33.8866 0.514093
## [2,]     33.8866 0.514093
library(ggplot2)
g = ggplot(father.son, aes(x = fheight, y = sheight))
g = g + geom_point()
g = g + geom_smooth(method = lm, se = FALSE, lwd = 2)
g
## `geom_smooth()` using formula = 'y ~ x'

  1. Refer to problem 1. Center the father and son variables and refit the model omitting the intercept. Verify that the slope estimate is the same as the linear regression fit from problem 1.
fit = lm(sheight ~ fheight, data = father.son)
summary(fit)
## 
## Call:
## lm(formula = sheight ~ fheight, data = father.son)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8772 -1.5144 -0.0079  1.6285  8.9685 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.88660    1.83235   18.49   <2e-16 ***
## fheight      0.51409    0.02705   19.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.437 on 1076 degrees of freedom
## Multiple R-squared:  0.2513, Adjusted R-squared:  0.2506 
## F-statistic: 361.2 on 1 and 1076 DF,  p-value: < 2.2e-16
xc = x - mean(x)
yc = y - mean(y)
sum(xc * yc) / sum(xc^2)
## [1] 0.514093
lm(yc ~ xc - 1)
## 
## Call:
## lm(formula = yc ~ xc - 1)
## 
## Coefficients:
##     xc  
## 0.5141
  1. Refer to problem 1. Normalize the father and son data and see that the fitted slope is the correlation.
xn = (x - mean(x)) / sd(x)
yn = (y - mean(y)) / sd(y)
lm(yn ~ xn)
## 
## Call:
## lm(formula = yn ~ xn)
## 
## Coefficients:
## (Intercept)           xn  
##   1.820e-15    5.013e-01
cor(xn, yn)
## [1] 0.5013383
lm(formula = xn ~ yn)
## 
## Call:
## lm(formula = xn ~ yn)
## 
## Coefficients:
## (Intercept)           yn  
##  -2.216e-15    5.013e-01
  1. Go back to the linear regression line from Problem 1. If a father’s height was 63 inches, what would you predict the son’s height to be?
fit = lm(sheight ~ fheight, data = father.son)
summary(fit)
## 
## Call:
## lm(formula = sheight ~ fheight, data = father.son)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8772 -1.5144 -0.0079  1.6285  8.9685 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.88660    1.83235   18.49   <2e-16 ***
## fheight      0.51409    0.02705   19.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.437 on 1076 degrees of freedom
## Multiple R-squared:  0.2513, Adjusted R-squared:  0.2506 
## F-statistic: 361.2 on 1 and 1076 DF,  p-value: < 2.2e-16
predict(fit, newdata = data.frame(fheight = 63))
##        1 
## 66.27447
coef(fit)
## (Intercept)     fheight 
##   33.886604    0.514093
bo = coef(fit)[1]; b1 = coef(fit)[2]
  1. Consider a data set where the standard deviation of the outcome variable is double that of the predictor. Also, the variables have a correlation of 0.3. If you fit a linear regression model, what would be the estimate of the slope?

               sd(Y)/sd(x)=2;cor(Y,X)=0.3
    
               β1=cor(Y,X)×sd(Y)/sd(X)
    
               β1=0.3×2=0.6
  2. Consider the previous problem. The outcome variable has a mean of 1 and the predictor has a mean of 0.5. What would be the intercept?

               β0=Y¯−β1X¯=1−0.6×0.5=0.7
  3. True or false, if the predictor variable has mean 0, the estimated intercept from linear regression will be the mean of the outcome?

TRUE

              β0=Y¯−β1X¯=Y¯−β1×0=>β0=Y¯
  1. Consider problem 5 again. What would be the estimated slope if the predictor and outcome were reversed?

               β1=cor(Y,X)×sd(X)/sd(Y)=0.3∗1/2=0.15