Linear Regression in R

Using R’s lm function, perform regression analysis and measure the significance of the independent variables for the following two data sets. In the first case, you are evaluating the statement that we hear that Maximum Heart Rate of a person is related to their age by the following equation:

MaxHR = 220 − Age

Perform a linear regression analysis fitting the Max Heart Rate to Age using the lm function in R. What is the resulting equation? Is the effect of Age on Max HR significant? What is the significance level? Please also plot the fitted relationship between Max HR and Age.

MaxHR<- c(202,186,187,180,156,169,174,172,153,199,193,174,198,183,178)
Age <-c(18,23,25,35,65,54,34,56,72,19,23,42,18,39,37)
HR_data <- data.frame(Age,MaxHR)
#lm function
f<-lm(MaxHR~Age, data = HR_data)
l<-summary(f)

Regression Line is MaxHR = 210.04846 -0.79773* Age

Is the effect of Age on Max HR significant?

H0 : B1 = 0 i.e There is no effect of Age on Max HR Ha : B1 ≠ 0 i.e There is a significant effect of Age on Max HR Since the p value of B1 i.e Age is very low (p value = 3.847987e-08) we reject the null hypothesis.

# plot the fitted relationship between Max HR and Age.
library(ggplot2)
ggplot(HR_data, aes(Age, MaxHR)) + 
  geom_point(aes(y=MaxHR)) + 
  stat_smooth(method = lm, level = 0.95) 

perform a Linear Regression analysis using mpg as the dependent variable and the other 4 (displacement, horse- power, weight, acceleration) as independent variables. What is the final linear regression fit equation? Which of the 4 independent variables have a significant impact on mpg? What are their corresponding significance levels? What are the standard errors on each of the coefficients? Please perform this experiment in two ways. First take any random 40 data points from the entire auto data sample and perform the linear regression fit and measure the 95% confidence intervals. Then, take the entire data set (all 392 points) and perform linear regression and measure the 95% confidence intervals. Please report the resulting fit equation, their significance values and confidence intervals for each of the two runs.

auto_data<-read.table("/Users/auto-mpg.txt")
names(auto_data) <- c('displacement', 'horsepower', 'weight', 'acceleration', 'mpg')

Sample Model with 40 records

set.seed(32)
samp_data <- auto_data[sample(nrow(auto_data), 40), ]
samp_model<-lm(mpg~displacement + horsepower + weight + acceleration, data=samp_data)
summary(samp_model)
## 
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration, 
##     data = samp_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3190 -2.7041 -0.1152  1.7718  9.4217 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  60.701653   7.050627   8.609 3.66e-10 ***
## displacement -0.012841   0.026361  -0.487   0.6292    
## horsepower   -0.102870   0.068556  -1.501   0.1424    
## weight       -0.003996   0.003434  -1.164   0.2524    
## acceleration -0.729648   0.372362  -1.960   0.0581 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.841 on 35 degrees of freedom
## Multiple R-squared:  0.775,  Adjusted R-squared:  0.7493 
## F-statistic: 30.14 on 4 and 35 DF,  p-value: 6.717e-11

Regression equation: mpg = 60.701653 - 0.012841displacement - 0.102870horsepower - 0.003996weight -0.729648acceleration

# significance Impact
summary(samp_model)$coefficients[2:5,4] 
## displacement   horsepower       weight acceleration 
##   0.62920407   0.14244619   0.25237259   0.05805476

Only acceleation has p value nearly 0.05,so it is significant. Rest other variables (displacement, horsepower, weight and acceleration) are not found to be significant.

# 95% Confidence Intervals
sampCI <- confint(samp_model, level = 0.95)
sampCI
##                    2.5 %       97.5 %
## (Intercept)  46.38811940 75.015185925
## displacement -0.06635710  0.040674605
## horsepower   -0.24204658  0.036306103
## weight       -0.01096753  0.002974751
## acceleration -1.48558280  0.026285853
# Standard Error
summary(samp_model)$coefficients[2:5,2] 
## displacement   horsepower       weight acceleration 
##  0.026361087  0.068556129  0.003433876  0.372361645

Full Model with all records

full_model<-lm(mpg~displacement + horsepower + weight + acceleration, data=auto_data)
summary(full_model)
## 
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration, 
##     data = auto_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.378  -2.793  -0.333   2.193  16.256 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  45.2511397  2.4560447  18.424  < 2e-16 ***
## displacement -0.0060009  0.0067093  -0.894  0.37166    
## horsepower   -0.0436077  0.0165735  -2.631  0.00885 ** 
## weight       -0.0052805  0.0008109  -6.512  2.3e-10 ***
## acceleration -0.0231480  0.1256012  -0.184  0.85388    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.247 on 387 degrees of freedom
## Multiple R-squared:  0.707,  Adjusted R-squared:  0.704 
## F-statistic: 233.4 on 4 and 387 DF,  p-value: < 2.2e-16

Regression Equation: mpg = 45.2511397 - 0.0060009displacement - 0.0436077horsepower - 0.0052805weight - 0.0231480acceleration

# significance Impact
summary(full_model)$coefficients[2:5,4] 
## displacement   horsepower       weight acceleration 
## 3.716584e-01 8.848982e-03 2.302545e-10 8.538765e-01

Horsepower and weight are found to be significant

# 95% Confidence Intervals
fmodel_CI <- confint(full_model, level = 0.95)
fmodel_CI
##                     2.5 %       97.5 %
## (Intercept)  40.422278855 50.080000544
## displacement -0.019192122  0.007190380
## horsepower   -0.076193029 -0.011022433
## weight       -0.006874738 -0.003686277
## acceleration -0.270094049  0.223798050
# Standard Error
summary(full_model)$coefficients[2:5,2] 
## displacement   horsepower       weight acceleration 
## 0.0067093055 0.0165734633 0.0008108541 0.1256011622