Linear Regression in R
Using R’s lm function, perform regression analysis and measure the significance of the independent variables for the following two data sets. In the first case, you are evaluating the statement that we hear that Maximum Heart Rate of a person is related to their age by the following equation:
MaxHR = 220 − Age
Perform a linear regression analysis fitting the Max Heart Rate to Age using the lm function in R. What is the resulting equation? Is the effect of Age on Max HR significant? What is the significance level? Please also plot the fitted relationship between Max HR and Age.
MaxHR<- c(202,186,187,180,156,169,174,172,153,199,193,174,198,183,178)
Age <-c(18,23,25,35,65,54,34,56,72,19,23,42,18,39,37)
HR_data <- data.frame(Age,MaxHR)
#lm function
f<-lm(MaxHR~Age, data = HR_data)
l<-summary(f)
Regression Line is MaxHR = 210.04846 -0.79773* Age
H0 : B1 = 0 i.e There is no effect of Age on Max HR Ha : B1 ≠0 i.e There is a significant effect of Age on Max HR Since the p value of B1 i.e Age is very low (p value = 3.847987e-08) we reject the null hypothesis.
# plot the fitted relationship between Max HR and Age.
library(ggplot2)
ggplot(HR_data, aes(Age, MaxHR)) +
geom_point(aes(y=MaxHR)) +
stat_smooth(method = lm, level = 0.95)
perform a Linear Regression analysis using mpg as the dependent variable and the other 4 (displacement, horse- power, weight, acceleration) as independent variables. What is the final linear regression fit equation? Which of the 4 independent variables have a significant impact on mpg? What are their corresponding significance levels? What are the standard errors on each of the coefficients? Please perform this experiment in two ways. First take any random 40 data points from the entire auto data sample and perform the linear regression fit and measure the 95% confidence intervals. Then, take the entire data set (all 392 points) and perform linear regression and measure the 95% confidence intervals. Please report the resulting fit equation, their significance values and confidence intervals for each of the two runs.
auto_data<-read.table("/Users/auto-mpg.txt")
names(auto_data) <- c('displacement', 'horsepower', 'weight', 'acceleration', 'mpg')
set.seed(32)
samp_data <- auto_data[sample(nrow(auto_data), 40), ]
samp_model<-lm(mpg~displacement + horsepower + weight + acceleration, data=samp_data)
summary(samp_model)
##
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration,
## data = samp_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.3190 -2.7041 -0.1152 1.7718 9.4217
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.701653 7.050627 8.609 3.66e-10 ***
## displacement -0.012841 0.026361 -0.487 0.6292
## horsepower -0.102870 0.068556 -1.501 0.1424
## weight -0.003996 0.003434 -1.164 0.2524
## acceleration -0.729648 0.372362 -1.960 0.0581 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.841 on 35 degrees of freedom
## Multiple R-squared: 0.775, Adjusted R-squared: 0.7493
## F-statistic: 30.14 on 4 and 35 DF, p-value: 6.717e-11
Regression equation: mpg = 60.701653 - 0.012841displacement - 0.102870horsepower - 0.003996weight -0.729648acceleration
# significance Impact
summary(samp_model)$coefficients[2:5,4]
## displacement horsepower weight acceleration
## 0.62920407 0.14244619 0.25237259 0.05805476
Only acceleation has p value nearly 0.05,so it is significant. Rest other variables (displacement, horsepower, weight and acceleration) are not found to be significant.
# 95% Confidence Intervals
sampCI <- confint(samp_model, level = 0.95)
sampCI
## 2.5 % 97.5 %
## (Intercept) 46.38811940 75.015185925
## displacement -0.06635710 0.040674605
## horsepower -0.24204658 0.036306103
## weight -0.01096753 0.002974751
## acceleration -1.48558280 0.026285853
# Standard Error
summary(samp_model)$coefficients[2:5,2]
## displacement horsepower weight acceleration
## 0.026361087 0.068556129 0.003433876 0.372361645
full_model<-lm(mpg~displacement + horsepower + weight + acceleration, data=auto_data)
summary(full_model)
##
## Call:
## lm(formula = mpg ~ displacement + horsepower + weight + acceleration,
## data = auto_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.378 -2.793 -0.333 2.193 16.256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.2511397 2.4560447 18.424 < 2e-16 ***
## displacement -0.0060009 0.0067093 -0.894 0.37166
## horsepower -0.0436077 0.0165735 -2.631 0.00885 **
## weight -0.0052805 0.0008109 -6.512 2.3e-10 ***
## acceleration -0.0231480 0.1256012 -0.184 0.85388
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.247 on 387 degrees of freedom
## Multiple R-squared: 0.707, Adjusted R-squared: 0.704
## F-statistic: 233.4 on 4 and 387 DF, p-value: < 2.2e-16
Regression Equation: mpg = 45.2511397 - 0.0060009displacement - 0.0436077horsepower - 0.0052805weight - 0.0231480acceleration
# significance Impact
summary(full_model)$coefficients[2:5,4]
## displacement horsepower weight acceleration
## 3.716584e-01 8.848982e-03 2.302545e-10 8.538765e-01
Horsepower and weight are found to be significant
# 95% Confidence Intervals
fmodel_CI <- confint(full_model, level = 0.95)
fmodel_CI
## 2.5 % 97.5 %
## (Intercept) 40.422278855 50.080000544
## displacement -0.019192122 0.007190380
## horsepower -0.076193029 -0.011022433
## weight -0.006874738 -0.003686277
## acceleration -0.270094049 0.223798050
# Standard Error
summary(full_model)$coefficients[2:5,2]
## displacement horsepower weight acceleration
## 0.0067093055 0.0165734633 0.0008108541 0.1256011622