Using R’s lm function, perform regression analysis and measure the significance of the independent variables for the following two data sets. In the first case, you are evaluating the statement that we hear that Maximum Heart Rate of a person is related to their age by the following equation.
#load data and packages
suppressWarnings(library(faraway))
age <- c(18, 23, 25, 35, 65, 54, 34, 56, 72, 19, 23, 42, 18, 39, 37)
maxHR <- c(202, 186, 187, 180, 156, 169, 174, 172, 153, 199, 193, 174, 198, 183, 178)Perform a linear regression analysis fitting the Max Heart Rate to Age using the lm function in R. What is the resulting equation? Is the effect of Age on Max HR significant? What is the significance level?
model1<-lm(age~maxHR)
#equation
model1$call## lm(formula = age ~ maxHR)
#variable/target coefficients // model equation
coef(model1)## (Intercept) maxHR
## 242.766917 -1.139609
sumary(model1)## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 242.766917 18.072391 13.433 5.341e-09
## maxHR -1.139609 0.099947 -11.402 3.848e-08
##
## n = 15, p = 2, Residual SE = 5.47152, R-Squared = 0.91
Yes, based upon the results. Our P value for MaxHR is < 0.05 and is therefore a significant predictor. The significance level (P) is 3.848e-08.
Please also plot the fitted relationship between Max HR and Age.
old.par <- par(mfrow=c(1, 2))
plot(model1)par(old.par)Using the Auto data set from Assignment 5 (also attached here) perform a Linear Regression analysis using mpg as the dependent variable and the other 5 (displacement, horsepower, weight, acceleration) as independent variables.
Please perform this experiment in two ways. First take any random 40 data points from the entire auto data sample and perform the linear regression fit and measure the 95% confidence intervals.
#full data set#
data <- suppressWarnings(read.table("https://raw.githubusercontent.com/RobertSellers/dataWarehouse/master/auto-mpg.data"))
names(data) <- c("mpg", "displacement", "horsepower","weight","acceleration")
#sample of 40
sampleSet<-data[sample(nrow(data), 40), ]
model40<-lm(mpg~.,sampleSet)
modelFull<-lm(mpg~.,data)What is the final linear regression fit equation?
#equation
model40$call## lm(formula = mpg ~ ., data = sampleSet)
#variable/target coefficients // model equation
coef(model40)## (Intercept) displacement horsepower weight acceleration
## -235.27711038 1.72368189 0.06479871 2.11936733 1.12521423
#equation
modelFull$call## lm(formula = mpg ~ ., data = data)
#variable/target coefficients // model equation
coef(modelFull)## (Intercept) displacement horsepower weight acceleration
## -55.47963630 0.63155975 0.08281334 -3.51392367 -0.34375687
Which of the 4 independent variables have a significant impact on mpg? What are their corresponding significance levels? What are the standard errors on each of the coefficients?
#variable/target coefficients
sumary(model40)## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -235.277110 97.481322 -2.4136 0.02117
## displacement 1.723682 0.347974 4.9535 1.846e-05
## horsepower 0.064799 0.013828 4.6861 4.125e-05
## weight 2.119367 3.381309 0.6268 0.53486
## acceleration 1.125214 1.262144 0.8915 0.37874
##
## n = 40, p = 5, Residual SE = 34.56110, R-Squared = 0.91
sumary(modelFull)## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -55.4796363 25.3119536 -2.1918 0.0289866
## displacement 0.6315597 0.1224161 5.1591 3.972e-07
## horsepower 0.0828133 0.0049061 16.8796 < 2.2e-16
## weight -3.5139237 0.9337417 -3.7633 0.0001937
## acceleration -0.3437569 0.3843392 -0.8944 0.3716584
##
## n = 392, p = 5, Residual SE = 32.14197, R-Squared = 0.91
The sumary function provides each of these values.
Sample Data Set
The significant predictor for the sample dataset is horsepower (P=2.2e-16,SE=0.012020). Acceleration (P>0.05, SE=1.229467), displacement (P=9.603e-07, SE=0.315030), and weight (P=0.0001937,3.202106) are insignificant.
Full Data Set
The significant predictors for the full dataset are displacement (P=3.972e-07, SE=0.1224161), horsepower (P=2.2e-16,SE=0.0049061), and weight (P=0.0001937,0.9337417). Acceleration is insignificant (P>0.05, SE=0.3843392).
#confidence intervals
confint(model40, level=0.95)## 2.5 % 97.5 %
## (Intercept) -433.17471457 -37.3795062
## displacement 1.01725631 2.4301075
## horsepower 0.03672671 0.0928707
## weight -4.74505460 8.9837893
## acceleration -1.43707415 3.6875026
confint(modelFull, level=0.95)## 2.5 % 97.5 %
## (Intercept) -105.24579153 -5.71348108
## displacement 0.39087586 0.87224363
## horsepower 0.07316736 0.09245933
## weight -5.34976508 -1.67808227
## acceleration -1.09941104 0.41189731
Based on all of these results, we see reductions in standard error, confidence interval range, and p-values (significance) for larger (complete) data set. The smaller our sample size, the less accurate our model becomes.