Load Packages

Problem Statement

Using R, build a regression model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?

Multiple Linear Regression:

## 
## Call:
## lm(formula = age ~ trestbps + chol + thalach + oldpeak, data = heart)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.5862  -5.6714   0.2889   5.8933  23.5759 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 52.873743   5.026193  10.520  < 2e-16 ***
## trestbps     0.126013   0.026385   4.776 2.83e-06 ***
## chol         0.029522   0.008853   3.335 0.000964 ***
## thalach     -0.149234   0.021216  -7.034 1.43e-11 ***
## oldpeak      0.091232   0.424752   0.215 0.830082    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.85 on 292 degrees of freedom
## Multiple R-squared:  0.2578, Adjusted R-squared:  0.2476 
## F-statistic: 25.36 on 4 and 292 DF,  p-value: < 2.2e-16
##                   2.5 %      97.5 %
## (Intercept) 42.98158533 62.76589970
## trestbps     0.07408423  0.17794128
## chol         0.01209884  0.04694573
## thalach     -0.19098955 -0.10747933
## oldpeak     -0.74473150  0.92719518

Residual analysis

Based on the residual analysis, I would conclude that the linear model is appropriate. From the plots you can see that the points follow a straight line. This tells us that the residuals are normally distributed.