Prelude

We have moved from simple linear models to multiple regression models with more than one predictor being tested for explaining the observed variation in the data. Some of this variation is systematic (that explained by a variable of theoretical importance, e.g., temperature, where evaporation~temperature), while other variation remains unexplained (either because we didn’t measure it directly or because we don’t know what is causing it). We can add covariates to reduce the unexplained variance, but need to do so with care. In an experimental setting, you will design an experiment with the most likely explanatory variables included in the design. But in an observational setting, we may need to select for which variables help in explaining the distribution of our observations. In our simple example with evaporation, we know temperature is important, but also may have access to other variables like humidity, wind speed, or the like. But which of these helps explain or observed evaporation rates? We would construct a model to find out.

Today we will explore some of the diagnostics associated with model fitting and touch on model selection. You are not done with fitting an lm until your assessed the output and the residuals. We can select a model based on hypothesis tests (t, F), R2 , and backward/forward selection, or we can use an information theoretic approach. Information theory helps us answer the question: what is the probability of the hypothesis given our data (e.g., P[H1 | Data])? Information-theoretic analyses, like Bayesian analyses, are grounded in likelihood estimates–we need the log likelihood as estimated with Maximum Likelihood (ML). This is how R fits some models behind the scenes: when data are normally distributed, ML and Ordinary Least Squares (OLS) converge on the same solution–a lot of matrix algebra is involved that we won’t get into. At any rate, Akaike’s Information Criterion (AIC) is one I-T approach for model selection and we will get into that. There is a fairly accessible paper on I-T approaches here.

Think about your workflow:

Sure, there is more to it than this but we’ve already covered a lot of the nuance. It’s up to you to find a system that works for you to make sure you methodically complete the analysis.

Simple Refresher

library(jtools)
## Warning: package 'jtools' was built under R version 4.3.2
library(sjPlot)
## Warning: package 'sjPlot' was built under R version 4.3.2
## Registered S3 methods overwritten by 'broom':
##   method            from  
##   tidy.glht         jtools
##   tidy.summary.glht jtools
## Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
trees
##    Girth Height Volume
## 1    8.3     70   10.3
## 2    8.6     65   10.3
## 3    8.8     63   10.2
## 4   10.5     72   16.4
## 5   10.7     81   18.8
## 6   10.8     83   19.7
## 7   11.0     66   15.6
## 8   11.0     75   18.2
## 9   11.1     80   22.6
## 10  11.2     75   19.9
## 11  11.3     79   24.2
## 12  11.4     76   21.0
## 13  11.4     76   21.4
## 14  11.7     69   21.3
## 15  12.0     75   19.1
## 16  12.9     74   22.2
## 17  12.9     85   33.8
## 18  13.3     86   27.4
## 19  13.7     71   25.7
## 20  13.8     64   24.9
## 21  14.0     78   34.5
## 22  14.2     80   31.7
## 23  14.5     74   36.3
## 24  16.0     72   38.3
## 25  16.3     77   42.6
## 26  17.3     81   55.4
## 27  17.5     82   55.7
## 28  17.9     80   58.3
## 29  18.0     80   51.5
## 30  18.0     80   51.0
## 31  20.6     87   77.0
dim(trees)
## [1] 31  3
str(trees)
## 'data.frame':    31 obs. of  3 variables:
##  $ Girth : num  8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
##  $ Height: num  70 65 63 72 81 83 66 75 80 75 ...
##  $ Volume: num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
summary(trees)
##      Girth           Height       Volume     
##  Min.   : 8.30   Min.   :63   Min.   :10.20  
##  1st Qu.:11.05   1st Qu.:72   1st Qu.:19.40  
##  Median :12.90   Median :76   Median :24.20  
##  Mean   :13.25   Mean   :76   Mean   :30.17  
##  3rd Qu.:15.25   3rd Qu.:80   3rd Qu.:37.30  
##  Max.   :20.60   Max.   :87   Max.   :77.00
plot(trees)

hist(trees$Volume)

# maybe log-normal; not very normal
hist(trees$Girth)

# normal-ish
hist(trees$Height)

# Normal
plot(trees$Girth, trees$Volume)

# pretty linear (+)

Create a simple linear regression of volume as a function of girt:

m1 <- lm(Volume ~ Girth, data = trees)
summary(m1)
## 
## Call:
## lm(formula = Volume ~ Girth, data = trees)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.065 -3.107  0.152  3.495  9.587 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
## Girth         5.0659     0.2474   20.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.252 on 29 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9331 
## F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16
anova(m1)
## Analysis of Variance Table
## 
## Response: Volume
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## Girth      1 7581.8  7581.8  419.36 < 2.2e-16 ***
## Residuals 29  524.3    18.1                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Girth is a significant predictor of volume. The slope of the model is 5.066 and is significantly different from 0 (t = 20.48, p < 0.001). The high \(R^2\) value (0.935) shows that the majority of the variance in volume is explained by girth (\(F\)(1,29) = 419.36, p < 0.001).

par(mfrow = c(2,2))
plot(m1)

The residuals are drawn down a bit in the middle - not equally scattered. There is some variation in the QQ plot around the tails, but our data set is small. The standardized residuals look pretty good. There does appear to be one point that has high leverage - a high predictor value. Let’s check point 31.

trees[21:31,] #choosing ten observations to compare with point 31
##    Girth Height Volume
## 21  14.0     78   34.5
## 22  14.2     80   31.7
## 23  14.5     74   36.3
## 24  16.0     72   38.3
## 25  16.3     77   42.6
## 26  17.3     81   55.4
## 27  17.5     82   55.7
## 28  17.9     80   58.3
## 29  18.0     80   51.5
## 30  18.0     80   51.0
## 31  20.6     87   77.0

Seems a bit high.

mean(trees$Volume, na.rm = TRUE)
## [1] 30.17097
summary(trees$Volume)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.20   19.40   24.20   30.17   37.30   77.00

The 3rd quartile value is 37.3 and the max is 77. Observation 31 was a giant.

We’ve investigated: it seems like we just have a big tree. Let’s proceed for now.

Back to the lm:

m1 <- lm(Volume~Girth, data = trees)
summary(m1)
## 
## Call:
## lm(formula = Volume ~ Girth, data = trees)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.065 -3.107  0.152  3.495  9.587 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
## Girth         5.0659     0.2474   20.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.252 on 29 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9331 
## F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16
anova(m1)
## Analysis of Variance Table
## 
## Response: Volume
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## Girth      1 7581.8  7581.8  419.36 < 2.2e-16 ***
## Residuals 29  524.3    18.1                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We have done our due diligence and followed the model through. Now we make some inference and interpret our findings:

Tree volume is a function of girth. Our linear model (Volume=-36.94 + 5.07Girth), showed that for each addition inch in diameter [put “trees” in the help search and it will give you all the info on the data set], tree volume of timber increases ~5 cubic feet (Slope=5.066, t=20.48, p<0.001). Tree girth explains significant variation in tree volume (\(R^2\)=0.93) and tree girth is a good predictor of volume (F=419.4, p<0.001). Thus, black cherry tree board feet can be determined by girth (DBH).

Okay, that was a simple example. Let’s bump it up with an additional variable:

m2 <- lm(Volume ~ Girth + Height, data = trees)
summary(m2)
## 
## Call:
## lm(formula = Volume ~ Girth + Height, data = trees)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.4065 -2.6493 -0.2876  2.2003  8.4847 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -57.9877     8.6382  -6.713 2.75e-07 ***
## Girth         4.7082     0.2643  17.816  < 2e-16 ***
## Height        0.3393     0.1302   2.607   0.0145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.882 on 28 degrees of freedom
## Multiple R-squared:  0.948,  Adjusted R-squared:  0.9442 
## F-statistic:   255 on 2 and 28 DF,  p-value: < 2.2e-16

With both girth and height in the model, both appear to have a significant impact on volume. The slope for girth is 4.708 and is significantly different from 0 (t = 17.816, p < 0.001). The slope for height is 2.607 and is also significantly different from 0 (t = 2.607, p < 0.0145). The model explains a great deal of the variance in volume and is well fit (\(R^2\) = 0.944, \(F\)(2,28) = 255, p < 0.001).

Now we can drill down into the 2 explanatory vars. a bit more using anova(). This is different from aov(), which is used for a true ANOVA analysis. With anova() we are looking at the amount of variation explained by each variable related to the unexplained residual error. The anova() function is almost like a post-hoc test to determine which of the variables in the model contribute to the overall model fit. There is some info here:. If you go to the help and input “anova” you’ll see that for this function analysis of variance (or deviance) tables are computed for one or more fitted model objects.

anova(m2)
## Analysis of Variance Table
## 
## Response: Volume
##           Df Sum Sq Mean Sq  F value  Pr(>F)    
## Girth      1 7581.8  7581.8 503.1503 < 2e-16 ***
## Height     1  102.4   102.4   6.7943 0.01449 *  
## Residuals 28  421.9    15.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both girth and height are significant predictors. That is, a model with these variables is better than one without. And the MSE is very low.

par(mfrow=c(2,2))
plot(m2)

These diagnostic plots look okay - a bit wonky in spots, but nothing systematic, especially given our small 31 observations.

There is one concern here - are girth and height collinear?

cor.test(trees$Girth, trees$Height)
## 
##  Pearson's product-moment correlation
## 
## data:  trees$Girth and trees$Height
## t = 3.2722, df = 29, p-value = 0.002758
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2021327 0.7378538
## sample estimates:
##       cor 
## 0.5192801

Girth and height are roughly positively correlated, but with an r of 0.519, it is a moderate enough correlation. We can include the interaction with a *.

Interaction

What about a potential interaction between height and girth? We can evaluate that. First, let’s define an interaction: it means the slope of one continuous variable related to the response variable changes as the values of a second continuous predictor variable changes. Darn. You can probably see the problem here.

m3 <- lm(Volume ~ Girth * Height, data = trees)
summary(m3)
## 
## Call:
## lm(formula = Volume ~ Girth * Height, data = trees)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5821 -1.0673  0.3026  1.5641  4.6649 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  69.39632   23.83575   2.911  0.00713 ** 
## Girth        -5.85585    1.92134  -3.048  0.00511 ** 
## Height       -1.29708    0.30984  -4.186  0.00027 ***
## Girth:Height  0.13465    0.02438   5.524 7.48e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.709 on 27 degrees of freedom
## Multiple R-squared:  0.9756, Adjusted R-squared:  0.9728 
## F-statistic: 359.3 on 3 and 27 DF,  p-value: < 2.2e-16
anova(m3)
## Analysis of Variance Table
## 
## Response: Volume
##              Df Sum Sq Mean Sq  F value    Pr(>F)    
## Girth         1 7581.8  7581.8 1033.469 < 2.2e-16 ***
## Height        1  102.4   102.4   13.956 0.0008867 ***
## Girth:Height  1  223.8   223.8   30.512 7.484e-06 ***
## Residuals    27  198.1     7.3                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It seems there is an interaction effect: girth and height are correlated. As girth increases, height is also increasing but at a different rate.

plot(trees$Girth, trees$Height)

ggplot(trees, aes(x = Girth, y = Height)) +
  geom_point(aes(col=Volume))+
  theme_bw()

Neither of these plots really shows us this interaction very well.

library(sjPlot)
plot_model(m3, type="int")

Here you can nicely see that at different values of height, the slopes are different. But these are only two potential values of height that are selected, and height is continuous. So we need to carefully consider the values we want to plot.

summary(trees$Height) #63-87 (plotted), mean = 76
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      63      72      76      76      80      87

So we can try a different method, which uses the mean and sd:

plot_model(m3, type ="int", mdrt.values = "meansd")

This gives us a range of values from the mean +/- 1 sd; we can see that the slope for Volume~Girth is steepest when height is greatest. This makes intuitive sense (more board feet come from bigger trees in height and girth).

Back to the models. We aren’t sure which is the best model. Model 3 has the highest \(R^2\) value but it also has a pesky interaction term. Here we can interpret that, but in some cases, it may be hard to interpret the interaction so thought needs to be given to its inclusion and outcomes.

So, is m3 the best? Maybe. It also has the highest number of parameters (3 + intercept = 4).

Model Selection

So here is where we desire some unbiased (or less biased) way of selecting the best model. We can compare models - if they use the same exact input data - with Akaike’s Information Criterion. By now, you know something about AIC, including its equation:

\(AIC = 2k - 2ln(\hat{L})\)

Which is 2 times the number of parameters (k) minus 2 times the negative log likelihood (\(\hat{L}\)).

Which model would you prefer using AIC? The implementation is simple.

AIC(m1, m2, m3)
##    df      AIC
## m1  3 181.6447
## m2  4 176.9100
## m3  5 155.4692

Recall that AIC is a model selection tool based on “badness of fit” where lower AIC values are preferred. In general, models with \(\Delta\)AIC < 3 are similarly supported (one would not be preferred over the other). Here m3 has the lowest AIC at 155.5 and it is well below the others. So, we can say this is the best fitting model.

Being king of dipshits doesn’t mean you’re smart: there will always be a winner with AIC, but it still could be a bad model.

Where to go from here.

We may be able to do better. We know the response variable is lognormal-ish. Let’s fit model 3, but with ln(Volume).

m4 <- lm(log(Volume) ~ Girth * Height, data = trees)
summary (m4)
## 
## Call:
## lm(formula = log(Volume) ~ Girth * Height, data = trees)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.14154 -0.05912 -0.01248  0.05700  0.14502 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.1485424  0.7425484  -2.893  0.00745 ** 
## Girth         0.3319765  0.0598548   5.546 7.05e-06 ***
## Height        0.0453027  0.0096524   4.693 6.94e-05 ***
## Girth:Height -0.0023796  0.0007594  -3.133  0.00413 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08438 on 27 degrees of freedom
## Multiple R-squared:  0.9769, Adjusted R-squared:  0.9743 
## F-statistic:   380 on 3 and 27 DF,  p-value: < 2.2e-16

The \(R^2\) is higher for this model, showing that it is a better fit to the data than the former model.

par(mfrow=c(2,2))
plot(m4)

What about the QQ plot and the residuals? The leverage is still a bit off due to some influential points, but the other plots look good.

Should we check the AIC?

AIC(m4)
## [1] -59.59979

Why is this so low? This should be a sign that something is off

With AIC, you can only compare models with the SAME input data. When we log transformed the variable, we fundamentally changed the data. Bad, very bad. Don’t do this! See below for how different the data are:

summary(trees$Volume)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.20   19.40   24.20   30.17   37.30   77.00
summary(log(trees$Volume))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.322   2.965   3.186   3.273   3.619   4.344

Another approach: Stepwise Regression.

Stepwise regression. We are going to use another built in dataset, mtcars. We will build a linear model of MPG ~ displacement, horsepower, weight, and cylinders, and determine which terms should stay and which should go. We start with the fully saturated model (the full model) as opposed to the null model which would be the mean of a y (a horizontal line).

mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
dim(mtcars)
## [1] 32 11

There are 11 variables with 32 data points. Gear, carb, vs, cyl, and am are categorical variables, while mpg, disp, hp, drat, wt, and qsec are continuous.

We start with the full model.

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
c1 <- lm(mpg ~ cyl + disp + hp + wt, data = mtcars)
summary(c1)
## 
## Call:
## lm(formula = mpg ~ cyl + disp + hp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0562 -1.4636 -0.4281  1.2854  5.8269 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 40.82854    2.75747  14.807 1.76e-14 ***
## cyl         -1.29332    0.65588  -1.972 0.058947 .  
## disp         0.01160    0.01173   0.989 0.331386    
## hp          -0.02054    0.01215  -1.691 0.102379    
## wt          -3.85390    1.01547  -3.795 0.000759 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.513 on 27 degrees of freedom
## Multiple R-squared:  0.8486, Adjusted R-squared:  0.8262 
## F-statistic: 37.84 on 4 and 27 DF,  p-value: 1.061e-10
anova(c1)
## Analysis of Variance Table
## 
## Response: mpg
##           Df Sum Sq Mean Sq  F value    Pr(>F)    
## cyl        1 817.71  817.71 129.5335 8.251e-12 ***
## disp       1  37.59   37.59   5.9552 0.0215174 *  
## hp         1   9.37    9.37   1.4844 0.2336201    
## wt         1  90.92   90.92  14.4034 0.0007589 ***
## Residuals 27 170.44    6.31                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Displacement has the least significant test of slope, but the ANOVA suggests hp is actually not contributing. We’ll use the F test to sequentially drop terms. In stepwise regression, you sequentially delete terms that aren’t significant until you arrive at the most parsimonious model - the one with the most significance that uses the fewest terms.

Now do that (in reality, you would evaluate each model for assumptions - in the interest of time, just fit them).

c2 <- lm(mpg ~ cyl + disp + wt, data = mtcars)
summary(c2)
## 
## Call:
## lm(formula = mpg ~ cyl + disp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4035 -1.4028 -0.4955  1.3387  6.0722 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 41.107678   2.842426  14.462 1.62e-14 ***
## cyl         -1.784944   0.607110  -2.940  0.00651 ** 
## disp         0.007473   0.011845   0.631  0.53322    
## wt          -3.635677   1.040138  -3.495  0.00160 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.595 on 28 degrees of freedom
## Multiple R-squared:  0.8326, Adjusted R-squared:  0.8147 
## F-statistic: 46.42 on 3 and 28 DF,  p-value: 5.399e-11

Dropped hp, as it had the least significant impact on the fit of the model (\(F\)~(1,27) = 1.485, p < 0.233) and on the response variable (t = -1.69, p < 0.10).