library(tidyverse)
## -- Attaching packages ----------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.0
## v tidyr   1.1.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts -------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
data <- mtcars

Introduction

The confound variables are the variables related to both predictors and response. When the model consists of confound variables, it is essential to identify them and evaluate them during the model selection.

Model

The dataset is mtcars which is coming from the base package. There are two models built to verify there is any confounding variables. Model 1 is made with gross horsepower and Enginee, model 2 is made with one extra variable: weight(1000 lbs).

summary(data)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
data %>%
  ggplot(mapping = aes(x=wt,y = mpg,color = vs)) + 
  geom_point(alpha=0.3) +
  geom_smooth(method = 'lm')
## `geom_smooth()` using formula 'y ~ x'

model1 <- lm(mpg~hp+vs, data = data)
model2 <- lm(mpg~hp + wt + vs, data = data)
Percentage_Change = (model1$coefficients[2] - model2$coefficients[2])/model1$coefficients[2]*100

Percentage_Change
##     hp 
## 53.383

Conclusion

From the analysis above, we can see that the model 2 with the variable of weight, the coefficient is -3.78, which can be interpreted to mean that every 1 unit increase in weight, it expect 3.78 unit decrease in miles per gallon. Also, we can monitor the changes in coefficients in both models, the percentage is 53.383. It can indicate that the association between mpg, weight, engine and gross horsepower.