library(tidyverse)
## -- Attaching packages ----------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.0
## v tidyr 1.1.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts -------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
data <- mtcars
The confound variables are the variables related to both predictors and response. When the model consists of confound variables, it is essential to identify them and evaluate them during the model selection.
The dataset is mtcars which is coming from the base package. There are two models built to verify there is any confounding variables. Model 1 is made with gross horsepower and Enginee, model 2 is made with one extra variable: weight(1000 lbs).
summary(data)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
data %>%
ggplot(mapping = aes(x=wt,y = mpg,color = vs)) +
geom_point(alpha=0.3) +
geom_smooth(method = 'lm')
## `geom_smooth()` using formula 'y ~ x'
model1 <- lm(mpg~hp+vs, data = data)
model2 <- lm(mpg~hp + wt + vs, data = data)
Percentage_Change = (model1$coefficients[2] - model2$coefficients[2])/model1$coefficients[2]*100
Percentage_Change
## hp
## 53.383
From the analysis above, we can see that the model 2 with the variable of weight, the coefficient is -3.78, which can be interpreted to mean that every 1 unit increase in weight, it expect 3.78 unit decrease in miles per gallon. Also, we can monitor the changes in coefficients in both models, the percentage is 53.383. It can indicate that the association between mpg, weight, engine and gross horsepower.