Confounding variables (also known as lurking variables) are essentially factors that affect both your dependent and independent variables. Failing to account for these confounders could lead to incorrect conclusions when doing analysis for your dataset.
To illustrate a basic example, we look at data concerned with a commodity that certainly had its share of consumption growth during the quarantine period: Wine.
For this exercise we’re using the wine dataset from https://www.kaggle.com/rajyellow46/wine-quality which contains information on the Portuguese “Vinho Verde” wine, including observations on physiochemical properties as well as each wine’s taste quality.
By plotting ‘wine quality’ with ‘residual sugar’, we find out that as sweetness level goes up, the quality of the wine actually diminishes.
ggplot(wine) +
aes(x = `residual sugar`, y = quality) +
geom_smooth(method='lm', size = 1.5) +
scale_color_hue() +
theme_minimal()
#> `geom_smooth()` using formula 'y ~ x'
#> Warning: Removed 2 rows containing non-finite values (stat_smooth).Before we readily assume that wine drinkers would always prefer liquor that’s completely devoid of sweetness, we first have to consider if there are confounding variables that would affect our analysis.
An obvious one that’s available in our dataset is of course the type of wine.
If we produce the same plot but split by the type of wine, we discover that our premature conclusion is incorrect. It turns out that perception of quality improves with sweetness for red wine, while the inverse is true only for white wine.
ggplot(wine) +
aes(x = `residual sugar`, y = quality, colour = type) +
geom_smooth(method='lm', size = 1) +
scale_color_hue() +
theme_minimal()
#> `geom_smooth()` using formula 'y ~ x'
#> Warning: Removed 2 rows containing non-finite values (stat_smooth).You can develop this further by extending the analysis to consider the following variables, which could likely be confounders in this particular example:
demographic data - different consumers perceive wine quality differently. Men might prefer a different type of wine versus women. Older generations might appreciate other blends versus millennials, etc.
wine brands - certain expectations come with the brand of wine that people drink. For some brands, a sweeter taste may be more acceptable, or a higher level of alcohol content may be expected.