## load library
if (!require("fastDummies")) install.packages("fastDummies")
if (!require("tidyverse")) install.packages("tidyverse")# MetapackgeDummy variables (or binary variables) are commonly used in statistical analyses and in more simple descriptive statistics. A dummy column is one which has a value of one when a categorical event occurs and a zero when it doesn’t occur. For example, if we had a dummy variable called male, 1 would indicate that the individual is male and 0 would indicate that the individual is female (or non-male to be more precise).
set.seed(1)
# Import training data
train <- read.csv('https://raw.githubusercontent.com/Vinayak234/DATA605/master/train.csv')
train <- train %>%
select(MSZoning, LotArea, SalePrice)
head(train)If the data come as predefined dummy variables, then it is rather straightforward to use these in regressions.
##
## Call:
## lm(formula = SalePrice ~ . - MSZoning, data = results)
##
## Residuals:
## Min 1Q Median 3Q Max
## -235390 -45628 -12485 26231 546762
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.756e+04 2.320e+04 2.481 0.0132 *
## LotArea 1.786e+00 1.960e-01 9.110 < 2e-16 ***
## MSZoning_FV 1.446e+05 2.484e+04 5.820 7.22e-09 ***
## MSZoning_RH 6.082e+04 2.948e+04 2.063 0.0393 *
## MSZoning_RL 1.128e+05 2.323e+04 4.856 1.33e-06 ***
## MSZoning_RM 5.736e+04 2.365e+04 2.425 0.0154 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 73120 on 1454 degrees of freedom
## Multiple R-squared: 0.1558, Adjusted R-squared: 0.1528
## F-statistic: 53.65 on 5 and 1454 DF, p-value: < 2.2e-16
As we just showed, linear regression is not strictly limited to quantitative variables. We can easily use least squares to create a regression model with qualitative explanatory variables. Futhermore, while it isn’t required that dummy variables take on the values of 1 and 0, it makes interpretation of your models much easier. We can extend linear regression with dummy variables a bit further, but let’s save that for another time.