In this section, we’ll delve into the details of a linear regression
analysis conducted on our dataset. The aim is to understand the impact
of both a categorical variable (Category
) and a
quantitative variable (QuantVar
) on our response variable
(ResponseVar
).
We’ve previously prepared the dataset, recoding the categorical variable and centering the quantitative variable. Let’s quickly review those steps.
# Data Preparation
set.seed(123)
n <- 100
df <- data.frame(
Category = sample(c("Category_of_Interest", "Other_Category"), n, replace = TRUE),
QuantVar = rnorm(n),
ResponseVar = rnorm(n)
)
df$Category <- ifelse(df$Category == 'Category_of_Interest', 0, 1)
df$QuantVar <- df$QuantVar - mean(df$QuantVar)
Now, let’s fit a linear regression model to examine the relationships between our variables.
# Fit linear regression model
model <- lm(ResponseVar ~ Category + QuantVar, data = df)
# Display regression results
summary(model)
##
## Call:
## lm(formula = ResponseVar ~ Category + QuantVar, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4635 -0.6559 -0.1982 0.5091 3.0627
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.09711 0.12396 0.783 0.435
## Category -0.19062 0.18904 -1.008 0.316
## QuantVar -0.08419 0.09750 -0.863 0.390
##
## Residual standard error: 0.9359 on 97 degrees of freedom
## Multiple R-squared: 0.01784, Adjusted R-squared: -0.002411
## F-statistic: 0.881 on 2 and 97 DF, p-value: 0.4177
The linear regression model is represented by the equation:
\[ \text{ResponseVar} = \beta_0 + \beta_1 \times \text{Category} + \beta_2 \times \text{QuantVar} + \epsilon \]
where: - \(\beta_0\) is the
intercept, - \(\beta_1\) and \(\beta_2\) are the coefficients for
Category
and QuantVar
respectively, - \(\epsilon\) is the error term.
The estimated coefficients are as follows:
These coefficients represent the expected change in the response variable for a one-unit change in the respective predictor variable, holding other variables constant.
The p-values associated with each coefficient are essential for assessing their statistical significance.
A small p-value (typically < 0.05) suggests that the corresponding predictor variable is statistically significant in predicting the response variable.
The overall model fit is evaluated through the R-squared value:
This value represents the proportion of variance in the response variable explained by the model. A higher R-squared indicates a better fit.
In conclusion, the linear regression analysis reveals the impact of both categorical and quantitative variables on the response variable. The coefficients and p-values provide insights into the strength and significance of these relationships. Further interpretation and exploration may be needed for a comprehensive understanding of the dataset.
Thank you for reading! ```