In this blog post, we’ll explore the association between our explanatory variables and a response variable using a linear regression model. We’ll also visualize the data to gain additional insights. Before diving into the results, we’ll walk through the data preparation steps for both categorical and quantitative explanatory variables.
# Generate a random dataset
set.seed(123) # for reproducibility
n <- 100
df <- data.frame(
Category = sample(c("Category_of_Interest", "Other_Category"), n, replace = TRUE),
QuantVar = rnorm(n),
ResponseVar = rnorm(n)
)
# Recode the categorical variable
df$Category <- ifelse(df$Category == 'Category_of_Interest', 0, 1)
# Generate a frequency table for checking coding
frequency_table <- table(df$Category)
frequency_table
##
## 0 1
## 57 43
# Center the quantitative variable
df$QuantVar <- df$QuantVar - mean(df$QuantVar)
# Check centering by calculating the mean
mean_after_centering <- mean(df$QuantVar)
mean_after_centering
## [1] -1.54965e-17
Let’s visualize the data to better understand its distribution and relationships.
hist(df$QuantVar, main = "Histogram of Quantitative Variable", xlab = "QuantVar")
boxplot(df$QuantVar ~ df$Category, main = "Boxplot of QuantVar by Category", xlab = "Category", ylab = "QuantVar")
plot(df$QuantVar, df$ResponseVar, main = "Scatterplot of QuantVar vs. ResponseVar", xlab = "QuantVar", ylab = "ResponseVar")
barplot(frequency_table, main = "Bar Chart of Categorical Variable", xlab = "Category", ylab = "Frequency")
plot(density(df$QuantVar), main = "Density Plot of Quantitative Variable", xlab = "QuantVar", ylab = "Density")
cor_matrix <- cor(df[, c("QuantVar", "ResponseVar")])
heatmap(cor_matrix, main = "Correlation Heatmap", xlab = "Variable", ylab = "Variable", col = colorRampPalette(c("blue", "white", "red"))(20))
time_variable <- 1:n
plot(time_variable, df$ResponseVar, type = "l", main = "Line Chart - Time Series", xlab = "Time", ylab = "ResponseVar")
library(ggplot2)
ggplot(df, aes(x = as.factor(df$Category), y = df$QuantVar)) +
geom_violin(fill = "skyblue", drawWidth = 0.5) +
labs(title = "Violin Plot of QuantVar by Category", x = "Category", y = "QuantVar")
## Warning in geom_violin(fill = "skyblue", drawWidth = 0.5): Ignoring unknown
## parameters: `drawWidth`
## Warning: Use of `df$Category` is discouraged.
## ℹ Use `Category` instead.
## Warning: Use of `df$QuantVar` is discouraged.
## ℹ Use `QuantVar` instead.
pie(frequency_table, main = "Pie Chart of Categorical Variable", labels = c("Category 0", "Category 1"))
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
plot_ly(df, x = ~QuantVar, y = ~ResponseVar, z = ~Category, color = ~Category, type = "scatter3d", mode = "markers")
Now, let’s proceed with testing our linear regression model using the prepared data.
# Fit linear regression model
model <- lm(ResponseVar ~ Category + QuantVar, data = df)
# Display regression results
summary(model)
##
## Call:
## lm(formula = ResponseVar ~ Category + QuantVar, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4635 -0.6559 -0.1982 0.5091 3.0627
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.09711 0.12396 0.783 0.435
## Category -0.19062 0.18904 -1.008 0.316
## QuantVar -0.08419 0.09750 -0.863 0.390
##
## Residual standard error: 0.9359 on 97 degrees of freedom
## Multiple R-squared: 0.01784, Adjusted R-squared: -0.002411
## F-statistic: 0.881 on 2 and 97 DF, p-value: 0.4177
The frequency table and mean checks confirm proper coding and centering. Moving on to the linear regression analysis, the results are summarized above.
These results suggest…
Conclude your blog post by interpreting the results and discussing any significant findings or insights.
In this analysis, we explored the association between our explanatory variables and the response variable using linear regression. Visualizations provided additional insights into the data distribution and relationships. Proper data preparation ensures the accuracy of our analysis. Further investigations and considerations for the model may be needed for a more comprehensive understanding.
Thank you for reading! ```
Feel free to customize the charts and visualizations based on your preferences and data characteristics.