Introduction

In this blog post, we’ll explore the association between our explanatory variables and a response variable using a linear regression model. We’ll also visualize the data to gain additional insights. Before diving into the results, we’ll walk through the data preparation steps for both categorical and quantitative explanatory variables.

Data Preparation

Categorical Explanatory Variable

# Generate a random dataset
set.seed(123)  # for reproducibility
n <- 100
df <- data.frame(
  Category = sample(c("Category_of_Interest", "Other_Category"), n, replace = TRUE),
  QuantVar = rnorm(n),
  ResponseVar = rnorm(n)
)

# Recode the categorical variable
df$Category <- ifelse(df$Category == 'Category_of_Interest', 0, 1)

# Generate a frequency table for checking coding
frequency_table <- table(df$Category)
frequency_table

## 
##  0  1 
## 57 43

Quantitative Explanatory Variable

# Center the quantitative variable
df$QuantVar <- df$QuantVar - mean(df$QuantVar)

# Check centering by calculating the mean
mean_after_centering <- mean(df$QuantVar)
mean_after_centering

## [1] -1.54965e-17

Visualizations

Let’s visualize the data to better understand its distribution and relationships.

1. Histogram of Quantitative Variable

hist(df$QuantVar, main = "Histogram of Quantitative Variable", xlab = "QuantVar")

2. Boxplot of Quantitative Variable by Category

boxplot(df$QuantVar ~ df$Category, main = "Boxplot of QuantVar by Category", xlab = "Category", ylab = "QuantVar")

3. Scatterplot of Quantitative Variable vs. Response Variable

plot(df$QuantVar, df$ResponseVar, main = "Scatterplot of QuantVar vs. ResponseVar", xlab = "QuantVar", ylab = "ResponseVar")

4. Bar Chart of Categorical Variable

barplot(frequency_table, main = "Bar Chart of Categorical Variable", xlab = "Category", ylab = "Frequency")

5. Density Plot of Quantitative Variable

plot(density(df$QuantVar), main = "Density Plot of Quantitative Variable", xlab = "QuantVar", ylab = "Density")

6. Correlation Heatmap

cor_matrix <- cor(df[, c("QuantVar", "ResponseVar")])
heatmap(cor_matrix, main = "Correlation Heatmap", xlab = "Variable", ylab = "Variable", col = colorRampPalette(c("blue", "white", "red"))(20))

7. Line Chart - Time Series (Assuming a time variable exists)

time_variable <- 1:n
plot(time_variable, df$ResponseVar, type = "l", main = "Line Chart - Time Series", xlab = "Time", ylab = "ResponseVar")

8. Violin Plot

library(ggplot2)
ggplot(df, aes(x = as.factor(df$Category), y = df$QuantVar)) +
  geom_violin(fill = "skyblue", drawWidth = 0.5) +
  labs(title = "Violin Plot of QuantVar by Category", x = "Category", y = "QuantVar")

## Warning in geom_violin(fill = "skyblue", drawWidth = 0.5): Ignoring unknown
## parameters: `drawWidth`

## Warning: Use of `df$Category` is discouraged.
## ℹ Use `Category` instead.

## Warning: Use of `df$QuantVar` is discouraged.
## ℹ Use `QuantVar` instead.

9. Pie Chart of Categorical Variable

pie(frequency_table, main = "Pie Chart of Categorical Variable", labels = c("Category 0", "Category 1"))

10. 3D Scatterplot

library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

plot_ly(df, x = ~QuantVar, y = ~ResponseVar, z = ~Category, color = ~Category, type = "scatter3d", mode = "markers")

Linear Regression Analysis

Now, let’s proceed with testing our linear regression model using the prepared data.

# Fit linear regression model
model <- lm(ResponseVar ~ Category + QuantVar, data = df)

# Display regression results
summary(model)

## 
## Call:
## lm(formula = ResponseVar ~ Category + QuantVar, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4635 -0.6559 -0.1982  0.5091  3.0627 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.09711    0.12396   0.783    0.435
## Category    -0.19062    0.18904  -1.008    0.316
## QuantVar    -0.08419    0.09750  -0.863    0.390
## 
## Residual standard error: 0.9359 on 97 degrees of freedom
## Multiple R-squared:  0.01784,    Adjusted R-squared:  -0.002411 
## F-statistic: 0.881 on 2 and 97 DF,  p-value: 0.4177

Results and Discussion

The frequency table and mean checks confirm proper coding and centering. Moving on to the linear regression analysis, the results are summarized above.

Category Coefficient: Insert Category Coefficient Here
QuantVar Coefficient: Insert QuantVar Coefficient Here
P-values: Insert P-values Here

These results suggest…

Conclude your blog post by interpreting the results and discussing any significant findings or insights.

Conclusion

In this analysis, we explored the association between our explanatory variables and the response variable using linear regression. Visualizations provided additional insights into the data distribution and relationships. Proper data preparation ensures the accuracy of our analysis. Further investigations and considerations for the model may be needed for a more comprehensive understanding.

Thank you for reading! ```

Feel free to customize the charts and visualizations based on your preferences and data characteristics.

Linear Regression Analysis with Visualizations

Muhammad Farhaad

02/19/2024