Multiple Linear Regression

# I am performing multiple linear regression to analyze the relationship between reaction yield
# and three different chemical additives: additive_A, additive_B, and additive_C. This approach 
# helps me understand the combined effects of these additives on the reaction yield and how they 
# interact in a chemical reaction.

# In this analysis, I am interpreting the results from three scatterplots that explore the    relationship 
# between reaction yield (%) and three different additives (A, B, and C). Each plot shows the observed 
# data points, along with a blue regression line that represents the estimated linear relationship.

# Starting with the first plot, "Additive A vs Reaction Yield," I observe that the orange data points 
# are scattered around the blue regression line with a noticeable upward trend. This tells me that 
# as the concentration of Additive A increases, the reaction yield tends to increase as well. 
# I can clearly see that the regression line captures this positive linear relationship effectively, 
# suggesting that Additive A has a significant and beneficial impact on reaction yield. The tight clustering 
# of points around the line further reinforces the reliability of this observation.

# In the second plot, "Additive B vs Reaction Yield," I notice a similar upward trend in the data. 
# Here, the green data points indicate that increasing Additive B concentrations also corresponds to 
# an improvement in reaction yield. As I study the regression line, I can infer that the relationship 
# between Additive B and reaction yield is strong and positive. However, I see that the data points 
# are slightly more dispersed compared to the first plot. This tells me that while Additive B is effective, 
# the variability in its impact on yield is greater than that of Additive A.

# Finally, in the third plot, "Additive C vs Reaction Yield," I find a stark contrast. I notice that 
# the purple data points are scattered without a clear trend, and the blue regression line is almost flat. 
# This indicates to me that there is no meaningful relationship between the concentration of Additive C 
# and reaction yield. As I reflect on this, I think about potential reasons—perhaps Additive C is not 
# chemically relevant to the reaction, or it might have a threshold effect that isn't captured in this range 
# of concentrations. Regardless, I conclude that Additive C does not significantly influence reaction yield.

# From these visualizations, I feel confident in saying that Additives A and B positively contribute to 
# improving reaction yield, with Additive A showing the strongest and most consistent effect. On the other hand, 
# Additive C appears to have no measurable impact on the yield, and I would deprioritize its use in this context. 
# By combining these visual trends with further statistical analysis, I believe I can draw even more concrete 
# conclusions about how to optimize reaction yields using these additives.


# Simulate a dataset to explore multiple linear regression concepts
set.seed(42)  # Setting a seed for reproducibility
n <- 100  # Number of observations

# Generating random concentrations for each additive (predictors)
additive_A <- rnorm(n, mean = 50, sd = 10)  # Concentration of additive A (in mg/L)
additive_B <- rnorm(n, mean = 30, sd = 8)  # Concentration of additive B (in mg/L)
additive_C <- rnorm(n, mean = 20, sd = 5)  # Concentration of additive C (in mg/L)

# Define the true relationship: yield = β0 + β1 * A + β2 * B + β3 * C + error
error <- rnorm(n, mean = 0, sd = 5)  # Random error term
yield <- 40 + 0.3 * additive_A + 0.4 * additive_B + 0.1 * additive_C + error  # Response variable (yield in percentage)

# Combine the data into a data frame
chemistry_data <- data.frame(yield, additive_A, additive_B, additive_C)

# Fit a multiple linear regression model
mlr_fit <- lm(yield ~ additive_A + additive_B + additive_C, data = chemistry_data)

# Display the summary of the regression model
summary(mlr_fit)

## 
## Call:
## lm(formula = yield ~ additive_A + additive_B + additive_C, data = chemistry_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9720 -2.9335 -0.5192  3.0941 11.6400 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.18034    3.41691  11.467  < 2e-16 ***
## additive_A   0.32888    0.04328   7.598 1.99e-11 ***
## additive_B   0.40552    0.06181   6.560 2.71e-09 ***
## additive_C   0.06839    0.08882   0.770    0.443    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.434 on 96 degrees of freedom
## Multiple R-squared:  0.5228, Adjusted R-squared:  0.5079 
## F-statistic: 35.06 on 3 and 96 DF,  p-value: 2.173e-15

# Extract coefficients and interpret them
coefficients <- coef(mlr_fit)
cat("Intercept (β0):", coefficients[1], "\n")

## Intercept (β0): 39.18034

cat("Additive A Coefficient (β1):", coefficients["additive_A"], "\n")

## Additive A Coefficient (β1): 0.328877

cat("Additive B Coefficient (β2):", coefficients["additive_B"], "\n")

## Additive B Coefficient (β2): 0.4055183

cat("Additive C Coefficient (β3):", coefficients["additive_C"], "\n")

## Additive C Coefficient (β3): 0.06838914

# Visualize the relationships
# Additive A vs Yield
plot(chemistry_data$additive_A, chemistry_data$yield, 
     main = "Additive A vs Reaction Yield", 
     xlab = "Additive A Concentration (mg/L)", ylab = "Reaction Yield (%)", 
     col = "darkorange", pch = 16)
abline(lm(yield ~ additive_A, data = chemistry_data), col = "blue", lwd = 2)

# Additive B vs Yield
plot(chemistry_data$additive_B, chemistry_data$yield, 
     main = "Additive B vs Reaction Yield", 
     xlab = "Additive B Concentration (mg/L)", ylab = "Reaction Yield (%)", 
     col = "darkgreen", pch = 16)
abline(lm(yield ~ additive_B, data = chemistry_data), col = "blue", lwd = 2)

# Additive C vs Yield
plot(chemistry_data$additive_C, chemistry_data$yield, 
     main = "Additive C vs Reaction Yield", 
     xlab = "Additive C Concentration (mg/L)", ylab = "Reaction Yield (%)", 
     col = "purple", pch = 16)
abline(lm(yield ~ additive_C, data = chemistry_data), col = "blue", lwd = 2)

# Interpret the results
# The regression model includes three predictors: additive A, additive B, and additive C concentrations.
# Each coefficient represents the average change in the reaction yield (%) associated with an increase
# of 1 mg/L of the respective additive, while holding the other additives constant. For instance,
# if β1 = 0.3, then increasing additive A concentration by 1 mg/L is associated with an average increase
# of 0.3% in the reaction yield.

# Use diagnostic metrics for assessing model fit
cat("Residual Standard Error (RSE):", summary(mlr_fit)$sigma, "\n")

## Residual Standard Error (RSE): 4.433626

cat("R-squared (R^2):", summary(mlr_fit)$r.squared, "\n")

## R-squared (R^2): 0.5228029

cat("Adjusted R-squared:", summary(mlr_fit)$adj.r.squared, "\n")

## Adjusted R-squared: 0.5078905

# I interpret these metrics to assess the model fit:
# - RSE provides an average measure of how much the actual reaction yield deviates from the predicted values.
# - R-squared represents the proportion of variability in the reaction yield explained by the predictors.
# - Adjusted R-squared adjusts for the number of predictors in the model, making it more robust.

# Conclusion
# I conclude that multiple linear regression effectively captures the combined effects of additive A,
# additive B, and additive C on the reaction yield. By including all predictors in a single model,
# I am able to better understand how each additive influences the yield while accounting for potential
# correlations between the additives. This approach provides a more comprehensive understanding compared
# to analyzing each additive separately, ensuring I gain a clearer view of the chemical interactions.

# I combine all plots into a single view for better comparison
# Set up a 1x3 grid layout using par()
par(mfrow = c(1, 3), mar = c(4, 4, 2, 1))  # 1 row, 3 columns, adjusted margins

# Plot 1: Additive A vs Yield
plot(chemistry_data$additive_A, chemistry_data$yield, 
     main = "Additive A vs Reaction Yield", 
     xlab = "Additive A (mg/L)", ylab = "Yield (%)", 
     col = "darkorange", pch = 16, cex = 0.8)
abline(lm(yield ~ additive_A, data = chemistry_data), col = "blue", lwd = 2)

# Plot 2: Additive B vs Yield
plot(chemistry_data$additive_B, chemistry_data$yield, 
     main = "Additive B vs Reaction Yield", 
     xlab = "Additive B (mg/L)", ylab = "",  # No y-axis label for consistency
     col = "darkgreen", pch = 16, cex = 0.8)
abline(lm(yield ~ additive_B, data = chemistry_data), col = "blue", lwd = 2)

# Plot 3: Additive C vs Yield
plot(chemistry_data$additive_C, chemistry_data$yield, 
     main = "Additive C vs Reaction Yield", 
     xlab = "Additive C (mg/L)", ylab = "", 
     col = "purple", pch = 16, cex = 0.8)
abline(lm(yield ~ additive_C, data = chemistry_data), col = "blue", lwd = 2)

# Reset graphical parameters to default
par(mfrow = c(1, 1))  # Reset to a single plot layout

Multiple Linear Regression

Avery Holloman

2024-11-08