# I am performing multiple linear regression to analyze the relationship between reaction yield
# and three different chemical additives: additive_A, additive_B, and additive_C. This approach
# helps me understand the combined effects of these additives on the reaction yield and how they
# interact in a chemical reaction.
# In this analysis, I am interpreting the results from three scatterplots that explore the relationship
# between reaction yield (%) and three different additives (A, B, and C). Each plot shows the observed
# data points, along with a blue regression line that represents the estimated linear relationship.
# Starting with the first plot, "Additive A vs Reaction Yield," I observe that the orange data points
# are scattered around the blue regression line with a noticeable upward trend. This tells me that
# as the concentration of Additive A increases, the reaction yield tends to increase as well.
# I can clearly see that the regression line captures this positive linear relationship effectively,
# suggesting that Additive A has a significant and beneficial impact on reaction yield. The tight clustering
# of points around the line further reinforces the reliability of this observation.
# In the second plot, "Additive B vs Reaction Yield," I notice a similar upward trend in the data.
# Here, the green data points indicate that increasing Additive B concentrations also corresponds to
# an improvement in reaction yield. As I study the regression line, I can infer that the relationship
# between Additive B and reaction yield is strong and positive. However, I see that the data points
# are slightly more dispersed compared to the first plot. This tells me that while Additive B is effective,
# the variability in its impact on yield is greater than that of Additive A.
# Finally, in the third plot, "Additive C vs Reaction Yield," I find a stark contrast. I notice that
# the purple data points are scattered without a clear trend, and the blue regression line is almost flat.
# This indicates to me that there is no meaningful relationship between the concentration of Additive C
# and reaction yield. As I reflect on this, I think about potential reasons—perhaps Additive C is not
# chemically relevant to the reaction, or it might have a threshold effect that isn't captured in this range
# of concentrations. Regardless, I conclude that Additive C does not significantly influence reaction yield.
# From these visualizations, I feel confident in saying that Additives A and B positively contribute to
# improving reaction yield, with Additive A showing the strongest and most consistent effect. On the other hand,
# Additive C appears to have no measurable impact on the yield, and I would deprioritize its use in this context.
# By combining these visual trends with further statistical analysis, I believe I can draw even more concrete
# conclusions about how to optimize reaction yields using these additives.
# Simulate a dataset to explore multiple linear regression concepts
set.seed(42) # Setting a seed for reproducibility
n <- 100 # Number of observations
# Generating random concentrations for each additive (predictors)
additive_A <- rnorm(n, mean = 50, sd = 10) # Concentration of additive A (in mg/L)
additive_B <- rnorm(n, mean = 30, sd = 8) # Concentration of additive B (in mg/L)
additive_C <- rnorm(n, mean = 20, sd = 5) # Concentration of additive C (in mg/L)
# Define the true relationship: yield = β0 + β1 * A + β2 * B + β3 * C + error
error <- rnorm(n, mean = 0, sd = 5) # Random error term
yield <- 40 + 0.3 * additive_A + 0.4 * additive_B + 0.1 * additive_C + error # Response variable (yield in percentage)
# Combine the data into a data frame
chemistry_data <- data.frame(yield, additive_A, additive_B, additive_C)
# Fit a multiple linear regression model
mlr_fit <- lm(yield ~ additive_A + additive_B + additive_C, data = chemistry_data)
# Display the summary of the regression model
summary(mlr_fit)
##
## Call:
## lm(formula = yield ~ additive_A + additive_B + additive_C, data = chemistry_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9720 -2.9335 -0.5192 3.0941 11.6400
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.18034 3.41691 11.467 < 2e-16 ***
## additive_A 0.32888 0.04328 7.598 1.99e-11 ***
## additive_B 0.40552 0.06181 6.560 2.71e-09 ***
## additive_C 0.06839 0.08882 0.770 0.443
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.434 on 96 degrees of freedom
## Multiple R-squared: 0.5228, Adjusted R-squared: 0.5079
## F-statistic: 35.06 on 3 and 96 DF, p-value: 2.173e-15
# Extract coefficients and interpret them
coefficients <- coef(mlr_fit)
cat("Intercept (β0):", coefficients[1], "\n")
## Intercept (β0): 39.18034
cat("Additive A Coefficient (β1):", coefficients["additive_A"], "\n")
## Additive A Coefficient (β1): 0.328877
cat("Additive B Coefficient (β2):", coefficients["additive_B"], "\n")
## Additive B Coefficient (β2): 0.4055183
cat("Additive C Coefficient (β3):", coefficients["additive_C"], "\n")
## Additive C Coefficient (β3): 0.06838914
# Visualize the relationships
# Additive A vs Yield
plot(chemistry_data$additive_A, chemistry_data$yield,
main = "Additive A vs Reaction Yield",
xlab = "Additive A Concentration (mg/L)", ylab = "Reaction Yield (%)",
col = "darkorange", pch = 16)
abline(lm(yield ~ additive_A, data = chemistry_data), col = "blue", lwd = 2)

# Additive B vs Yield
plot(chemistry_data$additive_B, chemistry_data$yield,
main = "Additive B vs Reaction Yield",
xlab = "Additive B Concentration (mg/L)", ylab = "Reaction Yield (%)",
col = "darkgreen", pch = 16)
abline(lm(yield ~ additive_B, data = chemistry_data), col = "blue", lwd = 2)

# Additive C vs Yield
plot(chemistry_data$additive_C, chemistry_data$yield,
main = "Additive C vs Reaction Yield",
xlab = "Additive C Concentration (mg/L)", ylab = "Reaction Yield (%)",
col = "purple", pch = 16)
abline(lm(yield ~ additive_C, data = chemistry_data), col = "blue", lwd = 2)

# Interpret the results
# The regression model includes three predictors: additive A, additive B, and additive C concentrations.
# Each coefficient represents the average change in the reaction yield (%) associated with an increase
# of 1 mg/L of the respective additive, while holding the other additives constant. For instance,
# if β1 = 0.3, then increasing additive A concentration by 1 mg/L is associated with an average increase
# of 0.3% in the reaction yield.
# Use diagnostic metrics for assessing model fit
cat("Residual Standard Error (RSE):", summary(mlr_fit)$sigma, "\n")
## Residual Standard Error (RSE): 4.433626
cat("R-squared (R^2):", summary(mlr_fit)$r.squared, "\n")
## R-squared (R^2): 0.5228029
cat("Adjusted R-squared:", summary(mlr_fit)$adj.r.squared, "\n")
## Adjusted R-squared: 0.5078905
# I interpret these metrics to assess the model fit:
# - RSE provides an average measure of how much the actual reaction yield deviates from the predicted values.
# - R-squared represents the proportion of variability in the reaction yield explained by the predictors.
# - Adjusted R-squared adjusts for the number of predictors in the model, making it more robust.
# Conclusion
# I conclude that multiple linear regression effectively captures the combined effects of additive A,
# additive B, and additive C on the reaction yield. By including all predictors in a single model,
# I am able to better understand how each additive influences the yield while accounting for potential
# correlations between the additives. This approach provides a more comprehensive understanding compared
# to analyzing each additive separately, ensuring I gain a clearer view of the chemical interactions.
# I combine all plots into a single view for better comparison
# Set up a 1x3 grid layout using par()
par(mfrow = c(1, 3), mar = c(4, 4, 2, 1)) # 1 row, 3 columns, adjusted margins
# Plot 1: Additive A vs Yield
plot(chemistry_data$additive_A, chemistry_data$yield,
main = "Additive A vs Reaction Yield",
xlab = "Additive A (mg/L)", ylab = "Yield (%)",
col = "darkorange", pch = 16, cex = 0.8)
abline(lm(yield ~ additive_A, data = chemistry_data), col = "blue", lwd = 2)
# Plot 2: Additive B vs Yield
plot(chemistry_data$additive_B, chemistry_data$yield,
main = "Additive B vs Reaction Yield",
xlab = "Additive B (mg/L)", ylab = "", # No y-axis label for consistency
col = "darkgreen", pch = 16, cex = 0.8)
abline(lm(yield ~ additive_B, data = chemistry_data), col = "blue", lwd = 2)
# Plot 3: Additive C vs Yield
plot(chemistry_data$additive_C, chemistry_data$yield,
main = "Additive C vs Reaction Yield",
xlab = "Additive C (mg/L)", ylab = "",
col = "purple", pch = 16, cex = 0.8)
abline(lm(yield ~ additive_C, data = chemistry_data), col = "blue", lwd = 2)

# Reset graphical parameters to default
par(mfrow = c(1, 1)) # Reset to a single plot layout