# I am performing multiple linear regression to study the relationship between pH
# and three chemical elements: Sodium (Na), Magnesium (Mg), and Calcium (Ca). In this
# updated analysis, I also visualize the regression plane in a three-dimensional setting.
# In this analysis, I am interpreting a 3D regression plot that visualizes the relationship between pH, sodium (Na) concentration, and magnesium (Mg) concentration. This plot helps me explore how the two
# independent variables, sodium and magnesium, collectively influence the pH levels.
# As I examine the plot, I see a regression plane that cuts through the data points, which are represented
# by blue dots. This plane represents my model's predicted pH values based on the combination of sodium
# and magnesium concentrations. I notice that the plane has an upward slope, which suggests that both
# sodium and magnesium concentrations positively contribute to increasing pH.
# The blue data points scattered around the regression plane tell me about the actual observed values.
# I observe that many points lie close to the regression plane, indicating that my model captures the
# relationship between these variables well. However, I also see some points that deviate from the plane,
# reminding me of the residuals—differences between the observed and predicted values. These deviations
# highlight the random variability in the data, which is expected in real-world datasets.
# As I study the dimensions of the plot, I find that sodium concentrations range from 20 to 80 mg/L,
# while magnesium concentrations range from 0 to 60 mg/L. The pH values span from approximately 7.5 to 11.
# This broad range gives me confidence that my model is based on diverse data and is not overly constrained
# to specific conditions.
# I also notice the gridlines on the regression plane, which help me interpret the interaction effects
# between sodium and magnesium concentrations. For example, I see that when sodium concentration is high,
# even a moderate increase in magnesium concentration causes the pH to rise significantly. This indicates
# a possible synergistic effect between sodium and magnesium in influencing pH.
# Reflecting on this analysis, I feel confident that the model provides meaningful insights into how sodium
# and magnesium interact to affect pH. The upward slope of the regression plane aligns with my expectation
# that these ions play a role in increasing pH levels. I believe this visualization strengthens my
# understanding of the system and offers a robust foundation for further statistical analysis.
# Additionally, the combination of 3D visualization and regression modeling allows me to better assess
# the predictive power and limitations of my model.
# Load necessary library for 3D visualization
library(scatterplot3d)
# Simulate a dataset representing chemical concentrations and pH
set.seed(42) # Setting a seed for reproducibility
n <- 100 # Number of observations
# Generate random concentrations for each element
sodium <- rnorm(n, mean = 50, sd = 10) # Sodium concentration (mg/L)
magnesium <- rnorm(n, mean = 30, sd = 8) # Magnesium concentration (mg/L)
calcium <- rnorm(n, mean = 20, sd = 5) # Calcium concentration (mg/L)
# Define the true relationship: pH = β0 + β1 * Na + β2 * Mg + β3 * Ca + error
error <- rnorm(n, mean = 0, sd = 0.5) # Random error term
pH <- 7.0 + 0.02 * sodium + 0.05 * magnesium - 0.01 * calcium + error # Response variable (pH)
# Combine the data into a data frame
chemistry_data <- data.frame(pH, sodium, magnesium, calcium)
# Fit a multiple linear regression model
mlr_fit <- lm(pH ~ sodium + magnesium + calcium, data = chemistry_data)
# Display the summary of the regression model
summary(mlr_fit)
##
## Call:
## lm(formula = pH ~ sodium + magnesium + calcium, data = chemistry_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.89720 -0.29335 -0.05192 0.30941 1.16400
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.918034 0.341691 20.246 < 2e-16 ***
## sodium 0.022888 0.004328 5.288 7.77e-07 ***
## magnesium 0.050552 0.006181 8.178 1.19e-12 ***
## calcium -0.013161 0.008882 -1.482 0.142
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4434 on 96 degrees of freedom
## Multiple R-squared: 0.514, Adjusted R-squared: 0.4988
## F-statistic: 33.84 on 3 and 96 DF, p-value: 5.194e-15
# Extract coefficients and interpret them
coefficients <- coef(mlr_fit)
cat("Intercept (β0):", coefficients[1], "\n")
## Intercept (β0): 6.918034
cat("Sodium Coefficient (β1):", coefficients["sodium"], "\n")
## Sodium Coefficient (β1): 0.0228877
cat("Magnesium Coefficient (β2):", coefficients["magnesium"], "\n")
## Magnesium Coefficient (β2): 0.05055183
cat("Calcium Coefficient (β3):", coefficients["calcium"], "\n")
## Calcium Coefficient (β3): -0.01316109
# Create a 3D scatterplot for Sodium, Magnesium, and pH
scatterplot3d(
chemistry_data$sodium, chemistry_data$magnesium, chemistry_data$pH,
main = "3D Regression Plane: Sodium, Magnesium, and pH",
xlab = "Sodium (Na) Concentration (mg/L)",
ylab = "Magnesium (Mg) Concentration (mg/L)",
zlab = "pH",
pch = 16, color = "blue", grid = TRUE, angle = 45
) -> s3d
# Add the regression plane
plane_intercept <- coefficients[1]
plane_sodium <- coefficients["sodium"]
plane_magnesium <- coefficients["magnesium"]
# Generate points for the regression plane
x_vals <- seq(min(chemistry_data$sodium), max(chemistry_data$sodium), length.out = 10)
y_vals <- seq(min(chemistry_data$magnesium), max(chemistry_data$magnesium), length.out = 10)
grid_vals <- expand.grid(x_vals, y_vals)
z_vals <- plane_intercept + plane_sodium * grid_vals[, 1] + plane_magnesium * grid_vals[, 2]
# Add the plane to the plot
s3d$plane3d(plane_intercept, plane_sodium, plane_magnesium, draw_polygon = TRUE, color = "lightblue", alpha = 0.5)
## Warning in segments(x, z1, x + y.max * yx.f, z2 + yz.f * y.max, lty = ltya, :
## "color" is not a graphical parameter
## Warning in segments(x, z1, x + y.max * yx.f, z2 + yz.f * y.max, lty = ltya, :
## "alpha" is not a graphical parameter
## Warning in segments(x.min + y * yx.f, z1 + y * yz.f, x.max + y * yx.f, z2 + :
## "color" is not a graphical parameter
## Warning in segments(x.min + y * yx.f, z1 + y * yz.f, x.max + y * yx.f, z2 + :
## "alpha" is not a graphical parameter

# Evaluate diagnostic metrics for the model fit
cat("Residual Standard Error (RSE):", summary(mlr_fit)$sigma, "\n")
## Residual Standard Error (RSE): 0.4433626
cat("R-squared (R^2):", summary(mlr_fit)$r.squared, "\n")
## R-squared (R^2): 0.5139794
cat("Adjusted R-squared:", summary(mlr_fit)$adj.r.squared, "\n")
## Adjusted R-squared: 0.4987913
# 2D Visualization for the individual predictors
par(mfrow = c(1, 3), mar = c(4, 4, 2, 1)) # Set up a grid layout
# Plot 1: Sodium vs pH
plot(chemistry_data$sodium, chemistry_data$pH,
main = "Sodium vs pH",
xlab = "Sodium (Na) Concentration (mg/L)",
ylab = "pH",
col = "skyblue", pch = 16)
abline(lm(pH ~ sodium, data = chemistry_data), col = "darkblue", lwd = 2)
# Plot 2: Magnesium vs pH
plot(chemistry_data$magnesium, chemistry_data$pH,
main = "Magnesium vs pH",
xlab = "Magnesium (Mg) Concentration (mg/L)",
ylab = "", # No y-axis label for consistency
col = "lightgreen", pch = 16)
abline(lm(pH ~ magnesium, data = chemistry_data), col = "darkgreen", lwd = 2)
# Plot 3: Calcium vs pH
plot(chemistry_data$calcium, chemistry_data$pH,
main = "Calcium vs pH",
xlab = "Calcium (Ca) Concentration (mg/L)",
ylab = "",
col = "lightcoral", pch = 16)
abline(lm(pH ~ calcium, data = chemistry_data), col = "darkred", lwd = 2)

# Reset graphical parameters
par(mfrow = c(1, 1))
# Interpretation of the 3D plane
# In this three-dimensional setting, the regression plane illustrates the relationship between two predictors (Sodium and Magnesium)
# and the response variable (pH). The plane minimizes the sum of squared vertical distances (residuals) between the observed points
# and the plane, providing the best fit for the data in 3D space.