# I am performing multiple linear regression to study the relationship between pH
# and three chemical elements: Sodium (Na), Magnesium (Mg), and Calcium (Ca). In this
# updated analysis, I also visualize the regression plane in a three-dimensional setting.

# In this analysis, I am interpreting a 3D regression plot that visualizes the relationship between pH, sodium (Na) concentration, and magnesium (Mg) concentration. This plot helps me explore how the two 
# independent variables, sodium and magnesium, collectively influence the pH levels.

# As I examine the plot, I see a regression plane that cuts through the data points, which are represented 
# by blue dots. This plane represents my model's predicted pH values based on the combination of sodium 
# and magnesium concentrations. I notice that the plane has an upward slope, which suggests that both 
# sodium and magnesium concentrations positively contribute to increasing pH.

# The blue data points scattered around the regression plane tell me about the actual observed values. 
# I observe that many points lie close to the regression plane, indicating that my model captures the 
# relationship between these variables well. However, I also see some points that deviate from the plane, 
# reminding me of the residuals—differences between the observed and predicted values. These deviations 
# highlight the random variability in the data, which is expected in real-world datasets.

# As I study the dimensions of the plot, I find that sodium concentrations range from 20 to 80 mg/L, 
# while magnesium concentrations range from 0 to 60 mg/L. The pH values span from approximately 7.5 to 11. 
# This broad range gives me confidence that my model is based on diverse data and is not overly constrained 
# to specific conditions.

# I also notice the gridlines on the regression plane, which help me interpret the interaction effects 
# between sodium and magnesium concentrations. For example, I see that when sodium concentration is high, 
# even a moderate increase in magnesium concentration causes the pH to rise significantly. This indicates 
# a possible synergistic effect between sodium and magnesium in influencing pH.

# Reflecting on this analysis, I feel confident that the model provides meaningful insights into how sodium 
# and magnesium interact to affect pH. The upward slope of the regression plane aligns with my expectation 
# that these ions play a role in increasing pH levels. I believe this visualization strengthens my 
# understanding of the system and offers a robust foundation for further statistical analysis. 
# Additionally, the combination of 3D visualization and regression modeling allows me to better assess 
# the predictive power and limitations of my model.


# Load necessary library for 3D visualization
library(scatterplot3d)

# Simulate a dataset representing chemical concentrations and pH
set.seed(42)  # Setting a seed for reproducibility
n <- 100  # Number of observations

# Generate random concentrations for each element
sodium <- rnorm(n, mean = 50, sd = 10)  # Sodium concentration (mg/L)
magnesium <- rnorm(n, mean = 30, sd = 8)  # Magnesium concentration (mg/L)
calcium <- rnorm(n, mean = 20, sd = 5)  # Calcium concentration (mg/L)

# Define the true relationship: pH = β0 + β1 * Na + β2 * Mg + β3 * Ca + error
error <- rnorm(n, mean = 0, sd = 0.5)  # Random error term
pH <- 7.0 + 0.02 * sodium + 0.05 * magnesium - 0.01 * calcium + error  # Response variable (pH)

# Combine the data into a data frame
chemistry_data <- data.frame(pH, sodium, magnesium, calcium)

# Fit a multiple linear regression model
mlr_fit <- lm(pH ~ sodium + magnesium + calcium, data = chemistry_data)

# Display the summary of the regression model
summary(mlr_fit)
## 
## Call:
## lm(formula = pH ~ sodium + magnesium + calcium, data = chemistry_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.89720 -0.29335 -0.05192  0.30941  1.16400 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.918034   0.341691  20.246  < 2e-16 ***
## sodium       0.022888   0.004328   5.288 7.77e-07 ***
## magnesium    0.050552   0.006181   8.178 1.19e-12 ***
## calcium     -0.013161   0.008882  -1.482    0.142    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4434 on 96 degrees of freedom
## Multiple R-squared:  0.514,  Adjusted R-squared:  0.4988 
## F-statistic: 33.84 on 3 and 96 DF,  p-value: 5.194e-15
# Extract coefficients and interpret them
coefficients <- coef(mlr_fit)
cat("Intercept (β0):", coefficients[1], "\n")
## Intercept (β0): 6.918034
cat("Sodium Coefficient (β1):", coefficients["sodium"], "\n")
## Sodium Coefficient (β1): 0.0228877
cat("Magnesium Coefficient (β2):", coefficients["magnesium"], "\n")
## Magnesium Coefficient (β2): 0.05055183
cat("Calcium Coefficient (β3):", coefficients["calcium"], "\n")
## Calcium Coefficient (β3): -0.01316109
# Create a 3D scatterplot for Sodium, Magnesium, and pH
scatterplot3d(
  chemistry_data$sodium, chemistry_data$magnesium, chemistry_data$pH,
  main = "3D Regression Plane: Sodium, Magnesium, and pH",
  xlab = "Sodium (Na) Concentration (mg/L)",
  ylab = "Magnesium (Mg) Concentration (mg/L)",
  zlab = "pH",
  pch = 16, color = "blue", grid = TRUE, angle = 45
) -> s3d

# Add the regression plane
plane_intercept <- coefficients[1]
plane_sodium <- coefficients["sodium"]
plane_magnesium <- coefficients["magnesium"]

# Generate points for the regression plane
x_vals <- seq(min(chemistry_data$sodium), max(chemistry_data$sodium), length.out = 10)
y_vals <- seq(min(chemistry_data$magnesium), max(chemistry_data$magnesium), length.out = 10)
grid_vals <- expand.grid(x_vals, y_vals)
z_vals <- plane_intercept + plane_sodium * grid_vals[, 1] + plane_magnesium * grid_vals[, 2]

# Add the plane to the plot
s3d$plane3d(plane_intercept, plane_sodium, plane_magnesium, draw_polygon = TRUE, color = "lightblue", alpha = 0.5)
## Warning in segments(x, z1, x + y.max * yx.f, z2 + yz.f * y.max, lty = ltya, :
## "color" is not a graphical parameter
## Warning in segments(x, z1, x + y.max * yx.f, z2 + yz.f * y.max, lty = ltya, :
## "alpha" is not a graphical parameter
## Warning in segments(x.min + y * yx.f, z1 + y * yz.f, x.max + y * yx.f, z2 + :
## "color" is not a graphical parameter
## Warning in segments(x.min + y * yx.f, z1 + y * yz.f, x.max + y * yx.f, z2 + :
## "alpha" is not a graphical parameter

# Evaluate diagnostic metrics for the model fit
cat("Residual Standard Error (RSE):", summary(mlr_fit)$sigma, "\n")
## Residual Standard Error (RSE): 0.4433626
cat("R-squared (R^2):", summary(mlr_fit)$r.squared, "\n")
## R-squared (R^2): 0.5139794
cat("Adjusted R-squared:", summary(mlr_fit)$adj.r.squared, "\n")
## Adjusted R-squared: 0.4987913
# 2D Visualization for the individual predictors
par(mfrow = c(1, 3), mar = c(4, 4, 2, 1))  # Set up a grid layout

# Plot 1: Sodium vs pH
plot(chemistry_data$sodium, chemistry_data$pH, 
     main = "Sodium vs pH", 
     xlab = "Sodium (Na) Concentration (mg/L)", 
     ylab = "pH", 
     col = "skyblue", pch = 16)
abline(lm(pH ~ sodium, data = chemistry_data), col = "darkblue", lwd = 2)

# Plot 2: Magnesium vs pH
plot(chemistry_data$magnesium, chemistry_data$pH, 
     main = "Magnesium vs pH", 
     xlab = "Magnesium (Mg) Concentration (mg/L)", 
     ylab = "",  # No y-axis label for consistency
     col = "lightgreen", pch = 16)
abline(lm(pH ~ magnesium, data = chemistry_data), col = "darkgreen", lwd = 2)

# Plot 3: Calcium vs pH
plot(chemistry_data$calcium, chemistry_data$pH, 
     main = "Calcium vs pH", 
     xlab = "Calcium (Ca) Concentration (mg/L)", 
     ylab = "", 
     col = "lightcoral", pch = 16)
abline(lm(pH ~ calcium, data = chemistry_data), col = "darkred", lwd = 2)

# Reset graphical parameters
par(mfrow = c(1, 1))

# Interpretation of the 3D plane
# In this three-dimensional setting, the regression plane illustrates the relationship between two predictors (Sodium and Magnesium) 
# and the response variable (pH). The plane minimizes the sum of squared vertical distances (residuals) between the observed points
# and the plane, providing the best fit for the data in 3D space.