ABC Beverage leadership has asked the Data Science team to build a predictive model of pH using historical manufacturing data. This report summarizes our findings in plain business language. A separate technical report documents the full methodology.
Why pH Matters:
pH is a fundamental measure of a beverage’s chemical balance. It affects:
| Historical records used for training | 2,571 |
| Predictor variables (process sensors) | 32 |
| Response variable | pH |
| Brand codes present | A, B, C, D |
| Batches needing pH prediction | 267 |
The chart below shows how pH differs across our four brand codes. Brand D consistently runs at a higher pH than Brands A, B, and C.
Our model identified the process variables that have the greatest influence on pH. The longer the bar, the more that variable matters.
We tested five different modeling approaches on the historical data, using a rigorous technique called 10-fold cross-validation — essentially splitting the data into 10 chunks and testing each model on data it has never seen before.
| Model | Approach (Plain Language) | CV Accuracy (RMSE) | Selected |
|---|---|---|---|
| Linear Regression | Draws a straight-line relationship between each predictor and pH | ~0.116 | |
| Elastic Net (Regularized) | Like linear regression, but automatically reduces the importance of weak predictors | ~0.112 | |
| K-Nearest Neighbors | Predicts pH based on the most similar historical batches | ~0.103 | |
| Random Forest | Builds hundreds of decision trees and averages their predictions | ~0.085 ✓ Best | YES |
| XGBoost (Boosted Trees) | Builds trees sequentially, each one correcting errors from the last | ~0.089 |
The model explains approximately 70.6% of the variation in pH across all historical batches.
We applied the final model to all 267 batches where pH was not
recorded. The predictions are included in the accompanying Excel file
(pH_Predictions.xlsx).
| Statistic | Value |
|---|---|
| Minimum Predicted pH | 8.168 |
| Average Predicted pH | 8.546 |
| Median Predicted pH | 8.530 |
| Maximum Predicted pH | 8.791 |
Brand D batches will consistently trend higher in pH.
The top controllable factors — Balling Level, Alch Rel, Carb Rel, and Pressure Vacuum are worth monitoring closely during production. .
This model can be updated as new production data comes in.
The model’s predictions carry an uncertainty of roughly ±0.085 pH units on average.
| Deliverable | File | Purpose |
|---|---|---|
| This report | NonTechnical_Report_pH.html |
Business summary for leadership |
| Technical report | Technical_Report_pH_Prediction.html |
Full methodology and model details |
| pH Predictions | pH_Predictions.xlsx |
Predicted pH for 267 evaluation batches |
```