Introduction

ABC Beverage leadership has asked the Data Science team to build a predictive model of pH using historical manufacturing data. This report summarizes our findings in plain business language. A separate technical report documents the full methodology.

Why pH Matters:

pH is a fundamental measure of a beverage’s chemical balance. It affects:

  • Taste and quality
  • Microbial safety
  • Regulatory compliance
  • Cost

Data Overview

Historical records used for training 2,571
Predictor variables (process sensors) 32
Response variable pH
Brand codes present A, B, C, D
Batches needing pH prediction 267

pH Across Our Product Lines

The chart below shows how pH differs across our four brand codes. Brand D consistently runs at a higher pH than Brands A, B, and C.


The Key Drivers of pH

Our model identified the process variables that have the greatest influence on pH. The longer the bar, the more that variable matters.


How We Built the Model

We tested five different modeling approaches on the historical data, using a rigorous technique called 10-fold cross-validation — essentially splitting the data into 10 chunks and testing each model on data it has never seen before.

Model Approach (Plain Language) CV Accuracy (RMSE) Selected
Linear Regression Draws a straight-line relationship between each predictor and pH ~0.116
Elastic Net (Regularized) Like linear regression, but automatically reduces the importance of weak predictors ~0.112
K-Nearest Neighbors Predicts pH based on the most similar historical batches ~0.103
Random Forest Builds hundreds of decision trees and averages their predictions ~0.085 ✓ Best YES
XGBoost (Boosted Trees) Builds trees sequentially, each one correcting errors from the last ~0.089

How Accurate Is the Model?

The model explains approximately 70.6% of the variation in pH across all historical batches.


Predictions for the 267 New Batches

We applied the final model to all 267 batches where pH was not recorded. The predictions are included in the accompanying Excel file (pH_Predictions.xlsx).

Statistic Value
Minimum Predicted pH 8.168
Average Predicted pH 8.546
Median Predicted pH 8.530
Maximum Predicted pH 8.791

What This Means for Operations

  1. Brand D batches will consistently trend higher in pH.

  2. The top controllable factors — Balling Level, Alch Rel, Carb Rel, and Pressure Vacuum are worth monitoring closely during production. .

  3. This model can be updated as new production data comes in.

  4. The model’s predictions carry an uncertainty of roughly ±0.085 pH units on average.


Deliverables Provided

Deliverable File Purpose
This report NonTechnical_Report_pH.html Business summary for leadership
Technical report Technical_Report_pH_Prediction.html Full methodology and model details
pH Predictions pH_Predictions.xlsx Predicted pH for 267 evaluation batches

```