Hide Assignment Information Instructions Project #2 (Team) Assignment
This is role playing. I am your new boss. I am in charge of production at ABC Beverage and you are a team of data scientists reporting to me. My leadership has told me that new regulations are requiring us to understand our manufacturing process, the predictive factors and be able to report to them our predictive model of PH.
Please use the historical data set I am providing. Build and report the factors in BOTH a technical and non-technical report. I like to use Word and Excel. Please provide your non-technical report in a business friendly readable document and your predictions in an Excel readable format. The technical report should show clearly the models you tested and how you selected your final approach.
Please submit both Rpubs links and .rmd files or other readable formats for technical and non-technical reports. Also submit the excel file showing the prediction of your models for pH.
Due on Dec 15, 2024 11:59 PM
The following code below loops through the list of necessary packages and checks to determine if each is installed. If the package is not found it is installed and loaded.
## Warning: package 'summarytools' was built under R version 4.3.3
## Warning: package 'caret' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'randomForest' was built under R version 4.3.3
## Warning: package 'glmnet' was built under R version 4.3.3
## Warning: package 'reshape2' was built under R version 4.3.2
## Warning in check_dep_version(): ABI version mismatch:
## lme4 was built with Matrix ABI version 1
## Current Matrix ABI version is 0
## Please re-install lme4 from source or restore original 'Matrix' package
## Warning: package 'mice' was built under R version 4.3.3
The script retrieves training (StudentData.xlsx) and test datasets (StudentEvaluation.xlsx) from GitHub, reads them into data frames, and removes the temporary files to maintain a clean workspace.
Performs analysis of the Student_Train dataset and display a summary of the structure of the dataset, helping to understand the data’s composition.
## There are 2571 observations/cases in the Student Training dataset.
## There are 33 columns/elements in the Student Training dataset
## There are 1 Categorical Variables in the Student Training dataset.
## There are 32 Predictor Variables in the Student Training dataset.
Display both a structural overview and a statistical summary of the training dataset.
## Rows: 2,571
## Columns: 33
## $ `Brand Code` <chr> "B", "A", "B", "A", "A", "A", "A", "B", "B", "B", …
## $ `Carb Volume` <dbl> 5.340000, 5.426667, 5.286667, 5.440000, 5.486667, …
## $ `Fill Ounces` <dbl> 23.96667, 24.00667, 24.06000, 24.00667, 24.31333, …
## $ `PC Volume` <dbl> 0.2633333, 0.2386667, 0.2633333, 0.2933333, 0.1113…
## $ `Carb Pressure` <dbl> 68.2, 68.4, 70.8, 63.0, 67.2, 66.6, 64.2, 67.6, 64…
## $ `Carb Temp` <dbl> 141.2, 139.6, 144.8, 132.6, 136.8, 138.4, 136.8, 1…
## $ PSC <dbl> 0.104, 0.124, 0.090, NA, 0.026, 0.090, 0.128, 0.15…
## $ `PSC Fill` <dbl> 0.26, 0.22, 0.34, 0.42, 0.16, 0.24, 0.40, 0.34, 0.…
## $ `PSC CO2` <dbl> 0.04, 0.04, 0.16, 0.04, 0.12, 0.04, 0.04, 0.04, 0.…
## $ `Mnf Flow` <dbl> -100, -100, -100, -100, -100, -100, -100, -100, -1…
## $ `Carb Pressure1` <dbl> 118.8, 121.6, 120.2, 115.2, 118.4, 119.6, 122.2, 1…
## $ `Fill Pressure` <dbl> 46.0, 46.0, 46.0, 46.4, 45.8, 45.6, 51.8, 46.8, 46…
## $ `Hyd Pressure1` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure2` <dbl> NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure3` <dbl> NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure4` <dbl> 118, 106, 82, 92, 92, 116, 124, 132, 90, 108, 94, …
## $ `Filler Level` <dbl> 121.2, 118.6, 120.0, 117.8, 118.6, 120.2, 123.4, 1…
## $ `Filler Speed` <dbl> 4002, 3986, 4020, 4012, 4010, 4014, NA, 1004, 4014…
## $ Temperature <dbl> 66.0, 67.6, 67.0, 65.6, 65.6, 66.2, 65.8, 65.2, 65…
## $ `Usage cont` <dbl> 16.18, 19.90, 17.76, 17.42, 17.68, 23.82, 20.74, 1…
## $ `Carb Flow` <dbl> 2932, 3144, 2914, 3062, 3054, 2948, 30, 684, 2902,…
## $ Density <dbl> 0.88, 0.92, 1.58, 1.54, 1.54, 1.52, 0.84, 0.84, 0.…
## $ MFR <dbl> 725.0, 726.8, 735.0, 730.6, 722.8, 738.8, NA, NA, …
## $ Balling <dbl> 1.398, 1.498, 3.142, 3.042, 3.042, 2.992, 1.298, 1…
## $ `Pressure Vacuum` <dbl> -4.0, -4.0, -3.8, -4.4, -4.4, -4.4, -4.4, -4.4, -4…
## $ PH <dbl> 8.36, 8.26, 8.94, 8.24, 8.26, 8.32, 8.40, 8.38, 8.…
## $ `Oxygen Filler` <dbl> 0.022, 0.026, 0.024, 0.030, 0.030, 0.024, 0.066, 0…
## $ `Bowl Setpoint` <dbl> 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, …
## $ `Pressure Setpoint` <dbl> 46.4, 46.8, 46.6, 46.0, 46.0, 46.0, 46.0, 46.0, 46…
## $ `Air Pressurer` <dbl> 142.6, 143.0, 142.0, 146.2, 146.2, 146.6, 146.2, 1…
## $ `Alch Rel` <dbl> 6.58, 6.56, 7.66, 7.14, 7.14, 7.16, 6.54, 6.52, 6.…
## $ `Carb Rel` <dbl> 5.32, 5.30, 5.84, 5.42, 5.44, 5.44, 5.38, 5.34, 5.…
## $ `Balling Lvl` <dbl> 1.48, 1.56, 3.28, 3.04, 3.04, 3.02, 1.44, 1.44, 1.…
| Brand Code | Carb Volume | Fill Ounces | PC Volume | Carb Pressure | Carb Temp | PSC | PSC Fill | PSC CO2 | Mnf Flow | Carb Pressure1 | Fill Pressure | Hyd Pressure1 | Hyd Pressure2 | Hyd Pressure3 | Hyd Pressure4 | Filler Level | Filler Speed | Temperature | Usage cont | Carb Flow | Density | MFR | Balling | Pressure Vacuum | PH | Oxygen Filler | Bowl Setpoint | Pressure Setpoint | Air Pressurer | Alch Rel | Carb Rel | Balling Lvl | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Length:2571 | Min. :5.040 | Min. :23.63 | Min. :0.07933 | Min. :57.00 | Min. :128.6 | Min. :0.00200 | Min. :0.0000 | Min. :0.00000 | Min. :-100.20 | Min. :105.6 | Min. :34.60 | Min. :-0.80 | Min. : 0.00 | Min. :-1.20 | Min. : 52.00 | Min. : 55.8 | Min. : 998 | Min. :63.60 | Min. :12.08 | Min. : 26 | Min. :0.240 | Min. : 31.4 | Min. :-0.170 | Min. :-6.600 | Min. :7.880 | Min. :0.00240 | Min. : 70.0 | Min. :44.00 | Min. :140.8 | Min. :5.280 | Min. :4.960 | Min. :0.00 | |
| Class :character | 1st Qu.:5.293 | 1st Qu.:23.92 | 1st Qu.:0.23917 | 1st Qu.:65.60 | 1st Qu.:138.4 | 1st Qu.:0.04800 | 1st Qu.:0.1000 | 1st Qu.:0.02000 | 1st Qu.:-100.00 | 1st Qu.:119.0 | 1st Qu.:46.00 | 1st Qu.: 0.00 | 1st Qu.: 0.00 | 1st Qu.: 0.00 | 1st Qu.: 86.00 | 1st Qu.: 98.3 | 1st Qu.:3888 | 1st Qu.:65.20 | 1st Qu.:18.36 | 1st Qu.:1144 | 1st Qu.:0.900 | 1st Qu.:706.3 | 1st Qu.: 1.496 | 1st Qu.:-5.600 | 1st Qu.:8.440 | 1st Qu.:0.02200 | 1st Qu.:100.0 | 1st Qu.:46.00 | 1st Qu.:142.2 | 1st Qu.:6.540 | 1st Qu.:5.340 | 1st Qu.:1.38 | |
| Mode :character | Median :5.347 | Median :23.97 | Median :0.27133 | Median :68.20 | Median :140.8 | Median :0.07600 | Median :0.1800 | Median :0.04000 | Median : 65.20 | Median :123.2 | Median :46.40 | Median :11.40 | Median :28.60 | Median :27.60 | Median : 96.00 | Median :118.4 | Median :3982 | Median :65.60 | Median :21.79 | Median :3028 | Median :0.980 | Median :724.0 | Median : 1.648 | Median :-5.400 | Median :8.540 | Median :0.03340 | Median :120.0 | Median :46.00 | Median :142.6 | Median :6.560 | Median :5.400 | Median :1.48 | |
| NA | Mean :5.370 | Mean :23.97 | Mean :0.27712 | Mean :68.19 | Mean :141.1 | Mean :0.08457 | Mean :0.1954 | Mean :0.05641 | Mean : 24.57 | Mean :122.6 | Mean :47.92 | Mean :12.44 | Mean :20.96 | Mean :20.46 | Mean : 96.29 | Mean :109.3 | Mean :3687 | Mean :65.97 | Mean :20.99 | Mean :2468 | Mean :1.174 | Mean :704.0 | Mean : 2.198 | Mean :-5.216 | Mean :8.546 | Mean :0.04684 | Mean :109.3 | Mean :47.62 | Mean :142.8 | Mean :6.897 | Mean :5.437 | Mean :2.05 | |
| NA | 3rd Qu.:5.453 | 3rd Qu.:24.03 | 3rd Qu.:0.31200 | 3rd Qu.:70.60 | 3rd Qu.:143.8 | 3rd Qu.:0.11200 | 3rd Qu.:0.2600 | 3rd Qu.:0.08000 | 3rd Qu.: 140.80 | 3rd Qu.:125.4 | 3rd Qu.:50.00 | 3rd Qu.:20.20 | 3rd Qu.:34.60 | 3rd Qu.:33.40 | 3rd Qu.:102.00 | 3rd Qu.:120.0 | 3rd Qu.:3998 | 3rd Qu.:66.40 | 3rd Qu.:23.75 | 3rd Qu.:3186 | 3rd Qu.:1.620 | 3rd Qu.:731.0 | 3rd Qu.: 3.292 | 3rd Qu.:-5.000 | 3rd Qu.:8.680 | 3rd Qu.:0.06000 | 3rd Qu.:120.0 | 3rd Qu.:50.00 | 3rd Qu.:143.0 | 3rd Qu.:7.240 | 3rd Qu.:5.540 | 3rd Qu.:3.14 | |
| NA | Max. :5.700 | Max. :24.32 | Max. :0.47800 | Max. :79.40 | Max. :154.0 | Max. :0.27000 | Max. :0.6200 | Max. :0.24000 | Max. : 229.40 | Max. :140.2 | Max. :60.40 | Max. :58.00 | Max. :59.40 | Max. :50.00 | Max. :142.00 | Max. :161.2 | Max. :4030 | Max. :76.20 | Max. :25.90 | Max. :5104 | Max. :1.920 | Max. :868.6 | Max. : 4.012 | Max. :-3.600 | Max. :9.360 | Max. :0.40000 | Max. :140.0 | Max. :52.00 | Max. :148.2 | Max. :8.620 | Max. :6.060 | Max. :3.66 | |
| NA | NA’s :10 | NA’s :38 | NA’s :39 | NA’s :27 | NA’s :26 | NA’s :33 | NA’s :23 | NA’s :39 | NA’s :2 | NA’s :32 | NA’s :22 | NA’s :11 | NA’s :15 | NA’s :15 | NA’s :30 | NA’s :20 | NA’s :57 | NA’s :14 | NA’s :5 | NA’s :2 | NA’s :1 | NA’s :212 | NA’s :1 | NA | NA’s :4 | NA’s :12 | NA’s :2 | NA’s :12 | NA | NA’s :9 | NA’s :10 | NA’s :1 |
Display the first 10 observations/cases in the Student Training dataset.
| Brand Code | Carb Volume | Fill Ounces | PC Volume | Carb Pressure | Carb Temp | PSC | PSC Fill | PSC CO2 | Mnf Flow | Carb Pressure1 | Fill Pressure | Hyd Pressure1 | Hyd Pressure2 | Hyd Pressure3 | Hyd Pressure4 | Filler Level | Filler Speed | Temperature | Usage cont | Carb Flow | Density | MFR | Balling | Pressure Vacuum | PH | Oxygen Filler | Bowl Setpoint | Pressure Setpoint | Air Pressurer | Alch Rel | Carb Rel | Balling Lvl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| B | 5.340000 | 23.96667 | 0.2633333 | 68.2 | 141.2 | 0.104 | 0.26 | 0.04 | -100 | 118.8 | 46.0 | 0 | NA | NA | 118 | 121.2 | 4002 | 66.0 | 16.18 | 2932 | 0.88 | 725.0 | 1.398 | -4.0 | 8.36 | 0.022 | 120 | 46.4 | 142.6 | 6.58 | 5.32 | 1.48 |
| A | 5.426667 | 24.00667 | 0.2386667 | 68.4 | 139.6 | 0.124 | 0.22 | 0.04 | -100 | 121.6 | 46.0 | 0 | NA | NA | 106 | 118.6 | 3986 | 67.6 | 19.90 | 3144 | 0.92 | 726.8 | 1.498 | -4.0 | 8.26 | 0.026 | 120 | 46.8 | 143.0 | 6.56 | 5.30 | 1.56 |
| B | 5.286667 | 24.06000 | 0.2633333 | 70.8 | 144.8 | 0.090 | 0.34 | 0.16 | -100 | 120.2 | 46.0 | 0 | NA | NA | 82 | 120.0 | 4020 | 67.0 | 17.76 | 2914 | 1.58 | 735.0 | 3.142 | -3.8 | 8.94 | 0.024 | 120 | 46.6 | 142.0 | 7.66 | 5.84 | 3.28 |
| A | 5.440000 | 24.00667 | 0.2933333 | 63.0 | 132.6 | NA | 0.42 | 0.04 | -100 | 115.2 | 46.4 | 0 | 0 | 0 | 92 | 117.8 | 4012 | 65.6 | 17.42 | 3062 | 1.54 | 730.6 | 3.042 | -4.4 | 8.24 | 0.030 | 120 | 46.0 | 146.2 | 7.14 | 5.42 | 3.04 |
| A | 5.486667 | 24.31333 | 0.1113333 | 67.2 | 136.8 | 0.026 | 0.16 | 0.12 | -100 | 118.4 | 45.8 | 0 | 0 | 0 | 92 | 118.6 | 4010 | 65.6 | 17.68 | 3054 | 1.54 | 722.8 | 3.042 | -4.4 | 8.26 | 0.030 | 120 | 46.0 | 146.2 | 7.14 | 5.44 | 3.04 |
| A | 5.380000 | 23.92667 | 0.2693333 | 66.6 | 138.4 | 0.090 | 0.24 | 0.04 | -100 | 119.6 | 45.6 | 0 | 0 | 0 | 116 | 120.2 | 4014 | 66.2 | 23.82 | 2948 | 1.52 | 738.8 | 2.992 | -4.4 | 8.32 | 0.024 | 120 | 46.0 | 146.6 | 7.16 | 5.44 | 3.02 |
| A | 5.313333 | 23.88667 | 0.2680000 | 64.2 | 136.8 | 0.128 | 0.40 | 0.04 | -100 | 122.2 | 51.8 | 0 | 0 | 0 | 124 | 123.4 | NA | 65.8 | 20.74 | 30 | 0.84 | NA | 1.298 | -4.4 | 8.40 | 0.066 | 120 | 46.0 | 146.2 | 6.54 | 5.38 | 1.44 |
| B | 5.320000 | 24.17333 | 0.2206667 | 67.6 | 141.4 | 0.154 | 0.34 | 0.04 | -100 | 124.2 | 46.8 | 0 | 0 | 0 | 132 | 118.6 | 1004 | 65.2 | 18.96 | 684 | 0.84 | NA | 1.298 | -4.4 | 8.38 | 0.046 | 120 | 46.0 | 146.4 | 6.52 | 5.34 | 1.44 |
| B | 5.246667 | 23.98000 | 0.2626667 | 64.2 | 140.2 | 0.132 | 0.12 | 0.14 | -100 | 120.8 | 46.0 | 0 | 0 | 0 | 90 | 120.2 | 4014 | 65.4 | 18.40 | 2902 | 0.90 | 740.4 | 1.446 | -4.4 | 8.38 | 0.064 | 120 | 46.0 | 147.2 | 6.52 | 5.34 | 1.44 |
| B | 5.266667 | 24.00667 | 0.2313333 | 72.0 | 147.4 | 0.014 | 0.24 | 0.06 | -100 | 119.8 | 45.2 | 0 | 0 | 0 | 108 | 120.8 | 4028 | 66.6 | 13.50 | 3038 | 0.90 | 692.4 | 1.448 | -4.4 | 8.50 | 0.022 | 120 | 46.0 | 146.2 | 6.54 | 5.34 | 1.38 |
The following functions help clean and preprocess the dataset by handling missing values, outliers, feature scaling, and constant columns.
The following provides a comprehensive analysis of missing data by both displaying a table of missing values per column and generating a bar plot to visualize the distribution of missing values across variables. The table allows for detailed inspection, while the plot gives a quick overview of the missing data across the dataset.
The MFR has the highest percentage of missing data (8.2%), while Density has the least (0%). Many of the variables have relatively low missing data (around 0.1% to 0.5%). While some variables, particularly MFR, Brand Code, Filler Speed, and others, have missing values in the range of 1% to 8%, which will be imputed in the data cleaning process.
## Missing values per column:
| variable | n | percent |
|---|---|---|
| MFR | 212 | 8.2% |
| Brand Code | 120 | 4.7% |
| Filler Speed | 57 | 2.2% |
| PC Volume | 39 | 1.5% |
| PSC CO2 | 39 | 1.5% |
| Fill Ounces | 38 | 1.5% |
| PSC | 33 | 1.3% |
| Carb Pressure1 | 32 | 1.2% |
| Hyd Pressure4 | 30 | 1.2% |
| Carb Pressure | 27 | 1.1% |
| Carb Temp | 26 | 1% |
| PSC Fill | 23 | 0.9% |
| Fill Pressure | 22 | 0.9% |
| Filler Level | 20 | 0.8% |
| Hyd Pressure2 | 15 | 0.6% |
| Hyd Pressure3 | 15 | 0.6% |
| Temperature | 14 | 0.5% |
| Oxygen Filler | 12 | 0.5% |
| Pressure Setpoint | 12 | 0.5% |
| Hyd Pressure1 | 11 | 0.4% |
| Carb Rel | 10 | 0.4% |
| Carb Volume | 10 | 0.4% |
| Alch Rel | 9 | 0.4% |
| Usage cont | 5 | 0.2% |
| PH | 4 | 0.2% |
| Bowl Setpoint | 2 | 0.1% |
| Carb Flow | 2 | 0.1% |
| Mnf Flow | 2 | 0.1% |
| Balling | 1 | 0% |
| Balling Lvl | 1 | 0% |
| Density | 1 | 0% |
The following calculates and visualizes the correlations between numeric
variables in the Student_Train dataset using a heatmap. It highlights
the strength and direction of the relationships between variables, with
negative correlations shown in blue, positive correlations in red, and
zero correlations in gray. The heatmap provides an intuitive way to
identify strong or weak correlations in the dataset.
The following boxplots represents the distribution of values for a different variable in the Student_Train dataset (excluding the first column). Each plot is labeled with the variable name, and the boxplots are horizontally oriented with a custom color scheme. This visualization helps to identify the spread, central tendency, and potential outliers for each variable.
Variables that are highly Right-Skewed are: PSC, PSC Fill, PSC CO2,Hyd Pressure1-3, Oxygen Filler, Air Pressurer. A suggested Transformation would be to apply a logarithmic or a square root transformation to reduce skewness.
The clean_data function is a comprehensive data preprocessing pipeline that imputes missing values, removes outliers, scales features, and removes constant columns from the dataset. It is applied to the Student_Train dataset to produce a cleaned version, Student_Cleaned, ready for further analysis or modeling. The function uses various techniques like Predictive Mean Matching for imputation, Z-score normalization for scaling, and IQR for outlier detection.
## Warning: package 'janitor' was built under R version 4.3.3
Checks to see if any missing values remain in the cleaned dataset.
## Missing values per column after cleaning data:
| variable | n | percent |
|---|---|---|
| brand_code | 119 | 4.7% |
After cleaning the data we need to validate the dataset to check for remaining missing values as well as visualize the distributions to ensure scaling and outlier handling.The imputation process is visualized with a plot, and the density plot for the variable Carb_Volume compares the distribution of imputed and observed values.
Transformations For the right skewed variables we will use square root transformation
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:randomForest':
##
## combine
## The following object is masked from 'package:dplyr':
##
## combine
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Model Building
Linear Models: Split the Data 80-20 in training and testing data
Simple Linear Regression
## RMSE Rsquared MAE
## 0.7812662 0.3468832 0.6011851
Robust Linear Model
## RMSE Rsquared MAE
## 0.8129427 0.2915512 0.6439884
Partial Linear Squares
## RMSE Rsquared MAE
## 0.7812578 0.3470570 0.6038144
Enet Model
## Loading required package: lars
## Loaded lars 1.3
## RMSE Rsquared MAE
## 0.7784749 0.3502429 0.6047150
Ridge Model
## RMSE Rsquared MAE
## 0.7795994 0.3490617 0.6028884
LARS
## RMSE Rsquared MAE
## 0.7784348 0.3504221 0.6036631
Non-Linear Models:
Split the Data 80-20 in training and testing data
KNN
## RMSE Rsquared MAE
## 0.7943361 0.3678533 0.6059853
SVM
## RMSE Rsquared MAE
## 0.7384741 0.4564360 0.5392765
MARS
## Loading required package: earth
## Warning: package 'earth' was built under R version 4.3.3
## Loading required package: Formula
## Loading required package: plotmo
## Warning: package 'plotmo' was built under R version 4.3.3
## Loading required package: plotrix
## Warning: package 'plotrix' was built under R version 4.3.2
## RMSE Rsquared MAE
## 0.7628534 0.4136125 0.5601422
Random Forest
## RMSE Rsquared MAE
## 0.6101989 0.6445020 0.4355393
Model Evaluation
## RMSE Rsquared MAE
## KNN 0.7943361 0.3678533 0.6059853
## SVM 0.7384741 0.4564360 0.5392765
## MARS 0.7628534 0.4136125 0.5601422
## RandomForest 0.6101989 0.6445020 0.4355393
## SLR 0.9201597 0.1763704 0.7047426
## RLM 0.9087564 0.1771359 0.7090285
## PLS 0.9200291 0.1770999 0.7069576
## ENET 0.9092491 0.1847612 0.6967332
## Ridge 0.9154253 0.1805659 0.7018082
## LARS 0.9111737 0.1835543 0.6975926
The model with the best opimal resampling and test set performance of all the models is the Random Forest model. It has the lowest RMSE (a lower RMSE predicts a better performing model) of all the models at: 0.61. It also has the highest Rsquared (a higher Rsquared indicates a better fit) of all the models at 0.64.
For our optimal Random Forest model, Mnf Flow enjoys the greatest impact
(54.42), above others like Carb Pressure1 (27.02) and Carb Volume
(21.26), all of which contribute meaningfully to predicting PH. Minimal
contributors such as PSC Fill and PSC CO2 can be diesregarded in
follow-up analysis, as we should focus optimization strategies on top
predictors. We next employed our Random Forest model to form a
prediction.
## Warning: package 'DT' was built under R version 4.3.3
## Warning: package 'DataExplorer' was built under R version 4.3.3
The output would indicate concentration of the predictions around around
0, largely symmetric in disttribution and concentration toward the mean,
all of which suggests that our model is accurate in predicting PH values
within a fairly narrow range.