Instruction:
This is role playing. I am your new boss. I am in charge of production at ABC Beverage and you are a team of data scientists reporting to me. My leadership has told me that new regulations are requiring us to understand our manufacturing process, the predictive factors and be able to report to them our predictive model of PH.
Please use the historical data set I am providing. Build and report the factors in BOTH a technical and non-technical report. I like to use Word and Excel. Please provide your non-technical report in a business friendly readable document and your predictions in an Excel readable format. The technical report should show clearly the models you tested and how you selected your final approach.
Please submit both Rpubs links and .rmd files or other readable formats for technical and non-technical reports. Also submit the excel file showing the prediction of your models for pH.
#load data
library(readxl)
train <- read_excel("TrainData.xlsx")
#review
glimpse(train)
## Rows: 2,571
## Columns: 33
## $ `Brand Code` <chr> "B", "A", "B", "A", "A", "A", "A", "B", "B", "B", …
## $ `Carb Volume` <dbl> 5.340000, 5.426667, 5.286667, 5.440000, 5.486667, …
## $ `Fill Ounces` <dbl> 23.96667, 24.00667, 24.06000, 24.00667, 24.31333, …
## $ `PC Volume` <dbl> 0.2633333, 0.2386667, 0.2633333, 0.2933333, 0.1113…
## $ `Carb Pressure` <dbl> 68.2, 68.4, 70.8, 63.0, 67.2, 66.6, 64.2, 67.6, 64…
## $ `Carb Temp` <dbl> 141.2, 139.6, 144.8, 132.6, 136.8, 138.4, 136.8, 1…
## $ PSC <dbl> 0.104, 0.124, 0.090, NA, 0.026, 0.090, 0.128, 0.15…
## $ `PSC Fill` <dbl> 0.26, 0.22, 0.34, 0.42, 0.16, 0.24, 0.40, 0.34, 0.…
## $ `PSC CO2` <dbl> 0.04, 0.04, 0.16, 0.04, 0.12, 0.04, 0.04, 0.04, 0.…
## $ `Mnf Flow` <dbl> -100, -100, -100, -100, -100, -100, -100, -100, -1…
## $ `Carb Pressure1` <dbl> 118.8, 121.6, 120.2, 115.2, 118.4, 119.6, 122.2, 1…
## $ `Fill Pressure` <dbl> 46.0, 46.0, 46.0, 46.4, 45.8, 45.6, 51.8, 46.8, 46…
## $ `Hyd Pressure1` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure2` <dbl> NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure3` <dbl> NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure4` <dbl> 118, 106, 82, 92, 92, 116, 124, 132, 90, 108, 94, …
## $ `Filler Level` <dbl> 121.2, 118.6, 120.0, 117.8, 118.6, 120.2, 123.4, 1…
## $ `Filler Speed` <dbl> 4002, 3986, 4020, 4012, 4010, 4014, NA, 1004, 4014…
## $ Temperature <dbl> 66.0, 67.6, 67.0, 65.6, 65.6, 66.2, 65.8, 65.2, 65…
## $ `Usage cont` <dbl> 16.18, 19.90, 17.76, 17.42, 17.68, 23.82, 20.74, 1…
## $ `Carb Flow` <dbl> 2932, 3144, 2914, 3062, 3054, 2948, 30, 684, 2902,…
## $ Density <dbl> 0.88, 0.92, 1.58, 1.54, 1.54, 1.52, 0.84, 0.84, 0.…
## $ MFR <dbl> 725.0, 726.8, 735.0, 730.6, 722.8, 738.8, NA, NA, …
## $ Balling <dbl> 1.398, 1.498, 3.142, 3.042, 3.042, 2.992, 1.298, 1…
## $ `Pressure Vacuum` <dbl> -4.0, -4.0, -3.8, -4.4, -4.4, -4.4, -4.4, -4.4, -4…
## $ PH <dbl> 8.36, 8.26, 8.94, 8.24, 8.26, 8.32, 8.40, 8.38, 8.…
## $ `Oxygen Filler` <dbl> 0.022, 0.026, 0.024, 0.030, 0.030, 0.024, 0.066, 0…
## $ `Bowl Setpoint` <dbl> 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, …
## $ `Pressure Setpoint` <dbl> 46.4, 46.8, 46.6, 46.0, 46.0, 46.0, 46.0, 46.0, 46…
## $ `Air Pressurer` <dbl> 142.6, 143.0, 142.0, 146.2, 146.2, 146.6, 146.2, 1…
## $ `Alch Rel` <dbl> 6.58, 6.56, 7.66, 7.14, 7.14, 7.16, 6.54, 6.52, 6.…
## $ `Carb Rel` <dbl> 5.32, 5.30, 5.84, 5.42, 5.44, 5.44, 5.38, 5.34, 5.…
## $ `Balling Lvl` <dbl> 1.48, 1.56, 3.28, 3.04, 3.04, 3.02, 1.44, 1.44, 1.…
summary(train)
## Brand Code Carb Volume Fill Ounces PC Volume
## Length:2571 Min. :5.040 Min. :23.63 Min. :0.07933
## Class :character 1st Qu.:5.293 1st Qu.:23.92 1st Qu.:0.23917
## Mode :character Median :5.347 Median :23.97 Median :0.27133
## Mean :5.370 Mean :23.97 Mean :0.27712
## 3rd Qu.:5.453 3rd Qu.:24.03 3rd Qu.:0.31200
## Max. :5.700 Max. :24.32 Max. :0.47800
## NA's :10 NA's :38 NA's :39
## Carb Pressure Carb Temp PSC PSC Fill
## Min. :57.00 Min. :128.6 Min. :0.00200 Min. :0.0000
## 1st Qu.:65.60 1st Qu.:138.4 1st Qu.:0.04800 1st Qu.:0.1000
## Median :68.20 Median :140.8 Median :0.07600 Median :0.1800
## Mean :68.19 Mean :141.1 Mean :0.08457 Mean :0.1954
## 3rd Qu.:70.60 3rd Qu.:143.8 3rd Qu.:0.11200 3rd Qu.:0.2600
## Max. :79.40 Max. :154.0 Max. :0.27000 Max. :0.6200
## NA's :27 NA's :26 NA's :33 NA's :23
## PSC CO2 Mnf Flow Carb Pressure1 Fill Pressure
## Min. :0.00000 Min. :-100.20 Min. :105.6 Min. :34.60
## 1st Qu.:0.02000 1st Qu.:-100.00 1st Qu.:119.0 1st Qu.:46.00
## Median :0.04000 Median : 65.20 Median :123.2 Median :46.40
## Mean :0.05641 Mean : 24.57 Mean :122.6 Mean :47.92
## 3rd Qu.:0.08000 3rd Qu.: 140.80 3rd Qu.:125.4 3rd Qu.:50.00
## Max. :0.24000 Max. : 229.40 Max. :140.2 Max. :60.40
## NA's :39 NA's :2 NA's :32 NA's :22
## Hyd Pressure1 Hyd Pressure2 Hyd Pressure3 Hyd Pressure4
## Min. :-0.80 Min. : 0.00 Min. :-1.20 Min. : 52.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 86.00
## Median :11.40 Median :28.60 Median :27.60 Median : 96.00
## Mean :12.44 Mean :20.96 Mean :20.46 Mean : 96.29
## 3rd Qu.:20.20 3rd Qu.:34.60 3rd Qu.:33.40 3rd Qu.:102.00
## Max. :58.00 Max. :59.40 Max. :50.00 Max. :142.00
## NA's :11 NA's :15 NA's :15 NA's :30
## Filler Level Filler Speed Temperature Usage cont Carb Flow
## Min. : 55.8 Min. : 998 Min. :63.60 Min. :12.08 Min. : 26
## 1st Qu.: 98.3 1st Qu.:3888 1st Qu.:65.20 1st Qu.:18.36 1st Qu.:1144
## Median :118.4 Median :3982 Median :65.60 Median :21.79 Median :3028
## Mean :109.3 Mean :3687 Mean :65.97 Mean :20.99 Mean :2468
## 3rd Qu.:120.0 3rd Qu.:3998 3rd Qu.:66.40 3rd Qu.:23.75 3rd Qu.:3186
## Max. :161.2 Max. :4030 Max. :76.20 Max. :25.90 Max. :5104
## NA's :20 NA's :57 NA's :14 NA's :5 NA's :2
## Density MFR Balling Pressure Vacuum
## Min. :0.240 Min. : 31.4 Min. :-0.170 Min. :-6.600
## 1st Qu.:0.900 1st Qu.:706.3 1st Qu.: 1.496 1st Qu.:-5.600
## Median :0.980 Median :724.0 Median : 1.648 Median :-5.400
## Mean :1.174 Mean :704.0 Mean : 2.198 Mean :-5.216
## 3rd Qu.:1.620 3rd Qu.:731.0 3rd Qu.: 3.292 3rd Qu.:-5.000
## Max. :1.920 Max. :868.6 Max. : 4.012 Max. :-3.600
## NA's :1 NA's :212 NA's :1
## PH Oxygen Filler Bowl Setpoint Pressure Setpoint
## Min. :7.880 Min. :0.00240 Min. : 70.0 Min. :44.00
## 1st Qu.:8.440 1st Qu.:0.02200 1st Qu.:100.0 1st Qu.:46.00
## Median :8.540 Median :0.03340 Median :120.0 Median :46.00
## Mean :8.546 Mean :0.04684 Mean :109.3 Mean :47.62
## 3rd Qu.:8.680 3rd Qu.:0.06000 3rd Qu.:120.0 3rd Qu.:50.00
## Max. :9.360 Max. :0.40000 Max. :140.0 Max. :52.00
## NA's :4 NA's :12 NA's :2 NA's :12
## Air Pressurer Alch Rel Carb Rel Balling Lvl
## Min. :140.8 Min. :5.280 Min. :4.960 Min. :0.00
## 1st Qu.:142.2 1st Qu.:6.540 1st Qu.:5.340 1st Qu.:1.38
## Median :142.6 Median :6.560 Median :5.400 Median :1.48
## Mean :142.8 Mean :6.897 Mean :5.437 Mean :2.05
## 3rd Qu.:143.0 3rd Qu.:7.240 3rd Qu.:5.540 3rd Qu.:3.14
## Max. :148.2 Max. :8.620 Max. :6.060 Max. :3.66
## NA's :9 NA's :10 NA's :1
dim(train)
## [1] 2571 33
plot_missing(train)
plot_histogram(train)
plot_density(train)
plot_boxplot(
data = train,
by = "PH")
## Warning: Removed 302 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 372 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 46 rows containing non-finite values (`stat_boxplot()`).
library("tidyr")
train_data = train %>% drop_na()
glimpse(train_data)
## Rows: 2,038
## Columns: 33
## $ `Brand Code` <chr> "A", "A", "B", "B", "B", "B", "B", "B", "B", "C", …
## $ `Carb Volume` <dbl> 5.486667, 5.380000, 5.246667, 5.266667, 5.320000, …
## $ `Fill Ounces` <dbl> 24.31333, 23.92667, 23.98000, 24.00667, 23.92000, …
## $ `PC Volume` <dbl> 0.1113333, 0.2693333, 0.2626667, 0.2313333, 0.2586…
## $ `Carb Pressure` <dbl> 67.2, 66.6, 64.2, 72.0, 66.2, 61.6, 71.6, 72.6, 68…
## $ `Carb Temp` <dbl> 136.8, 138.4, 140.2, 147.4, 139.4, 132.8, 147.8, 1…
## $ PSC <dbl> 0.026, 0.090, 0.132, 0.014, 0.078, 0.110, 0.096, 0…
## $ `PSC Fill` <dbl> 0.16, 0.24, 0.12, 0.24, 0.18, 0.18, 0.22, 0.36, 0.…
## $ `PSC CO2` <dbl> 0.12, 0.04, 0.14, 0.06, 0.04, 0.02, 0.04, 0.08, 0.…
## $ `Mnf Flow` <dbl> -100, -100, -100, -100, -100, -100, -100, -100, -1…
## $ `Carb Pressure1` <dbl> 118.4, 119.6, 120.8, 119.8, 119.6, 119.2, 113.6, 1…
## $ `Fill Pressure` <dbl> 45.8, 45.6, 46.0, 45.2, 46.6, 46.6, 46.0, 46.6, 47…
## $ `Hyd Pressure1` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure2` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure3` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure4` <dbl> 92, 116, 90, 108, 94, 86, 94, 92, 96, 92, 94, 98, …
## $ `Filler Level` <dbl> 118.6, 120.2, 120.2, 120.8, 119.6, 119.6, 120.0, 1…
## $ `Filler Speed` <dbl> 4010, 4014, 4014, 4028, 4020, 4012, 4012, 4010, 40…
## $ Temperature <dbl> 65.6, 66.2, 65.4, 66.6, 65.0, 65.4, 65.0, 65.0, 65…
## $ `Usage cont` <dbl> 17.68, 23.82, 18.40, 13.50, 19.04, 18.44, 23.44, 2…
## $ `Carb Flow` <dbl> 3054, 2948, 2902, 3038, 3056, 3110, 3040, 3056, 32…
## $ Density <dbl> 1.54, 1.52, 0.90, 0.90, 0.90, 0.92, 0.92, 0.90, 0.…
## $ MFR <dbl> 722.8, 738.8, 740.4, 692.4, 727.0, 735.0, 731.0, 7…
## $ Balling <dbl> 3.042, 2.992, 1.446, 1.448, 1.448, 1.498, 1.498, 1…
## $ `Pressure Vacuum` <dbl> -4.4, -4.4, -4.4, -4.4, -4.4, -4.4, -4.4, -4.4, -4…
## $ PH <dbl> 8.26, 8.32, 8.38, 8.50, 8.34, 8.34, 8.38, 8.40, 8.…
## $ `Oxygen Filler` <dbl> 0.030, 0.024, 0.064, 0.022, 0.030, 0.058, 0.046, 0…
## $ `Bowl Setpoint` <dbl> 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, …
## $ `Pressure Setpoint` <dbl> 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46…
## $ `Air Pressurer` <dbl> 146.2, 146.6, 147.2, 146.2, 146.2, 146.8, 146.8, 1…
## $ `Alch Rel` <dbl> 7.14, 7.16, 6.52, 6.54, 6.52, 6.52, 6.52, 6.52, 6.…
## $ `Carb Rel` <dbl> 5.44, 5.44, 5.34, 5.34, 5.34, 5.34, 5.34, 5.34, 5.…
## $ `Balling Lvl` <dbl> 3.04, 3.02, 1.44, 1.38, 1.44, 1.44, 1.44, 1.44, 1.…
training_partition = createDataPartition(train_data$PH,p=.8,list=FALSE)
training_df = train_data[training_partition,]
validation_df = train_data[-training_partition,]
olrmod = train(PH~.,data = training_df,method='lm',trControl=trainControl('cv',number=10))
olrpred = predict(olrmod,validation_df)
olrpred_results = postResample(pred = olrpred, obs = validation_df$PH)
library(pls)
##
## Attaching package: 'pls'
## The following object is masked from 'package:corrplot':
##
## corrplot
## The following object is masked from 'package:caret':
##
## R2
## The following object is masked from 'package:stats':
##
## loadings
pls_mod = train(PH~.,data = training_df,method='pls',trControl=trainControl('cv',number=10),center=T,tunelength=20)
plot(pls_mod)
plspred = predict(pls_mod,validation_df)
plspred_results = postResample(pred = plspred, obs = validation_df$PH)
library(elasticnet)
## Loading required package: lars
## Loaded lars 1.3
##
## Attaching package: 'lars'
## The following object is masked from 'package:psych':
##
## error.bars
set.seed(100)
rrfit = train(PH~.,data=training_df,method='ridge',tuneGrid=data.frame(.lambda=seq(0,.1,length=15)),trControl=trainControl('cv',number=10))
plot(rrfit)
rrpred = predict(rrfit,validation_df)
rrpred_results = postResample(pred = rrpred, obs = validation_df$PH)
set.seed(100)
nnetmod = train(PH~.,data=training_df,
method = "avNNet",
preProc = c("center", "scale"),
tuneGrid = expand.grid( .decay = c(0, 0.01, .1), .size = c(1:10), .bag= F ),
trControl = trainControl(method = "cv",number = 10),
linout = T,
trace= F,
MaxNWts = 5 * (ncol(training_df) + 1) + 5 + 1,
maxit = 500)
nnetpred = predict(nnetmod,validation_df)
nnetpred_results = postResample(pred = nnetpred, obs = validation_df$PH)
set.seed(100)
marsmod = train(PH~.,data=training_df,method='earth',trControl=trainControl(method='cv'))
marspred = predict(marsmod,validation_df)
marspred_results = postResample(pred = marspred, obs = validation_df$PH)
set.seed(100)
svmmod = train(PH~., data = training_df,
method= "svmLinear",
trControl = trainControl(method = "repeatedcv",number = 10, repeats=3),
tuneLength = 10)
svmpred = predict(svmmod,validation_df)
svmpred_results = postResample(pred = svmpred, obs = validation_df$PH)
set.seed(100)
knnmod = train(PH~., data = training_df,
method= "knn",
trControl = trainControl(method = "repeatedcv",number = 10, repeats=3),
tuneLength = 10)
knnpred = predict(knnmod,validation_df)
knnpred_results = postResample(pred = knnpred, obs = validation_df$PH)
randomforest_model = train(PH~.,data=training_df,method='rf',
preProcess = c('center', 'scale'),trControl=trainControl('cv'))
randomforest_pred = predict(randomforest_model, validation_df)
randomforest_results = postResample(pred = randomforest_pred, obs = validation_df$PH)
cub_model = train(PH~.,data=training_df,method='cubist',
preProcess = c('center', 'scale'),trControl=trainControl('cv'))
cub_pred = predict(cub_model, validation_df)
cub_results = postResample(pred = cub_pred, obs = validation_df$PH)
Aggregating the model results we see our Cubist model performs the best among the selected model types. There is of course the risk of over fitting, however the cubist model we discerned to be best fit in this scenario.
titles <- list("OLR","PLS","RR","NNet","MARS","SVM","KNN","Random Forest","Cubist")
results <- list(olrpred_results,plspred_results,rrpred_results,nnetpred_results,marspred_results,svmpred_results,knnpred_results,randomforest_results,cub_results)
df = NULL
for (x in 1:length(results)) {
Model = titles[[x]]
RMSE = unname(results[[x]]["RMSE"])
Rsquared = unname(results[[x]]["Rsquared"])
MAE = unname(results[[x]]["MAE"])
df = rbind(df,data.frame(Model,RMSE,Rsquared,MAE))
}
knitr::kable(df[order(df$Rsquared,decreasing=TRUE),], digits=4, row.names=F)
| Model | RMSE | Rsquared | MAE |
|---|---|---|---|
| Cubist | 0.0831 | 0.7665 | 0.0622 |
| Random Forest | 0.0914 | 0.7272 | 0.0665 |
| NNet | 0.1048 | 0.6268 | 0.0792 |
| MARS | 0.1223 | 0.4921 | 0.0935 |
| RR | 0.1290 | 0.4345 | 0.1010 |
| OLR | 0.1290 | 0.4345 | 0.1010 |
| SVM | 0.1302 | 0.4246 | 0.0999 |
| KNN | 0.1323 | 0.4139 | 0.0937 |
| PLS | 0.1537 | 0.1963 | 0.1197 |
ggplot(data=df,aes(x=Model,y=RMSE)) +geom_bar(stat='identity',aes(fill=Model))+ coord_flip()
ggplot(data=df,aes(x=Model,y=Rsquared)) +geom_bar(stat='identity',aes(fill=Model))+ coord_flip()
ggplot(data=df,aes(x=Model,y=MAE)) +geom_bar(stat='identity',aes(fill=Model))+ coord_flip()
First the data is loaded and reviewed. Using the glimpse
function to take a look to the test data set.
#load test data
library(readxl)
test <- read_excel("/Users/anjalhussan/Documents/CUNY/DATA624/Project2/TestData.xlsx")
#subset features from response
test_features <- test %>%
dplyr::select(-c(PH))
#review
glimpse(test_features)
## Rows: 267
## Columns: 32
## $ `Brand Code` <chr> "D", "A", "B", "B", "B", "B", "A", "B", "A", "D", …
## $ `Carb Volume` <dbl> 5.480000, 5.393333, 5.293333, 5.266667, 5.406667, …
## $ `Fill Ounces` <dbl> 24.03333, 23.95333, 23.92000, 23.94000, 24.20000, …
## $ `PC Volume` <dbl> 0.2700000, 0.2266667, 0.3033333, 0.1860000, 0.1600…
## $ `Carb Pressure` <dbl> 65.4, 63.2, 66.4, 64.8, 69.4, 73.4, 65.2, 67.4, 66…
## $ `Carb Temp` <dbl> 134.6, 135.0, 140.4, 139.0, 142.2, 147.2, 134.6, 1…
## $ PSC <dbl> 0.236, 0.042, 0.068, 0.004, 0.040, 0.078, 0.088, 0…
## $ `PSC Fill` <dbl> 0.40, 0.22, 0.10, 0.20, 0.30, 0.22, 0.14, 0.10, 0.…
## $ `PSC CO2` <dbl> 0.04, 0.08, 0.02, 0.02, 0.06, NA, 0.00, 0.04, 0.04…
## $ `Mnf Flow` <dbl> -100, -100, -100, -100, -100, -100, -100, -100, -1…
## $ `Carb Pressure1` <dbl> 116.6, 118.8, 120.2, 124.8, 115.0, 118.6, 117.6, 1…
## $ `Fill Pressure` <dbl> 46.0, 46.2, 45.8, 40.0, 51.4, 46.4, 46.2, 40.0, 43…
## $ `Hyd Pressure1` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure2` <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Hyd Pressure3` <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Hyd Pressure4` <dbl> 96, 112, 98, 132, 94, 94, 108, 108, 110, 106, 98, …
## $ `Filler Level` <dbl> 129.4, 120.0, 119.4, 120.2, 116.0, 120.4, 119.6, 1…
## $ `Filler Speed` <dbl> 3986, 4012, 4010, NA, 4018, 4010, 4010, NA, 4010, …
## $ Temperature <dbl> 66.0, 65.6, 65.6, 74.4, 66.4, 66.6, 66.8, NA, 65.8…
## $ `Usage cont` <dbl> 21.66, 17.60, 24.18, 18.12, 21.32, 18.00, 17.68, 1…
## $ `Carb Flow` <dbl> 2950, 2916, 3056, 28, 3214, 3064, 3042, 1972, 2502…
## $ Density <dbl> 0.88, 1.50, 0.90, 0.74, 0.88, 0.84, 1.48, 1.60, 1.…
## $ MFR <dbl> 727.6, 735.8, 734.8, NA, 752.0, 732.0, 729.8, NA, …
## $ Balling <dbl> 1.398, 2.942, 1.448, 1.056, 1.398, 1.298, 2.894, 3…
## $ `Pressure Vacuum` <dbl> -3.8, -4.4, -4.2, -4.0, -4.0, -3.8, -4.2, -4.4, -4…
## $ `Oxygen Filler` <dbl> 0.022, 0.030, 0.046, NA, 0.082, 0.064, 0.042, 0.09…
## $ `Bowl Setpoint` <dbl> 130, 120, 120, 120, 120, 120, 120, 120, 120, 120, …
## $ `Pressure Setpoint` <dbl> 45.2, 46.0, 46.0, 46.0, 50.0, 46.0, 46.0, 46.0, 46…
## $ `Air Pressurer` <dbl> 142.6, 147.2, 146.6, 146.4, 145.8, 146.0, 145.0, 1…
## $ `Alch Rel` <dbl> 6.56, 7.14, 6.52, 6.48, 6.50, 6.50, 7.18, 7.16, 7.…
## $ `Carb Rel` <dbl> 5.34, 5.58, 5.34, 5.50, 5.38, 5.42, 5.46, 5.42, 5.…
## $ `Balling Lvl` <dbl> 1.48, 3.04, 1.46, 1.48, 1.46, 1.44, 3.02, 3.00, 3.…
summary(test)
## Brand Code Carb Volume Fill Ounces PC Volume
## Length:267 Min. :5.147 Min. :23.75 Min. :0.09867
## Class :character 1st Qu.:5.287 1st Qu.:23.92 1st Qu.:0.23333
## Mode :character Median :5.340 Median :23.97 Median :0.27533
## Mean :5.369 Mean :23.97 Mean :0.27769
## 3rd Qu.:5.465 3rd Qu.:24.01 3rd Qu.:0.32200
## Max. :5.667 Max. :24.20 Max. :0.46400
## NA's :1 NA's :6 NA's :4
## Carb Pressure Carb Temp PSC PSC Fill
## Min. :60.20 Min. :130.0 Min. :0.00400 Min. :0.0200
## 1st Qu.:65.30 1st Qu.:138.4 1st Qu.:0.04450 1st Qu.:0.1000
## Median :68.00 Median :140.8 Median :0.07600 Median :0.1800
## Mean :68.25 Mean :141.2 Mean :0.08545 Mean :0.1903
## 3rd Qu.:70.60 3rd Qu.:143.8 3rd Qu.:0.11200 3rd Qu.:0.2600
## Max. :77.60 Max. :154.0 Max. :0.24600 Max. :0.6200
## NA's :1 NA's :5 NA's :3
## PSC CO2 Mnf Flow Carb Pressure1 Fill Pressure
## Min. :0.00000 Min. :-100.20 Min. :113.0 Min. :37.80
## 1st Qu.:0.02000 1st Qu.:-100.00 1st Qu.:120.2 1st Qu.:46.00
## Median :0.04000 Median : 0.20 Median :123.4 Median :47.80
## Mean :0.05107 Mean : 21.03 Mean :123.0 Mean :48.14
## 3rd Qu.:0.06000 3rd Qu.: 141.30 3rd Qu.:125.5 3rd Qu.:50.20
## Max. :0.24000 Max. : 220.40 Max. :136.0 Max. :60.20
## NA's :5 NA's :4 NA's :2
## Hyd Pressure1 Hyd Pressure2 Hyd Pressure3 Hyd Pressure4
## Min. :-50.00 Min. :-50.00 Min. :-50.00 Min. : 68.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 90.00
## Median : 10.40 Median : 26.80 Median : 27.70 Median : 98.00
## Mean : 12.01 Mean : 20.11 Mean : 19.61 Mean : 97.84
## 3rd Qu.: 20.40 3rd Qu.: 34.80 3rd Qu.: 33.00 3rd Qu.:104.00
## Max. : 50.00 Max. : 61.40 Max. : 49.20 Max. :140.00
## NA's :1 NA's :1 NA's :4
## Filler Level Filler Speed Temperature Usage cont Carb Flow
## Min. : 69.2 Min. :1006 Min. :63.80 Min. :12.90 Min. : 0
## 1st Qu.:100.6 1st Qu.:3812 1st Qu.:65.40 1st Qu.:18.12 1st Qu.:1083
## Median :118.6 Median :3978 Median :65.80 Median :21.44 Median :3038
## Mean :110.3 Mean :3581 Mean :66.23 Mean :20.90 Mean :2409
## 3rd Qu.:120.2 3rd Qu.:3996 3rd Qu.:66.60 3rd Qu.:23.74 3rd Qu.:3215
## Max. :153.2 Max. :4020 Max. :75.40 Max. :24.60 Max. :3858
## NA's :2 NA's :10 NA's :2 NA's :2
## Density MFR Balling Pressure Vacuum
## Min. :0.060 Min. : 15.6 Min. :0.902 Min. :-6.400
## 1st Qu.:0.920 1st Qu.:707.0 1st Qu.:1.498 1st Qu.:-5.600
## Median :0.980 Median :724.6 Median :1.648 Median :-5.200
## Mean :1.177 Mean :697.8 Mean :2.203 Mean :-5.174
## 3rd Qu.:1.600 3rd Qu.:731.5 3rd Qu.:3.242 3rd Qu.:-4.800
## Max. :1.840 Max. :784.8 Max. :3.788 Max. :-3.600
## NA's :1 NA's :31 NA's :1 NA's :1
## PH Oxygen Filler Bowl Setpoint Pressure Setpoint
## Mode:logical Min. :0.00240 Min. : 70.0 Min. :44.00
## NA's:267 1st Qu.:0.01960 1st Qu.:100.0 1st Qu.:46.00
## Median :0.03370 Median :120.0 Median :46.00
## Mean :0.04666 Mean :109.6 Mean :47.73
## 3rd Qu.:0.05440 3rd Qu.:120.0 3rd Qu.:50.00
## Max. :0.39800 Max. :130.0 Max. :52.00
## NA's :3 NA's :1 NA's :2
## Air Pressurer Alch Rel Carb Rel Balling Lvl
## Min. :141.2 Min. :6.400 Min. :5.18 Min. :0.000
## 1st Qu.:142.2 1st Qu.:6.540 1st Qu.:5.34 1st Qu.:1.380
## Median :142.6 Median :6.580 Median :5.40 Median :1.480
## Mean :142.8 Mean :6.907 Mean :5.44 Mean :2.051
## 3rd Qu.:142.8 3rd Qu.:7.180 3rd Qu.:5.56 3rd Qu.:3.080
## Max. :147.2 Max. :7.820 Max. :5.74 Max. :3.420
## NA's :1 NA's :3 NA's :2
In order to run our model using the Cubist model, the test data needs to be cleaned in a similar fashion to that of train data used to run the model.
Visualize missing data from Test Dataset
plot_missing(test)
test_noph_nona= test_features %>% drop_na()
glimpse(test_noph_nona)
## Rows: 200
## Columns: 32
## $ `Brand Code` <chr> "A", "B", "B", "A", "A", "B", "B", "B", "C", "B", …
## $ `Carb Volume` <dbl> 5.393333, 5.293333, 5.406667, 5.480000, 5.406667, …
## $ `Fill Ounces` <dbl> 23.95333, 23.92000, 24.20000, 23.93333, 23.92000, …
## $ `PC Volume` <dbl> 0.2266667, 0.3033333, 0.1600000, 0.2433333, 0.3326…
## $ `Carb Pressure` <dbl> 63.2, 66.4, 69.4, 65.2, 66.8, 63.2, 65.0, 63.8, 64…
## $ `Carb Temp` <dbl> 135.0, 140.4, 142.2, 134.6, 138.0, 139.6, 138.8, 1…
## $ PSC <dbl> 0.042, 0.068, 0.040, 0.088, 0.246, 0.184, 0.152, 0…
## $ `PSC Fill` <dbl> 0.22, 0.10, 0.30, 0.14, 0.48, 0.26, 0.12, 0.18, 0.…
## $ `PSC CO2` <dbl> 0.08, 0.02, 0.06, 0.00, 0.04, 0.20, 0.00, 0.02, 0.…
## $ `Mnf Flow` <dbl> -100, -100, -100, -100, -100, -100, -100, -100, -1…
## $ `Carb Pressure1` <dbl> 118.8, 120.2, 115.0, 117.6, 136.0, 117.2, 117.0, 1…
## $ `Fill Pressure` <dbl> 46.2, 45.8, 51.4, 46.2, 43.8, 46.2, 45.8, 46.4, 46…
## $ `Hyd Pressure1` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure2` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure3` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Hyd Pressure4` <dbl> 112, 98, 94, 108, 110, 96, 100, 100, 92, 90, 90, 7…
## $ `Filler Level` <dbl> 120.0, 119.4, 116.0, 119.6, 121.0, 118.4, 119.6, 1…
## $ `Filler Speed` <dbl> 4012, 4010, 4018, 4010, 4010, 4010, 4010, 4016, 40…
## $ Temperature <dbl> 65.6, 65.6, 66.4, 66.8, 65.8, 65.8, 65.4, 65.6, 67…
## $ `Usage cont` <dbl> 17.60, 24.18, 21.32, 17.68, 17.70, 17.16, 20.52, 2…
## $ `Carb Flow` <dbl> 2916, 3056, 3214, 3042, 2502, 3100, 2926, 2954, 30…
## $ Density <dbl> 1.50, 0.90, 0.88, 1.48, 1.52, 0.86, 0.92, 0.94, 0.…
## $ MFR <dbl> 735.8, 734.8, 752.0, 729.8, 741.2, 735.8, 735.6, 7…
## $ Balling <dbl> 2.942, 1.448, 1.398, 2.894, 2.992, 1.348, 1.498, 1…
## $ `Pressure Vacuum` <dbl> -4.4, -4.2, -4.0, -4.2, -4.4, -4.2, -4.8, -4.8, -4…
## $ `Oxygen Filler` <dbl> 0.030, 0.046, 0.082, 0.042, 0.046, 0.048, 0.066, 0…
## $ `Bowl Setpoint` <dbl> 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, …
## $ `Pressure Setpoint` <dbl> 46, 46, 50, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46…
## $ `Air Pressurer` <dbl> 147.2, 146.6, 145.8, 145.0, 146.2, 147.0, 147.0, 1…
## $ `Alch Rel` <dbl> 7.14, 6.52, 6.50, 7.18, 7.14, 6.50, 6.54, 6.54, 6.…
## $ `Carb Rel` <dbl> 5.58, 5.34, 5.38, 5.46, 5.44, 5.38, 5.28, 5.22, 5.…
## $ `Balling Lvl` <dbl> 3.04, 1.46, 1.46, 3.02, 3.10, 1.42, 1.46, 1.44, 1.…
eval_cub_pred = predict(cub_model, test_noph_nona)
scaled_ph = scale(train_data$PH, center= TRUE, scale=TRUE)
scaledeval_predictions = eval_cub_pred * attr(scaled_ph, 'scaled:scale') + attr(scaled_ph, 'scaled:center')
eval_df = data.frame(scaledeval_predictions)
#write.xlsx(eval_df, 'FinalProject_Predictions.xlsx')
d = data.frame(
eval_df,
stringsAsFactors = FALSE
)
dt <- datatable(d, options = list(pageLength = 25))
dt