Developing a model to predict permeability could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug. a) start R and Use these commands to load the data

#loaded relevant libraries
library(AppliedPredictiveModeling)
library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
library(caTools)
library(elasticnet)
## Loading required package: lars
## Loaded lars 1.3
library(lars)
library(MASS)
library(pls)
## 
## Attaching package: 'pls'
## The following object is masked from 'package:caret':
## 
##     R2
## The following object is masked from 'package:stats':
## 
##     loadings
library(stats)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ purrr::lift()   masks caret::lift()
## ✖ dplyr::select() masks MASS::select()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tsibble     1.1.5     ✔ feasts      0.4.1
## ✔ tsibbledata 0.4.1     ✔ fable       0.4.0
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ purrr::lift()        masks caret::lift()
## ✖ fabletools::MAE()    masks caret::MAE()
## ✖ fabletools::RMSE()   masks caret::RMSE()
## ✖ dplyr::select()      masks MASS::select()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(fable)
library(ggplot2)
library(e1071)
## 
## Attaching package: 'e1071'
## 
## The following object is masked from 'package:fabletools':
## 
##     interpolate
library(lattice)
library(corrplot)
## corrplot 0.95 loaded
## 
## Attaching package: 'corrplot'
## 
## The following object is masked from 'package:pls':
## 
##     corrplot
library(VIM)
## Loading required package: colorspace
## Loading required package: grid
## VIM is ready to use.
## 
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
## 
## Attaching package: 'VIM'
## 
## The following object is masked from 'package:datasets':
## 
##     sleep
data(permeability)
#loaded data put into data frame
p_df <- as.data.frame(permeability)

class(permeability) #matrix
## [1] "matrix" "array"
class(p_df) #data frame
## [1] "data.frame"
fingerprints <- fingerprints #matrix fingerprints declared to ensure usability.
head(p_df) #dataframe of the permeability
##   permeability
## 1       12.520
## 2        1.120
## 3       19.405
## 4        1.730
## 5        1.680
## 6        0.510

data loaded

  1. The fingerprint predictors indicate the presense or absence of substructures of a molecule and are often sparce meaning that relatively few of the molecules contain each substructure. filter out the predictors that have low frequencies using the nearZeroVar function from the caret package. how many predictors are left for modeling?
#removing the columns with near zero variance / low frequency diversity
filter_fingerprints <- fingerprints[, -nearZeroVar(fingerprints)]
#how many predictors were there?
dim(fingerprints)
## [1]  165 1107
#how many predictors are left? 
dim(filter_fingerprints)
## [1] 165 388

now there are 388 columns or predictors.

  1. Split the data into a training and a test set, preprocess the data, and tune a PLS model. How many latent variables are optimal and what is the corresponding resampled estimate of R^2?
set.seed(987654321) 
#splitting the permeability data into a 75% training and a 25% test set
#select indices to use for extracting the training data
train_indices <- createDataPartition(p_df$permeability, p = 0.75, list = FALSE)

# use the indices to select the data
train_p <- permeability[train_indices, ]
test_p <- permeability[-train_indices, ]
train_fp <- filter_fingerprints[train_indices, ]
test_fp <- filter_fingerprints[-train_indices, ]

#test of 20 variables for PLS model
ctrl <- trainControl(method= "cv", number= 20)
#implement the model
plsTune <- train(x = train_fp, y = train_p,
                 method= "pls",
                 tunelength= 20,
                 tuneGrid = expand.grid(ncomp = 1:20), #force it to test 20 variables.
                 trControl = ctrl,
                 preProc = c("center", "scale")
                 ) #by default the model tries to reduce the RMSE
#view info of the model
print(plsTune)
## Partial Least Squares 
## 
## 125 samples
## 388 predictors
## 
## Pre-processing: centered (388), scaled (388) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 119, 119, 119, 119, 120, 119, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE      
##    1     13.25976  0.4464616  10.358836
##    2     12.00416  0.4872695   8.841882
##    3     11.87550  0.4924602   9.254769
##    4     11.43491  0.5164738   9.116062
##    5     11.16276  0.5290583   8.804034
##    6     11.13298  0.5525563   8.774824
##    7     11.02419  0.5726444   8.619423
##    8     10.77778  0.5980230   8.576113
##    9     10.68594  0.5855937   8.398050
##   10     10.59289  0.5959762   8.263912
##   11     10.68505  0.5886411   8.364086
##   12     10.67276  0.5970798   8.292205
##   13     10.82059  0.5846147   8.455137
##   14     11.22155  0.5698572   8.788628
##   15     11.27470  0.5657656   8.805156
##   16     11.64483  0.5549272   9.108120
##   17     11.83424  0.5497440   9.165310
##   18     12.12880  0.5405549   9.290222
##   19     12.15433  0.5342537   9.371949
##   20     12.38511  0.5274388   9.552238
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 10.
#see the iterations of the latent variables and their performance plotted
plot(plsTune)

#attempted a second model to see the optimal Rsquared value
plsTune2 <- train(x = train_fp, y = train_p,
                 method= "pls",
                 tunelength= 20,
                 tuneGrid = expand.grid(ncomp = 1:20),
                 metric = 'Rsquared', #optimizing Rsquared
                 trControl = ctrl,
                 preProc = c("center", "scale")
                 )

print(plsTune2)
## Partial Least Squares 
## 
## 125 samples
## 388 predictors
## 
## Pre-processing: centered (388), scaled (388) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 119, 118, 119, 119, 121, 118, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE      
##    1     13.27651  0.3819973  10.242590
##    2     11.93108  0.5312996   8.955680
##    3     11.76173  0.5223206   9.282503
##    4     11.69984  0.5377856   9.450407
##    5     11.47535  0.5417391   9.147189
##    6     11.43199  0.5295197   9.024695
##    7     11.30985  0.5559088   8.987647
##    8     11.27594  0.5740494   9.053436
##    9     11.42119  0.5845358   8.949448
##   10     11.41917  0.5638812   8.936092
##   11     11.70490  0.5539041   9.099776
##   12     11.83443  0.5437615   9.110245
##   13     12.01633  0.5314545   9.321229
##   14     12.20600  0.5074832   9.477129
##   15     12.50586  0.4957734   9.714224
##   16     12.80412  0.4768489   9.942856
##   17     13.03100  0.4794901  10.123784
##   18     13.13572  0.4762658  10.172648
##   19     13.31113  0.4670994  10.187537
##   20     13.31420  0.4661427  10.283183
## 
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 9.
plot(plsTune2)

for a test of 20 variables for a model selected by the default method of finding the lowest RMSE there were 10 latent variables with an Rsquared of 0.5959762. for a model selected by finding the highest Rsquared, there were 9 latent variables with an Rsquared of 0.5845358. due to 10 having the lowest RMSE and the largest Rsquared it is probably the best model. d. Predict the response for the test set. What is the test set estimate of Rsquared? initially my thoughts that the model would have the same performance as the training set with a Rsquared of 0.59

#predict with the first PLS model that optimized RMSE
fp_predict <- predict(plsTune, test_fp)
#test to see how well the predictions did
postResample(fp_predict, test_p)
##      RMSE  Rsquared       MAE 
## 10.887182  0.457854  8.165262
# predict with the PLS model that optimized Rsquared
fp_predict2 <- predict(plsTune2, test_fp)
#test predictions
postResample(fp_predict2, test_p)
##       RMSE   Rsquared        MAE 
## 10.6318507  0.4760463  7.9407355

surprisingly the model with less latent variables (9) perfomed better on the test data set even though it had worse performance on the training set. though their performance is similar.

try other models any better?

set.seed(3)
lmTune <- train(x = train_fp, y = train_p,
                 method= "lm",
                 trControl = ctrl,
                 preProc = c("center", "scale")
                 )
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
print(lmTune)
## Linear Regression 
## 
## 125 samples
## 388 predictors
## 
## Pre-processing: centered (388), scaled (388) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 119, 118, 119, 120, 118, 120, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   60.45876  0.3249167  36.69025
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE
fp_predict3 <- predict(lmTune, test_fp)
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases
postResample(fp_predict3, test_p)
##       RMSE   Rsquared        MAE 
## 52.3926831  0.1303428 32.3763365
ridgeGrid <- data.frame(.lambda = seq(0.0001,0.3,length = 10))
ridgeTune <- train(x = train_fp, y = train_p,
                 method= "ridge",
                 tuneGrid = ridgeGrid,
                 trControl = ctrl,
                 preProc = c("center", "scale")
                 )
print(ridgeTune)
## Ridge Regression 
## 
## 125 samples
## 388 predictors
## 
## Pre-processing: centered (388), scaled (388) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 120, 119, 120, 119, 118, 118, ... 
## Resampling results across tuning parameters:
## 
##   lambda      RMSE          Rsquared   MAE         
##   0.00010000  163858.79275  0.1896401  93642.221368
##   0.03342222      13.15790  0.4706041     10.324337
##   0.06674444      12.53399  0.4914283      9.766323
##   0.10006667      12.19513  0.5138133      9.437761
##   0.13338889      12.07589  0.5254832      9.345561
##   0.16671111      12.03991  0.5341925      9.345011
##   0.20003333      12.04456  0.5407807      9.355552
##   0.23335556      12.07982  0.5460861      9.384765
##   0.26667778      12.13601  0.5504579      9.427785
##   0.30000000      12.21082  0.5540042      9.493195
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.1667111.
plot(ridgeTune)

fp_predict4 <- predict(ridgeTune, test_fp)

postResample(fp_predict4, test_p)
##       RMSE   Rsquared        MAE 
## 11.0353444  0.4835548  8.5750659

a linear regression model has the worst Rsquared of ~0.3 for the training set and a Rsquared ~0.1 for the test data set. It also had the highest RMSE.

For a ridge regression model had the highest Rsquared for the test set. It’s RMSE was worse than the PLS models but better than the linear regression. The final value used for the model was lambda = 0.1667111. RMSE: 12.03991 0.5341925

  1. would you recommend any of your models to replace the permeability laboratory experiment. I would not recommend any of my models to replace the experiment. Generally in science you want to have correlation coefficients to be very high with little error. it is possible to use the PLS or ridge model to do some screening, but it is not accurate enough to definitively predict permeability.

Start R and use these commands to load the data:

data("ChemicalManufacturingProcess")
#book mentioned process predictors, which was not found, so process predictors was made from chemical manufacturing process
apropos('processPredictors')
## character(0)
#how many na values?
sum(is.na(ChemicalManufacturingProcess))
## [1] 106

Impute the predictors the impute package is no longer available on CRAN used VIM instead

#imputed with K nearest neighbors set to 5.
imputeCMP <- kNN(ChemicalManufacturingProcess, k = 5)
#imputation removed nas
sum(is.na(imputeCMP))
## [1] 0

imputed with nearest 5 neighbors

  1. split the data into training and test. pre process the data tune a model of your choice. what is the optimal value of the performance metric?
#created processPredictors since it does not seem to be available after loading the data
#removed the output and kept the predictors
processPredictors <- select(imputeCMP, -"Yield")
set.seed(987654321) 
#splitting the permeability data into a 75% training and a 25% test set
#select indices to use for extracting the training data
train_indice <- createDataPartition(imputeCMP$Yield, p = 0.75, list = FALSE)

# use the indices to select the data
train_cmp <- imputeCMP[train_indice, ]
test_cmp <- imputeCMP[-train_indice, ]
train_cmpp <- processPredictors[train_indice, ]
test_cmpp <- processPredictors[-train_indice, ]

#test of 20 variables for PLS model
ctrl <- trainControl(method= "cv", number= 20)
#implement the model
plsTunecmp <- train(x = train_cmpp, y = train_cmp$Yield,
                 method= "pls",
                 tunelength= 57,
                 tuneGrid = expand.grid(ncomp = 1:20), #force it to test 20 variables.
                 trControl = ctrl,
                 preProc = c("center", "scale")
                 ) #by default the model tries to reduce the RMSE
#info of the model
print(plsTunecmp)
## Partial Least Squares 
## 
## 132 samples
## 115 predictors
## 
## Pre-processing: centered (57), scaled (57), ignore (58) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 125, 124, 124, 126, 128, 126, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE     
##    1     1.353285  0.5325094  1.121985
##    2     1.410579  0.5684854  1.081299
##    3     1.353358  0.6275744  1.052066
##    4     1.650966  0.6288592  1.178190
##    5     1.891649  0.6280510  1.249222
##    6     1.897481  0.6350393  1.249742
##    7     1.926913  0.6325423  1.269054
##    8     1.954366  0.6260732  1.293714
##    9     1.965167  0.6144576  1.305091
##   10     2.090428  0.6003619  1.366453
##   11     2.210097  0.5925558  1.426800
##   12     2.301631  0.5861027  1.469666
##   13     2.294127  0.5762289  1.463510
##   14     2.282930  0.5681934  1.467621
##   15     2.248300  0.5684622  1.454498
##   16     2.214395  0.5582624  1.439491
##   17     2.315414  0.5546140  1.485624
##   18     2.538266  0.5527585  1.576116
##   19     2.652645  0.5441030  1.624933
##   20     2.769165  0.5450234  1.675027
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 1.
plot(plsTunecmp)

#attempted a second model to see the optimal Rsquared value
plsTunecmp2 <- train(x = train_cmpp, y = train_cmp$Yield,
                 method= "pls",
                 tunelength= 57,
                 tuneGrid = expand.grid(ncomp = 1:20),
                 metric = 'Rsquared', #optimizing Rsquared
                 trControl = ctrl,
                 preProc = c("center", "scale")
                 )
# checked model that optimized Rsquared
print(plsTunecmp2)
## Partial Least Squares 
## 
## 132 samples
## 115 predictors
## 
## Pre-processing: centered (57), scaled (57), ignore (58) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 127, 126, 124, 125, 124, 125, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE     
##    1     1.362851  0.5346905  1.141341
##    2     1.393145  0.5258934  1.066228
##    3     1.289711  0.6444817  1.013242
##    4     1.515723  0.6327822  1.100800
##    5     1.728108  0.6479596  1.154141
##    6     1.756615  0.6290911  1.157878
##    7     1.798649  0.6177643  1.182533
##    8     1.798173  0.6079081  1.188746
##    9     1.828769  0.5946988  1.218899
##   10     1.939382  0.5831397  1.269384
##   11     2.028877  0.5842301  1.324990
##   12     2.113436  0.5839007  1.358965
##   13     2.112742  0.5875895  1.360486
##   14     2.086512  0.5833899  1.363457
##   15     2.054611  0.5753634  1.355360
##   16     2.016909  0.5702812  1.343235
##   17     2.076828  0.5657995  1.370536
##   18     2.142933  0.5660274  1.396796
##   19     2.206281  0.5663064  1.423895
##   20     2.297630  0.5648803  1.466473
## 
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 5.
plot(plsTunecmp2)

ncomp = 1 had the best performance metric for reducing the RMSE was 1.353285. but it had a relatively low rsquared

based off of the rSquared and RMSE I believe ncomp = 3 should be the best (second best for both RMSE and RSquared. tested it. I believe that 1 latent variable would perform worse than 3.

#PLS model with 3 latent variables
plsTunecmp3 <- train(x = train_cmpp, y = train_cmp$Yield,
                 method= "pls",
                 tunelength= 57,
                 tuneGrid = expand.grid(ncomp = 3), #force it to use 3 latent variables
                 metric = 'Rsquared', #optimizing Rsquared
                 trControl = ctrl,
                 preProc = c("center", "scale")
                 )

print(plsTunecmp3)
## Partial Least Squares 
## 
## 132 samples
## 115 predictors
## 
## Pre-processing: centered (57), scaled (57), ignore (58) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 125, 125, 124, 126, 125, 125, ... 
## Resampling results:
## 
##   RMSE     Rsquared   MAE   
##   1.38745  0.6231064  1.0418
## 
## Tuning parameter 'ncomp' was held constant at a value of 3
#check performance of 3 latent variables model on test data set
cmp_predict <- predict(plsTunecmp3, test_cmpp)

postResample(cmp_predict, test_cmp$Yield)
##      RMSE  Rsquared       MAE 
## 1.5332791 0.4786589 1.2090102
#check performance of 1 latent vriables PLS model on the test data set
print(plsTunecmp)
## Partial Least Squares 
## 
## 132 samples
## 115 predictors
## 
## Pre-processing: centered (57), scaled (57), ignore (58) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 125, 124, 124, 126, 128, 126, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE     
##    1     1.353285  0.5325094  1.121985
##    2     1.410579  0.5684854  1.081299
##    3     1.353358  0.6275744  1.052066
##    4     1.650966  0.6288592  1.178190
##    5     1.891649  0.6280510  1.249222
##    6     1.897481  0.6350393  1.249742
##    7     1.926913  0.6325423  1.269054
##    8     1.954366  0.6260732  1.293714
##    9     1.965167  0.6144576  1.305091
##   10     2.090428  0.6003619  1.366453
##   11     2.210097  0.5925558  1.426800
##   12     2.301631  0.5861027  1.469666
##   13     2.294127  0.5762289  1.463510
##   14     2.282930  0.5681934  1.467621
##   15     2.248300  0.5684622  1.454498
##   16     2.214395  0.5582624  1.439491
##   17     2.315414  0.5546140  1.485624
##   18     2.538266  0.5527585  1.576116
##   19     2.652645  0.5441030  1.624933
##   20     2.769165  0.5450234  1.675027
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 1.
cmp_predict1 <- predict(plsTunecmp, test_cmpp)

postResample(cmp_predict1, test_cmp$Yield)
##      RMSE  Rsquared       MAE 
## 1.9094688 0.2028172 1.4093548

3 latent variables (ncomp) performed better than 1. 3: RMSE 1.5332791, Rsquared 0.4786589, MAE 1.2090102. 1: RMSE 1.9094688, Rsquared 0.2028172, MAE 1.4093548 the test set performs worse than the training set.

then decided to check ncomp 5.

print(plsTunecmp2)
## Partial Least Squares 
## 
## 132 samples
## 115 predictors
## 
## Pre-processing: centered (57), scaled (57), ignore (58) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 127, 126, 124, 125, 124, 125, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE     
##    1     1.362851  0.5346905  1.141341
##    2     1.393145  0.5258934  1.066228
##    3     1.289711  0.6444817  1.013242
##    4     1.515723  0.6327822  1.100800
##    5     1.728108  0.6479596  1.154141
##    6     1.756615  0.6290911  1.157878
##    7     1.798649  0.6177643  1.182533
##    8     1.798173  0.6079081  1.188746
##    9     1.828769  0.5946988  1.218899
##   10     1.939382  0.5831397  1.269384
##   11     2.028877  0.5842301  1.324990
##   12     2.113436  0.5839007  1.358965
##   13     2.112742  0.5875895  1.360486
##   14     2.086512  0.5833899  1.363457
##   15     2.054611  0.5753634  1.355360
##   16     2.016909  0.5702812  1.343235
##   17     2.076828  0.5657995  1.370536
##   18     2.142933  0.5660274  1.396796
##   19     2.206281  0.5663064  1.423895
##   20     2.297630  0.5648803  1.466473
## 
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 5.
cmp_predict2 <- predict(plsTunecmp2, test_cmpp)

postResample(cmp_predict2, test_cmp$Yield)
##      RMSE  Rsquared       MAE 
## 2.4504133 0.1541251 1.5157113

ncomp 5 perfomed the worst.

which predictors are important

#stores the one latent variable Model
onecomp <- plsTunecmp$finalModel
#stores the three latent variable Model
threecomp <- plsTunecmp3$finalModel
#shows the latent variables
onecomp$loadings
## 
## Loadings:
##                                Comp 1
## BiologicalMaterial01            0.257
## BiologicalMaterial02            0.296
## BiologicalMaterial03            0.249
## BiologicalMaterial04            0.256
## BiologicalMaterial05            0.113
## BiologicalMaterial06            0.284
## BiologicalMaterial07                 
## BiologicalMaterial08            0.269
## BiologicalMaterial09            0.101
## BiologicalMaterial10            0.193
## BiologicalMaterial11            0.254
## BiologicalMaterial12            0.253
## ManufacturingProcess01               
## ManufacturingProcess02         -0.187
## ManufacturingProcess03               
## ManufacturingProcess04         -0.178
## ManufacturingProcess05               
## ManufacturingProcess06          0.152
## ManufacturingProcess07               
## ManufacturingProcess08               
## ManufacturingProcess09          0.179
## ManufacturingProcess10          0.107
## ManufacturingProcess11          0.151
## ManufacturingProcess12          0.129
## ManufacturingProcess13         -0.160
## ManufacturingProcess14               
## ManufacturingProcess15          0.122
## ManufacturingProcess16               
## ManufacturingProcess17         -0.101
## ManufacturingProcess18               
## ManufacturingProcess19          0.108
## ManufacturingProcess20               
## ManufacturingProcess21               
## ManufacturingProcess22               
## ManufacturingProcess23               
## ManufacturingProcess24         -0.125
## ManufacturingProcess25               
## ManufacturingProcess26               
## ManufacturingProcess27               
## ManufacturingProcess28          0.188
## ManufacturingProcess29               
## ManufacturingProcess30          0.102
## ManufacturingProcess31               
## ManufacturingProcess32          0.230
## ManufacturingProcess33          0.195
## ManufacturingProcess34               
## ManufacturingProcess35               
## ManufacturingProcess36         -0.204
## ManufacturingProcess37               
## ManufacturingProcess38               
## ManufacturingProcess39               
## ManufacturingProcess40               
## ManufacturingProcess41               
## ManufacturingProcess42               
## ManufacturingProcess43               
## ManufacturingProcess44               
## ManufacturingProcess45               
## Yield_impTRUE                        
## BiologicalMaterial01_impTRUE         
## BiologicalMaterial02_impTRUE         
## BiologicalMaterial03_impTRUE         
## BiologicalMaterial04_impTRUE         
## BiologicalMaterial05_impTRUE         
## BiologicalMaterial06_impTRUE         
## BiologicalMaterial07_impTRUE         
## BiologicalMaterial08_impTRUE         
## BiologicalMaterial09_impTRUE         
## BiologicalMaterial10_impTRUE         
## BiologicalMaterial11_impTRUE         
## BiologicalMaterial12_impTRUE         
## ManufacturingProcess01_impTRUE       
## ManufacturingProcess02_impTRUE       
## ManufacturingProcess03_impTRUE       
## ManufacturingProcess04_impTRUE       
## ManufacturingProcess05_impTRUE       
## ManufacturingProcess06_impTRUE       
## ManufacturingProcess07_impTRUE       
## ManufacturingProcess08_impTRUE       
## ManufacturingProcess09_impTRUE       
## ManufacturingProcess10_impTRUE       
## ManufacturingProcess11_impTRUE       
## ManufacturingProcess12_impTRUE       
## ManufacturingProcess13_impTRUE       
## ManufacturingProcess14_impTRUE       
## ManufacturingProcess15_impTRUE       
## ManufacturingProcess16_impTRUE       
## ManufacturingProcess17_impTRUE       
## ManufacturingProcess18_impTRUE       
## ManufacturingProcess19_impTRUE       
## ManufacturingProcess20_impTRUE       
## ManufacturingProcess21_impTRUE       
## ManufacturingProcess22_impTRUE       
## ManufacturingProcess23_impTRUE       
## ManufacturingProcess24_impTRUE       
## ManufacturingProcess25_impTRUE       
## ManufacturingProcess26_impTRUE       
## ManufacturingProcess27_impTRUE       
## ManufacturingProcess28_impTRUE       
## ManufacturingProcess29_impTRUE       
## ManufacturingProcess30_impTRUE       
## ManufacturingProcess31_impTRUE       
## ManufacturingProcess32_impTRUE       
## ManufacturingProcess33_impTRUE       
## ManufacturingProcess34_impTRUE       
## ManufacturingProcess35_impTRUE       
## ManufacturingProcess36_impTRUE       
## ManufacturingProcess37_impTRUE       
## ManufacturingProcess38_impTRUE       
## ManufacturingProcess39_impTRUE       
## ManufacturingProcess40_impTRUE       
## ManufacturingProcess41_impTRUE       
## ManufacturingProcess42_impTRUE       
## ManufacturingProcess43_impTRUE       
## ManufacturingProcess44_impTRUE       
## ManufacturingProcess45_impTRUE       
## 
##                Comp 1
## SS loadings     1.112
## Proportion Var  0.010
#3 latent variables model
threecomp$loadings
## 
## Loadings:
##                                Comp 1 Comp 2 Comp 3
## BiologicalMaterial01            0.257 -0.191       
## BiologicalMaterial02            0.296 -0.127       
## BiologicalMaterial03            0.249              
## BiologicalMaterial04            0.256 -0.186       
## BiologicalMaterial05            0.113 -0.149       
## BiologicalMaterial06            0.284 -0.102       
## BiologicalMaterial07                         -0.270
## BiologicalMaterial08            0.269 -0.174       
## BiologicalMaterial09            0.101              
## BiologicalMaterial10            0.193 -0.238       
## BiologicalMaterial11            0.254 -0.133       
## BiologicalMaterial12            0.253 -0.114 -0.144
## ManufacturingProcess01                 0.120       
## ManufacturingProcess02         -0.187  0.261       
## ManufacturingProcess03                             
## ManufacturingProcess04         -0.178              
## ManufacturingProcess05                       -0.122
## ManufacturingProcess06          0.152  0.150       
## ManufacturingProcess07                             
## ManufacturingProcess08                             
## ManufacturingProcess09          0.179  0.282 -0.218
## ManufacturingProcess10          0.107  0.102 -0.229
## ManufacturingProcess11          0.151  0.185 -0.231
## ManufacturingProcess12          0.129  0.187       
## ManufacturingProcess13         -0.160 -0.339  0.116
## ManufacturingProcess14                -0.179  0.247
## ManufacturingProcess15          0.122 -0.148  0.210
## ManufacturingProcess16                             
## ManufacturingProcess17         -0.101 -0.394       
## ManufacturingProcess18                -0.324  0.295
## ManufacturingProcess19          0.108 -0.320  0.264
## ManufacturingProcess20                -0.322  0.173
## ManufacturingProcess21                -0.198       
## ManufacturingProcess22                             
## ManufacturingProcess23                             
## ManufacturingProcess24         -0.125         0.117
## ManufacturingProcess25                -0.121  0.140
## ManufacturingProcess26                -0.114  0.135
## ManufacturingProcess27                -0.124  0.131
## ManufacturingProcess28          0.188 -0.170       
## ManufacturingProcess29                -0.160  0.145
## ManufacturingProcess30          0.102              
## ManufacturingProcess31                        0.124
## ManufacturingProcess32          0.230         0.311
## ManufacturingProcess33          0.195 -0.110  0.280
## ManufacturingProcess34                 0.130       
## ManufacturingProcess35                             
## ManufacturingProcess36         -0.204        -0.307
## ManufacturingProcess37                       -0.167
## ManufacturingProcess38                        0.126
## ManufacturingProcess39                        0.164
## ManufacturingProcess40                             
## ManufacturingProcess41                             
## ManufacturingProcess42                        0.200
## ManufacturingProcess43                -0.143  0.154
## ManufacturingProcess44                        0.182
## ManufacturingProcess45                        0.186
## Yield_impTRUE                                      
## BiologicalMaterial01_impTRUE                       
## BiologicalMaterial02_impTRUE                       
## BiologicalMaterial03_impTRUE                       
## BiologicalMaterial04_impTRUE                       
## BiologicalMaterial05_impTRUE                       
## BiologicalMaterial06_impTRUE                       
## BiologicalMaterial07_impTRUE                       
## BiologicalMaterial08_impTRUE                       
## BiologicalMaterial09_impTRUE                       
## BiologicalMaterial10_impTRUE                       
## BiologicalMaterial11_impTRUE                       
## BiologicalMaterial12_impTRUE                       
## ManufacturingProcess01_impTRUE                     
## ManufacturingProcess02_impTRUE                     
## ManufacturingProcess03_impTRUE                     
## ManufacturingProcess04_impTRUE                     
## ManufacturingProcess05_impTRUE                     
## ManufacturingProcess06_impTRUE                     
## ManufacturingProcess07_impTRUE                     
## ManufacturingProcess08_impTRUE                     
## ManufacturingProcess09_impTRUE                     
## ManufacturingProcess10_impTRUE                     
## ManufacturingProcess11_impTRUE                     
## ManufacturingProcess12_impTRUE                     
## ManufacturingProcess13_impTRUE                     
## ManufacturingProcess14_impTRUE                     
## ManufacturingProcess15_impTRUE                     
## ManufacturingProcess16_impTRUE                     
## ManufacturingProcess17_impTRUE                     
## ManufacturingProcess18_impTRUE                     
## ManufacturingProcess19_impTRUE                     
## ManufacturingProcess20_impTRUE                     
## ManufacturingProcess21_impTRUE                     
## ManufacturingProcess22_impTRUE                     
## ManufacturingProcess23_impTRUE                     
## ManufacturingProcess24_impTRUE                     
## ManufacturingProcess25_impTRUE                     
## ManufacturingProcess26_impTRUE                     
## ManufacturingProcess27_impTRUE                     
## ManufacturingProcess28_impTRUE                     
## ManufacturingProcess29_impTRUE                     
## ManufacturingProcess30_impTRUE                     
## ManufacturingProcess31_impTRUE                     
## ManufacturingProcess32_impTRUE                     
## ManufacturingProcess33_impTRUE                     
## ManufacturingProcess34_impTRUE                     
## ManufacturingProcess35_impTRUE                     
## ManufacturingProcess36_impTRUE                     
## ManufacturingProcess37_impTRUE                     
## ManufacturingProcess38_impTRUE                     
## ManufacturingProcess39_impTRUE                     
## ManufacturingProcess40_impTRUE                     
## ManufacturingProcess41_impTRUE                     
## ManufacturingProcess42_impTRUE                     
## ManufacturingProcess43_impTRUE                     
## ManufacturingProcess44_impTRUE                     
## ManufacturingProcess45_impTRUE                     
## 
##                Comp 1 Comp 2 Comp 3
## SS loadings     1.112  1.369  1.242
## Proportion Var  0.010  0.012  0.011
## Cumulative Var  0.010  0.022  0.032

in the two PLS models tested: ncomp 3 and ncomp 1. a majority of the biological materials are used as predictors except biological material 7 for comp 1 in both models. biological material 7 is used for comp 3. around half of the manufacturing processes are used to predict the yield. This may indicate that the biological materials have the most influence on the yield.

varImp(threecomp) #importance in the 3 latent variable model variables.
##                                     Overall
## BiologicalMaterial01           0.0719437141
## BiologicalMaterial02           0.0855289360
## BiologicalMaterial03           0.0811498501
## BiologicalMaterial04           0.0708180298
## BiologicalMaterial05           0.0304753507
## BiologicalMaterial06           0.0821258726
## BiologicalMaterial07           0.0412765647
## BiologicalMaterial08           0.0772627370
## BiologicalMaterial09           0.0337012505
## BiologicalMaterial10           0.0524852896
## BiologicalMaterial11           0.0748167555
## BiologicalMaterial12           0.0758378014
## ManufacturingProcess01         0.0268683197
## ManufacturingProcess02         0.0505015392
## ManufacturingProcess03         0.0101553570
## ManufacturingProcess04         0.0540081087
## ManufacturingProcess05         0.0280197715
## ManufacturingProcess06         0.0766476793
## ManufacturingProcess07         0.0272803696
## ManufacturingProcess08         0.0062560395
## ManufacturingProcess09         0.1021141532
## ManufacturingProcess10         0.0443906051
## ManufacturingProcess11         0.0663840081
## ManufacturingProcess12         0.0646665065
## ManufacturingProcess13         0.1119905526
## ManufacturingProcess14         0.0121168424
## ManufacturingProcess15         0.0412995544
## ManufacturingProcess16         0.0067601415
## ManufacturingProcess17         0.0990486372
## ManufacturingProcess18         0.0374727510
## ManufacturingProcess19         0.0397113185
## ManufacturingProcess20         0.0435332092
## ManufacturingProcess21         0.0158719950
## ManufacturingProcess22         0.0072871263
## ManufacturingProcess23         0.0227509721
## ManufacturingProcess24         0.0372525332
## ManufacturingProcess25         0.0076477163
## ManufacturingProcess26         0.0114006449
## ManufacturingProcess27         0.0068426037
## ManufacturingProcess28         0.0548534061
## ManufacturingProcess29         0.0283002328
## ManufacturingProcess30         0.0356758676
## ManufacturingProcess31         0.0135885194
## ManufacturingProcess32         0.1211090684
## ManufacturingProcess33         0.0703036345
## ManufacturingProcess34         0.0594050721
## ManufacturingProcess35         0.0406110182
## ManufacturingProcess36         0.1106298804
## ManufacturingProcess37         0.0233492310
## ManufacturingProcess38         0.0192908740
## ManufacturingProcess39         0.0108529941
## ManufacturingProcess40         0.0044560263
## ManufacturingProcess41         0.0027232624
## ManufacturingProcess42         0.0151153456
## ManufacturingProcess43         0.0211919069
## ManufacturingProcess44         0.0093542375
## ManufacturingProcess45         0.0086845947
## Yield_impTRUE                  0.0000000000
## BiologicalMaterial01_impTRUE   0.0000000000
## BiologicalMaterial02_impTRUE   0.0000000000
## BiologicalMaterial03_impTRUE   0.0000000000
## BiologicalMaterial04_impTRUE   0.0000000000
## BiologicalMaterial05_impTRUE   0.0000000000
## BiologicalMaterial06_impTRUE   0.0000000000
## BiologicalMaterial07_impTRUE   0.0000000000
## BiologicalMaterial08_impTRUE   0.0000000000
## BiologicalMaterial09_impTRUE   0.0000000000
## BiologicalMaterial10_impTRUE   0.0000000000
## BiologicalMaterial11_impTRUE   0.0000000000
## BiologicalMaterial12_impTRUE   0.0000000000
## ManufacturingProcess01_impTRUE 0.0017519877
## ManufacturingProcess02_impTRUE 0.0017519877
## ManufacturingProcess03_impTRUE 0.0077269176
## ManufacturingProcess04_impTRUE 0.0017519877
## ManufacturingProcess05_impTRUE 0.0017519877
## ManufacturingProcess06_impTRUE 0.0017519877
## ManufacturingProcess07_impTRUE 0.0017519877
## ManufacturingProcess08_impTRUE 0.0017519877
## ManufacturingProcess09_impTRUE 0.0000000000
## ManufacturingProcess10_impTRUE 0.0046171518
## ManufacturingProcess11_impTRUE 0.0047070469
## ManufacturingProcess12_impTRUE 0.0017519877
## ManufacturingProcess13_impTRUE 0.0000000000
## ManufacturingProcess14_impTRUE 0.0004758256
## ManufacturingProcess15_impTRUE 0.0000000000
## ManufacturingProcess16_impTRUE 0.0000000000
## ManufacturingProcess17_impTRUE 0.0000000000
## ManufacturingProcess18_impTRUE 0.0000000000
## ManufacturingProcess19_impTRUE 0.0000000000
## ManufacturingProcess20_impTRUE 0.0000000000
## ManufacturingProcess21_impTRUE 0.0000000000
## ManufacturingProcess22_impTRUE 0.0017519877
## ManufacturingProcess23_impTRUE 0.0017519877
## ManufacturingProcess24_impTRUE 0.0017519877
## ManufacturingProcess25_impTRUE 0.0014452909
## ManufacturingProcess26_impTRUE 0.0014452909
## ManufacturingProcess27_impTRUE 0.0014452909
## ManufacturingProcess28_impTRUE 0.0014452909
## ManufacturingProcess29_impTRUE 0.0014452909
## ManufacturingProcess30_impTRUE 0.0014452909
## ManufacturingProcess31_impTRUE 0.0014452909
## ManufacturingProcess32_impTRUE 0.0000000000
## ManufacturingProcess33_impTRUE 0.0014452909
## ManufacturingProcess34_impTRUE 0.0014452909
## ManufacturingProcess35_impTRUE 0.0014452909
## ManufacturingProcess36_impTRUE 0.0014452909
## ManufacturingProcess37_impTRUE 0.0000000000
## ManufacturingProcess38_impTRUE 0.0000000000
## ManufacturingProcess39_impTRUE 0.0000000000
## ManufacturingProcess40_impTRUE 0.0017519877
## ManufacturingProcess41_impTRUE 0.0017519877
## ManufacturingProcess42_impTRUE 0.0000000000
## ManufacturingProcess43_impTRUE 0.0000000000
## ManufacturingProcess44_impTRUE 0.0000000000
## ManufacturingProcess45_impTRUE 0.0000000000
#arrange the variables by importance
varImp(plsTunecmp3)$importance |> 
  arrange(-Overall)
##                                    Overall
## ManufacturingProcess32         100.0000000
## ManufacturingProcess13          92.4708233
## ManufacturingProcess36          91.3473135
## ManufacturingProcess09          84.3158605
## ManufacturingProcess17          81.7846579
## BiologicalMaterial02            70.6214136
## BiologicalMaterial06            67.8114973
## BiologicalMaterial03            67.0055935
## BiologicalMaterial08            63.7959965
## ManufacturingProcess06          63.2881421
## BiologicalMaterial12            62.6194243
## BiologicalMaterial11            61.7763447
## BiologicalMaterial01            59.4040687
## BiologicalMaterial04            58.4745889
## ManufacturingProcess33          58.0498517
## ManufacturingProcess11          54.8134083
## ManufacturingProcess12          53.3952637
## ManufacturingProcess34          49.0508868
## ManufacturingProcess28          45.2925672
## ManufacturingProcess04          44.5946034
## BiologicalMaterial10            43.3372086
## ManufacturingProcess02          41.6992220
## ManufacturingProcess10          36.6534114
## ManufacturingProcess20          35.9454579
## ManufacturingProcess15          34.1011247
## BiologicalMaterial07            34.0821420
## ManufacturingProcess35          33.5325989
## ManufacturingProcess19          32.7897151
## ManufacturingProcess18          30.9413254
## ManufacturingProcess24          30.7594912
## ManufacturingProcess30          29.4576352
## BiologicalMaterial09            27.8271899
## BiologicalMaterial05            25.1635581
## ManufacturingProcess29          23.3675588
## ManufacturingProcess05          23.1359814
## ManufacturingProcess07          22.5254557
## ManufacturingProcess01          22.1852253
## ManufacturingProcess37          19.2795067
## ManufacturingProcess23          18.7855232
## ManufacturingProcess43          17.4981999
## ManufacturingProcess38          15.9285132
## ManufacturingProcess21          13.1055380
## ManufacturingProcess42          12.4807711
## ManufacturingProcess31          11.2200676
## ManufacturingProcess14          10.0049010
## ManufacturingProcess26           9.4135353
## ManufacturingProcess39           8.9613389
## ManufacturingProcess03           8.3852986
## ManufacturingProcess44           7.7238126
## ManufacturingProcess45           7.1708872
## ManufacturingProcess03_impTRUE   6.3801313
## ManufacturingProcess25           6.3147346
## ManufacturingProcess22           6.0169948
## ManufacturingProcess27           5.6499516
## ManufacturingProcess16           5.5818624
## ManufacturingProcess08           5.1656243
## ManufacturingProcess11_impTRUE   3.8866180
## ManufacturingProcess10_impTRUE   3.8123914
## ManufacturingProcess40           3.6793498
## ManufacturingProcess41           2.2486032
## ManufacturingProcess01_impTRUE   1.4466198
## ManufacturingProcess02_impTRUE   1.4466198
## ManufacturingProcess04_impTRUE   1.4466198
## ManufacturingProcess05_impTRUE   1.4466198
## ManufacturingProcess06_impTRUE   1.4466198
## ManufacturingProcess07_impTRUE   1.4466198
## ManufacturingProcess08_impTRUE   1.4466198
## ManufacturingProcess12_impTRUE   1.4466198
## ManufacturingProcess22_impTRUE   1.4466198
## ManufacturingProcess23_impTRUE   1.4466198
## ManufacturingProcess24_impTRUE   1.4466198
## ManufacturingProcess40_impTRUE   1.4466198
## ManufacturingProcess41_impTRUE   1.4466198
## ManufacturingProcess25_impTRUE   1.1933796
## ManufacturingProcess26_impTRUE   1.1933796
## ManufacturingProcess27_impTRUE   1.1933796
## ManufacturingProcess28_impTRUE   1.1933796
## ManufacturingProcess29_impTRUE   1.1933796
## ManufacturingProcess30_impTRUE   1.1933796
## ManufacturingProcess31_impTRUE   1.1933796
## ManufacturingProcess33_impTRUE   1.1933796
## ManufacturingProcess34_impTRUE   1.1933796
## ManufacturingProcess35_impTRUE   1.1933796
## ManufacturingProcess36_impTRUE   1.1933796
## ManufacturingProcess14_impTRUE   0.3928902
## Yield_impTRUE                    0.0000000
## BiologicalMaterial01_impTRUE     0.0000000
## BiologicalMaterial02_impTRUE     0.0000000
## BiologicalMaterial03_impTRUE     0.0000000
## BiologicalMaterial04_impTRUE     0.0000000
## BiologicalMaterial05_impTRUE     0.0000000
## BiologicalMaterial06_impTRUE     0.0000000
## BiologicalMaterial07_impTRUE     0.0000000
## BiologicalMaterial08_impTRUE     0.0000000
## BiologicalMaterial09_impTRUE     0.0000000
## BiologicalMaterial10_impTRUE     0.0000000
## BiologicalMaterial11_impTRUE     0.0000000
## BiologicalMaterial12_impTRUE     0.0000000
## ManufacturingProcess09_impTRUE   0.0000000
## ManufacturingProcess13_impTRUE   0.0000000
## ManufacturingProcess15_impTRUE   0.0000000
## ManufacturingProcess16_impTRUE   0.0000000
## ManufacturingProcess17_impTRUE   0.0000000
## ManufacturingProcess18_impTRUE   0.0000000
## ManufacturingProcess19_impTRUE   0.0000000
## ManufacturingProcess20_impTRUE   0.0000000
## ManufacturingProcess21_impTRUE   0.0000000
## ManufacturingProcess32_impTRUE   0.0000000
## ManufacturingProcess37_impTRUE   0.0000000
## ManufacturingProcess38_impTRUE   0.0000000
## ManufacturingProcess39_impTRUE   0.0000000
## ManufacturingProcess42_impTRUE   0.0000000
## ManufacturingProcess43_impTRUE   0.0000000
## ManufacturingProcess44_impTRUE   0.0000000
## ManufacturingProcess45_impTRUE   0.0000000

most important predictors listed above

toppredict <- varImp(plsTunecmp3)$importance |> 
  arrange(-Overall) |> 
  head(10)

#correlation heatmap
imputeCMP |> 
  select(c("Yield", row.names(toppredict))) |> 
  cor() |> 
  corrplot(method = "number", number.cex = 0.7, type = "upper")

Manufacturing process 32 appears to have the highest absolute correlation for yield. It also has the highest importance. the top manufacturing processes do have a higher absolute value of correlations to yield more than the top biological materials but not by much. a higher percentage of biological materials still are used more as predictors than the manufacturing processes. The manufacturing processes that are picked however do have a strong correlation with the yield and greater importance.
The PLS model suggests importance to the top manufacturing processes and biological materials. Focusing on monitoring and improving those processes and materials may help produce higher yields. More resources potentially could be used for the highly correlated materials and processes and less resources on the less correlated materials and processes.