DATA 624 Homework Seven - Linear Regression

Overview

This is homework seven of the Fall 2024 edition of DATA 624. The assignment covers questions 6.2 and 6.3 from the exercise section of chapter 6 in Applied Predictive Modeling by Max Kuhn and Kjell Johnson

First, most of the requried libraries.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(elasticnet)

## Loading required package: lars
## Loaded lars 1.3

library(caret)

## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift

library(lars)
library(pls)

## 
## Attaching package: 'pls'
## 
## The following object is masked from 'package:caret':
## 
##     R2
## 
## The following object is masked from 'package:stats':
## 
##     loadings

library(stats)
library(corrplot)

## corrplot 0.95 loaded
## 
## Attaching package: 'corrplot'
## 
## The following object is masked from 'package:pls':
## 
##     corrplot

library(MASS)

## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select

library(robustbase)  
library(glmnet)

## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Loaded glmnet 4.1-8

6.2 Predicting permeability

This exercise consists of component questions A through F that are about developing permeability prediction models for a pharmaceutical company.

A. Load library and data

The first question just says to load the library for the book and the dataset used.

library(AppliedPredictiveModeling)
data(permeability)

B. Filtering out predictors with NearZeroVar

The question states: ” The fingerprint predictors indicate the presence or absence of substructures of a molecule and are often sparse meaning that relatively few of the molecules contain each substructure. Filter out the predictors that have low frequencies using the nearZeroVar function from the caret package. How many predictors are left for modeling?”

Answer: there are 388 predictors left after filtering out those with low frequencies.

nzv_fingers <- preProcess(fingerprints, method = c("nzv")) |>
    predict(fingerprints)

prem_predict <- ncol(nzv_fingers)
prem_predict

## [1] 388

C. Preprocessing and PLS modeling

The question states: “Split the data into a training and a test set, pre-process the data, and tune a PLS model. How many latent variables are optimal and what is the corresponding resampled estimate of R2?”

Answer:

There are 10 optimal latent variables and the corresponding resampled estimate of R2 is 0.36914.

nzv_perm_s <- cbind(nzv_fingers, permeability)

nzv_perm <- preProcess(nzv_perm_s, method = c("BoxCox", "center", "scale", "knnImpute")) |>
    predict(nzv_perm_s) 

part <- createDataPartition(nzv_perm[permeability], p = 0.75, list = FALSE)
nzv_perm_train <- nzv_perm[part,]
nzv_perm_test <- nzv_perm[-part,]

pls_control <- trainControl(method = "LOOCV")

pls_train <- train(permeability ~ .,
  data = nzv_perm_train,
  method = "pls",
  tuneLength = 20,
  trControl = pls_control)

pls_train

## Partial Least Squares 
## 
## 102 samples
## 388 predictors
## 
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation 
## Summary of sample sizes: 101, 101, 101, 101, 101, 101, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE       Rsquared   MAE      
##    1     0.8781172  0.1819578  0.6729183
##    2     0.8292031  0.2712637  0.6167052
##    3     0.8111174  0.3106445  0.6081974
##    4     0.8358035  0.2981609  0.6292403
##    5     0.8134162  0.3507931  0.5976846
##    6     0.8137202  0.3396873  0.5801619
##    7     0.7865501  0.3760534  0.5630216
##    8     0.7704538  0.4047623  0.5462394
##    9     0.7563150  0.4337708  0.5623967
##   10     0.7428234  0.4578516  0.5524178
##   11     0.7429511  0.4672472  0.5597181
##   12     0.7598164  0.4636119  0.5857559
##   13     0.7712966  0.4587113  0.5939703
##   14     0.7833992  0.4554922  0.5997252
##   15     0.7976450  0.4455459  0.6057548
##   16     0.8235224  0.4323917  0.6299468
##   17     0.8265127  0.4379625  0.6334531
##   18     0.8366675  0.4321632  0.6458152
##   19     0.8308357  0.4389615  0.6446405
##   20     0.8291036  0.4427953  0.6515345
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 10.

opt_lat_vars <- pls_train$bestTune$ncomp
rs_r2 <- max(pls_train$results$Rsquared)

cat("There are ", opt_lat_vars, " optimal latent variables and the corresponding resampled estimate of R2 is ",  rs_r2)

## There are  10  optimal latent variables and the corresponding resampled estimate of R2 is  0.4672472

D. Predict the response for the test set. What is the test set estimate of R2?

Answer:

The test set estimate of R2 is 0.5593098

predict(pls_train, nzv_perm_test) %>% 
  data.frame(pred = .,obs = nzv_perm_test[,"permeability"]) %>% 
  defaultSummary()

##      RMSE  Rsquared       MAE 
## 0.9063238 0.4024881 0.7284953

E. Try building other models discussed in this chapter. Do any have better predictive performance?

Answer:

First model: robust linear regression

Results: Samples: 102 Predictors: 388

The robust linear model yielded an RMSE of 0.9446 and an R² of 0.3589, which is lower than the previous model’s R² of 0.5593, indicating that the model did not perform as well in terms of explained variance. The MAE of 0.7090 is slightly higher, suggesting less accuracy in predicting permeability compared to the earlier model. Overall, the previous model with a better R² and lower RMSE was more effective in capturing the relationship between predictors and permeability.

nzv_fingers_lr <- preProcess(fingerprints, method = c("nzv")) |>
    predict(fingerprints)

nzv_perm_rl <- cbind(nzv_fingers_lr, permeability)

nzv_perm_lr <- preProcess(nzv_perm_rl, method = c("BoxCox", "center", "scale", "knnImpute")) |>
    predict(nzv_perm_rl) 

part_lr <- createDataPartition(nzv_perm_rl[permeability], p = 0.75, list = FALSE)
perm_train_rl <- nzv_perm_lr[part_lr,]
perm_test_rl <- nzv_perm_lr[-part_lr,]

control_lr <- trainControl(method = "LOOCV")

robust_lines <- train(permeability ~ .,
  data = perm_train_rl,
  method = "rlm",
  trControl = control_lr,
  preProcess = "pca"
  
)

robust_lines

## Robust Linear Model 
## 
## 102 samples
## 388 predictors
## 
## Pre-processing: principal component signal extraction (388), centered
##  (388), scaled (388) 
## Resampling: Leave-One-Out Cross-Validation 
## Summary of sample sizes: 101, 101, 101, 101, 101, 101, ... 
## Resampling results across tuning parameters:
## 
##   intercept  psi           RMSE       Rsquared   MAE      
##   FALSE      psi.huber     0.8976401  0.2227545  0.6643225
##   FALSE      psi.hampel    0.8872977  0.2296714  0.6624136
##   FALSE      psi.bisquare  0.8772084  0.2463988  0.6370917
##    TRUE      psi.huber     0.8494446  0.2596366  0.5852497
##    TRUE      psi.hampel    0.8498867  0.2595142  0.5938601
##    TRUE      psi.bisquare  0.8557224  0.2713834  0.5979869
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were intercept = TRUE and psi = psi.huber.

predict(robust_lines, perm_test_rl) %>% 
  data.frame(pred = .,obs = perm_test_rl[,"permeability"]) %>% 
  defaultSummary()

##      RMSE  Rsquared       MAE 
## 1.0322024 0.2818927 0.8391725

Second model: Elastic net regression Results: Samples: 102 Predictors: 388

The Elastic Net model, using bootstrapped resampling with 10 repetitions, resulted in an RMSE of 0.6883, an R² of 0.5943, and an MAE of 0.5342. This model explored a grid of lambda and fraction parameters and selected the optimal values of lambda = 0.1 and fraction = 0.15 based on the smallest RMSE. The resampling process indicates that the Elastic Net model performs moderately well, with an R² value showing that it captures about 59.4% of the variance in the data. The relatively low MAE further indicates that the model is accurate in predicting permeability values with minimal error.

ctrl_enet <- trainControl(method = "boot", number = 10)

enet_g <- expand.grid(.lambda = c(0, 0.01, .1), .fraction = seq(.05, 1, length = 20))

enet_reg <- train(permeability ~ .,
  data = nzv_perm_train,
  method = "enet",
  tuneGrid = enet_g,
  trControl = ctrl_enet)

## Warning: model fit failed for Resample01: lambda=0.00, fraction=1 Error in if (zmin < gamhat) { : missing value where TRUE/FALSE needed

## Warning: model fit failed for Resample02: lambda=0.00, fraction=1 Error in if (zmin < gamhat) { : missing value where TRUE/FALSE needed

## Warning: model fit failed for Resample03: lambda=0.00, fraction=1 Error in if (zmin < gamhat) { : missing value where TRUE/FALSE needed

## Warning: model fit failed for Resample07: lambda=0.10, fraction=1 Error in elasticnet::enet(as.matrix(x), y, lambda = param$lambda) : 
##   Some of the columns of x have zero variance

## Warning: model fit failed for Resample07: lambda=0.01, fraction=1 Error in elasticnet::enet(as.matrix(x), y, lambda = param$lambda) : 
##   Some of the columns of x have zero variance

## Warning: model fit failed for Resample07: lambda=0.00, fraction=1 Error in elasticnet::enet(as.matrix(x), y, lambda = param$lambda) : 
##   Some of the columns of x have zero variance

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.

enet_reg

## Elasticnet 
## 
## 102 samples
## 388 predictors
## 
## No pre-processing
## Resampling: Bootstrapped (10 reps) 
## Summary of sample sizes: 102, 102, 102, 102, 102, 102, ... 
## Resampling results across tuning parameters:
## 
##   lambda  fraction  RMSE          Rsquared    MAE         
##   0.00    0.05      4.394198e+18  0.13963134  2.524463e+18
##   0.00    0.10      8.793506e+18  0.11951013  5.061322e+18
##   0.00    0.15      1.319305e+19  0.11328248  7.598182e+18
##   0.00    0.20      1.759264e+19  0.11084286  1.013504e+19
##   0.00    0.25      2.199226e+19  0.11246765  1.267190e+19
##   0.00    0.30      2.639190e+19  0.11519907  1.520876e+19
##   0.00    0.35      3.079153e+19  0.11690819  1.774563e+19
##   0.00    0.40      3.519118e+19  0.11846028  2.028249e+19
##   0.00    0.45      3.959082e+19  0.11950352  2.281935e+19
##   0.00    0.50      4.399047e+19  0.11780550  2.535622e+19
##   0.00    0.55      4.839012e+19  0.11561806  2.789308e+19
##   0.00    0.60      5.278977e+19  0.10844410  3.042994e+19
##   0.00    0.65      5.718942e+19  0.09850609  3.296680e+19
##   0.00    0.70      6.158907e+19  0.08941738  3.550367e+19
##   0.00    0.75      6.598872e+19  0.08106742  3.804053e+19
##   0.00    0.80      7.038838e+19  0.07299187  4.057739e+19
##   0.00    0.85      7.478803e+19  0.06617802  4.311426e+19
##   0.00    0.90      7.918768e+19  0.06120767  4.565112e+19
##   0.00    0.95      8.358734e+19  0.05570198  4.818798e+19
##   0.00    1.00      8.798699e+19  0.05585483  5.072484e+19
##   0.01    0.05      8.040407e-01  0.38399608  6.163561e-01
##   0.01    0.10      9.057823e-01  0.32839687  6.578943e-01
##   0.01    0.15      1.012421e+00  0.32255602  7.070616e-01
##   0.01    0.20      1.100088e+00  0.31823323  7.460490e-01
##   0.01    0.25      1.179027e+00  0.31748344  7.862242e-01
##   0.01    0.30      1.252522e+00  0.32413778  8.313183e-01
##   0.01    0.35      1.336961e+00  0.31805691  8.900324e-01
##   0.01    0.40      1.403742e+00  0.33177944  9.327543e-01
##   0.01    0.45      1.484727e+00  0.32659500  9.804213e-01
##   0.01    0.50      1.574956e+00  0.31245560  1.039180e+00
##   0.01    0.55      1.666594e+00  0.29357814  1.101546e+00
##   0.01    0.60      1.761351e+00  0.26982362  1.167418e+00
##   0.01    0.65      1.850694e+00  0.24952984  1.229821e+00
##   0.01    0.70      1.937829e+00  0.23145432  1.289463e+00
##   0.01    0.75      2.020336e+00  0.21890626  1.346155e+00
##   0.01    0.80      2.097705e+00  0.21032750  1.401407e+00
##   0.01    0.85      2.171083e+00  0.20339907  1.455422e+00
##   0.01    0.90      2.244306e+00  0.19712141  1.510278e+00
##   0.01    0.95      2.318920e+00  0.19051065  1.564348e+00
##   0.01    1.00      2.388428e+00  0.18357246  1.613712e+00
##   0.10    0.05      7.846397e-01  0.41544994  6.204690e-01
##   0.10    0.10      7.477181e-01  0.39263831  5.717002e-01
##   0.10    0.15      7.632911e-01  0.36531357  5.731313e-01
##   0.10    0.20      7.872249e-01  0.36000783  5.846228e-01
##   0.10    0.25      8.069199e-01  0.35900920  5.960359e-01
##   0.10    0.30      8.221232e-01  0.36275591  6.051782e-01
##   0.10    0.35      8.292357e-01  0.36581136  6.074162e-01
##   0.10    0.40      8.352146e-01  0.36508725  6.113573e-01
##   0.10    0.45      8.371041e-01  0.36716001  6.116030e-01
##   0.10    0.50      8.407872e-01  0.36738913  6.130232e-01
##   0.10    0.55      8.433480e-01  0.36693080  6.157443e-01
##   0.10    0.60      8.467317e-01  0.36564692  6.200721e-01
##   0.10    0.65      8.490593e-01  0.36503848  6.240524e-01
##   0.10    0.70      8.528130e-01  0.36298615  6.291960e-01
##   0.10    0.75      8.582206e-01  0.35890671  6.356003e-01
##   0.10    0.80      8.643251e-01  0.35355880  6.415157e-01
##   0.10    0.85      8.708280e-01  0.34759902  6.473002e-01
##   0.10    0.90      8.769254e-01  0.34186247  6.523038e-01
##   0.10    0.95      8.829158e-01  0.33655499  6.569560e-01
##   0.10    1.00      8.880407e-01  0.33182394  6.611402e-01
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were fraction = 0.1 and lambda = 0.1.

predict(enet_reg, nzv_perm_test) %>% 
  data.frame(pred = .,obs = nzv_perm_test[,"permeability"]) %>% 
  defaultSummary()

##      RMSE  Rsquared       MAE 
## 0.7336849 0.5555956 0.6215807

Overall conclusion: When comparing the models overall, the Elastic Net model performed the best, with the highest R² (0.5943) and the lowest RMSE (0.6883). The robust linear model, with an R² of 0.3589 and RMSE of 0.9446, was less effective in explaining the variance and predicting permeability accurately compared to the Elastic Net.

6.3 Modeling biological raw materials and their manufacturing process

This exercise consists of component questions A through F

A. Load the data

data(ChemicalManufacturingProcess)

B. Imputing missing values

The question states: “A small percentage of cells in the predictor set contain missing values. Use an imputation function to fill in these missing values.”

Answer:

I used median impute to insert the median value where a missing value is.

resolve_miss <- preProcess(ChemicalManufacturingProcess, method = c("medianImpute")) |>
  predict(ChemicalManufacturingProcess)

C. Preprocessing and tuning a model

The question states: “Split the data into a training and a test set, pre-process the data, and tune a model of your choice from this chapter. What is the optimal value of the performance metric?”

Answer:

I will use Ridge Regression. This is a model that which imposes a penalty on the coefficients based on the different lambdas that we will be training over.

First, preprocessing and setting up the train and test sets:

model_tuner <- preProcess(resolve_miss, method = c("center", "scale")) |>
  predict(resolve_miss)

ridge_slices <- createDataPartition(model_tuner$Yield, p = 0.75, list = FALSE)

ridge_train <- model_tuner[ridge_slices,]  
ridge_test <- model_tuner[-ridge_slices,]

cat("The training set has dimensions of: \n")

## The training set has dimensions of:

dim(ridge_train)

## [1] 132  58

Finding optimal performance metric: The optimal value of the performance metric RMSE is 1.831127, with a lambda of 0.1.

ridge_control <- trainControl(method = "boot", number = 10)
ridge_roller <- expand.grid(.alpha = 0, .lambda = seq(0, .1, length = 15))

ridge_reg <- train(Yield ~ ., 
                   data = ridge_train,
                   method = "glmnet", 
                   tuneGrid = ridge_roller,
                   trControl = ridge_control)

ridge_reg

## glmnet 
## 
## 132 samples
##  57 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (10 reps) 
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ... 
## Resampling results across tuning parameters:
## 
##   lambda       RMSE      Rsquared   MAE      
##   0.000000000  1.196050  0.3507894  0.7094885
##   0.007142857  1.196050  0.3507894  0.7094885
##   0.014285714  1.196050  0.3507894  0.7094885
##   0.021428571  1.196050  0.3507894  0.7094885
##   0.028571429  1.196050  0.3507894  0.7094885
##   0.035714286  1.196050  0.3507894  0.7094885
##   0.042857143  1.196050  0.3507894  0.7094885
##   0.050000000  1.196246  0.3506916  0.7098193
##   0.057142857  1.196356  0.3503859  0.7096693
##   0.064285714  1.193111  0.3505462  0.7080116
##   0.071428571  1.188541  0.3490222  0.7047708
##   0.078571429  1.176216  0.3468999  0.7002409
##   0.085714286  1.161984  0.3455241  0.6954936
##   0.092857143  1.148870  0.3447713  0.6910561
##   0.100000000  1.136762  0.3443828  0.6869802
## 
## Tuning parameter 'alpha' was held constant at a value of 0
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0 and lambda = 0.1.

D. Predicting test set response

The question states: Predict the response for the test set. What is the value of the performance metric and how does this compare with the resampled performance metric on the training set?

Answer: The RMSE for the test set is 0.7467, which is significantly lower than the resampled RMSE of 1.8311 on the training set. This suggests that the model generalizes well to the test set, showing better performance than during the resampling on the training set.

predict(ridge_reg, ridge_test) %>% 
  data.frame(pred = .,obs = ridge_test$Yield) %>% 
  defaultSummary()

##      RMSE  Rsquared       MAE 
## 1.0316564 0.3028784 0.6387433

E. Assessing predictors value

The question states: Which predictors are most important in the model you have trained? Do either the biological or process predictors dominate the list?

Answer: There are 8 process predictors and 7 biological predictors in the top 15. Neither process dominates the list.

plot(varImp(ridge_reg), top = 15)

  ggtitle("Top 15 predictors")

## $title
## [1] "Top 15 predictors"
## 
## attr(,"class")
## [1] "labels"

F. Reviewing predictors and response relationships

The question states: “Explore the relationships between each of the top predictors and the response. How could this information be helpful in improving yield in future runs of the manufacturing process?”

Answer: The relationships between the top predictors and yield show some clear trends. For example, ManufacturingProcess32 and BiologicalMaterial06 have strong positive correlations, meaning boosting them could improve yield in future runs. On the other hand, ManufacturingProcess36 and ManufacturingProcess13 negatively impact yield, so controlling those might help reduce their drag on the process.

ridge_15 <- varImp(ridge_reg)$importance
ridge_15 <- as.data.frame(ridge_15)
ridge_predict <- rownames(ridge_15)[order(ridge_15[, 1], decreasing = TRUE)[1:15]]

for (rider in ridge_predict) {
  yields <- ggplot(ridge_train, aes(x = .data[[rider]], y = .data[["Yield"]])) + 
    geom_point() + 
    geom_smooth(method = "lm", se = FALSE, color = "skyblue") + 
    ggtitle(paste("Relationship between", rider, "and Yield")) +
    theme_minimal()
  
  print(yields)
}

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

ridge_corrs <- sapply(ridge_predict, function(x) cor(ridge_train[[x]], ridge_train$Yield, use = "complete.obs"))
ridges <- data.frame(Predictor = ridge_predict, Correlation = ridge_corrs)
ridges

##                                     Predictor Correlation
## ManufacturingProcess16 ManufacturingProcess16 -0.11954627
## ManufacturingProcess32 ManufacturingProcess32  0.61144797
## ManufacturingProcess09 ManufacturingProcess09  0.52399357
## ManufacturingProcess34 ManufacturingProcess34  0.27984300
## ManufacturingProcess04 ManufacturingProcess04 -0.20106926
## ManufacturingProcess45 ManufacturingProcess45  0.09796569
## ManufacturingProcess19 ManufacturingProcess19  0.08636538
## BiologicalMaterial03     BiologicalMaterial03  0.43941513
## ManufacturingProcess13 ManufacturingProcess13 -0.48442916
## ManufacturingProcess36 ManufacturingProcess36 -0.53070122
## ManufacturingProcess15 ManufacturingProcess15  0.19536228
## ManufacturingProcess28 ManufacturingProcess28  0.24254129
## ManufacturingProcess24 ManufacturingProcess24 -0.17173179
## ManufacturingProcess11 ManufacturingProcess11  0.37170717
## ManufacturingProcess37 ManufacturingProcess37 -0.10184077

DATA 624 Homework Seven - Linear Regression

Kevin Kirby

2024-10-20

Overview

6.2 Predicting permeability

A. Load library and data

B. Filtering out predictors with NearZeroVar

C. Preprocessing and PLS modeling

D. Predict the response for the test set. What is the test set estimate of R2?

E. Try building other models discussed in this chapter. Do any have better predictive performance?

6.3 Modeling biological raw materials and their manufacturing process

A. Load the data

B. Imputing missing values

C. Preprocessing and tuning a model

D. Predicting test set response

E. Assessing predictors value

F. Reviewing predictors and response relationships

DATA 624 Homework Seven - Linear Regression

Kevin Kirby

2024-10-20

Overview

6.2 Predicting permeability

A. Load library and data

B. Filtering out predictors with NearZeroVar

C. Preprocessing and PLS modeling

D. Predict the response for the test set. What is the test set estimate of R2?

E. Try building other models discussed in this chapter. Do any have better predictive performance?

F. Would you recommend any of your models to replace the permeability laboratory experiment?

6.3 Modeling biological raw materials and their manufacturing process

A. Load the data

B. Imputing missing values

C. Preprocessing and tuning a model

D. Predicting test set response

E. Assessing predictors value

F. Reviewing predictors and response relationships