6.2

Developing a model to predict permeability (See Sect. 1.4) could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug:

A

Start R and use these commands to load the data:

data(permeability)

B

The fingerprints predictors indicate the presense or absense of substructures of a molecule and are often sparse meaning that relatively few of the molecules contain each substructure. Filter out the predictors that have low frequencies using the nearZeroVar function from the caret package. How many are left for modeling?

fingerprints %>%
  nearZeroVar() %>%
  length()
## [1] 719

C

Split the data into training and test sets, pre-process the data, and tune a PLS model. How many latent variables are optimal and what is the corresponding resampled estimate of R2?

I will split it using a 80:20 split. I will center the data.

df <- as.data.frame(fingerprints[, nearZeroVar(fingerprints)]) %>%
  mutate(y = permeability)

# Make this reproducible
set.seed(42)

ind_train <- createDataPartition(df$y, times = 1, p = 0.8, list = FALSE)
traind_df <- df[ind_train, ]
testd_df <- df[-ind_train, ]

pls_model <- train(
  y ~ ., data = traind_df, method = "pls",
  center = TRUE,
  trControl = trainControl("cv", number = 10),
  tuneLength = 25
)

# Plot model RMSE vs different values of components
title <- paste("Training Set RMSE Minimized at",
               pls_model$bestTune$ncomp,
               "Components")
plot(pls_model, main = title)

pls_model$results %>%
  filter(ncomp == pls_model$bestTune$ncomp) %>%
  kable() %>%
  kable_styling()
ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD
5 14.29229 0.263486 10.77974 3.389964 0.2388523 2.7213

The optimal number of components in the PLS model is 5. This captures 26% of the variation in the permeability.

D

Predict the response of the test set. What is the test set estimate of R2 ?

# Make predictions
pls_predictions <- predict(pls_model, testd_df)
# Model performance metrics
results <- data.frame(Model = "PLS",
                      RMSE = caret::RMSE(pls_predictions, testd_df$y),
                      Rsquared = caret::R2(pls_predictions, testd_df$y))
results %>%
  kable() %>%
  kable_styling()
Model RMSE Rsquared
permeability PLS 13.23639 0.3016825

The R2 is 0.3.

E

Try building other models discussed in this chapter. Do any have better predictive performance? ### PCR Model

pcr_model <- train(
  y ~ ., data = traind_df, method = "pcr",
  center = TRUE,
  trControl = trainControl("cv", number = 10),
  tuneLength = 25
)

title <- paste("Training Set RMSE Minimized at",
               pcr_model$bestTune,
               "Components")
plot(pcr_model, main = title)

# Make predictions
pcr_predictions <- predict(pcr_model, testd_df)
# Model performance metrics
pcr_results <- data.frame(Model = "PCR",
                          RMSE = caret::RMSE(pcr_predictions, testd_df$y),
                          Rsquared = caret::R2(pcr_predictions, testd_df$y))
pcr_results %>%
  kable() %>%
  kable_styling()
Model RMSE Rsquared
permeability PCR 15.90717 0.0344595

Ridge Regression

x <- model.matrix(y ~ ., data = traind_df)
x_test <- model.matrix(y ~ ., data = testd_df)
rr_cv <- cv.glmnet(x, traind_df$y, alpha = 0)
rr_model <- glmnet(x, traind_df$y, alpha = 0, lambda = rr_cv$lambda.min)
rr_predictions <- as.vector(predict(rr_model, x_test))
rr_results <- data.frame(Model = "Ridge Regression",
                         RMSE = caret::RMSE(rr_predictions, testd_df$y),
                         Rsquared = caret::R2(rr_predictions, testd_df$y))
rr_results %>%
  kable() %>%
  kable_styling()
Model RMSE Rsquared
permeability Ridge Regression 14.59042 0.1426659

Lasso Regression

lr_cv <- cv.glmnet(x, traind_df$y, alpha = 1)
lr_model <- glmnet(x, traind_df$y, alpha = 1, lambda = lr_cv$lambda.min)
lr_predictions <- as.vector(predict(lr_model, x_test))
lr_results <- data.frame(Model = "Lasso Regression",
                         RMSE = caret::RMSE(lr_predictions, testd_df$y),
                         Rsquared = caret::R2(lr_predictions, testd_df$y))
lr_results %>%
  kable() %>%
  kable_styling()
Model RMSE Rsquared
permeability Lasso Regression 13.98552 0.2417885

Elastic Net Regession

en_model <- train(
  y ~ ., data = traind_df, method = "glmnet",
  trControl = trainControl("cv", number = 10),
  tuneLength = 10
)
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
# Best tuning parameters
en_model$bestTune
##    alpha   lambda
## 65   0.7 1.333236
en_predictions <- en_model %>% predict(x_test)
# Model performance metrics
en_results <- data.frame(Model = "Elastic Net Regression",
                         RMSE = caret::RMSE(en_predictions, testd_df$y),
                         Rsquared = caret::R2(en_predictions, testd_df$y))
en_results %>%
  kable() %>%
  kable_styling()
Model RMSE Rsquared
permeability Elastic Net Regression 14.08594 0.205574

Summary

pls_model$results %>%
  filter(ncomp == pls_model$bestTune$ncomp) %>%
  mutate("Model" = "PLS") %>%
  as.data.frame() %>%
  bind_rows(pcr_results) %>%
  bind_rows(rr_results) %>%
  bind_rows(lr_results) %>%
  bind_rows(en_results) %>%
  arrange(desc(Rsquared)) %>%
  kable() %>%
  kable_styling()
ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD Model
…1 5 14.29229 0.2634860 10.77974 3.389964 0.2388523 2.7213 PLS
permeability…2 NA 13.98552 0.2417885 NA NA NA NA Lasso Regression
permeability…3 NA 14.08594 0.2055740 NA NA NA NA Elastic Net Regression
permeability…4 NA 14.59042 0.1426659 NA NA NA NA Ridge Regression
permeability…5 NA 15.90717 0.0344595 NA NA NA NA PCR

Elasticnet model has the lowest R2.

F

Would you recommend any of your models to replace the permeability laboratory experiment?

hist(permeability)

We can see that most of permeability are under 10. The model’s accuracy is not good enough to replace lab test.

6.3

A chemical manufacturing process for a pharmaceutical produce was discussed in Sect. 1.4. In this problem, the objective is to understand the relationship between biological measurement of the raw materials (predictors), measurements of the manufacutring process (predictors), and the response of product yield. Biological predictors cannot be changed but can be used to assess the quality of the raw materials before processing. On the other hand, manufacturing process predictors can be changed in the manufacturing process. Improving product yield by 1% will boot revenue by approximately one hundred thousand dollars per batch:

A

Start R and use these commands to load the data:

library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)

B

A small percentage of cells in the predictor set contain missing values. Use an imputation function to fill in these missing values (e.g., see Sect. 3.8).

library(AppliedPredictiveModeling)
missmap(ChemicalManufacturingProcess)

use KNN to impute values

# Make this reproducible
(CHEM_knn_impute <- preProcess(ChemicalManufacturingProcess, method=c('knnImpute')))
## Created from 152 samples and 58 variables
## 
## Pre-processing:
##   - centered (58)
##   - ignored (0)
##   - 5 nearest neighbor imputation (58)
##   - scaled (58)
CHEM_dataframe <- predict(CHEM_knn_impute, ChemicalManufacturingProcess)
summary(CHEM_dataframe)
##      Yield         BiologicalMaterial01 BiologicalMaterial02
##  Min.   :-2.6692   Min.   :-2.5653      Min.   :-2.1858     
##  1st Qu.:-0.7716   1st Qu.:-0.6078      1st Qu.:-0.7457     
##  Median :-0.1119   Median :-0.1491      Median :-0.1484     
##  Mean   : 0.0000   Mean   : 0.0000      Mean   : 0.0000     
##  3rd Qu.: 0.7035   3rd Qu.: 0.6423      3rd Qu.: 0.7557     
##  Max.   : 3.3394   Max.   : 3.3597      Max.   : 2.2459     
##  BiologicalMaterial03 BiologicalMaterial04 BiologicalMaterial05
##  Min.   :-2.6830      Min.   :-1.6731      Min.   :-2.90576    
##  1st Qu.:-0.6811      1st Qu.:-0.6222      1st Qu.:-0.73944    
##  Median :-0.1212      Median :-0.1405      Median :-0.05891    
##  Mean   : 0.0000      Mean   : 0.0000      Mean   : 0.00000    
##  3rd Qu.: 0.6804      3rd Qu.: 0.4907      3rd Qu.: 0.70568    
##  Max.   : 2.6355      Max.   : 6.0523      Max.   : 3.38985    
##  BiologicalMaterial06 BiologicalMaterial07 BiologicalMaterial08
##  Min.   :-2.2184      Min.   :-0.1313      Min.   :-2.38535    
##  1st Qu.:-0.7622      1st Qu.:-0.1313      1st Qu.:-0.64225    
##  Median :-0.1202      Median :-0.1313      Median : 0.02249    
##  Mean   : 0.0000      Mean   : 0.0000      Mean   : 0.00000    
##  3rd Qu.: 0.6499      3rd Qu.:-0.1313      3rd Qu.: 0.56906    
##  Max.   : 2.7948      Max.   : 7.5723      Max.   : 2.43034    
##  BiologicalMaterial09 BiologicalMaterial10 BiologicalMaterial11
##  Min.   :-3.39629     Min.   :-1.7202      Min.   :-2.3116     
##  1st Qu.:-0.59627     1st Qu.:-0.5685      1st Qu.:-0.6505     
##  Median :-0.03627     Median :-0.1513      Median :-0.1811     
##  Mean   : 0.00000     Mean   : 0.0000      Mean   : 0.0000     
##  3rd Qu.: 0.67428     3rd Qu.: 0.3161      3rd Qu.: 0.5491     
##  Max.   : 2.96246     Max.   : 6.7920      Max.   : 2.4431     
##  BiologicalMaterial12 ManufacturingProcess01 ManufacturingProcess02
##  Min.   :-2.3914      Min.   :-6.149703      Min.   :-1.969253     
##  1st Qu.:-0.6074      1st Qu.:-0.223563      1st Qu.: 0.308956     
##  Median :-0.1033      Median : 0.105667      Median : 0.509627     
##  Mean   : 0.0000      Mean   : 0.001224      Mean   : 0.009518     
##  3rd Qu.: 0.7112      3rd Qu.: 0.503487      3rd Qu.: 0.568648     
##  Max.   : 2.5986      Max.   : 1.587202      Max.   : 0.686690     
##  ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05
##  Min.   :-3.10582       Min.   :-3.323233      Min.   :-2.577803     
##  1st Qu.:-0.42705       1st Qu.:-0.613828      1st Qu.:-0.487046     
##  Median : 0.37658       Median : 0.342432      Median :-0.086583     
##  Mean   : 0.04123       Mean   : 0.003213      Mean   :-0.002534     
##  3rd Qu.: 0.46587       3rd Qu.: 0.661186      3rd Qu.: 0.230347     
##  Max.   : 2.69818       Max.   : 2.254953      Max.   : 5.686954     
##  ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08
##  Min.   :-1.630631      Min.   :-0.9580199     Min.   :-1.111973     
##  1st Qu.:-0.630408      1st Qu.:-0.9580199     1st Qu.:-1.111973     
##  Median :-0.222910      Median :-0.9580199     Median : 0.894164     
##  Mean   :-0.006574      Mean   :-0.0009072     Mean   :-0.001759     
##  3rd Qu.: 0.480950      3rd Qu.: 1.0378549     3rd Qu.: 0.894164     
##  Max.   : 7.408415      Max.   : 1.0378549     Max.   : 0.894164     
##  ManufacturingProcess09 ManufacturingProcess10 ManufacturingProcess11
##  Min.   :-4.37787       Min.   :-2.18999       Min.   :-2.63442      
##  1st Qu.:-0.49799       1st Qu.:-0.62482       1st Qu.:-0.53867      
##  Median : 0.04519       Median :-0.10310       Median : 0.02020      
##  Mean   : 0.00000       Mean   : 0.02156       Mean   : 0.03163      
##  3rd Qu.: 0.55281       3rd Qu.: 0.54906       3rd Qu.: 0.71878      
##  Max.   : 2.39252       Max.   : 3.15768       Max.   : 2.95425      
##  ManufacturingProcess12 ManufacturingProcess13 ManufacturingProcess14
##  Min.   :-0.480694      Min.   :-2.37172       Min.   :-2.803712     
##  1st Qu.:-0.480694      1st Qu.:-0.59881       1st Qu.:-0.488202     
##  Median :-0.480694      Median : 0.09066       Median : 0.029921     
##  Mean   :-0.002731      Mean   : 0.00000       Mean   :-0.004071     
##  3rd Qu.:-0.480694      3rd Qu.: 0.68163       3rd Qu.: 0.520534     
##  Max.   : 2.068439      Max.   : 4.03046       Max.   : 3.688885     
##  ManufacturingProcess15 ManufacturingProcess16 ManufacturingProcess17
##  Min.   :-2.3137        Min.   :-12.98219      Min.   :-2.43850      
##  1st Qu.:-0.4960        1st Qu.: -0.01436      1st Qu.:-0.67597      
##  Median :-0.1273        Median :  0.06312      Median : 0.04507      
##  Mean   : 0.0000        Mean   :  0.00000      Mean   : 0.00000      
##  3rd Qu.: 0.3786        3rd Qu.:  0.15126      3rd Qu.: 0.60587      
##  Max.   : 3.3283        Max.   :  0.81376      Max.   : 4.53150      
##  ManufacturingProcess18 ManufacturingProcess19 ManufacturingProcess20
##  Min.   :-13.08836      Min.   :-3.0321        Min.   :-13.05542     
##  1st Qu.:  0.00903      1st Qu.:-0.6022        1st Qu.: -0.01063     
##  Median :  0.06890      Median :-0.1360        Median :  0.07318     
##  Mean   :  0.00000      Mean   : 0.0000        Mean   :  0.00000     
##  3rd Qu.:  0.14237      3rd Qu.: 0.4838        3rd Qu.:  0.15197     
##  Max.   :  0.43899      Max.   : 2.5846        Max.   :  0.58033     
##  ManufacturingProcess21 ManufacturingProcess22 ManufacturingProcess23
##  Min.   :-2.1018        Min.   :-1.6230324     Min.   :-1.814768     
##  1st Qu.:-0.5599        1st Qu.:-0.7223009     1st Qu.:-0.611797     
##  Median :-0.1745        Median :-0.1218132     Median :-0.010311     
##  Mean   : 0.0000        Mean   : 0.0003314     Mean   : 0.004726     
##  3rd Qu.: 0.2110        3rd Qu.: 0.7789183     3rd Qu.: 0.591175     
##  Max.   : 4.8365        Max.   : 1.9798937     Max.   : 1.794146     
##  ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26
##  Min.   :-1.523304      Min.   :-12.927496     Min.   :-12.940454    
##  1st Qu.:-0.833580      1st Qu.:  0.002208     1st Qu.:  0.008505    
##  Median :-0.143857      Median :  0.069146     Median :  0.063251    
##  Mean   : 0.005061      Mean   :  0.000105     Mean   :  0.000404    
##  3rd Qu.: 0.890729      3rd Qu.:  0.128720     3rd Qu.:  0.115417    
##  Max.   : 2.442608      Max.   :  0.433287     Max.   :  0.312785    
##  ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29
##  Min.   :-12.888994     Min.   :-1.25583       Min.   :-12.026718    
##  1st Qu.:  0.000681     1st Qu.:-1.25583       1st Qu.: -0.186978    
##  Median :  0.066362     Median : 0.72551       Median : -0.066778    
##  Mean   :  0.001121     Mean   :-0.02868       Mean   : -0.003263    
##  3rd Qu.:  0.131337     3rd Qu.: 0.78266       3rd Qu.:  0.233723    
##  Max.   :  0.416660     Max.   : 0.93507       Max.   :  1.195326    
##  ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess32
##  Min.   :-9.38589       Min.   :-12.632749     Min.   :-2.86552      
##  1st Qu.:-0.37026       1st Qu.: -0.015263     1st Qu.:-0.64216      
##  Median : 0.03954       Median :  0.110732     Median :-0.08632      
##  Mean   : 0.01114       Mean   : -0.001722     Mean   : 0.00000      
##  3rd Qu.: 0.55179       3rd Qu.:  0.218728     3rd Qu.: 0.65480      
##  Max.   : 2.08855       Max.   :  0.416720     Max.   : 2.69287      
##  ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35
##  Min.   :-3.037737      Min.   :-3.558813      Min.   :-3.01270      
##  1st Qu.:-0.621676      1st Qu.: 0.118269      1st Qu.:-0.51725      
##  Median : 0.183677      Median : 0.118269      Median :-0.05513      
##  Mean   :-0.005764      Mean   :-0.009176      Mean   :-0.02362      
##  3rd Qu.: 0.586354      3rd Qu.: 0.118269      3rd Qu.: 0.49941      
##  Max.   : 2.599738      Max.   : 1.956810      Max.   : 2.44032      
##  ManufacturingProcess36 ManufacturingProcess37 ManufacturingProcess38
##  Min.   :-2.944307      Min.   :-2.27741       Min.   :-3.9024       
##  1st Qu.:-0.655777      1st Qu.:-0.70467       1st Qu.:-0.8225       
##  Median :-0.083645      Median :-0.03064       Median : 0.7175       
##  Mean   :-0.008228      Mean   : 0.00000       Mean   : 0.0000       
##  3rd Qu.: 0.488487      3rd Qu.: 0.64339       3rd Qu.: 0.7175       
##  Max.   : 2.777017      Max.   : 2.89017       Max.   : 0.7175       
##  ManufacturingProcess39 ManufacturingProcess40 ManufacturingProcess41
##  Min.   :-4.5508        Min.   :-0.4626528     Min.   :-0.440588     
##  1st Qu.: 0.1653        1st Qu.:-0.4626528     1st Qu.:-0.440588     
##  Median : 0.2317        Median :-0.4626528     Median :-0.440588     
##  Mean   : 0.0000        Mean   : 0.0003392     Mean   :-0.000392     
##  3rd Qu.: 0.2982        3rd Qu.:-0.4626528     3rd Qu.:-0.440588     
##  Max.   : 0.4310        Max.   : 2.1490969     Max.   : 3.275213     
##  ManufacturingProcess42 ManufacturingProcess43 ManufacturingProcess44
##  Min.   :-5.77163       Min.   :-1.0506        Min.   :-5.60583      
##  1st Qu.: 0.09979       1st Qu.:-0.3594        1st Qu.:-0.01588      
##  Median : 0.20280       Median :-0.1290        Median : 0.29467      
##  Mean   : 0.00000       Mean   : 0.0000        Mean   : 0.00000      
##  3rd Qu.: 0.25430       3rd Qu.: 0.1303        3rd Qu.: 0.29467      
##  Max.   : 0.46031       Max.   :11.6224        Max.   : 0.91578      
##  ManufacturingProcess45
##  Min.   :-5.25447      
##  1st Qu.:-0.09356      
##  Median : 0.15220      
##  Mean   : 0.00000      
##  3rd Qu.: 0.39796      
##  Max.   : 1.13523

C

Split the data into a training and a test set, pre-process the data, and tune a model of your choice from this chapter. What is the optimal value of the performance metric? Pre-process the data removing the zero variance variables prior to splitting the data.

We will split the data into 80% train and 20% test.

dim(CHEM_dataframe)
## [1] 176  58

Data will be split into training and test set using an 80:20 split.

CHEM_dataframe2 <- CHEM_dataframe[, -nearZeroVar(CHEM_dataframe)]
dim(CHEM_dataframe2)
## [1] 176  57
set.seed(555)

select_train <- createDataPartition(CHEM_dataframe2$Yield, times = 1, p = .80, list = FALSE)

train_x2 <- CHEM_dataframe2[select_train, ][, -c(1)] 
test_x2 <- CHEM_dataframe2[-select_train, ][, -c(1)] 
train_y2 <- CHEM_dataframe2[select_train, ]$Yield
test_y2 <- CHEM_dataframe2[-select_train, ]$Yield

(P_fit2 <- train(x = train_x2, y = train_y2,
                method = "pls",
                metric = "Rsquared",
                tuneLength = 25, 
                trControl = trainControl(method = "cv", number=10), 
                preProcess = c('center', 'scale')
          ))
## Partial Least Squares 
## 
## 144 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 129, 130, 130, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE       Rsquared   MAE      
##    1     0.7338982  0.4317348  0.6175189
##    2     0.7722558  0.4684310  0.5812528
##    3     0.8134501  0.5194408  0.5823906
##    4     0.9540440  0.5193093  0.6360902
##    5     1.0721588  0.5241915  0.6733250
##    6     1.1436786  0.5185259  0.6912503
##    7     1.2458840  0.5085213  0.7365470
##    8     1.3875640  0.4976078  0.7926451
##    9     1.4661458  0.4849316  0.8268420
##   10     1.5879129  0.4644183  0.8643309
##   11     1.6495018  0.4598224  0.8756271
##   12     1.7627131  0.4554407  0.8976769
##   13     1.7487823  0.4650528  0.8931955
##   14     1.6777520  0.4656269  0.8684202
##   15     1.6377169  0.4705341  0.8508950
##   16     1.6194606  0.4693782  0.8400870
##   17     1.5915410  0.4743053  0.8205206
##   18     1.5932854  0.4752533  0.8122477
##   19     1.6520158  0.4751787  0.8387607
##   20     1.6905995  0.4728315  0.8552220
##   21     1.7800104  0.4698293  0.8932960
##   22     1.8218081  0.4683798  0.9101214
##   23     1.9167962  0.4630918  0.9435827
##   24     1.9756004  0.4611352  0.9652468
##   25     2.0143881  0.4592907  0.9780208
## 
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 5.

pre-process by centering and scaling the predictors.

plot(P_fit2)

The optimal value for the performance metric yields a RMSE of 1.07 and R2 of 0.524.

D

Predict the response for the test set. What is the value of the performance metric and how does this compare with the resampled performance metric on the training set?

P_predict2 <- predict(P_fit2, newdata=test_x2)
(postResample(pred=P_predict2, obs=test_y2))
##      RMSE  Rsquared       MAE 
## 0.6344353 0.6718573 0.5241329

The predictions on the test set yield a RMSE of 0.634, which is better than the one on the training set and R2 of 0.524, which is slightly lower than the one from the training set. So the model has performed very well on the test set.

E

Which predictors are most important in the model you have trained? Do either the biological or process predictors dominate the list??

In this model, the most important predictor at the top of the list is “ManufacturingProcess32”. We have both biological and process predictors similarly as important in the list.

plot(varImp(P_fit2, scale = FALSE), top=20, scales = list(y = list(cex = 0.8)))
## Warning: package 'pls' was built under R version 4.3.2
## 
## Attaching package: 'pls'
## The following object is masked from 'package:caret':
## 
##     R2
## The following object is masked from 'package:corrplot':
## 
##     corrplot
## The following object is masked from 'package:stats':
## 
##     loadings

The RMSE is about 0.62 and a R2 of roughly 0.68 on the test set.

F

Explore the relationships between each of the top predictors and the response. How could this information be helpful in improving yield in future runs of the manufacturing process?

corr_vals <- CHEM_dataframe2 %>% 
  dplyr::select('Yield', 'ManufacturingProcess32','ManufacturingProcess36',
         'BiologicalMaterial06','ManufacturingProcess13',
         'BiologicalMaterial03','ManufacturingProcess17',
         'BiologicalMaterial02','BiologicalMaterial12',
         'ManufacturingProcess09','ManufacturingProcess31')

corr_plot_vals <- cor(corr_vals)

corrplot.mixed(corr_plot_vals, tl.col = 'black', tl.pos = 'lt', 
         upper = "number", lower="circle")