Chemometrics for Quantitative Determination of Terpenes Using
Attenuated Total Reflectance-Fourier Transform Infrared Spectroscopy: A
Pedagogical Laboratory Exercise for Undergraduate Instrumental Analysis
Students
Gerard G. Dumancas*1, Noemi Carreto1,Oliver
Generalao2, Guoyi Ke3, Ghalib Bello4,
Arnold Lubguban5,6, and Roberto Malaluan5,6Â
1Department of Chemistry, Loyola Science Center, The
University of Scranton, Scranton, PA, USA 18510Â
2Center for Informatics, University of San Agustin,
Gen. Luna St, Iloilo City, Philippines 5000Â
3Department of Mathematics and Physical Sciences,
Louisiana State University at Alexandria, Alexandria, LA, USA 71302Â
4Department of Environmental Medicine and Public Health,
Icahn School of Medicine at Mount Sinai, New York, NY, USA 10029Â
5Center for Sustainable Polymers, Mindanao State
University - Iligan Institute of Technology, Iligan City, 9200
PhilippinesÂ
6Department of Chemical Engineering and Technology,
Mindanao State University - Iligan Institute of Technology, Philippines,
Iligan City, 9200 PhilippinesÂ
AUTHOR INFORMATION
Corresponding AuthorÂ
*E-mail: gerard.dumancas@scranton.edu Â
The R Packages
library(prospectr)
library(pls)
library(DT)
The dataset
FTIR %>%
datatable(options = list(scrollX = TRUE))
set.seed(2017) # set seed for reproducibility
Analysis using first derivative
4700 - 339 cm -1 (All wavenumbers)
matplot(t(FTIR[,4:2265]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20,
main = "FTIR Spectra ") #plotting the FTIR data
First derivative signal preprocessing techniques
FTIRspecd1 <- t(diff(t(FTIR[,4:2265]),differences=1)) #first derivative signal processing of training set
Partial Least Squares Regression
train.model.limonene <- plsr(FTIR[,2]~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3]~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set
Plot of RMSE vs the number of components
plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene")

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2
RMSEP(train.model.limonene)
RMSEP(train.model.cymene)
R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1662
|
0.0185
|
|
2
|
0.1597
|
0.0939
|
|
3
|
0.1575
|
0.1185
|
|
4
|
0.1585
|
0.1078
|
|
5
|
0.1583
|
0.1097
|
RMSE and R2 for p-cymene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1662
|
0.0185
|
|
2
|
0.1597
|
0.0939
|
|
3
|
0.1575
|
0.1185
|
|
4
|
0.1585
|
0.1078
|
|
5
|
0.1583
|
0.1097
|
Predicted vs measured values
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

3225 - 428 cm -1
matplot(t(FTIR[,50:1500]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data
First derivative signal preprocessing techniques
FTIRspecd1 <- t(diff(t(FTIR[,50:1500]),differences=1)) #first derivative signal processing of training set
Partial Least Squares Regression
train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set
Plot of RMSE vs the number of components
plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene")

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2
RMSEP(train.model.limonene)
RMSEP(train.model.cymene)
R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1593
|
0.0981
|
|
2
|
0.1456
|
0.2465
|
|
3
|
0.1399
|
0.3044
|
|
4
|
0.1419
|
0.2849
|
|
5
|
0.1417
|
0.2870
|
RMSE and R2 for p-cymene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1593
|
0.0981
|
|
2
|
0.1456
|
0.2465
|
|
3
|
0.1399
|
0.3044
|
|
4
|
0.1419
|
0.2849
|
|
5
|
0.1417
|
0.2870
|
Predicted vs measured values
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

2261-428 cm -1
matplot(t(FTIR[,50:1000]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data
First derivative signal preprocessing techniques
FTIRspecd1 <- t(diff(t(FTIR[,50:1000]),differences=1)) #first derivative signal processing of training set
Partial Least Squares Regression
train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set
Plot of RMSE vs the number of components
plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene")

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2
RMSEP(train.model.limonene)
RMSEP(train.model.cymene)
R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1549
|
0.1472
|
|
2
|
0.1381
|
0.3230
|
|
3
|
0.1300
|
0.3994
|
|
4
|
0.1331
|
0.3711
|
|
5
|
0.1330
|
0.3716
|
RMSE and R2 for p-cymene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1549
|
0.1472
|
|
2
|
0.1381
|
0.3230
|
|
3
|
0.1300
|
0.3994
|
|
4
|
0.1331
|
0.3711
|
|
5
|
0.1330
|
0.3716
|
Predicted vs measured values
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

1875-525 cm-1
matplot(t(FTIR[,100:800]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data
First derivative signal preprocessing techniques
FTIRspecd1 <- t(diff(t(FTIR[,100:800]),differences=1)) #first derivative signal processing of training set
Partial Least Squares Regression
train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set
Plot of RMSE vs the number of components
plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene")

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2
RMSEP(train.model.limonene)
RMSEP(train.model.cymene)
R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1491
|
0.2103
|
|
2
|
0.1196
|
0.4922
|
|
3
|
0.1024
|
0.6272
|
|
4
|
0.1063
|
0.5984
|
|
5
|
0.1067
|
0.5957
|
RMSE and R2 for p-cymene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1491
|
0.2103
|
|
2
|
0.1196
|
0.4922
|
|
3
|
0.1024
|
0.6272
|
|
4
|
0.1063
|
0.5984
|
|
5
|
0.1067
|
0.5957
|
Predicted vs measured values
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

1585-525 cm-1
matplot(t(FTIR[,100:650]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data
First derivative signal preprocessing techniques
FTIRspecd1 <- t(diff(t(FTIR[,100:650]),differences=1)) #first derivative signal processing of training set
Partial Least Squares Regression
train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set
Plot of RMSE vs the number of components
plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene")

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2
RMSEP(train.model.limonene)
RMSEP(train.model.cymene)
R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1483
|
0.2186
|
|
2
|
0.1140
|
0.5385
|
|
3
|
0.0996
|
0.6476
|
|
4
|
0.1026
|
0.6262
|
|
5
|
0.1035
|
0.6195
|
RMSE and R2 for p-cymene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1483
|
0.2186
|
|
2
|
0.1140
|
0.5385
|
|
3
|
0.0996
|
0.6476
|
|
4
|
0.1026
|
0.6262
|
|
5
|
0.1035
|
0.6195
|
Predicted vs measured values
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

1489-718 cm-1 (This is the best results so far!)
matplot(t(FTIR[,200:600]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data
First derivative signal preprocessing techniques
FTIRspecd1 <- t(diff(t(FTIR[,200:600]),differences=1)) #first derivative signal processing of training set
Partial Least Squares Regression
train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set
Plot of RMSE vs the number of components
plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene")

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2
RMSEP(train.model.limonene)
RMSEP(train.model.cymene)
R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1524
|
0.1753
|
|
2
|
0.1118
|
0.5564
|
|
3
|
0.0944
|
0.6832
|
|
4
|
0.0946
|
0.6822
|
|
5
|
0.0974
|
0.6627
|
RMSE and R2 for p-cymene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1524
|
0.1753
|
|
2
|
0.1118
|
0.5564
|
|
3
|
0.0944
|
0.6832
|
|
4
|
0.0946
|
0.6822
|
|
5
|
0.0974
|
0.6627
|
Predicted vs measured values
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

1489-525 cm-1
matplot(t(FTIR[,100:600]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data
First derivative signal preprocessing techniques
FTIRspecd1 <- t(diff(t(FTIR[,100:600]),differences=1)) #first derivative signal processing of training set
Partial Least Squares Regression
train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set
Plot of RMSE vs the number of components
plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene")

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2
RMSEP(train.model.limonene)
RMSEP(train.model.cymene)
R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1499
|
0.2018
|
|
2
|
0.1147
|
0.5323
|
|
3
|
0.1026
|
0.6262
|
|
4
|
0.1027
|
0.6256
|
|
5
|
0.1045
|
0.6124
|
RMSE and R2 for p-cymene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1499
|
0.2018
|
|
2
|
0.1147
|
0.5323
|
|
3
|
0.1026
|
0.6262
|
|
4
|
0.1027
|
0.6256
|
|
5
|
0.1045
|
0.6124
|
Predicted vs measured values
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

1296-718 cm -1
matplot(t(FTIR[,200:500]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data
First derivative signal preprocessing techniques
FTIRspecd1 <- t(diff(t(FTIR[,200:500]),differences=1)) #first derivative signal processing of training set
Partial Least Squares Regression
train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set
Plot of RMSE vs the number of components
plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene")

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2
RMSEP(train.model.limonene)
RMSEP(train.model.cymene)
R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1528
|
0.1704
|
|
2
|
0.1105
|
0.5665
|
|
3
|
0.0926
|
0.6955
|
|
4
|
0.0952
|
0.6777
|
|
5
|
0.0951
|
0.6789
|
RMSE and R2 for p-cymene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1528
|
0.1704
|
|
2
|
0.1105
|
0.5665
|
|
3
|
0.0926
|
0.6955
|
|
4
|
0.0952
|
0.6777
|
|
5
|
0.0951
|
0.6789
|
Predicted vs measured values
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

BEST RESULTS ANALYSIS IN THIS SECTION ONWARDS
1489-718 cm-1 (This is the best results so far!)
matplot(t(FTIR[,200:600]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data
The first derivative
FTIRspecd1 <- t(diff(t(FTIR[,200:600]),differences=1)) #first derivative signal processing of training set
Partial Least Squares Regression
train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set
Plot of RMSE vs the number of components
plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene")

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2
RMSEP(train.model.limonene)
RMSEP(train.model.cymene)
R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1524
|
0.1753
|
|
2
|
0.1118
|
0.5564
|
|
3
|
0.0944
|
0.6832
|
|
4
|
0.0946
|
0.6822
|
|
5
|
0.0974
|
0.6627
|
RMSE and R2 for p-cymene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1524
|
0.1753
|
|
2
|
0.1118
|
0.5564
|
|
3
|
0.0944
|
0.6832
|
|
4
|
0.0946
|
0.6822
|
|
5
|
0.0974
|
0.6627
|
Predicted vs measured values
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="",xlab="Measured limonene (v/v)", ylab="Predicted limonene (v/v)") #limonene

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="",xlab="Measured p-cymene (v/v)", ylab="Predicted p-cymene (v/v)") #cymene

Best results without Signal Processing
Best wavenumber BUT without signal processing
matplot(t(FTIR[,200:600]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data
Partial Least Squares Regression
train.model.limonene <- plsr(FTIR[,2] ~as.matrix((FTIR[,200:600])),ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~as.matrix((FTIR[,200:600])),ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set
Plot of RMSE vs the number of components
plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene")

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2
RMSEP(train.model.limonene)
RMSEP(train.model.cymene)
R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1657
|
0.0244
|
|
2
|
0.1267
|
0.4296
|
|
3
|
0.1227
|
0.4649
|
|
4
|
0.1109
|
0.5632
|
|
5
|
0.1165
|
0.5179
|
RMSE and R2 for p-cymene
|
num of components
|
RMSE
|
R2
|
|
1
|
0.1657
|
0.0244
|
|
2
|
0.1267
|
0.4296
|
|
3
|
0.1227
|
0.4649
|
|
4
|
0.1109
|
0.5632
|
|
5
|
0.1165
|
0.5179
|
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="",xlab="Measured limonene (v/v)", ylab="Predicted limonene (v/v)") #limonene

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="",xlab="Measured p-cymene (v/v)", ylab="Predicted p-cymene ( v/v)") #cymene

Please read this tutorial for more details: https://cran.r-project.org/web/packages/pls/vignettes/pls-manual.pdf