Chemometrics for Quantitative Determination of Terpenes Using Attenuated Total Reflectance-Fourier Transform Infrared Spectroscopy: A Pedagogical Laboratory Exercise for Undergraduate Instrumental Analysis Students

Gerard G. Dumancas*1, Noemi Carreto1,Oliver Generalao2, Guoyi Ke3, Ghalib Bello4, Arnold Lubguban5,6, and Roberto Malaluan5,6 

1Department of Chemistry, Loyola Science Center, The University of Scranton, Scranton, PA, USA 18510 

2Center for Informatics, University of San Agustin, Gen. Luna St, Iloilo City, Philippines 5000 

3Department of Mathematics and Physical Sciences, Louisiana State University at Alexandria, Alexandria, LA, USA 71302 

4Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA 10029 

5Center for Sustainable Polymers, Mindanao State University - Iligan Institute of Technology, Iligan City, 9200 Philippines 

6Department of Chemical Engineering and Technology, Mindanao State University - Iligan Institute of Technology, Philippines, Iligan City, 9200 Philippines 

AUTHOR INFORMATION

Corresponding Author 

*E-mail: gerard.dumancas@scranton.edu  

The R Packages

library(prospectr)
library(pls)
library(DT)

The dataset

 FTIR %>%
   datatable(options = list(scrollX = TRUE))
set.seed(2017) # set seed for reproducibility

Analysis using first derivative

4700 - 339 cm -1 (All wavenumbers)

matplot(t(FTIR[,4:2265]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20,
            main = "FTIR Spectra ") #plotting the FTIR data

First derivative signal preprocessing techniques

FTIRspecd1 <- t(diff(t(FTIR[,4:2265]),differences=1)) #first derivative signal processing of training set

Partial Least Squares Regression

train.model.limonene <- plsr(FTIR[,2]~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3]~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set

Plot of RMSE vs the number of components

plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene") 

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene") 

RMSE and R2

RMSEP(train.model.limonene)
RMSEP(train.model.cymene)

R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
num of components RMSE R2
1 0.1662 0.0185
2 0.1597 0.0939
3 0.1575 0.1185
4 0.1585 0.1078
5 0.1583 0.1097
RMSE and R2 for p-cymene
num of components RMSE R2
1 0.1662 0.0185
2 0.1597 0.0939
3 0.1575 0.1185
4 0.1585 0.1078
5 0.1583 0.1097

Predicted vs measured values

plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

3225 - 428 cm -1

matplot(t(FTIR[,50:1500]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data

First derivative signal preprocessing techniques

FTIRspecd1 <- t(diff(t(FTIR[,50:1500]),differences=1)) #first derivative signal processing of training set

Partial Least Squares Regression

train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set

Plot of RMSE vs the number of components

plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene") 

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene") 

RMSE and R2

RMSEP(train.model.limonene)
RMSEP(train.model.cymene)

R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
num of components RMSE R2
1 0.1593 0.0981
2 0.1456 0.2465
3 0.1399 0.3044
4 0.1419 0.2849
5 0.1417 0.2870
RMSE and R2 for p-cymene
num of components RMSE R2
1 0.1593 0.0981
2 0.1456 0.2465
3 0.1399 0.3044
4 0.1419 0.2849
5 0.1417 0.2870

Predicted vs measured values

plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

2261-428 cm -1

matplot(t(FTIR[,50:1000]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data

First derivative signal preprocessing techniques

FTIRspecd1 <- t(diff(t(FTIR[,50:1000]),differences=1)) #first derivative signal processing of training set

Partial Least Squares Regression

train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set

Plot of RMSE vs the number of components

plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene") 

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene") 

RMSE and R2

RMSEP(train.model.limonene)
RMSEP(train.model.cymene)

R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
num of components RMSE R2
1 0.1549 0.1472
2 0.1381 0.3230
3 0.1300 0.3994
4 0.1331 0.3711
5 0.1330 0.3716
RMSE and R2 for p-cymene
num of components RMSE R2
1 0.1549 0.1472
2 0.1381 0.3230
3 0.1300 0.3994
4 0.1331 0.3711
5 0.1330 0.3716

Predicted vs measured values

plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

1875-525 cm-1

matplot(t(FTIR[,100:800]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data

First derivative signal preprocessing techniques

FTIRspecd1 <- t(diff(t(FTIR[,100:800]),differences=1)) #first derivative signal processing of training set

Partial Least Squares Regression

train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set

Plot of RMSE vs the number of components

plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene") 

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene") 

RMSE and R2

RMSEP(train.model.limonene)
RMSEP(train.model.cymene)

R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
num of components RMSE R2
1 0.1491 0.2103
2 0.1196 0.4922
3 0.1024 0.6272
4 0.1063 0.5984
5 0.1067 0.5957
RMSE and R2 for p-cymene
num of components RMSE R2
1 0.1491 0.2103
2 0.1196 0.4922
3 0.1024 0.6272
4 0.1063 0.5984
5 0.1067 0.5957

Predicted vs measured values

plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

1585-525 cm-1

matplot(t(FTIR[,100:650]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data

First derivative signal preprocessing techniques

FTIRspecd1 <- t(diff(t(FTIR[,100:650]),differences=1)) #first derivative signal processing of training set

Partial Least Squares Regression

train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set

Plot of RMSE vs the number of components

plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene") 

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene") 

RMSE and R2

RMSEP(train.model.limonene)
RMSEP(train.model.cymene)

R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
num of components RMSE R2
1 0.1483 0.2186
2 0.1140 0.5385
3 0.0996 0.6476
4 0.1026 0.6262
5 0.1035 0.6195
RMSE and R2 for p-cymene
num of components RMSE R2
1 0.1483 0.2186
2 0.1140 0.5385
3 0.0996 0.6476
4 0.1026 0.6262
5 0.1035 0.6195

Predicted vs measured values

plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

1489-718 cm-1 (This is the best results so far!)

matplot(t(FTIR[,200:600]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data

First derivative signal preprocessing techniques

FTIRspecd1 <- t(diff(t(FTIR[,200:600]),differences=1)) #first derivative signal processing of training set

Partial Least Squares Regression

train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set

Plot of RMSE vs the number of components

plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene") 

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene") 

RMSE and R2

RMSEP(train.model.limonene)
RMSEP(train.model.cymene)

R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
num of components RMSE R2
1 0.1524 0.1753
2 0.1118 0.5564
3 0.0944 0.6832
4 0.0946 0.6822
5 0.0974 0.6627
RMSE and R2 for p-cymene
num of components RMSE R2
1 0.1524 0.1753
2 0.1118 0.5564
3 0.0944 0.6832
4 0.0946 0.6822
5 0.0974 0.6627

Predicted vs measured values

plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

1489-525 cm-1

matplot(t(FTIR[,100:600]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data

First derivative signal preprocessing techniques

FTIRspecd1 <- t(diff(t(FTIR[,100:600]),differences=1)) #first derivative signal processing of training set

Partial Least Squares Regression

train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set

Plot of RMSE vs the number of components

plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene") 

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene") 

RMSE and R2

RMSEP(train.model.limonene)
RMSEP(train.model.cymene)

R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
num of components RMSE R2
1 0.1499 0.2018
2 0.1147 0.5323
3 0.1026 0.6262
4 0.1027 0.6256
5 0.1045 0.6124
RMSE and R2 for p-cymene
num of components RMSE R2
1 0.1499 0.2018
2 0.1147 0.5323
3 0.1026 0.6262
4 0.1027 0.6256
5 0.1045 0.6124

Predicted vs measured values

plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

1296-718 cm -1

matplot(t(FTIR[,200:500]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data

First derivative signal preprocessing techniques

FTIRspecd1 <- t(diff(t(FTIR[,200:500]),differences=1)) #first derivative signal processing of training set

Partial Least Squares Regression

train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set

Plot of RMSE vs the number of components

plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene") 

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene")

RMSE and R2

RMSEP(train.model.limonene)
RMSEP(train.model.cymene)

R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
num of components RMSE R2
1 0.1528 0.1704
2 0.1105 0.5665
3 0.0926 0.6955
4 0.0952 0.6777
5 0.0951 0.6789
RMSE and R2 for p-cymene
num of components RMSE R2
1 0.1528 0.1704
2 0.1105 0.5665
3 0.0926 0.6955
4 0.0952 0.6777
5 0.0951 0.6789

Predicted vs measured values

plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="limonene")

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="p-cymene")

BEST RESULTS ANALYSIS IN THIS SECTION ONWARDS

1489-718 cm-1 (This is the best results so far!)

matplot(t(FTIR[,200:600]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data

The first derivative

FTIRspecd1 <- t(diff(t(FTIR[,200:600]),differences=1)) #first derivative signal processing of training set

Partial Least Squares Regression

train.model.limonene <- plsr(FTIR[,2] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~FTIRspecd1,ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set

Plot of RMSE vs the number of components

plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene") 

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene") 

RMSE and R2

RMSEP(train.model.limonene)
RMSEP(train.model.cymene)

R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
num of components RMSE R2
1 0.1524 0.1753
2 0.1118 0.5564
3 0.0944 0.6832
4 0.0946 0.6822
5 0.0974 0.6627
RMSE and R2 for p-cymene
num of components RMSE R2
1 0.1524 0.1753
2 0.1118 0.5564
3 0.0944 0.6832
4 0.0946 0.6822
5 0.0974 0.6627

Predicted vs measured values

plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="",xlab="Measured limonene (v/v)", ylab="Predicted limonene (v/v)") #limonene

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="",xlab="Measured p-cymene (v/v)", ylab="Predicted p-cymene (v/v)") #cymene

Best results without Signal Processing

Best wavenumber BUT without signal processing

matplot(t(FTIR[,200:600]),xlab=bquote('Wavenumber'~(cm^-1)),ylab="Absorbance",lty=2,pch=20) #plotting the FTIR data

Partial Least Squares Regression

train.model.limonene <- plsr(FTIR[,2] ~as.matrix((FTIR[,200:600])),ncomp=5,scale=TRUE,validation="LOO") #Train[,3] refers to the concentration of limonene in the training set
train.model.cymene <- plsr(FTIR[,3] ~as.matrix((FTIR[,200:600])),ncomp=5,scale=TRUE,validation="LOO") #Train[,2] refers to the concentration of cymene in the training set

Plot of RMSE vs the number of components

plot(RMSEP(train.model.cymene),legendpos="topright",main="p-cymene") 

plot(RMSEP(train.model.limonene),legendpos="topright",main="limonene") 

RMSE and R2

RMSEP(train.model.limonene)
RMSEP(train.model.cymene)

R2(train.model.limonene)
R2(train.model.cymene)
RMSE and R2 for Limonene
num of components RMSE R2
1 0.1657 0.0244
2 0.1267 0.4296
3 0.1227 0.4649
4 0.1109 0.5632
5 0.1165 0.5179
RMSE and R2 for p-cymene
num of components RMSE R2
1 0.1657 0.0244
2 0.1267 0.4296
3 0.1227 0.4649
4 0.1109 0.5632
5 0.1165 0.5179
plot(train.model.limonene, ncomp=3, asp=1, line=TRUE, main="",xlab="Measured limonene (v/v)", ylab="Predicted limonene (v/v)") #limonene

plot(train.model.cymene, ncomp=3, asp=1, line=TRUE, main="",xlab="Measured p-cymene (v/v)", ylab="Predicted p-cymene ( v/v)") #cymene

Please read this tutorial for more details: https://cran.r-project.org/web/packages/pls/vignettes/pls-manual.pdf