Recursive Partitioning and Regression Trees (RPART) are a recursive partitioning methods that build Classification and Regression Trees for predicting continuous dependent variables (regression) and categorical predictor variables (classification). RPART returns a vector of predicted responses (Facies) from a fitted rpart object.

In R, Recursive Partitioning and Regression Trees can be generated through the RPART package.

In this work, RPART was adopted here to model Lithofacies given core analysis and well logs data in order to predict discrete and posterior probability distributions of the Lithofacies in Karpur Dataset.

Install the packages required to implement RPART algorithm with their functions.

#First, install the required packages.
require(rpart)
## Loading required package: rpart
require(ggplot2)
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## 
## The following object is masked _by_ '.GlobalEnv':
## 
##     mpg
require(MASS)
## Loading required package: MASS
require(lattice)
## Loading required package: lattice
library(rpart)
library(ggplot2)
library(MASS)
library(lattice)

Call the dataset and show the dataset head: -

##    depth caliper ind.deep ind.med  gamma phi.N R.deep  R.med      SP
## 1 5667.0   8.685  618.005 569.781 98.823 0.410  1.618  1.755 -56.587
## 2 5667.5   8.686  497.547 419.494 90.640 0.307  2.010  2.384 -61.916
## 3 5668.0   8.686  384.935 300.155 78.087 0.203  2.598  3.332 -55.861
## 4 5668.5   8.686  278.324 205.224 66.232 0.119  3.593  4.873 -41.860
## 5 5669.0   8.686  183.743 131.155 59.807 0.069  5.442  7.625 -34.934
## 6 5669.5   8.686  109.512  75.633 57.109 0.048  9.131 13.222 -39.769
##   density.corr density phi.core   k.core Facies
## 1       -0.033   2.205  33.9000 2442.590     F1
## 2       -0.067   2.040  33.4131 3006.989     F1
## 3       -0.064   1.888  33.1000 3370.000     F1
## 4       -0.053   1.794  34.9000 2270.000     F1
## 5       -0.054   1.758  35.0644 2530.758     F1
## 6       -0.058   1.759  35.3152 2928.314     F1

Summary of the dataset.

##      depth         caliper         ind.deep          ind.med       
##  Min.   :5667   Min.   :8.487   Min.   :  6.532   Min.   :  9.386  
##  1st Qu.:5769   1st Qu.:8.556   1st Qu.: 28.799   1st Qu.: 27.892  
##  Median :5872   Median :8.588   Median :217.849   Median :254.383  
##  Mean   :5873   Mean   :8.622   Mean   :275.357   Mean   :273.357  
##  3rd Qu.:5977   3rd Qu.:8.686   3rd Qu.:566.793   3rd Qu.:544.232  
##  Max.   :6083   Max.   :8.886   Max.   :769.484   Max.   :746.028  
##                                                                    
##      gamma            phi.N            R.deep            R.med        
##  Min.   : 16.74   Min.   :0.0150   Min.   :  1.300   Min.   :  1.340  
##  1st Qu.: 40.89   1st Qu.:0.2030   1st Qu.:  1.764   1st Qu.:  1.837  
##  Median : 51.37   Median :0.2450   Median :  4.590   Median :  3.931  
##  Mean   : 53.42   Mean   :0.2213   Mean   : 24.501   Mean   : 21.196  
##  3rd Qu.: 62.37   3rd Qu.:0.2640   3rd Qu.: 34.724   3rd Qu.: 35.853  
##  Max.   :112.40   Max.   :0.4100   Max.   :153.085   Max.   :106.542  
##                                                                       
##        SP          density.corr          density         phi.core    
##  Min.   :-73.95   Min.   :-0.067000   Min.   :1.758   Min.   :15.70  
##  1st Qu.:-42.01   1st Qu.:-0.016000   1st Qu.:2.023   1st Qu.:23.90  
##  Median :-32.25   Median :-0.007000   Median :2.099   Median :27.60  
##  Mean   :-30.98   Mean   :-0.008883   Mean   :2.102   Mean   :26.93  
##  3rd Qu.:-19.48   3rd Qu.: 0.002000   3rd Qu.:2.181   3rd Qu.:30.70  
##  Max.   : 25.13   Max.   : 0.089000   Max.   :2.387   Max.   :36.30  
##                                                                      
##      k.core             Facies   
##  Min.   :    0.42   F8     :184  
##  1st Qu.:  657.33   F9     :172  
##  Median : 1591.22   F10    :171  
##  Mean   : 2251.91   F1     :111  
##  3rd Qu.: 3046.82   F5     :109  
##  Max.   :15600.00   F3     : 55  
##                     (Other): 17

Visualize the dataset:

Modeling the Facies given well logs and core data through RPART model:

##  [1] "frame"               "where"               "call"               
##  [4] "terms"               "cptable"             "method"             
##  [7] "parms"               "control"             "functions"          
## [10] "numresp"             "splits"              "variable.importance"
## [13] "y"                   "ordered"
## 
## Classification tree:
## rpart(formula = Facies ~ ., data = karpur, method = "class")
## 
## Variables actually used in tree construction:
## [1] density  depth    ind.deep phi.N   
## 
## Root node error: 635/819 = 0.77534
## 
## n= 819 
## 
##         CP nsplit rel error  xerror     xstd
## 1 0.270866      0   1.00000 1.02205 0.018278
## 2 0.255118      1   0.72913 0.81102 0.021773
## 3 0.173228      2   0.47402 0.48031 0.021788
## 4 0.114961      3   0.30079 0.30709 0.019195
## 5 0.033071      4   0.18583 0.20000 0.016313
## 6 0.023622      5   0.15276 0.16063 0.014881
## 7 0.015748      6   0.12913 0.14803 0.014365
## 8 0.010000      7   0.11339 0.13543 0.013816

## Call:
## rpart(formula = Facies ~ ., data = karpur, method = "class")
##   n= 819 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.27086614      0 1.0000000 1.0220472 0.01827810
## 2 0.25511811      1 0.7291339 0.8110236 0.02177332
## 3 0.17322835      2 0.4740157 0.4803150 0.02178792
## 4 0.11496063      3 0.3007874 0.3070866 0.01919526
## 5 0.03307087      4 0.1858268 0.2000000 0.01631320
## 6 0.02362205      5 0.1527559 0.1606299 0.01488141
## 7 0.01574803      6 0.1291339 0.1480315 0.01436538
## 8 0.01000000      7 0.1133858 0.1354331 0.01381610
## 
## Variable importance
##        depth        gamma      density      ind.med        R.med 
##           17           11           11           10            9 
##      caliper        phi.N     ind.deep       R.deep     phi.core 
##            8            7            7            7            6 
##       k.core density.corr 
##            5            3 
## 
## Node number 1: 819 observations,    complexity param=0.2708661
##   predicted class=F8   expected loss=0.7753358  P(node) =1
##     class counts:   111   171     8    55   109     9   184   172
##    probabilities: 0.136 0.209 0.010 0.067 0.133 0.011 0.225 0.210 
##   left son=2 (626 obs) right son=3 (193 obs)
##   Primary splits:
##       depth    < 5983.25  to the left,  improve=146.89560, (0 missing)
##       density  < 2.1775   to the right, improve=120.07850, (0 missing)
##       phi.core < 29.4092  to the right, improve=112.02960, (0 missing)
##       ind.med  < 152.848  to the left,  improve= 97.20764, (0 missing)
##       R.med    < 6.5425   to the right, improve= 97.20764, (0 missing)
##   Surrogate splits:
##       ind.deep < 541.212  to the left,  agree=0.928, adj=0.694, (0 split)
##       R.deep   < 1.8475   to the right, agree=0.928, adj=0.694, (0 split)
##       ind.med  < 468.478  to the left,  agree=0.922, adj=0.668, (0 split)
##       R.med    < 2.1345   to the right, agree=0.922, adj=0.668, (0 split)
##       gamma    < 21.8895  to the right, agree=0.783, adj=0.078, (0 split)
## 
## Node number 2: 626 observations,    complexity param=0.2551181
##   predicted class=F8   expected loss=0.7060703  P(node) =0.7643468
##     class counts:   111   171     8    55    88     9   184     0
##    probabilities: 0.177 0.273 0.013 0.088 0.141 0.014 0.294 0.000 
##   left son=4 (219 obs) right son=5 (407 obs)
##   Primary splits:
##       density  < 2.1565   to the right, improve=120.67180, (0 missing)
##       depth    < 5898.25  to the left,  improve=117.30100, (0 missing)
##       gamma    < 49.9915  to the right, improve=108.00770, (0 missing)
##       phi.core < 26.9534  to the left,  improve=104.54910, (0 missing)
##       caliper  < 8.583    to the right, improve= 94.69796, (0 missing)
##   Surrogate splits:
##       phi.core < 26.1882  to the left,  agree=0.920, adj=0.772, (0 split)
##       gamma    < 61.814   to the right, agree=0.917, adj=0.763, (0 split)
##       k.core   < 1215.728 to the left,  agree=0.890, adj=0.685, (0 split)
##       ind.med  < 141.893  to the right, agree=0.882, adj=0.662, (0 split)
##       R.med    < 7.0575   to the left,  agree=0.882, adj=0.662, (0 split)
## 
## Node number 3: 193 observations,    complexity param=0.03307087
##   predicted class=F9   expected loss=0.1088083  P(node) =0.2356532
##     class counts:     0     0     0     0    21     0     0   172
##    probabilities: 0.000 0.000 0.000 0.000 0.109 0.000 0.000 0.891 
##   left son=6 (21 obs) right son=7 (172 obs)
##   Primary splits:
##       depth    < 6071.5   to the right, improve=37.43005, (0 missing)
##       k.core   < 3735     to the right, improve=25.83455, (0 missing)
##       ind.deep < 643.1255 to the right, improve=24.01360, (0 missing)
##       R.deep   < 1.498    to the left,  improve=23.70985, (0 missing)
##       ind.med  < 645      to the right, improve=23.24719, (0 missing)
##   Surrogate splits:
##       k.core   < 3735     to the right, agree=0.969, adj=0.714, (0 split)
##       ind.deep < 667.5805 to the right, agree=0.959, adj=0.619, (0 split)
##       R.deep   < 1.498    to the left,  agree=0.959, adj=0.619, (0 split)
##       ind.med  < 645      to the right, agree=0.953, adj=0.571, (0 split)
##       gamma    < 25.195   to the left,  agree=0.953, adj=0.571, (0 split)
## 
## Node number 4: 219 observations
##   predicted class=F10  expected loss=0.260274  P(node) =0.2673993
##     class counts:     1   162     6    40     4     6     0     0
##    probabilities: 0.005 0.740 0.027 0.183 0.018 0.027 0.000 0.000 
## 
## Node number 5: 407 observations,    complexity param=0.1732283
##   predicted class=F8   expected loss=0.5479115  P(node) =0.4969475
##     class counts:   110     9     2    15    84     3   184     0
##    probabilities: 0.270 0.022 0.005 0.037 0.206 0.007 0.452 0.000 
##   left son=10 (209 obs) right son=11 (198 obs)
##   Primary splits:
##       depth        < 5856.25  to the left,  improve=125.33350, (0 missing)
##       caliper      < 8.6255   to the right, improve=111.37570, (0 missing)
##       phi.N        < 0.227    to the left,  improve=110.46260, (0 missing)
##       density      < 1.962    to the left,  improve= 97.15700, (0 missing)
##       density.corr < -0.0245  to the left,  improve= 89.22577, (0 missing)
##   Surrogate splits:
##       phi.N        < 0.227    to the left,  agree=0.907, adj=0.808, (0 split)
##       caliper      < 8.5835   to the right, agree=0.902, adj=0.798, (0 split)
##       density.corr < -0.0195  to the left,  agree=0.823, adj=0.636, (0 split)
##       density      < 2.0055   to the left,  agree=0.781, adj=0.551, (0 split)
##       gamma        < 48.2055  to the right, agree=0.769, adj=0.525, (0 split)
## 
## Node number 6: 21 observations
##   predicted class=F5   expected loss=0  P(node) =0.02564103
##     class counts:     0     0     0     0    21     0     0     0
##    probabilities: 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 
## 
## Node number 7: 172 observations
##   predicted class=F9   expected loss=0  P(node) =0.2100122
##     class counts:     0     0     0     0     0     0     0   172
##    probabilities: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 
## 
## Node number 10: 209 observations,    complexity param=0.1149606
##   predicted class=F1   expected loss=0.4736842  P(node) =0.2551893
##     class counts:   110     7     2    15    73     2     0     0
##    probabilities: 0.526 0.033 0.010 0.072 0.349 0.010 0.000 0.000 
##   left son=20 (113 obs) right son=21 (96 obs)
##   Primary splits:
##       depth    < 5723.75  to the left,  improve=80.52183, (0 missing)
##       caliper  < 8.625    to the right, improve=77.77824, (0 missing)
##       gamma    < 48.7505  to the right, improve=55.94746, (0 missing)
##       density  < 1.962    to the left,  improve=52.72623, (0 missing)
##       phi.core < 30.616   to the right, improve=46.28594, (0 missing)
##   Surrogate splits:
##       caliper  < 8.625    to the right, agree=0.914, adj=0.812, (0 split)
##       density  < 1.9525   to the left,  agree=0.904, adj=0.792, (0 split)
##       gamma    < 48.7505  to the right, agree=0.876, adj=0.729, (0 split)
##       phi.core < 30.23475 to the right, agree=0.861, adj=0.698, (0 split)
##       phi.N    < 0.167    to the left,  agree=0.842, adj=0.656, (0 split)
## 
## Node number 11: 198 observations,    complexity param=0.01574803
##   predicted class=F8   expected loss=0.07070707  P(node) =0.2417582
##     class counts:     0     2     0     0    11     1   184     0
##    probabilities: 0.000 0.010 0.000 0.000 0.056 0.005 0.929 0.000 
##   left son=22 (10 obs) right son=23 (188 obs)
##   Primary splits:
##       phi.N    < 0.201    to the left,  improve=18.49076, (0 missing)
##       R.med    < 101.053  to the right, improve=17.06480, (0 missing)
##       ind.med  < 9.896    to the left,  improve=17.06480, (0 missing)
##       R.deep   < 139.404  to the right, improve=14.58357, (0 missing)
##       ind.deep < 7.175    to the left,  improve=14.58357, (0 missing)
##   Surrogate splits:
##       ind.med  < 9.6435   to the left,  agree=0.995, adj=0.9, (0 split)
##       R.med    < 102.6155 to the right, agree=0.995, adj=0.9, (0 split)
##       ind.deep < 7.175    to the left,  agree=0.975, adj=0.5, (0 split)
##       gamma    < 23.1145  to the left,  agree=0.975, adj=0.5, (0 split)
##       R.deep   < 139.404  to the right, agree=0.975, adj=0.5, (0 split)
## 
## Node number 20: 113 observations
##   predicted class=F1   expected loss=0.02654867  P(node) =0.1379731
##     class counts:   110     3     0     0     0     0     0     0
##    probabilities: 0.973 0.027 0.000 0.000 0.000 0.000 0.000 0.000 
## 
## Node number 21: 96 observations,    complexity param=0.02362205
##   predicted class=F5   expected loss=0.2395833  P(node) =0.1172161
##     class counts:     0     4     2    15    73     2     0     0
##    probabilities: 0.000 0.042 0.021 0.156 0.760 0.021 0.000 0.000 
##   left son=42 (23 obs) right son=43 (73 obs)
##   Primary splits:
##       ind.deep < 81.2895  to the right, improve=25.72192, (0 missing)
##       R.deep   < 12.371   to the left,  improve=25.72192, (0 missing)
##       ind.med  < 110.7285 to the right, improve=24.54726, (0 missing)
##       R.med    < 9.0335   to the left,  improve=24.54726, (0 missing)
##       caliper  < 8.6365   to the right, improve=21.66506, (0 missing)
##   Surrogate splits:
##       R.deep  < 12.371   to the left,  agree=1.000, adj=1.000, (0 split)
##       ind.med < 88.761   to the right, agree=0.990, adj=0.957, (0 split)
##       R.med   < 11.293   to the left,  agree=0.990, adj=0.957, (0 split)
##       depth   < 5790     to the left,  agree=0.948, adj=0.783, (0 split)
##       caliper < 8.6365   to the right, agree=0.948, adj=0.783, (0 split)
## 
## Node number 22: 10 observations
##   predicted class=F5   expected loss=0  P(node) =0.01221001
##     class counts:     0     0     0     0    10     0     0     0
##    probabilities: 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 
## 
## Node number 23: 188 observations
##   predicted class=F8   expected loss=0.0212766  P(node) =0.2295482
##     class counts:     0     2     0     0     1     1   184     0
##    probabilities: 0.000 0.011 0.000 0.000 0.005 0.005 0.979 0.000 
## 
## Node number 42: 23 observations
##   predicted class=F3   expected loss=0.3478261  P(node) =0.02808303
##     class counts:     0     4     2    15     0     2     0     0
##    probabilities: 0.000 0.174 0.087 0.652 0.000 0.087 0.000 0.000 
## 
## Node number 43: 73 observations
##   predicted class=F5   expected loss=0  P(node) =0.08913309
##     class counts:     0     0     0     0    73     0     0     0
##    probabilities: 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000

Head of posterior distributions of Facies:

##   1   2   3   4   5   6 
## F10  F1  F1  F1  F1  F1 
## Levels: F1 F10 F2 F3 F5 F7 F8 F9
##           F1        F10         F2        F3         F5         F7 F8 F9
## 1 0.00456621 0.73972603 0.02739726 0.1826484 0.01826484 0.02739726  0  0
## 2 0.97345133 0.02654867 0.00000000 0.0000000 0.00000000 0.00000000  0  0
## 3 0.97345133 0.02654867 0.00000000 0.0000000 0.00000000 0.00000000  0  0
## 4 0.97345133 0.02654867 0.00000000 0.0000000 0.00000000 0.00000000  0  0
## 5 0.97345133 0.02654867 0.00000000 0.0000000 0.00000000 0.00000000  0  0
## 6 0.97345133 0.02654867 0.00000000 0.0000000 0.00000000 0.00000000  0  0

Means of the Well logs and core data given each Facies:

RPART Modeling Validation by computing the total correct percent.

##        F1       F10        F2        F3        F5        F7        F8 
## 0.9909910 0.9473684 0.0000000 0.2727273 0.9541284 0.0000000 1.0000000 
##        F9 
## 1.0000000

Total percent correct:

## [1] 0.9120879

Scatteratrix plot of Lithofacies Classification by RPART:

Visualizing the predicted posterior distribution of the Eight Facies.

Combining the posterior distribution of the eight Lithofacies in one plot.

## Warning in bxp(structure(list(stats = structure(c(5667, 5680.75, 5694.5, :
## some notches went outside hinges ('box'): maybe set notch=FALSE
## Warning in bxp(structure(list(stats = structure(c(5667.5, 5681.5, 5695.5, :
## some notches went outside hinges ('box'): maybe set notch=FALSE

References

  1. Breiman, L., J.H. Friedman, R.A. Olshen, , and C.J Stone. (1984) Classification and Regression Trees. Wadsworth, Belmont, Ca.

  2. Al-Mudhafar, W. J. (2015). Integrating Component Analysis & Classification Techniques for Comparative Prediction of Continuous & Discrete Lithofacies Distributions. Offshore Technology Conference. doi:10.4043/25806-MS.

  3. Karpur, L., L. Lake, and K. Sepehrnoori. (2000). Probability Logs for Facies Classification. In Situ 24(1): 57.

  4. Al-Mudhafer, W. J. (2014). Multinomial Logistic Regression for Bayesian Estimation of Vertical Facies Modeling in Heterogeneous Sandstone Reservoirs. Offshore Technology Conference. doi:10.4043/24732-MS.

  5. Al-Mudhafar, W. J. (2015). Applied Geostatistics in R: 1. Naive Bayes Classifier for Lithofacies Modeling in a Sandstone Formation. RPubs.

  6. Al-Mudhafar, W. J. (2015). Applied Geostatistics in R: 2. Applied Geostatistics in R: 2. Logistic Boosting Regression (LogitBoost) for Multinomial Lithofacies Classification in a Sandstone Formation. RPubs.

  7. Al-Mudhafar, W. J. (2015). Applied Geostatistics in R: 3. Linear Discriminant Analysis (LDA) for Multinomial Lithofacies Classification in a Sandstone Formation. RPubs.

  8. Al-Mudhafar, W. J. (2015). Applied Geostatistics in R: 4. Applied Geostatistics in R: 4. Multinomial Logistic Regression (MLR) for Posterior Lithofacies Probability Prediction in a Sandstone Formation. RPubs.