Recursive Partitioning and Regression Trees (RPART) are a recursive partitioning methods that build Classification and Regression Trees for predicting continuous dependent variables (regression) and categorical predictor variables (classification). RPART returns a vector of predicted responses (Facies) from a fitted rpart object.
In R, Recursive Partitioning and Regression Trees can be generated through the RPART package.
In this work, RPART was adopted here to model Lithofacies given core analysis and well logs data in order to predict discrete and posterior probability distributions of the Lithofacies in Karpur Dataset.
Install the packages required to implement RPART algorithm with their functions.
#First, install the required packages.
require(rpart)
## Loading required package: rpart
require(ggplot2)
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
##
## The following object is masked _by_ '.GlobalEnv':
##
## mpg
require(MASS)
## Loading required package: MASS
require(lattice)
## Loading required package: lattice
library(rpart)
library(ggplot2)
library(MASS)
library(lattice)
Call the dataset and show the dataset head: -
## depth caliper ind.deep ind.med gamma phi.N R.deep R.med SP
## 1 5667.0 8.685 618.005 569.781 98.823 0.410 1.618 1.755 -56.587
## 2 5667.5 8.686 497.547 419.494 90.640 0.307 2.010 2.384 -61.916
## 3 5668.0 8.686 384.935 300.155 78.087 0.203 2.598 3.332 -55.861
## 4 5668.5 8.686 278.324 205.224 66.232 0.119 3.593 4.873 -41.860
## 5 5669.0 8.686 183.743 131.155 59.807 0.069 5.442 7.625 -34.934
## 6 5669.5 8.686 109.512 75.633 57.109 0.048 9.131 13.222 -39.769
## density.corr density phi.core k.core Facies
## 1 -0.033 2.205 33.9000 2442.590 F1
## 2 -0.067 2.040 33.4131 3006.989 F1
## 3 -0.064 1.888 33.1000 3370.000 F1
## 4 -0.053 1.794 34.9000 2270.000 F1
## 5 -0.054 1.758 35.0644 2530.758 F1
## 6 -0.058 1.759 35.3152 2928.314 F1
Summary of the dataset.
## depth caliper ind.deep ind.med
## Min. :5667 Min. :8.487 Min. : 6.532 Min. : 9.386
## 1st Qu.:5769 1st Qu.:8.556 1st Qu.: 28.799 1st Qu.: 27.892
## Median :5872 Median :8.588 Median :217.849 Median :254.383
## Mean :5873 Mean :8.622 Mean :275.357 Mean :273.357
## 3rd Qu.:5977 3rd Qu.:8.686 3rd Qu.:566.793 3rd Qu.:544.232
## Max. :6083 Max. :8.886 Max. :769.484 Max. :746.028
##
## gamma phi.N R.deep R.med
## Min. : 16.74 Min. :0.0150 Min. : 1.300 Min. : 1.340
## 1st Qu.: 40.89 1st Qu.:0.2030 1st Qu.: 1.764 1st Qu.: 1.837
## Median : 51.37 Median :0.2450 Median : 4.590 Median : 3.931
## Mean : 53.42 Mean :0.2213 Mean : 24.501 Mean : 21.196
## 3rd Qu.: 62.37 3rd Qu.:0.2640 3rd Qu.: 34.724 3rd Qu.: 35.853
## Max. :112.40 Max. :0.4100 Max. :153.085 Max. :106.542
##
## SP density.corr density phi.core
## Min. :-73.95 Min. :-0.067000 Min. :1.758 Min. :15.70
## 1st Qu.:-42.01 1st Qu.:-0.016000 1st Qu.:2.023 1st Qu.:23.90
## Median :-32.25 Median :-0.007000 Median :2.099 Median :27.60
## Mean :-30.98 Mean :-0.008883 Mean :2.102 Mean :26.93
## 3rd Qu.:-19.48 3rd Qu.: 0.002000 3rd Qu.:2.181 3rd Qu.:30.70
## Max. : 25.13 Max. : 0.089000 Max. :2.387 Max. :36.30
##
## k.core Facies
## Min. : 0.42 F8 :184
## 1st Qu.: 657.33 F9 :172
## Median : 1591.22 F10 :171
## Mean : 2251.91 F1 :111
## 3rd Qu.: 3046.82 F5 :109
## Max. :15600.00 F3 : 55
## (Other): 17
Visualize the dataset:
Modeling the Facies given well logs and core data through RPART model:
## [1] "frame" "where" "call"
## [4] "terms" "cptable" "method"
## [7] "parms" "control" "functions"
## [10] "numresp" "splits" "variable.importance"
## [13] "y" "ordered"
##
## Classification tree:
## rpart(formula = Facies ~ ., data = karpur, method = "class")
##
## Variables actually used in tree construction:
## [1] density depth ind.deep phi.N
##
## Root node error: 635/819 = 0.77534
##
## n= 819
##
## CP nsplit rel error xerror xstd
## 1 0.270866 0 1.00000 1.02205 0.018278
## 2 0.255118 1 0.72913 0.81102 0.021773
## 3 0.173228 2 0.47402 0.48031 0.021788
## 4 0.114961 3 0.30079 0.30709 0.019195
## 5 0.033071 4 0.18583 0.20000 0.016313
## 6 0.023622 5 0.15276 0.16063 0.014881
## 7 0.015748 6 0.12913 0.14803 0.014365
## 8 0.010000 7 0.11339 0.13543 0.013816
## Call:
## rpart(formula = Facies ~ ., data = karpur, method = "class")
## n= 819
##
## CP nsplit rel error xerror xstd
## 1 0.27086614 0 1.0000000 1.0220472 0.01827810
## 2 0.25511811 1 0.7291339 0.8110236 0.02177332
## 3 0.17322835 2 0.4740157 0.4803150 0.02178792
## 4 0.11496063 3 0.3007874 0.3070866 0.01919526
## 5 0.03307087 4 0.1858268 0.2000000 0.01631320
## 6 0.02362205 5 0.1527559 0.1606299 0.01488141
## 7 0.01574803 6 0.1291339 0.1480315 0.01436538
## 8 0.01000000 7 0.1133858 0.1354331 0.01381610
##
## Variable importance
## depth gamma density ind.med R.med
## 17 11 11 10 9
## caliper phi.N ind.deep R.deep phi.core
## 8 7 7 7 6
## k.core density.corr
## 5 3
##
## Node number 1: 819 observations, complexity param=0.2708661
## predicted class=F8 expected loss=0.7753358 P(node) =1
## class counts: 111 171 8 55 109 9 184 172
## probabilities: 0.136 0.209 0.010 0.067 0.133 0.011 0.225 0.210
## left son=2 (626 obs) right son=3 (193 obs)
## Primary splits:
## depth < 5983.25 to the left, improve=146.89560, (0 missing)
## density < 2.1775 to the right, improve=120.07850, (0 missing)
## phi.core < 29.4092 to the right, improve=112.02960, (0 missing)
## ind.med < 152.848 to the left, improve= 97.20764, (0 missing)
## R.med < 6.5425 to the right, improve= 97.20764, (0 missing)
## Surrogate splits:
## ind.deep < 541.212 to the left, agree=0.928, adj=0.694, (0 split)
## R.deep < 1.8475 to the right, agree=0.928, adj=0.694, (0 split)
## ind.med < 468.478 to the left, agree=0.922, adj=0.668, (0 split)
## R.med < 2.1345 to the right, agree=0.922, adj=0.668, (0 split)
## gamma < 21.8895 to the right, agree=0.783, adj=0.078, (0 split)
##
## Node number 2: 626 observations, complexity param=0.2551181
## predicted class=F8 expected loss=0.7060703 P(node) =0.7643468
## class counts: 111 171 8 55 88 9 184 0
## probabilities: 0.177 0.273 0.013 0.088 0.141 0.014 0.294 0.000
## left son=4 (219 obs) right son=5 (407 obs)
## Primary splits:
## density < 2.1565 to the right, improve=120.67180, (0 missing)
## depth < 5898.25 to the left, improve=117.30100, (0 missing)
## gamma < 49.9915 to the right, improve=108.00770, (0 missing)
## phi.core < 26.9534 to the left, improve=104.54910, (0 missing)
## caliper < 8.583 to the right, improve= 94.69796, (0 missing)
## Surrogate splits:
## phi.core < 26.1882 to the left, agree=0.920, adj=0.772, (0 split)
## gamma < 61.814 to the right, agree=0.917, adj=0.763, (0 split)
## k.core < 1215.728 to the left, agree=0.890, adj=0.685, (0 split)
## ind.med < 141.893 to the right, agree=0.882, adj=0.662, (0 split)
## R.med < 7.0575 to the left, agree=0.882, adj=0.662, (0 split)
##
## Node number 3: 193 observations, complexity param=0.03307087
## predicted class=F9 expected loss=0.1088083 P(node) =0.2356532
## class counts: 0 0 0 0 21 0 0 172
## probabilities: 0.000 0.000 0.000 0.000 0.109 0.000 0.000 0.891
## left son=6 (21 obs) right son=7 (172 obs)
## Primary splits:
## depth < 6071.5 to the right, improve=37.43005, (0 missing)
## k.core < 3735 to the right, improve=25.83455, (0 missing)
## ind.deep < 643.1255 to the right, improve=24.01360, (0 missing)
## R.deep < 1.498 to the left, improve=23.70985, (0 missing)
## ind.med < 645 to the right, improve=23.24719, (0 missing)
## Surrogate splits:
## k.core < 3735 to the right, agree=0.969, adj=0.714, (0 split)
## ind.deep < 667.5805 to the right, agree=0.959, adj=0.619, (0 split)
## R.deep < 1.498 to the left, agree=0.959, adj=0.619, (0 split)
## ind.med < 645 to the right, agree=0.953, adj=0.571, (0 split)
## gamma < 25.195 to the left, agree=0.953, adj=0.571, (0 split)
##
## Node number 4: 219 observations
## predicted class=F10 expected loss=0.260274 P(node) =0.2673993
## class counts: 1 162 6 40 4 6 0 0
## probabilities: 0.005 0.740 0.027 0.183 0.018 0.027 0.000 0.000
##
## Node number 5: 407 observations, complexity param=0.1732283
## predicted class=F8 expected loss=0.5479115 P(node) =0.4969475
## class counts: 110 9 2 15 84 3 184 0
## probabilities: 0.270 0.022 0.005 0.037 0.206 0.007 0.452 0.000
## left son=10 (209 obs) right son=11 (198 obs)
## Primary splits:
## depth < 5856.25 to the left, improve=125.33350, (0 missing)
## caliper < 8.6255 to the right, improve=111.37570, (0 missing)
## phi.N < 0.227 to the left, improve=110.46260, (0 missing)
## density < 1.962 to the left, improve= 97.15700, (0 missing)
## density.corr < -0.0245 to the left, improve= 89.22577, (0 missing)
## Surrogate splits:
## phi.N < 0.227 to the left, agree=0.907, adj=0.808, (0 split)
## caliper < 8.5835 to the right, agree=0.902, adj=0.798, (0 split)
## density.corr < -0.0195 to the left, agree=0.823, adj=0.636, (0 split)
## density < 2.0055 to the left, agree=0.781, adj=0.551, (0 split)
## gamma < 48.2055 to the right, agree=0.769, adj=0.525, (0 split)
##
## Node number 6: 21 observations
## predicted class=F5 expected loss=0 P(node) =0.02564103
## class counts: 0 0 0 0 21 0 0 0
## probabilities: 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000
##
## Node number 7: 172 observations
## predicted class=F9 expected loss=0 P(node) =0.2100122
## class counts: 0 0 0 0 0 0 0 172
## probabilities: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000
##
## Node number 10: 209 observations, complexity param=0.1149606
## predicted class=F1 expected loss=0.4736842 P(node) =0.2551893
## class counts: 110 7 2 15 73 2 0 0
## probabilities: 0.526 0.033 0.010 0.072 0.349 0.010 0.000 0.000
## left son=20 (113 obs) right son=21 (96 obs)
## Primary splits:
## depth < 5723.75 to the left, improve=80.52183, (0 missing)
## caliper < 8.625 to the right, improve=77.77824, (0 missing)
## gamma < 48.7505 to the right, improve=55.94746, (0 missing)
## density < 1.962 to the left, improve=52.72623, (0 missing)
## phi.core < 30.616 to the right, improve=46.28594, (0 missing)
## Surrogate splits:
## caliper < 8.625 to the right, agree=0.914, adj=0.812, (0 split)
## density < 1.9525 to the left, agree=0.904, adj=0.792, (0 split)
## gamma < 48.7505 to the right, agree=0.876, adj=0.729, (0 split)
## phi.core < 30.23475 to the right, agree=0.861, adj=0.698, (0 split)
## phi.N < 0.167 to the left, agree=0.842, adj=0.656, (0 split)
##
## Node number 11: 198 observations, complexity param=0.01574803
## predicted class=F8 expected loss=0.07070707 P(node) =0.2417582
## class counts: 0 2 0 0 11 1 184 0
## probabilities: 0.000 0.010 0.000 0.000 0.056 0.005 0.929 0.000
## left son=22 (10 obs) right son=23 (188 obs)
## Primary splits:
## phi.N < 0.201 to the left, improve=18.49076, (0 missing)
## R.med < 101.053 to the right, improve=17.06480, (0 missing)
## ind.med < 9.896 to the left, improve=17.06480, (0 missing)
## R.deep < 139.404 to the right, improve=14.58357, (0 missing)
## ind.deep < 7.175 to the left, improve=14.58357, (0 missing)
## Surrogate splits:
## ind.med < 9.6435 to the left, agree=0.995, adj=0.9, (0 split)
## R.med < 102.6155 to the right, agree=0.995, adj=0.9, (0 split)
## ind.deep < 7.175 to the left, agree=0.975, adj=0.5, (0 split)
## gamma < 23.1145 to the left, agree=0.975, adj=0.5, (0 split)
## R.deep < 139.404 to the right, agree=0.975, adj=0.5, (0 split)
##
## Node number 20: 113 observations
## predicted class=F1 expected loss=0.02654867 P(node) =0.1379731
## class counts: 110 3 0 0 0 0 0 0
## probabilities: 0.973 0.027 0.000 0.000 0.000 0.000 0.000 0.000
##
## Node number 21: 96 observations, complexity param=0.02362205
## predicted class=F5 expected loss=0.2395833 P(node) =0.1172161
## class counts: 0 4 2 15 73 2 0 0
## probabilities: 0.000 0.042 0.021 0.156 0.760 0.021 0.000 0.000
## left son=42 (23 obs) right son=43 (73 obs)
## Primary splits:
## ind.deep < 81.2895 to the right, improve=25.72192, (0 missing)
## R.deep < 12.371 to the left, improve=25.72192, (0 missing)
## ind.med < 110.7285 to the right, improve=24.54726, (0 missing)
## R.med < 9.0335 to the left, improve=24.54726, (0 missing)
## caliper < 8.6365 to the right, improve=21.66506, (0 missing)
## Surrogate splits:
## R.deep < 12.371 to the left, agree=1.000, adj=1.000, (0 split)
## ind.med < 88.761 to the right, agree=0.990, adj=0.957, (0 split)
## R.med < 11.293 to the left, agree=0.990, adj=0.957, (0 split)
## depth < 5790 to the left, agree=0.948, adj=0.783, (0 split)
## caliper < 8.6365 to the right, agree=0.948, adj=0.783, (0 split)
##
## Node number 22: 10 observations
## predicted class=F5 expected loss=0 P(node) =0.01221001
## class counts: 0 0 0 0 10 0 0 0
## probabilities: 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000
##
## Node number 23: 188 observations
## predicted class=F8 expected loss=0.0212766 P(node) =0.2295482
## class counts: 0 2 0 0 1 1 184 0
## probabilities: 0.000 0.011 0.000 0.000 0.005 0.005 0.979 0.000
##
## Node number 42: 23 observations
## predicted class=F3 expected loss=0.3478261 P(node) =0.02808303
## class counts: 0 4 2 15 0 2 0 0
## probabilities: 0.000 0.174 0.087 0.652 0.000 0.087 0.000 0.000
##
## Node number 43: 73 observations
## predicted class=F5 expected loss=0 P(node) =0.08913309
## class counts: 0 0 0 0 73 0 0 0
## probabilities: 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000
Head of posterior distributions of Facies:
## 1 2 3 4 5 6
## F10 F1 F1 F1 F1 F1
## Levels: F1 F10 F2 F3 F5 F7 F8 F9
## F1 F10 F2 F3 F5 F7 F8 F9
## 1 0.00456621 0.73972603 0.02739726 0.1826484 0.01826484 0.02739726 0 0
## 2 0.97345133 0.02654867 0.00000000 0.0000000 0.00000000 0.00000000 0 0
## 3 0.97345133 0.02654867 0.00000000 0.0000000 0.00000000 0.00000000 0 0
## 4 0.97345133 0.02654867 0.00000000 0.0000000 0.00000000 0.00000000 0 0
## 5 0.97345133 0.02654867 0.00000000 0.0000000 0.00000000 0.00000000 0 0
## 6 0.97345133 0.02654867 0.00000000 0.0000000 0.00000000 0.00000000 0 0
Means of the Well logs and core data given each Facies:
RPART Modeling Validation by computing the total correct percent.
## F1 F10 F2 F3 F5 F7 F8
## 0.9909910 0.9473684 0.0000000 0.2727273 0.9541284 0.0000000 1.0000000
## F9
## 1.0000000
Total percent correct:
## [1] 0.9120879
Scatteratrix plot of Lithofacies Classification by RPART:
Visualizing the predicted posterior distribution of the Eight Facies.
Combining the posterior distribution of the eight Lithofacies in one plot.
## Warning in bxp(structure(list(stats = structure(c(5667, 5680.75, 5694.5, :
## some notches went outside hinges ('box'): maybe set notch=FALSE
## Warning in bxp(structure(list(stats = structure(c(5667.5, 5681.5, 5695.5, :
## some notches went outside hinges ('box'): maybe set notch=FALSE
References
Breiman, L., J.H. Friedman, R.A. Olshen, , and C.J Stone. (1984) Classification and Regression Trees. Wadsworth, Belmont, Ca.
Al-Mudhafar, W. J. (2015). Integrating Component Analysis & Classification Techniques for Comparative Prediction of Continuous & Discrete Lithofacies Distributions. Offshore Technology Conference. doi:10.4043/25806-MS.
Karpur, L., L. Lake, and K. Sepehrnoori. (2000). Probability Logs for Facies Classification. In Situ 24(1): 57.
Al-Mudhafer, W. J. (2014). Multinomial Logistic Regression for Bayesian Estimation of Vertical Facies Modeling in Heterogeneous Sandstone Reservoirs. Offshore Technology Conference. doi:10.4043/24732-MS.
Al-Mudhafar, W. J. (2015). Applied Geostatistics in R: 1. Naive Bayes Classifier for Lithofacies Modeling in a Sandstone Formation. RPubs.
Al-Mudhafar, W. J. (2015). Applied Geostatistics in R: 2. Applied Geostatistics in R: 2. Logistic Boosting Regression (LogitBoost) for Multinomial Lithofacies Classification in a Sandstone Formation. RPubs.
Al-Mudhafar, W. J. (2015). Applied Geostatistics in R: 3. Linear Discriminant Analysis (LDA) for Multinomial Lithofacies Classification in a Sandstone Formation. RPubs.
Al-Mudhafar, W. J. (2015). Applied Geostatistics in R: 4. Applied Geostatistics in R: 4. Multinomial Logistic Regression (MLR) for Posterior Lithofacies Probability Prediction in a Sandstone Formation. RPubs.