In each scenario, compute Adjusted R2 and RMSE and Plot the measured & predicted permeability in one graph.
This step loads the dataset karpur.csv using the read.csv function. The summary() function provides a quick statistical summary of all columns, including the mean, median, min, max, and counts of missing values for numeric columns.
data<-read.csv("C:/Users/amalm/OneDrive/Desktop/karpur.csv",header=T)
summary(data)
## depth caliper ind.deep ind.med
## Min. :5667 Min. :8.487 Min. : 6.532 Min. : 9.386
## 1st Qu.:5769 1st Qu.:8.556 1st Qu.: 28.799 1st Qu.: 27.892
## Median :5872 Median :8.588 Median :217.849 Median :254.383
## Mean :5873 Mean :8.622 Mean :275.357 Mean :273.357
## 3rd Qu.:5977 3rd Qu.:8.686 3rd Qu.:566.793 3rd Qu.:544.232
## Max. :6083 Max. :8.886 Max. :769.484 Max. :746.028
## gamma phi.N R.deep R.med
## Min. : 16.74 Min. :0.0150 Min. : 1.300 Min. : 1.340
## 1st Qu.: 40.89 1st Qu.:0.2030 1st Qu.: 1.764 1st Qu.: 1.837
## Median : 51.37 Median :0.2450 Median : 4.590 Median : 3.931
## Mean : 53.42 Mean :0.2213 Mean : 24.501 Mean : 21.196
## 3rd Qu.: 62.37 3rd Qu.:0.2640 3rd Qu.: 34.724 3rd Qu.: 35.853
## Max. :112.40 Max. :0.4100 Max. :153.085 Max. :106.542
## SP density.corr density phi.core
## Min. :-73.95 Min. :-0.067000 Min. :1.758 Min. :15.70
## 1st Qu.:-42.01 1st Qu.:-0.016000 1st Qu.:2.023 1st Qu.:23.90
## Median :-32.25 Median :-0.007000 Median :2.099 Median :27.60
## Mean :-30.98 Mean :-0.008883 Mean :2.102 Mean :26.93
## 3rd Qu.:-19.48 3rd Qu.: 0.002000 3rd Qu.:2.181 3rd Qu.:30.70
## Max. : 25.13 Max. : 0.089000 Max. :2.387 Max. :36.30
## k.core Facies X phi.core.fraq
## Min. : 0.42 Length:819 Mode:logical Min. :0.1570
## 1st Qu.: 657.33 Class :character NA's:819 1st Qu.:0.2390
## Median : 1591.22 Mode :character Median :0.2760
## Mean : 2251.91 Mean :0.2693
## 3rd Qu.: 3046.82 3rd Qu.:0.3070
## Max. :15600.00 Max. :0.3630
This step ensures all variables (columns) in the dataset have more than one unique value. Columns with only one level (e.g., a categorical variable where all rows have the same value) are removed since they provide no variance or predictive power for modeling.
data <- data[, sapply(data, function(x) length(unique(x)) > 1)]
The column Facies is converted into a factor (categorical variable). This allows proper handling by the regression model. The caret package is loaded, which provides functions like RMSE for evaluating model performance.
data$Facies <- as.factor(data$Facies) # Convert to factor if needed
library(caret)
## Warning: package 'caret' was built under R version 4.4.2
## Loading required package: ggplot2
## Loading required package: lattice
A multiple linear regression (MLR) model is fitted using all variables in the dataset except Facies. The summary() function provides the coefficients, significance levels, and overall statistics (e.g., R²) for the model. Predictions (k.predicted_1) are made using the same data, and a scatter plot is generated to compare measured (data$k.core) and predicted values.
model_1<- lm(k.core~ .-Facies,data=data)
summary(model_1)
##
## Call:
## lm(formula = k.core ~ . - Facies, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5549.5 -755.5 -178.1 578.0 11260.8
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60762.728 16605.360 3.659 0.000269 ***
## depth -7.398 1.446 -5.115 3.92e-07 ***
## caliper -3955.952 1055.105 -3.749 0.000190 ***
## ind.deep -14.183 2.345 -6.048 2.24e-09 ***
## ind.med 17.300 2.509 6.896 1.08e-11 ***
## gamma -77.487 5.475 -14.153 < 2e-16 ***
## phi.N -1784.704 1301.772 -1.371 0.170763
## R.deep -26.007 6.974 -3.729 0.000206 ***
## R.med 63.525 9.841 6.455 1.86e-10 ***
## SP -8.784 3.460 -2.539 0.011313 *
## density.corr -523.060 5358.876 -0.098 0.922269
## density 8011.106 1120.554 7.149 1.96e-12 ***
## phi.core 183.203 23.802 7.697 4.07e-14 ***
## phi.core.fraq NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1442 on 806 degrees of freedom
## Multiple R-squared: 0.5903, Adjusted R-squared: 0.5842
## F-statistic: 96.77 on 12 and 806 DF, p-value: < 2.2e-16
k.predicted_1 <-predict(model_1,data=data)
plot(k.predicted_1,data$k.core)
The Root Mean Square Error (RMSE) for the first model is calculated using the caret::RMSE() function, which measures the average prediction error. Lower RMSE values indicate better model performance.
rmse_1<- RMSE(k.predicted_1,data$k.core )
rmse_1
## [1] 1430.118
Stepwise selection is applied to the first model (model_1) to simplify it by removing less significant predictors. The direction = “backward” argument removes predictors one by one, starting with the least statistically significant, until the model achieves optimal performance. The new model (model_2) is evaluated, and predictions are compared with actual values using another scatter plot.
model_2<-step(model_1 , direction = "backward")
## Start: AIC=11926.91
## k.core ~ (depth + caliper + ind.deep + ind.med + gamma + phi.N +
## R.deep + R.med + SP + density.corr + density + phi.core +
## Facies + phi.core.fraq) - Facies
##
##
## Step: AIC=11926.91
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N +
## R.deep + R.med + SP + density.corr + density + phi.core
##
## Df Sum of Sq RSS AIC
## - density.corr 1 19799 1675068713 11925
## - phi.N 1 3906205 1678955118 11927
## <none> 1675048914 11927
## - SP 1 13394190 1688443104 11931
## - R.deep 1 28897686 1703946599 11939
## - caliper 1 29214826 1704263740 11939
## - depth 1 54372650 1729421563 11951
## - ind.deep 1 76022788 1751071701 11961
## - R.med 1 86603706 1761652619 11966
## - ind.med 1 98823752 1773872666 11972
## - density 1 106221406 1781270319 11975
## - phi.core 1 123125117 1798174031 11983
## - gamma 1 416312526 2091361440 12107
##
## Step: AIC=11924.92
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N +
## R.deep + R.med + SP + density + phi.core
##
## Df Sum of Sq RSS AIC
## <none> 1675068713 11925
## - phi.N 1 4564880 1679633593 11925
## - SP 1 13491079 1688559792 11930
## - R.deep 1 28896144 1703964857 11937
## - caliper 1 29253869 1704322581 11937
## - depth 1 54825159 1729893872 11949
## - ind.deep 1 77573926 1752642639 11960
## - R.med 1 86772220 1761840933 11964
## - ind.med 1 100740701 1775809413 11971
## - density 1 114209586 1789278299 11977
## - phi.core 1 124694278 1799762991 11982
## - gamma 1 417015194 2092083907 12105
summary(model_2)
##
## Call:
## lm(formula = k.core ~ depth + caliper + ind.deep + ind.med +
## gamma + phi.N + R.deep + R.med + SP + density + phi.core,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5545.3 -753.4 -177.1 576.8 11260.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60910.619 16525.937 3.686 0.000243 ***
## depth -7.409 1.442 -5.139 3.46e-07 ***
## caliper -3957.892 1054.270 -3.754 0.000186 ***
## ind.deep -14.146 2.314 -6.113 1.52e-09 ***
## ind.med 17.263 2.478 6.967 6.74e-12 ***
## gamma -77.461 5.465 -14.174 < 2e-16 ***
## phi.N -1825.771 1231.150 -1.483 0.138470
## R.deep -25.972 6.961 -3.731 0.000204 ***
## R.med 63.466 9.816 6.466 1.75e-10 ***
## SP -8.803 3.453 -2.549 0.010974 *
## density 7980.761 1075.902 7.418 3.02e-13 ***
## phi.core 183.436 23.667 7.751 2.75e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1441 on 807 degrees of freedom
## Multiple R-squared: 0.5903, Adjusted R-squared: 0.5847
## F-statistic: 105.7 on 11 and 807 DF, p-value: < 2.2e-16
k.predicted_2 <-predict(model_2,data=data)
plot(k.predicted_2,data$k.core)
The RMSE for the stepwise model (model_2) is calculated. This step evaluates whether the simplified model performs as well as or better than the full model.
rmse_2<- RMSE(k.predicted_2,data$k.core )
rmse_2
## [1] 1430.126
This model includes all available variables in the dataset as predictors, including Facies. Predictions are generated using the model, and a scatter plot is created to compare measured permeability (data$k.core) against predicted values (k.predicted_3). The purpose of this step is to observe how including Facies impacts the model.
model_3<- lm(k.core~ .,data=data)
summary(model_3)
##
## Call:
## lm(formula = k.core ~ ., data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5585.6 -568.9 49.2 476.5 8928.4
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.783e+04 1.760e+04 -3.853 0.000126 ***
## depth 8.544e+00 1.785e+00 4.786 2.02e-06 ***
## caliper 1.413e+03 1.019e+03 1.387 0.165789
## ind.deep -2.418e-01 2.354e+00 -0.103 0.918220
## ind.med 1.224e+00 2.585e+00 0.473 0.636062
## gamma -4.583e+01 6.010e+00 -7.626 6.88e-14 ***
## phi.N -2.010e+03 1.476e+03 -1.362 0.173540
## R.deep -2.344e+01 6.288e+00 -3.727 0.000207 ***
## R.med 5.643e+01 9.065e+00 6.225 7.76e-10 ***
## SP -7.125e+00 3.145e+00 -2.266 0.023736 *
## density.corr -2.567e+03 4.809e+03 -0.534 0.593602
## density 2.319e+03 1.173e+03 1.976 0.048458 *
## phi.core 1.921e+02 2.282e+01 8.418 < 2e-16 ***
## FaciesF10 8.921e+02 3.590e+02 2.485 0.013157 *
## FaciesF2 9.243e+02 5.818e+02 1.589 0.112514
## FaciesF3 4.393e+02 3.344e+02 1.313 0.189394
## FaciesF5 7.411e+02 3.428e+02 2.162 0.030908 *
## FaciesF7 -4.152e+01 5.742e+02 -0.072 0.942377
## FaciesF8 -1.179e+03 3.927e+02 -3.002 0.002770 **
## FaciesF9 -2.969e+03 4.298e+02 -6.908 1.00e-11 ***
## phi.core.fraq NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1262 on 799 degrees of freedom
## Multiple R-squared: 0.6889, Adjusted R-squared: 0.6815
## F-statistic: 93.12 on 19 and 799 DF, p-value: < 2.2e-16
k.predicted_3 <-predict(model_3,data=data)
plot(k.predicted_3,data$k.core)
rmse_3<- RMSE(k.predicted_3,data$k.core )
rmse_3
## [1] 1246.201
Stepwise selection is applied to the third model (model_3) to remove less significant predictors and improve interpretability. The new predictions (k.predicted_4) are plotted against actual values to evaluate the performance of the stepwise-selected model.
model_4<-step(model_3 , direction = "backward")
## Start: AIC=11715.43
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N +
## R.deep + R.med + SP + density.corr + density + phi.core +
## Facies + phi.core.fraq
##
##
## Step: AIC=11715.43
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N +
## R.deep + R.med + SP + density.corr + density + phi.core +
## Facies
##
## Df Sum of Sq RSS AIC
## - ind.deep 1 16793 1271937992 11713
## - ind.med 1 356746 1272277945 11714
## - density.corr 1 453661 1272374861 11714
## - phi.N 1 2953609 1274874809 11715
## - caliper 1 3063007 1274984206 11715
## <none> 1271921199 11715
## - density 1 6217927 1278139127 11717
## - SP 1 8171834 1280093033 11719
## - R.deep 1 22117394 1294038593 11728
## - depth 1 36466976 1308388176 11737
## - R.med 1 61690461 1333611660 11752
## - gamma 1 92579723 1364500923 11771
## - phi.core 1 112793101 1384714301 11783
## - Facies 7 403127714 1675048914 11927
##
## Step: AIC=11713.44
## k.core ~ depth + caliper + ind.med + gamma + phi.N + R.deep +
## R.med + SP + density.corr + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - density.corr 1 437546 1272375538 11712
## - phi.N 1 2938766 1274876758 11713
## - caliper 1 3074396 1275012389 11713
## <none> 1271937992 11713
## - density 1 6228928 1278166920 11715
## - ind.med 1 6905855 1278843848 11716
## - SP 1 8191802 1280129794 11717
## - R.deep 1 22125695 1294063687 11726
## - depth 1 39139470 1311077462 11736
## - R.med 1 61773953 1333711946 11750
## - gamma 1 92865220 1364803212 11769
## - phi.core 1 112960440 1384898432 11781
## - Facies 7 479133709 1751071701 11961
##
## Step: AIC=11711.72
## k.core ~ depth + caliper + ind.med + gamma + phi.N + R.deep +
## R.med + SP + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - caliper 1 2980713 1275356252 11712
## <none> 1272375538 11712
## - phi.N 1 3279032 1275654571 11712
## - density 1 5792837 1278168375 11713
## - ind.med 1 6813959 1279189497 11714
## - SP 1 8391302 1280766840 11715
## - R.deep 1 22009402 1294384940 11724
## - depth 1 38705776 1311081314 11734
## - R.med 1 61436819 1333812357 11748
## - gamma 1 93974329 1366349868 11768
## - phi.core 1 115336515 1387712053 11781
## - Facies 7 480267100 1752642639 11960
##
## Step: AIC=11711.64
## k.core ~ depth + ind.med + gamma + phi.N + R.deep + R.med + SP +
## density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - phi.N 1 2534906 1277891157 11711
## <none> 1275356252 11712
## - density 1 7270311 1282626562 11714
## - SP 1 8733336 1284089587 11715
## - ind.med 1 12924050 1288280301 11718
## - R.deep 1 22449117 1297805369 11724
## - depth 1 51507476 1326863728 11742
## - R.med 1 60137982 1335494234 11747
## - phi.core 1 112564835 1387921086 11779
## - gamma 1 141535555 1416891807 11796
## - Facies 7 520094756 1795451008 11978
##
## Step: AIC=11711.26
## k.core ~ depth + ind.med + gamma + R.deep + R.med + SP + density +
## phi.core + Facies
##
## Df Sum of Sq RSS AIC
## <none> 1277891157 11711
## - density 1 5155969 1283047127 11713
## - SP 1 8515796 1286406953 11715
## - ind.med 1 10944937 1288836095 11716
## - R.deep 1 23273312 1301164469 11724
## - depth 1 49725248 1327616405 11740
## - R.med 1 59454645 1337345802 11746
## - phi.core 1 110154394 1388045551 11777
## - gamma 1 219059092 1496950249 11839
## - Facies 7 526383446 1804274603 11980
summary(model_4)
##
## Call:
## lm(formula = k.core ~ depth + ind.med + gamma + R.deep + R.med +
## SP + density + phi.core + Facies, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5608.3 -567.8 35.9 500.7 8989.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.322e+04 6.625e+03 -6.523 1.22e-10 ***
## depth 6.648e+00 1.189e+00 5.590 3.11e-08 ***
## ind.med 1.078e+00 4.111e-01 2.623 0.008894 **
## gamma -5.324e+01 4.537e+00 -11.733 < 2e-16 ***
## R.deep -2.395e+01 6.264e+00 -3.824 0.000141 ***
## R.med 5.515e+01 9.022e+00 6.112 1.53e-09 ***
## SP -7.214e+00 3.118e+00 -2.313 0.020960 *
## density 1.880e+03 1.044e+03 1.800 0.072240 .
## phi.core 1.817e+02 2.184e+01 8.320 3.77e-16 ***
## FaciesF10 8.266e+02 3.533e+02 2.340 0.019553 *
## FaciesF2 7.035e+02 5.567e+02 1.264 0.206697
## FaciesF3 4.100e+02 3.228e+02 1.270 0.204443
## FaciesF5 5.913e+02 3.211e+02 1.841 0.065924 .
## FaciesF7 -3.159e+02 5.402e+02 -0.585 0.558866
## FaciesF8 -1.455e+03 3.122e+02 -4.661 3.69e-06 ***
## FaciesF9 -3.017e+03 3.764e+02 -8.017 3.82e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1262 on 803 degrees of freedom
## Multiple R-squared: 0.6874, Adjusted R-squared: 0.6816
## F-statistic: 117.7 on 15 and 803 DF, p-value: < 2.2e-16
k.predicted_4 <-predict(model_4,data=data)
plot(k.predicted_4,data$k.core)
rmse_4<- RMSE(k.predicted_4,data$k.core )
rmse_4
## [1] 1249.122
The permeability values are transformed to their base-10 logarithms (log10) to normalize skewed data and stabilize variance. A regression model (model_5) is created using all predictors, including Facies, to predict log_k.core. The predicted logarithmic permeability (k.predicted_5) is plotted against the actual transformed values.
data$log10_k.core<-log10(data$k.core)
model_5<- lm(log10_k.core~.-k.core,data=data)
summary(model_5)
##
## Call:
## lm(formula = log10_k.core ~ . - k.core, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5804 -0.1138 0.0322 0.1529 0.7384
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.3461877 4.6532000 -0.504 0.61425
## depth 0.0007425 0.0004718 1.574 0.11596
## caliper -0.4605945 0.2693103 -1.710 0.08760 .
## ind.deep -0.0007951 0.0006222 -1.278 0.20168
## ind.med 0.0007137 0.0006833 1.044 0.29659
## gamma -0.0091269 0.0015885 -5.746 1.30e-08 ***
## phi.N -1.7628155 0.3901024 -4.519 7.16e-06 ***
## R.deep -0.0025878 0.0016620 -1.557 0.11987
## R.med 0.0044073 0.0023960 1.839 0.06622 .
## SP -0.0016935 0.0008312 -2.037 0.04194 *
## density.corr 1.4462633 1.2712045 1.138 0.25558
## density 1.6148374 0.3100921 5.208 2.44e-07 ***
## phi.core 0.0948634 0.0060329 15.724 < 2e-16 ***
## FaciesF10 0.0786460 0.0948909 0.829 0.40746
## FaciesF2 -0.0184334 0.1537793 -0.120 0.90462
## FaciesF3 -0.0307548 0.0883957 -0.348 0.72799
## FaciesF5 0.1094193 0.0906034 1.208 0.22753
## FaciesF7 0.2811620 0.1517797 1.852 0.06433 .
## FaciesF8 -0.0976234 0.1038054 -0.940 0.34727
## FaciesF9 -0.3562116 0.1135966 -3.136 0.00178 **
## phi.core.fraq NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3335 on 799 degrees of freedom
## Multiple R-squared: 0.6806, Adjusted R-squared: 0.673
## F-statistic: 89.6 on 19 and 799 DF, p-value: < 2.2e-16
log_k.predicted_5 <-predict(model_5,data=data)
k.predicted_5<-10^log_k.predicted_5
plot(k.predicted_5,data$k.core)
rmse_5<- RMSE(k.predicted_5,data$k.core )
rmse_5
## [1] 1333.017
Stepwise selection is applied to the logarithmic regression model (model_5) to optimize predictor selection. The simplified model (model_6) is used to make predictions, which are plotted against the actual transformed permeability values.
model_6<-step(model_5, direction = "backward")
## Start: AIC=-1779.02
## log10_k.core ~ (depth + caliper + ind.deep + ind.med + gamma +
## phi.N + R.deep + R.med + SP + density.corr + density + phi.core +
## k.core + Facies + phi.core.fraq) - k.core
##
##
## Step: AIC=-1779.02
## log10_k.core ~ depth + caliper + ind.deep + ind.med + gamma +
## phi.N + R.deep + R.med + SP + density.corr + density + phi.core +
## Facies
##
## Df Sum of Sq RSS AIC
## - ind.med 1 0.1213 88.981 -1779.9
## - density.corr 1 0.1440 89.004 -1779.7
## - ind.deep 1 0.1816 89.042 -1779.3
## <none> 88.860 -1779.0
## - R.deep 1 0.2696 89.130 -1778.5
## - depth 1 0.2754 89.135 -1778.5
## - caliper 1 0.3253 89.185 -1778.0
## - R.med 1 0.3763 89.236 -1777.6
## - SP 1 0.4617 89.322 -1776.8
## - phi.N 1 2.2710 91.131 -1760.3
## - density 1 3.0160 91.876 -1753.7
## - gamma 1 3.6713 92.531 -1747.9
## - Facies 7 7.0758 95.936 -1730.3
## - phi.core 1 27.4982 116.358 -1560.2
##
## Step: AIC=-1779.9
## log10_k.core ~ depth + caliper + ind.deep + gamma + phi.N + R.deep +
## R.med + SP + density.corr + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - density.corr 1 0.1931 89.174 -1780.1
## <none> 88.981 -1779.9
## - ind.deep 1 0.2179 89.199 -1779.9
## - R.deep 1 0.2447 89.226 -1779.7
## - caliper 1 0.2921 89.273 -1779.2
## - R.med 1 0.3397 89.321 -1778.8
## - SP 1 0.4101 89.391 -1778.1
## - depth 1 0.4622 89.444 -1777.7
## - phi.N 1 2.2035 91.185 -1761.9
## - density 1 3.0113 91.993 -1754.6
## - gamma 1 3.5761 92.557 -1749.6
## - Facies 7 9.1242 98.106 -1714.0
## - phi.core 1 27.4190 116.400 -1561.9
##
## Step: AIC=-1780.12
## log10_k.core ~ depth + caliper + ind.deep + gamma + phi.N + R.deep +
## R.med + SP + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - ind.deep 1 0.2180 89.392 -1780.1
## <none> 89.174 -1780.1
## - R.deep 1 0.2526 89.427 -1779.8
## - caliper 1 0.2676 89.442 -1779.7
## - R.med 1 0.3598 89.534 -1778.8
## - SP 1 0.3832 89.558 -1778.6
## - depth 1 0.5404 89.715 -1777.2
## - phi.N 1 2.0726 91.247 -1763.3
## - gamma 1 3.4838 92.658 -1750.7
## - density 1 3.6220 92.796 -1749.5
## - Facies 7 9.3567 98.531 -1712.4
## - phi.core 1 27.2273 116.402 -1563.9
##
## Step: AIC=-1780.12
## log10_k.core ~ depth + caliper + gamma + phi.N + R.deep + R.med +
## SP + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## <none> 89.392 -1780.1
## - R.deep 1 0.2869 89.679 -1779.5
## - depth 1 0.3332 89.726 -1779.1
## - SP 1 0.4296 89.822 -1778.2
## - R.med 1 0.5085 89.901 -1777.5
## - caliper 1 0.5746 89.967 -1776.9
## - phi.N 1 2.3337 91.726 -1761.0
## - gamma 1 3.8214 93.214 -1747.8
## - density 1 3.8626 93.255 -1747.5
## - Facies 7 9.2100 98.602 -1713.8
## - phi.core 1 27.0935 116.486 -1565.3
summary(model_6)
##
## Call:
## lm(formula = log10_k.core ~ depth + caliper + gamma + phi.N +
## R.deep + R.med + SP + density + phi.core + Facies, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.58182 -0.12001 0.03437 0.15230 0.70317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.2671250 3.9316796 -0.322 0.74732
## depth 0.0006562 0.0003796 1.729 0.08420 .
## caliper -0.5608681 0.2470292 -2.270 0.02344 *
## gamma -0.0091497 0.0015626 -5.855 6.94e-09 ***
## phi.N -1.7463527 0.3816550 -4.576 5.50e-06 ***
## R.deep -0.0026554 0.0016551 -1.604 0.10903
## R.med 0.0049837 0.0023334 2.136 0.03300 *
## SP -0.0016140 0.0008221 -1.963 0.04996 *
## density 1.7602255 0.2990153 5.887 5.79e-09 ***
## phi.core 0.0927539 0.0059493 15.591 < 2e-16 ***
## FaciesF10 0.0896953 0.0945929 0.948 0.34330
## FaciesF2 0.0152576 0.1523676 0.100 0.92026
## FaciesF3 -0.0292379 0.0869197 -0.336 0.73667
## FaciesF5 0.1022238 0.0879087 1.163 0.24524
## FaciesF7 0.2794793 0.1462763 1.911 0.05641 .
## FaciesF8 -0.0932936 0.0927473 -1.006 0.31477
## FaciesF9 -0.3877078 0.1030388 -3.763 0.00018 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3339 on 802 degrees of freedom
## Multiple R-squared: 0.6787, Adjusted R-squared: 0.6722
## F-statistic: 105.9 on 16 and 802 DF, p-value: < 2.2e-16
log_k.predicted_6 <-predict(model_6,data=data)
k.predicted_6<-10^log_k.predicted_6
plot(k.predicted_6,data$k.core)
rmse_6<- RMSE(k.predicted_6,data$k.core )
rmse_6
## [1] 1330.932
The dataset is randomly split into 75% for training and 25% for testing using sample_frac and anti_join. This ensures that the training and testing sets are mutually exclusive.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
set.seed(12345)
training<-sample_frac(data, .80)
testing<-anti_join(data,training)
## Joining with `by = join_by(depth, caliper, ind.deep, ind.med, gamma, phi.N,
## R.deep, R.med, SP, density.corr, density, phi.core, k.core, Facies,
## phi.core.fraq, log10_k.core)`
A multiple linear regression model (model_7) is built on the training set, with log10_k.core as the dependent variable. The original k.core column is excluded from predictors to avoid redundancy. Predictions for log-transformed permeability (log_k.predicted_7) are made for the testing set and back-transformed to the original scale using 10^. A scatter plot compares predicted permeability (k.predicted_7) with actual permeability (testing$k.core) for visual performance evaluation.
model_7<- lm(log10_k.core~.-k.core,data=training)
summary(model_7)
##
## Call:
## lm(formula = log10_k.core ~ . - k.core, data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.57123 -0.11876 0.02389 0.14325 0.79914
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.4732574 5.2931806 -0.467 0.64048
## depth 0.0007823 0.0005429 1.441 0.15013
## caliper -0.4661148 0.3064028 -1.521 0.12870
## ind.deep -0.0005275 0.0007083 -0.745 0.45671
## ind.med 0.0004756 0.0007769 0.612 0.54066
## gamma -0.0084138 0.0018608 -4.522 7.32e-06 ***
## phi.N -1.7476182 0.4515742 -3.870 0.00012 ***
## R.deep -0.0026534 0.0020047 -1.324 0.18611
## R.med 0.0046611 0.0028759 1.621 0.10557
## SP -0.0022459 0.0009607 -2.338 0.01972 *
## density.corr 1.6498996 1.3916437 1.186 0.23623
## density 1.5514223 0.3493920 4.440 1.06e-05 ***
## phi.core 0.0950263 0.0068555 13.861 < 2e-16 ***
## FaciesF10 0.0891010 0.1084215 0.822 0.41150
## FaciesF2 -0.0045917 0.1652854 -0.028 0.97785
## FaciesF3 -0.0705671 0.1022238 -0.690 0.49025
## FaciesF5 0.1276889 0.1055910 1.209 0.22701
## FaciesF7 0.3135309 0.1645276 1.906 0.05715 .
## FaciesF8 -0.1040710 0.1209194 -0.861 0.38975
## FaciesF9 -0.3722565 0.1340632 -2.777 0.00565 **
## phi.core.fraq NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3399 on 635 degrees of freedom
## Multiple R-squared: 0.6689, Adjusted R-squared: 0.659
## F-statistic: 67.53 on 19 and 635 DF, p-value: < 2.2e-16
log_k.predicted_7 <-predict(model_7,newdata=testing)
k.predicted_7<-10^log_k.predicted_7
plot(k.predicted_7,testing$k.core)
rmse_7<- RMSE(k.predicted_7,testing$k.core )
rmse_7
## [1] 1619.041
Stepwise regression simplifies model_7 by backward elimination to retain only the most significant predictors. The reduced model (model_8) is used to predict permeability on the testing set. Predictions are back-transformed to the original scale and plotted against actual permeability values.
model_8<-step(model_7, direction = "backward")
## Start: AIC=-1394.04
## log10_k.core ~ (depth + caliper + ind.deep + ind.med + gamma +
## phi.N + R.deep + R.med + SP + density.corr + density + phi.core +
## k.core + Facies + phi.core.fraq) - k.core
##
##
## Step: AIC=-1394.04
## log10_k.core ~ depth + caliper + ind.deep + ind.med + gamma +
## phi.N + R.deep + R.med + SP + density.corr + density + phi.core +
## Facies
##
## Df Sum of Sq RSS AIC
## - ind.med 1 0.0433 73.395 -1395.7
## - ind.deep 1 0.0641 73.416 -1395.5
## - density.corr 1 0.1624 73.514 -1394.6
## - R.deep 1 0.2024 73.554 -1394.2
## <none> 73.352 -1394.0
## - depth 1 0.2398 73.592 -1393.9
## - caliper 1 0.2673 73.619 -1393.7
## - R.med 1 0.3034 73.655 -1393.3
## - SP 1 0.6312 73.983 -1390.4
## - phi.N 1 1.7301 75.082 -1380.8
## - density 1 2.2776 75.629 -1376.0
## - gamma 1 2.3617 75.714 -1375.3
## - Facies 7 6.4874 79.839 -1352.5
## - phi.core 1 22.1943 95.546 -1222.9
##
## Step: AIC=-1395.65
## log10_k.core ~ depth + caliper + ind.deep + gamma + phi.N + R.deep +
## R.med + SP + density.corr + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - ind.deep 1 0.0759 73.471 -1397.0
## - R.deep 1 0.1888 73.584 -1396.0
## - density.corr 1 0.1974 73.593 -1395.9
## <none> 73.395 -1395.7
## - caliper 1 0.2469 73.642 -1395.5
## - R.med 1 0.2845 73.680 -1395.1
## - depth 1 0.3477 73.743 -1394.5
## - SP 1 0.6012 73.996 -1392.3
## - phi.N 1 1.7008 75.096 -1382.6
## - density 1 2.2803 75.676 -1377.6
## - gamma 1 2.3195 75.715 -1377.3
## - Facies 7 7.7978 81.193 -1343.5
## - phi.core 1 22.1551 95.550 -1224.9
##
## Step: AIC=-1396.97
## log10_k.core ~ depth + caliper + gamma + phi.N + R.deep + R.med +
## SP + density.corr + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - density.corr 1 0.1982 73.669 -1397.2
## - R.deep 1 0.2114 73.682 -1397.1
## <none> 73.471 -1397.0
## - depth 1 0.2718 73.743 -1396.5
## - R.med 1 0.3711 73.842 -1395.7
## - caliper 1 0.4138 73.885 -1395.3
## - SP 1 0.6465 74.118 -1393.2
## - phi.N 1 1.8773 75.348 -1382.5
## - density 1 2.4061 75.877 -1377.9
## - gamma 1 2.4768 75.948 -1377.2
## - Facies 7 7.7419 81.213 -1345.3
## - phi.core 1 22.2233 95.694 -1225.9
##
## Step: AIC=-1397.21
## log10_k.core ~ depth + caliper + gamma + phi.N + R.deep + R.med +
## SP + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - R.deep 1 0.2189 73.888 -1397.3
## <none> 73.669 -1397.2
## - depth 1 0.3370 74.006 -1396.2
## - caliper 1 0.3897 74.059 -1395.8
## - R.med 1 0.3941 74.063 -1395.7
## - SP 1 0.6070 74.276 -1393.8
## - phi.N 1 1.7504 75.420 -1383.8
## - gamma 1 2.3915 76.061 -1378.3
## - density 1 2.8893 76.559 -1374.0
## - Facies 7 7.9373 81.607 -1344.2
## - phi.core 1 22.0361 95.705 -1227.8
##
## Step: AIC=-1397.26
## log10_k.core ~ depth + caliper + gamma + phi.N + R.med + SP +
## density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## <none> 73.888 -1397.3
## - depth 1 0.2491 74.137 -1397.1
## - R.med 1 0.3477 74.236 -1396.2
## - caliper 1 0.3903 74.278 -1395.8
## - SP 1 0.5090 74.397 -1394.8
## - phi.N 1 1.8514 75.740 -1383.0
## - gamma 1 2.2934 76.182 -1379.2
## - density 1 2.7627 76.651 -1375.2
## - Facies 7 7.8761 81.764 -1344.9
## - phi.core 1 22.6856 96.574 -1223.9
summary(model_8)
##
## Call:
## lm(formula = log10_k.core ~ depth + caliper + gamma + phi.N +
## R.med + SP + density + phi.core + Facies, data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.58792 -0.11365 0.02482 0.14757 0.76086
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.4288931 4.5051297 -0.317 0.7512
## depth 0.0006448 0.0004394 1.468 0.1427
## caliper -0.5199205 0.2830059 -1.837 0.0667 .
## gamma -0.0081181 0.0018229 -4.453 9.97e-06 ***
## phi.N -1.7577281 0.4392713 -4.001 7.03e-05 ***
## R.med 0.0015024 0.0008664 1.734 0.0834 .
## SP -0.0019650 0.0009366 -2.098 0.0363 *
## density 1.6440250 0.3363417 4.888 1.29e-06 ***
## phi.core 0.0940016 0.0067112 14.007 < 2e-16 ***
## FaciesF10 0.1191848 0.1069107 1.115 0.2654
## FaciesF2 0.0434273 0.1630734 0.266 0.7901
## FaciesF3 -0.0576872 0.0999261 -0.577 0.5639
## FaciesF5 0.1577376 0.0999557 1.578 0.1150
## FaciesF7 0.3390997 0.1567647 2.163 0.0309 *
## FaciesF8 -0.0573559 0.1056215 -0.543 0.5873
## FaciesF9 -0.3543834 0.1193547 -2.969 0.0031 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.34 on 639 degrees of freedom
## Multiple R-squared: 0.6665, Adjusted R-squared: 0.6587
## F-statistic: 85.14 on 15 and 639 DF, p-value: < 2.2e-16
log_k.predicted_8 <-predict(model_8,newdata=testing)
k.predicted_8<-10^log_k.predicted_8
plot(k.predicted_8,testing$k.core)
RMSE is recalculated for the reduced model to compare its performance against the original model.
rmse_8<- RMSE(k.predicted_8,testing$k.core )
rmse_8
## [1] 1669.368