The objective : Conduct MLR using the Karpur Dataset to model and predict permeability in multiple scenarios, as follows:

  1. MLR of Permeability given all well logs without Facies.
  2. Apply Stepwise Elimination on the previous model.
  3. MLR of Permeability given all well logs with Facies.
  4. Apply Stepwise Elimination on the previous model
  5. MLR of Log10(Permeability) given all well logs with Facies.
  6. Apply Stepwise Elimination on the previous model
  7. Random Subsampling Cross-Validation applied on MLR of Log10(Permeability) given all well logs with Facies.
  8. Apply Stepwise Elimination on the previous model.

In each scenario, compute Adjusted R2 and RMSE and Plot the measured & predicted permeability in one graph.

1- First we run the data

Description:

This step loads the dataset karpur.csv using the read.csv function. The summary() function provides a quick statistical summary of all columns, including the mean, median, min, max, and counts of missing values for numeric columns.

data<-read.csv("C:/Users/amalm/OneDrive/Desktop/karpur.csv",header=T)
summary(data)
##      depth         caliper         ind.deep          ind.med       
##  Min.   :5667   Min.   :8.487   Min.   :  6.532   Min.   :  9.386  
##  1st Qu.:5769   1st Qu.:8.556   1st Qu.: 28.799   1st Qu.: 27.892  
##  Median :5872   Median :8.588   Median :217.849   Median :254.383  
##  Mean   :5873   Mean   :8.622   Mean   :275.357   Mean   :273.357  
##  3rd Qu.:5977   3rd Qu.:8.686   3rd Qu.:566.793   3rd Qu.:544.232  
##  Max.   :6083   Max.   :8.886   Max.   :769.484   Max.   :746.028  
##      gamma            phi.N            R.deep            R.med        
##  Min.   : 16.74   Min.   :0.0150   Min.   :  1.300   Min.   :  1.340  
##  1st Qu.: 40.89   1st Qu.:0.2030   1st Qu.:  1.764   1st Qu.:  1.837  
##  Median : 51.37   Median :0.2450   Median :  4.590   Median :  3.931  
##  Mean   : 53.42   Mean   :0.2213   Mean   : 24.501   Mean   : 21.196  
##  3rd Qu.: 62.37   3rd Qu.:0.2640   3rd Qu.: 34.724   3rd Qu.: 35.853  
##  Max.   :112.40   Max.   :0.4100   Max.   :153.085   Max.   :106.542  
##        SP          density.corr          density         phi.core    
##  Min.   :-73.95   Min.   :-0.067000   Min.   :1.758   Min.   :15.70  
##  1st Qu.:-42.01   1st Qu.:-0.016000   1st Qu.:2.023   1st Qu.:23.90  
##  Median :-32.25   Median :-0.007000   Median :2.099   Median :27.60  
##  Mean   :-30.98   Mean   :-0.008883   Mean   :2.102   Mean   :26.93  
##  3rd Qu.:-19.48   3rd Qu.: 0.002000   3rd Qu.:2.181   3rd Qu.:30.70  
##  Max.   : 25.13   Max.   : 0.089000   Max.   :2.387   Max.   :36.30  
##      k.core            Facies             X           phi.core.fraq   
##  Min.   :    0.42   Length:819         Mode:logical   Min.   :0.1570  
##  1st Qu.:  657.33   Class :character   NA's:819       1st Qu.:0.2390  
##  Median : 1591.22   Mode  :character                  Median :0.2760  
##  Mean   : 2251.91                                     Mean   :0.2693  
##  3rd Qu.: 3046.82                                     3rd Qu.:0.3070  
##  Max.   :15600.00                                     Max.   :0.3630

2- Remove or Modify Problematic Factors:

Description:

This step ensures all variables (columns) in the dataset have more than one unique value. Columns with only one level (e.g., a categorical variable where all rows have the same value) are removed since they provide no variance or predictive power for modeling.

data <- data[, sapply(data, function(x) length(unique(x)) > 1)]

3- Convert Non-Factor Variables:

Description:

The column Facies is converted into a factor (categorical variable). This allows proper handling by the regression model. The caret package is loaded, which provides functions like RMSE for evaluating model performance.

data$Facies <- as.factor(data$Facies)  # Convert to factor if needed
library(caret)
## Warning: package 'caret' was built under R version 4.4.2
## Loading required package: ggplot2
## Loading required package: lattice

4- Run the First Model

Description:

A multiple linear regression (MLR) model is fitted using all variables in the dataset except Facies. The summary() function provides the coefficients, significance levels, and overall statistics (e.g., R²) for the model. Predictions (k.predicted_1) are made using the same data, and a scatter plot is generated to compare measured (data$k.core) and predicted values.

model_1<- lm(k.core~ .-Facies,data=data)
summary(model_1)
## 
## Call:
## lm(formula = k.core ~ . - Facies, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5549.5  -755.5  -178.1   578.0 11260.8 
## 
## Coefficients: (1 not defined because of singularities)
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   60762.728  16605.360   3.659 0.000269 ***
## depth            -7.398      1.446  -5.115 3.92e-07 ***
## caliper       -3955.952   1055.105  -3.749 0.000190 ***
## ind.deep        -14.183      2.345  -6.048 2.24e-09 ***
## ind.med          17.300      2.509   6.896 1.08e-11 ***
## gamma           -77.487      5.475 -14.153  < 2e-16 ***
## phi.N         -1784.704   1301.772  -1.371 0.170763    
## R.deep          -26.007      6.974  -3.729 0.000206 ***
## R.med            63.525      9.841   6.455 1.86e-10 ***
## SP               -8.784      3.460  -2.539 0.011313 *  
## density.corr   -523.060   5358.876  -0.098 0.922269    
## density        8011.106   1120.554   7.149 1.96e-12 ***
## phi.core        183.203     23.802   7.697 4.07e-14 ***
## phi.core.fraq        NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1442 on 806 degrees of freedom
## Multiple R-squared:  0.5903, Adjusted R-squared:  0.5842 
## F-statistic: 96.77 on 12 and 806 DF,  p-value: < 2.2e-16
k.predicted_1 <-predict(model_1,data=data)
plot(k.predicted_1,data$k.core)

5- Calculate RMSE for the First Model

Description:

The Root Mean Square Error (RMSE) for the first model is calculated using the caret::RMSE() function, which measures the average prediction error. Lower RMSE values indicate better model performance.

rmse_1<- RMSE(k.predicted_1,data$k.core )
rmse_1
## [1] 1430.118

6- Apply Stepwise Selection for the second Model

Description:

Stepwise selection is applied to the first model (model_1) to simplify it by removing less significant predictors. The direction = “backward” argument removes predictors one by one, starting with the least statistically significant, until the model achieves optimal performance. The new model (model_2) is evaluated, and predictions are compared with actual values using another scatter plot.

model_2<-step(model_1 , direction = "backward")
## Start:  AIC=11926.91
## k.core ~ (depth + caliper + ind.deep + ind.med + gamma + phi.N + 
##     R.deep + R.med + SP + density.corr + density + phi.core + 
##     Facies + phi.core.fraq) - Facies
## 
## 
## Step:  AIC=11926.91
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N + 
##     R.deep + R.med + SP + density.corr + density + phi.core
## 
##                Df Sum of Sq        RSS   AIC
## - density.corr  1     19799 1675068713 11925
## - phi.N         1   3906205 1678955118 11927
## <none>                      1675048914 11927
## - SP            1  13394190 1688443104 11931
## - R.deep        1  28897686 1703946599 11939
## - caliper       1  29214826 1704263740 11939
## - depth         1  54372650 1729421563 11951
## - ind.deep      1  76022788 1751071701 11961
## - R.med         1  86603706 1761652619 11966
## - ind.med       1  98823752 1773872666 11972
## - density       1 106221406 1781270319 11975
## - phi.core      1 123125117 1798174031 11983
## - gamma         1 416312526 2091361440 12107
## 
## Step:  AIC=11924.92
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N + 
##     R.deep + R.med + SP + density + phi.core
## 
##            Df Sum of Sq        RSS   AIC
## <none>                  1675068713 11925
## - phi.N     1   4564880 1679633593 11925
## - SP        1  13491079 1688559792 11930
## - R.deep    1  28896144 1703964857 11937
## - caliper   1  29253869 1704322581 11937
## - depth     1  54825159 1729893872 11949
## - ind.deep  1  77573926 1752642639 11960
## - R.med     1  86772220 1761840933 11964
## - ind.med   1 100740701 1775809413 11971
## - density   1 114209586 1789278299 11977
## - phi.core  1 124694278 1799762991 11982
## - gamma     1 417015194 2092083907 12105
summary(model_2)
## 
## Call:
## lm(formula = k.core ~ depth + caliper + ind.deep + ind.med + 
##     gamma + phi.N + R.deep + R.med + SP + density + phi.core, 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5545.3  -753.4  -177.1   576.8 11260.2 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 60910.619  16525.937   3.686 0.000243 ***
## depth          -7.409      1.442  -5.139 3.46e-07 ***
## caliper     -3957.892   1054.270  -3.754 0.000186 ***
## ind.deep      -14.146      2.314  -6.113 1.52e-09 ***
## ind.med        17.263      2.478   6.967 6.74e-12 ***
## gamma         -77.461      5.465 -14.174  < 2e-16 ***
## phi.N       -1825.771   1231.150  -1.483 0.138470    
## R.deep        -25.972      6.961  -3.731 0.000204 ***
## R.med          63.466      9.816   6.466 1.75e-10 ***
## SP             -8.803      3.453  -2.549 0.010974 *  
## density      7980.761   1075.902   7.418 3.02e-13 ***
## phi.core      183.436     23.667   7.751 2.75e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1441 on 807 degrees of freedom
## Multiple R-squared:  0.5903, Adjusted R-squared:  0.5847 
## F-statistic: 105.7 on 11 and 807 DF,  p-value: < 2.2e-16
k.predicted_2 <-predict(model_2,data=data)
plot(k.predicted_2,data$k.core)

7- Calculate RMSE for the Stepwise Model

Description:

The RMSE for the stepwise model (model_2) is calculated. This step evaluates whether the simplified model performs as well as or better than the full model.

rmse_2<- RMSE(k.predicted_2,data$k.core )
rmse_2
## [1] 1430.126

8- Run the Third Model

Description:

This model includes all available variables in the dataset as predictors, including Facies. Predictions are generated using the model, and a scatter plot is created to compare measured permeability (data$k.core) against predicted values (k.predicted_3). The purpose of this step is to observe how including Facies impacts the model.

model_3<- lm(k.core~ .,data=data)
summary(model_3)
## 
## Call:
## lm(formula = k.core ~ ., data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5585.6  -568.9    49.2   476.5  8928.4 
## 
## Coefficients: (1 not defined because of singularities)
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -6.783e+04  1.760e+04  -3.853 0.000126 ***
## depth          8.544e+00  1.785e+00   4.786 2.02e-06 ***
## caliper        1.413e+03  1.019e+03   1.387 0.165789    
## ind.deep      -2.418e-01  2.354e+00  -0.103 0.918220    
## ind.med        1.224e+00  2.585e+00   0.473 0.636062    
## gamma         -4.583e+01  6.010e+00  -7.626 6.88e-14 ***
## phi.N         -2.010e+03  1.476e+03  -1.362 0.173540    
## R.deep        -2.344e+01  6.288e+00  -3.727 0.000207 ***
## R.med          5.643e+01  9.065e+00   6.225 7.76e-10 ***
## SP            -7.125e+00  3.145e+00  -2.266 0.023736 *  
## density.corr  -2.567e+03  4.809e+03  -0.534 0.593602    
## density        2.319e+03  1.173e+03   1.976 0.048458 *  
## phi.core       1.921e+02  2.282e+01   8.418  < 2e-16 ***
## FaciesF10      8.921e+02  3.590e+02   2.485 0.013157 *  
## FaciesF2       9.243e+02  5.818e+02   1.589 0.112514    
## FaciesF3       4.393e+02  3.344e+02   1.313 0.189394    
## FaciesF5       7.411e+02  3.428e+02   2.162 0.030908 *  
## FaciesF7      -4.152e+01  5.742e+02  -0.072 0.942377    
## FaciesF8      -1.179e+03  3.927e+02  -3.002 0.002770 ** 
## FaciesF9      -2.969e+03  4.298e+02  -6.908 1.00e-11 ***
## phi.core.fraq         NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1262 on 799 degrees of freedom
## Multiple R-squared:  0.6889, Adjusted R-squared:  0.6815 
## F-statistic: 93.12 on 19 and 799 DF,  p-value: < 2.2e-16
k.predicted_3 <-predict(model_3,data=data)
plot(k.predicted_3,data$k.core)

9- Calculate RMSE for the third model

rmse_3<- RMSE(k.predicted_3,data$k.core )
rmse_3
## [1] 1246.201

10- Apply Stepwise Selection to the Fourth Model

Description:

Stepwise selection is applied to the third model (model_3) to remove less significant predictors and improve interpretability. The new predictions (k.predicted_4) are plotted against actual values to evaluate the performance of the stepwise-selected model.

model_4<-step(model_3 , direction = "backward")
## Start:  AIC=11715.43
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N + 
##     R.deep + R.med + SP + density.corr + density + phi.core + 
##     Facies + phi.core.fraq
## 
## 
## Step:  AIC=11715.43
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N + 
##     R.deep + R.med + SP + density.corr + density + phi.core + 
##     Facies
## 
##                Df Sum of Sq        RSS   AIC
## - ind.deep      1     16793 1271937992 11713
## - ind.med       1    356746 1272277945 11714
## - density.corr  1    453661 1272374861 11714
## - phi.N         1   2953609 1274874809 11715
## - caliper       1   3063007 1274984206 11715
## <none>                      1271921199 11715
## - density       1   6217927 1278139127 11717
## - SP            1   8171834 1280093033 11719
## - R.deep        1  22117394 1294038593 11728
## - depth         1  36466976 1308388176 11737
## - R.med         1  61690461 1333611660 11752
## - gamma         1  92579723 1364500923 11771
## - phi.core      1 112793101 1384714301 11783
## - Facies        7 403127714 1675048914 11927
## 
## Step:  AIC=11713.44
## k.core ~ depth + caliper + ind.med + gamma + phi.N + R.deep + 
##     R.med + SP + density.corr + density + phi.core + Facies
## 
##                Df Sum of Sq        RSS   AIC
## - density.corr  1    437546 1272375538 11712
## - phi.N         1   2938766 1274876758 11713
## - caliper       1   3074396 1275012389 11713
## <none>                      1271937992 11713
## - density       1   6228928 1278166920 11715
## - ind.med       1   6905855 1278843848 11716
## - SP            1   8191802 1280129794 11717
## - R.deep        1  22125695 1294063687 11726
## - depth         1  39139470 1311077462 11736
## - R.med         1  61773953 1333711946 11750
## - gamma         1  92865220 1364803212 11769
## - phi.core      1 112960440 1384898432 11781
## - Facies        7 479133709 1751071701 11961
## 
## Step:  AIC=11711.72
## k.core ~ depth + caliper + ind.med + gamma + phi.N + R.deep + 
##     R.med + SP + density + phi.core + Facies
## 
##            Df Sum of Sq        RSS   AIC
## - caliper   1   2980713 1275356252 11712
## <none>                  1272375538 11712
## - phi.N     1   3279032 1275654571 11712
## - density   1   5792837 1278168375 11713
## - ind.med   1   6813959 1279189497 11714
## - SP        1   8391302 1280766840 11715
## - R.deep    1  22009402 1294384940 11724
## - depth     1  38705776 1311081314 11734
## - R.med     1  61436819 1333812357 11748
## - gamma     1  93974329 1366349868 11768
## - phi.core  1 115336515 1387712053 11781
## - Facies    7 480267100 1752642639 11960
## 
## Step:  AIC=11711.64
## k.core ~ depth + ind.med + gamma + phi.N + R.deep + R.med + SP + 
##     density + phi.core + Facies
## 
##            Df Sum of Sq        RSS   AIC
## - phi.N     1   2534906 1277891157 11711
## <none>                  1275356252 11712
## - density   1   7270311 1282626562 11714
## - SP        1   8733336 1284089587 11715
## - ind.med   1  12924050 1288280301 11718
## - R.deep    1  22449117 1297805369 11724
## - depth     1  51507476 1326863728 11742
## - R.med     1  60137982 1335494234 11747
## - phi.core  1 112564835 1387921086 11779
## - gamma     1 141535555 1416891807 11796
## - Facies    7 520094756 1795451008 11978
## 
## Step:  AIC=11711.26
## k.core ~ depth + ind.med + gamma + R.deep + R.med + SP + density + 
##     phi.core + Facies
## 
##            Df Sum of Sq        RSS   AIC
## <none>                  1277891157 11711
## - density   1   5155969 1283047127 11713
## - SP        1   8515796 1286406953 11715
## - ind.med   1  10944937 1288836095 11716
## - R.deep    1  23273312 1301164469 11724
## - depth     1  49725248 1327616405 11740
## - R.med     1  59454645 1337345802 11746
## - phi.core  1 110154394 1388045551 11777
## - gamma     1 219059092 1496950249 11839
## - Facies    7 526383446 1804274603 11980
summary(model_4)
## 
## Call:
## lm(formula = k.core ~ depth + ind.med + gamma + R.deep + R.med + 
##     SP + density + phi.core + Facies, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5608.3  -567.8    35.9   500.7  8989.7 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.322e+04  6.625e+03  -6.523 1.22e-10 ***
## depth        6.648e+00  1.189e+00   5.590 3.11e-08 ***
## ind.med      1.078e+00  4.111e-01   2.623 0.008894 ** 
## gamma       -5.324e+01  4.537e+00 -11.733  < 2e-16 ***
## R.deep      -2.395e+01  6.264e+00  -3.824 0.000141 ***
## R.med        5.515e+01  9.022e+00   6.112 1.53e-09 ***
## SP          -7.214e+00  3.118e+00  -2.313 0.020960 *  
## density      1.880e+03  1.044e+03   1.800 0.072240 .  
## phi.core     1.817e+02  2.184e+01   8.320 3.77e-16 ***
## FaciesF10    8.266e+02  3.533e+02   2.340 0.019553 *  
## FaciesF2     7.035e+02  5.567e+02   1.264 0.206697    
## FaciesF3     4.100e+02  3.228e+02   1.270 0.204443    
## FaciesF5     5.913e+02  3.211e+02   1.841 0.065924 .  
## FaciesF7    -3.159e+02  5.402e+02  -0.585 0.558866    
## FaciesF8    -1.455e+03  3.122e+02  -4.661 3.69e-06 ***
## FaciesF9    -3.017e+03  3.764e+02  -8.017 3.82e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1262 on 803 degrees of freedom
## Multiple R-squared:  0.6874, Adjusted R-squared:  0.6816 
## F-statistic: 117.7 on 15 and 803 DF,  p-value: < 2.2e-16
k.predicted_4 <-predict(model_4,data=data)
plot(k.predicted_4,data$k.core)

11- Calculate RMSE for the Fourth Model

rmse_4<- RMSE(k.predicted_4,data$k.core )
rmse_4
## [1] 1249.122

12- Logarithmic Transformation of Permeability

Description:

The permeability values are transformed to their base-10 logarithms (log10) to normalize skewed data and stabilize variance. A regression model (model_5) is created using all predictors, including Facies, to predict log_k.core. The predicted logarithmic permeability (k.predicted_5) is plotted against the actual transformed values.

data$log10_k.core<-log10(data$k.core)
model_5<- lm(log10_k.core~.-k.core,data=data)
summary(model_5)
## 
## Call:
## lm(formula = log10_k.core ~ . - k.core, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5804 -0.1138  0.0322  0.1529  0.7384 
## 
## Coefficients: (1 not defined because of singularities)
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -2.3461877  4.6532000  -0.504  0.61425    
## depth          0.0007425  0.0004718   1.574  0.11596    
## caliper       -0.4605945  0.2693103  -1.710  0.08760 .  
## ind.deep      -0.0007951  0.0006222  -1.278  0.20168    
## ind.med        0.0007137  0.0006833   1.044  0.29659    
## gamma         -0.0091269  0.0015885  -5.746 1.30e-08 ***
## phi.N         -1.7628155  0.3901024  -4.519 7.16e-06 ***
## R.deep        -0.0025878  0.0016620  -1.557  0.11987    
## R.med          0.0044073  0.0023960   1.839  0.06622 .  
## SP            -0.0016935  0.0008312  -2.037  0.04194 *  
## density.corr   1.4462633  1.2712045   1.138  0.25558    
## density        1.6148374  0.3100921   5.208 2.44e-07 ***
## phi.core       0.0948634  0.0060329  15.724  < 2e-16 ***
## FaciesF10      0.0786460  0.0948909   0.829  0.40746    
## FaciesF2      -0.0184334  0.1537793  -0.120  0.90462    
## FaciesF3      -0.0307548  0.0883957  -0.348  0.72799    
## FaciesF5       0.1094193  0.0906034   1.208  0.22753    
## FaciesF7       0.2811620  0.1517797   1.852  0.06433 .  
## FaciesF8      -0.0976234  0.1038054  -0.940  0.34727    
## FaciesF9      -0.3562116  0.1135966  -3.136  0.00178 ** 
## phi.core.fraq         NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3335 on 799 degrees of freedom
## Multiple R-squared:  0.6806, Adjusted R-squared:  0.673 
## F-statistic:  89.6 on 19 and 799 DF,  p-value: < 2.2e-16
log_k.predicted_5 <-predict(model_5,data=data)
k.predicted_5<-10^log_k.predicted_5
plot(k.predicted_5,data$k.core)

13- Calculate RMSEfor the fifth Model

rmse_5<- RMSE(k.predicted_5,data$k.core )
rmse_5
## [1] 1333.017

14- Apply Stepwise Selection to the Log-Transformed Model

Description:

Stepwise selection is applied to the logarithmic regression model (model_5) to optimize predictor selection. The simplified model (model_6) is used to make predictions, which are plotted against the actual transformed permeability values.

model_6<-step(model_5, direction = "backward")
## Start:  AIC=-1779.02
## log10_k.core ~ (depth + caliper + ind.deep + ind.med + gamma + 
##     phi.N + R.deep + R.med + SP + density.corr + density + phi.core + 
##     k.core + Facies + phi.core.fraq) - k.core
## 
## 
## Step:  AIC=-1779.02
## log10_k.core ~ depth + caliper + ind.deep + ind.med + gamma + 
##     phi.N + R.deep + R.med + SP + density.corr + density + phi.core + 
##     Facies
## 
##                Df Sum of Sq     RSS     AIC
## - ind.med       1    0.1213  88.981 -1779.9
## - density.corr  1    0.1440  89.004 -1779.7
## - ind.deep      1    0.1816  89.042 -1779.3
## <none>                       88.860 -1779.0
## - R.deep        1    0.2696  89.130 -1778.5
## - depth         1    0.2754  89.135 -1778.5
## - caliper       1    0.3253  89.185 -1778.0
## - R.med         1    0.3763  89.236 -1777.6
## - SP            1    0.4617  89.322 -1776.8
## - phi.N         1    2.2710  91.131 -1760.3
## - density       1    3.0160  91.876 -1753.7
## - gamma         1    3.6713  92.531 -1747.9
## - Facies        7    7.0758  95.936 -1730.3
## - phi.core      1   27.4982 116.358 -1560.2
## 
## Step:  AIC=-1779.9
## log10_k.core ~ depth + caliper + ind.deep + gamma + phi.N + R.deep + 
##     R.med + SP + density.corr + density + phi.core + Facies
## 
##                Df Sum of Sq     RSS     AIC
## - density.corr  1    0.1931  89.174 -1780.1
## <none>                       88.981 -1779.9
## - ind.deep      1    0.2179  89.199 -1779.9
## - R.deep        1    0.2447  89.226 -1779.7
## - caliper       1    0.2921  89.273 -1779.2
## - R.med         1    0.3397  89.321 -1778.8
## - SP            1    0.4101  89.391 -1778.1
## - depth         1    0.4622  89.444 -1777.7
## - phi.N         1    2.2035  91.185 -1761.9
## - density       1    3.0113  91.993 -1754.6
## - gamma         1    3.5761  92.557 -1749.6
## - Facies        7    9.1242  98.106 -1714.0
## - phi.core      1   27.4190 116.400 -1561.9
## 
## Step:  AIC=-1780.12
## log10_k.core ~ depth + caliper + ind.deep + gamma + phi.N + R.deep + 
##     R.med + SP + density + phi.core + Facies
## 
##            Df Sum of Sq     RSS     AIC
## - ind.deep  1    0.2180  89.392 -1780.1
## <none>                   89.174 -1780.1
## - R.deep    1    0.2526  89.427 -1779.8
## - caliper   1    0.2676  89.442 -1779.7
## - R.med     1    0.3598  89.534 -1778.8
## - SP        1    0.3832  89.558 -1778.6
## - depth     1    0.5404  89.715 -1777.2
## - phi.N     1    2.0726  91.247 -1763.3
## - gamma     1    3.4838  92.658 -1750.7
## - density   1    3.6220  92.796 -1749.5
## - Facies    7    9.3567  98.531 -1712.4
## - phi.core  1   27.2273 116.402 -1563.9
## 
## Step:  AIC=-1780.12
## log10_k.core ~ depth + caliper + gamma + phi.N + R.deep + R.med + 
##     SP + density + phi.core + Facies
## 
##            Df Sum of Sq     RSS     AIC
## <none>                   89.392 -1780.1
## - R.deep    1    0.2869  89.679 -1779.5
## - depth     1    0.3332  89.726 -1779.1
## - SP        1    0.4296  89.822 -1778.2
## - R.med     1    0.5085  89.901 -1777.5
## - caliper   1    0.5746  89.967 -1776.9
## - phi.N     1    2.3337  91.726 -1761.0
## - gamma     1    3.8214  93.214 -1747.8
## - density   1    3.8626  93.255 -1747.5
## - Facies    7    9.2100  98.602 -1713.8
## - phi.core  1   27.0935 116.486 -1565.3
summary(model_6)
## 
## Call:
## lm(formula = log10_k.core ~ depth + caliper + gamma + phi.N + 
##     R.deep + R.med + SP + density + phi.core + Facies, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.58182 -0.12001  0.03437  0.15230  0.70317 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.2671250  3.9316796  -0.322  0.74732    
## depth        0.0006562  0.0003796   1.729  0.08420 .  
## caliper     -0.5608681  0.2470292  -2.270  0.02344 *  
## gamma       -0.0091497  0.0015626  -5.855 6.94e-09 ***
## phi.N       -1.7463527  0.3816550  -4.576 5.50e-06 ***
## R.deep      -0.0026554  0.0016551  -1.604  0.10903    
## R.med        0.0049837  0.0023334   2.136  0.03300 *  
## SP          -0.0016140  0.0008221  -1.963  0.04996 *  
## density      1.7602255  0.2990153   5.887 5.79e-09 ***
## phi.core     0.0927539  0.0059493  15.591  < 2e-16 ***
## FaciesF10    0.0896953  0.0945929   0.948  0.34330    
## FaciesF2     0.0152576  0.1523676   0.100  0.92026    
## FaciesF3    -0.0292379  0.0869197  -0.336  0.73667    
## FaciesF5     0.1022238  0.0879087   1.163  0.24524    
## FaciesF7     0.2794793  0.1462763   1.911  0.05641 .  
## FaciesF8    -0.0932936  0.0927473  -1.006  0.31477    
## FaciesF9    -0.3877078  0.1030388  -3.763  0.00018 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3339 on 802 degrees of freedom
## Multiple R-squared:  0.6787, Adjusted R-squared:  0.6722 
## F-statistic: 105.9 on 16 and 802 DF,  p-value: < 2.2e-16
log_k.predicted_6 <-predict(model_6,data=data)
k.predicted_6<-10^log_k.predicted_6
plot(k.predicted_6,data$k.core)

15 - Calculate RMSE for the sixth Model

rmse_6<- RMSE(k.predicted_6,data$k.core )
rmse_6
## [1] 1330.932

16- Random Subsampling Cross-Validation on Log-Transformed Permeability

Description:

The dataset is randomly split into 75% for training and 25% for testing using sample_frac and anti_join. This ensures that the training and testing sets are mutually exclusive.

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
set.seed(12345)
training<-sample_frac(data, .80)
testing<-anti_join(data,training)
## Joining with `by = join_by(depth, caliper, ind.deep, ind.med, gamma, phi.N,
## R.deep, R.med, SP, density.corr, density, phi.core, k.core, Facies,
## phi.core.fraq, log10_k.core)`

17- Building and Evaluating the Seventh Model

Description:

A multiple linear regression model (model_7) is built on the training set, with log10_k.core as the dependent variable. The original k.core column is excluded from predictors to avoid redundancy. Predictions for log-transformed permeability (log_k.predicted_7) are made for the testing set and back-transformed to the original scale using 10^. A scatter plot compares predicted permeability (k.predicted_7) with actual permeability (testing$k.core) for visual performance evaluation.

model_7<- lm(log10_k.core~.-k.core,data=training)
summary(model_7)
## 
## Call:
## lm(formula = log10_k.core ~ . - k.core, data = training)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.57123 -0.11876  0.02389  0.14325  0.79914 
## 
## Coefficients: (1 not defined because of singularities)
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -2.4732574  5.2931806  -0.467  0.64048    
## depth          0.0007823  0.0005429   1.441  0.15013    
## caliper       -0.4661148  0.3064028  -1.521  0.12870    
## ind.deep      -0.0005275  0.0007083  -0.745  0.45671    
## ind.med        0.0004756  0.0007769   0.612  0.54066    
## gamma         -0.0084138  0.0018608  -4.522 7.32e-06 ***
## phi.N         -1.7476182  0.4515742  -3.870  0.00012 ***
## R.deep        -0.0026534  0.0020047  -1.324  0.18611    
## R.med          0.0046611  0.0028759   1.621  0.10557    
## SP            -0.0022459  0.0009607  -2.338  0.01972 *  
## density.corr   1.6498996  1.3916437   1.186  0.23623    
## density        1.5514223  0.3493920   4.440 1.06e-05 ***
## phi.core       0.0950263  0.0068555  13.861  < 2e-16 ***
## FaciesF10      0.0891010  0.1084215   0.822  0.41150    
## FaciesF2      -0.0045917  0.1652854  -0.028  0.97785    
## FaciesF3      -0.0705671  0.1022238  -0.690  0.49025    
## FaciesF5       0.1276889  0.1055910   1.209  0.22701    
## FaciesF7       0.3135309  0.1645276   1.906  0.05715 .  
## FaciesF8      -0.1040710  0.1209194  -0.861  0.38975    
## FaciesF9      -0.3722565  0.1340632  -2.777  0.00565 ** 
## phi.core.fraq         NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3399 on 635 degrees of freedom
## Multiple R-squared:  0.6689, Adjusted R-squared:  0.659 
## F-statistic: 67.53 on 19 and 635 DF,  p-value: < 2.2e-16
log_k.predicted_7 <-predict(model_7,newdata=testing)
k.predicted_7<-10^log_k.predicted_7
plot(k.predicted_7,testing$k.core)

18- Calculating RMSE

rmse_7<- RMSE(k.predicted_7,testing$k.core )
rmse_7
## [1] 1619.041

19- Stepwise Selection and Evaluation for the last model

Description:

Stepwise regression simplifies model_7 by backward elimination to retain only the most significant predictors. The reduced model (model_8) is used to predict permeability on the testing set. Predictions are back-transformed to the original scale and plotted against actual permeability values.

model_8<-step(model_7, direction = "backward")
## Start:  AIC=-1394.04
## log10_k.core ~ (depth + caliper + ind.deep + ind.med + gamma + 
##     phi.N + R.deep + R.med + SP + density.corr + density + phi.core + 
##     k.core + Facies + phi.core.fraq) - k.core
## 
## 
## Step:  AIC=-1394.04
## log10_k.core ~ depth + caliper + ind.deep + ind.med + gamma + 
##     phi.N + R.deep + R.med + SP + density.corr + density + phi.core + 
##     Facies
## 
##                Df Sum of Sq    RSS     AIC
## - ind.med       1    0.0433 73.395 -1395.7
## - ind.deep      1    0.0641 73.416 -1395.5
## - density.corr  1    0.1624 73.514 -1394.6
## - R.deep        1    0.2024 73.554 -1394.2
## <none>                      73.352 -1394.0
## - depth         1    0.2398 73.592 -1393.9
## - caliper       1    0.2673 73.619 -1393.7
## - R.med         1    0.3034 73.655 -1393.3
## - SP            1    0.6312 73.983 -1390.4
## - phi.N         1    1.7301 75.082 -1380.8
## - density       1    2.2776 75.629 -1376.0
## - gamma         1    2.3617 75.714 -1375.3
## - Facies        7    6.4874 79.839 -1352.5
## - phi.core      1   22.1943 95.546 -1222.9
## 
## Step:  AIC=-1395.65
## log10_k.core ~ depth + caliper + ind.deep + gamma + phi.N + R.deep + 
##     R.med + SP + density.corr + density + phi.core + Facies
## 
##                Df Sum of Sq    RSS     AIC
## - ind.deep      1    0.0759 73.471 -1397.0
## - R.deep        1    0.1888 73.584 -1396.0
## - density.corr  1    0.1974 73.593 -1395.9
## <none>                      73.395 -1395.7
## - caliper       1    0.2469 73.642 -1395.5
## - R.med         1    0.2845 73.680 -1395.1
## - depth         1    0.3477 73.743 -1394.5
## - SP            1    0.6012 73.996 -1392.3
## - phi.N         1    1.7008 75.096 -1382.6
## - density       1    2.2803 75.676 -1377.6
## - gamma         1    2.3195 75.715 -1377.3
## - Facies        7    7.7978 81.193 -1343.5
## - phi.core      1   22.1551 95.550 -1224.9
## 
## Step:  AIC=-1396.97
## log10_k.core ~ depth + caliper + gamma + phi.N + R.deep + R.med + 
##     SP + density.corr + density + phi.core + Facies
## 
##                Df Sum of Sq    RSS     AIC
## - density.corr  1    0.1982 73.669 -1397.2
## - R.deep        1    0.2114 73.682 -1397.1
## <none>                      73.471 -1397.0
## - depth         1    0.2718 73.743 -1396.5
## - R.med         1    0.3711 73.842 -1395.7
## - caliper       1    0.4138 73.885 -1395.3
## - SP            1    0.6465 74.118 -1393.2
## - phi.N         1    1.8773 75.348 -1382.5
## - density       1    2.4061 75.877 -1377.9
## - gamma         1    2.4768 75.948 -1377.2
## - Facies        7    7.7419 81.213 -1345.3
## - phi.core      1   22.2233 95.694 -1225.9
## 
## Step:  AIC=-1397.21
## log10_k.core ~ depth + caliper + gamma + phi.N + R.deep + R.med + 
##     SP + density + phi.core + Facies
## 
##            Df Sum of Sq    RSS     AIC
## - R.deep    1    0.2189 73.888 -1397.3
## <none>                  73.669 -1397.2
## - depth     1    0.3370 74.006 -1396.2
## - caliper   1    0.3897 74.059 -1395.8
## - R.med     1    0.3941 74.063 -1395.7
## - SP        1    0.6070 74.276 -1393.8
## - phi.N     1    1.7504 75.420 -1383.8
## - gamma     1    2.3915 76.061 -1378.3
## - density   1    2.8893 76.559 -1374.0
## - Facies    7    7.9373 81.607 -1344.2
## - phi.core  1   22.0361 95.705 -1227.8
## 
## Step:  AIC=-1397.26
## log10_k.core ~ depth + caliper + gamma + phi.N + R.med + SP + 
##     density + phi.core + Facies
## 
##            Df Sum of Sq    RSS     AIC
## <none>                  73.888 -1397.3
## - depth     1    0.2491 74.137 -1397.1
## - R.med     1    0.3477 74.236 -1396.2
## - caliper   1    0.3903 74.278 -1395.8
## - SP        1    0.5090 74.397 -1394.8
## - phi.N     1    1.8514 75.740 -1383.0
## - gamma     1    2.2934 76.182 -1379.2
## - density   1    2.7627 76.651 -1375.2
## - Facies    7    7.8761 81.764 -1344.9
## - phi.core  1   22.6856 96.574 -1223.9
summary(model_8)
## 
## Call:
## lm(formula = log10_k.core ~ depth + caliper + gamma + phi.N + 
##     R.med + SP + density + phi.core + Facies, data = training)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.58792 -0.11365  0.02482  0.14757  0.76086 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.4288931  4.5051297  -0.317   0.7512    
## depth        0.0006448  0.0004394   1.468   0.1427    
## caliper     -0.5199205  0.2830059  -1.837   0.0667 .  
## gamma       -0.0081181  0.0018229  -4.453 9.97e-06 ***
## phi.N       -1.7577281  0.4392713  -4.001 7.03e-05 ***
## R.med        0.0015024  0.0008664   1.734   0.0834 .  
## SP          -0.0019650  0.0009366  -2.098   0.0363 *  
## density      1.6440250  0.3363417   4.888 1.29e-06 ***
## phi.core     0.0940016  0.0067112  14.007  < 2e-16 ***
## FaciesF10    0.1191848  0.1069107   1.115   0.2654    
## FaciesF2     0.0434273  0.1630734   0.266   0.7901    
## FaciesF3    -0.0576872  0.0999261  -0.577   0.5639    
## FaciesF5     0.1577376  0.0999557   1.578   0.1150    
## FaciesF7     0.3390997  0.1567647   2.163   0.0309 *  
## FaciesF8    -0.0573559  0.1056215  -0.543   0.5873    
## FaciesF9    -0.3543834  0.1193547  -2.969   0.0031 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.34 on 639 degrees of freedom
## Multiple R-squared:  0.6665, Adjusted R-squared:  0.6587 
## F-statistic: 85.14 on 15 and 639 DF,  p-value: < 2.2e-16
log_k.predicted_8 <-predict(model_8,newdata=testing)
k.predicted_8<-10^log_k.predicted_8
plot(k.predicted_8,testing$k.core)

20- Calculating RMSE for the Reduced Model

Description:

RMSE is recalculated for the reduced model to compare its performance against the original model.

rmse_8<- RMSE(k.predicted_8,testing$k.core )
rmse_8
## [1] 1669.368