Importing the Data

Setting working dir and import the data

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
setwd("C:/Users/n/Desktop/ResCharHomework")
data<-read.csv(("karpur.csv"), header = TRUE)
head(data)
##    depth caliper ind.deep ind.med  gamma phi.N R.deep  R.med      SP
## 1 5667.0   8.685  618.005 569.781 98.823 0.410  1.618  1.755 -56.587
## 2 5667.5   8.686  497.547 419.494 90.640 0.307  2.010  2.384 -61.916
## 3 5668.0   8.686  384.935 300.155 78.087 0.203  2.598  3.332 -55.861
## 4 5668.5   8.686  278.324 205.224 66.232 0.119  3.593  4.873 -41.860
## 5 5669.0   8.686  183.743 131.155 59.807 0.069  5.442  7.625 -34.934
## 6 5669.5   8.686  109.512  75.633 57.109 0.048  9.131 13.222 -39.769
##   density.corr density phi.core   k.core Facies
## 1       -0.033   2.205  33.9000 2442.590     F1
## 2       -0.067   2.040  33.4131 3006.989     F1
## 3       -0.064   1.888  33.1000 3370.000     F1
## 4       -0.053   1.794  34.9000 2270.000     F1
## 5       -0.054   1.758  35.0644 2530.758     F1
## 6       -0.058   1.759  35.3152 2928.314     F1
rmse <- function(actual, predicted) {
  sqrt(mean((actual - predicted)^2))
}

Multiple Linear Regression Models

We can define a regression model as a model tha describe a relationships between variables by fitting a line to the observed data. This model allows you to estimate how a dependent variable chagnes as the independent variable(s) change.
In this section a multilinear regression models was developed in several scenarios to predict or estimate a permeability value using a karper dataset.

1. MLR using all CPI data without Facies

In this section, Facies was not included with the independent features.

model.1 <- lm(k.core ~ . -Facies  , data= data)

Compute Adjusted R-square and RMSE values

model.1_summary <- summary(model.1)
prediction_model.1<- predict(model.1)
rmse_model.1 <-  rmse(data$k.core,prediction_model.1)
cat("Adjusted R-Square value:", model.1_summary$adj.r.squared, "| RMSE:", rmse_model.1, "\n")
## Adjusted R-Square value: 0.5841845 | RMSE: 1430.118

Plotting measured value versus perdicted value

data_mod <- data.frame(Predicted = prediction_model.1,  # Create data for ggplot2
                       Observed = data$k.core)
ggplot(data_mod,                                     # Draw plot using ggplot2 package
       aes(x = Predicted,
           y = Observed)) +
  geom_point() +
  geom_abline(intercept = 0,
              slope = 1,
              color = "red",
              linewidth = 2)


The R-squared value of 0.5903 and Adjusted R-Squared value of 0.5842 is to low and not acceptable.There is some features in the data set that do not have a strong relationship with predicted value. So, by deleting this features from independent matrix adjusted R-squared value may rise. This done by using stepwise regression.
Stepwise regression is a procedure we can use to build a regression model from a set of predictor variables by entering and removing predictors in a stepwise manner into the model until there is no statistically valid reason to enter or remove any more.
Three different types of stepwise Regression:

  1. Forward Selection: Starts with no predictors and adds them one by one.
  2. Backward Elimination: Starts with all predictors and removes them one by one.
  3. Stepwise Selection: Combines forward selection and backward elimination, adding or removing predictors as needed. In this doc, Stepwise Selection is used.

Stepwise Regression

reduced_model.1 <- step(model.1, direction="both") #Both attr for using the third type of stepwise. 
## Start:  AIC=11926.91
## k.core ~ (depth + caliper + ind.deep + ind.med + gamma + phi.N + 
##     R.deep + R.med + SP + density.corr + density + phi.core + 
##     Facies) - Facies
## 
##                Df Sum of Sq        RSS   AIC
## - density.corr  1     19799 1675068713 11925
## - phi.N         1   3906205 1678955118 11927
## <none>                      1675048914 11927
## - SP            1  13394190 1688443104 11931
## - R.deep        1  28897686 1703946599 11939
## - caliper       1  29214826 1704263740 11939
## - depth         1  54372650 1729421563 11951
## - ind.deep      1  76022788 1751071701 11961
## - R.med         1  86603706 1761652619 11966
## - ind.med       1  98823752 1773872666 11972
## - density       1 106221406 1781270319 11975
## - phi.core      1 123125117 1798174031 11983
## - gamma         1 416312526 2091361440 12107
## 
## Step:  AIC=11924.92
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N + 
##     R.deep + R.med + SP + density + phi.core
## 
##                Df Sum of Sq        RSS   AIC
## <none>                      1675068713 11925
## - phi.N         1   4564880 1679633593 11925
## + density.corr  1     19799 1675048914 11927
## - SP            1  13491079 1688559792 11930
## - R.deep        1  28896144 1703964857 11937
## - caliper       1  29253869 1704322581 11937
## - depth         1  54825159 1729893872 11949
## - ind.deep      1  77573926 1752642639 11960
## - R.med         1  86772220 1761840933 11964
## - ind.med       1 100740701 1775809413 11971
## - density       1 114209586 1789278299 11977
## - phi.core      1 124694278 1799762991 11982
## - gamma         1 417015194 2092083907 12105


By applying stepwise regression density.corr was removed because its removal results in the best (lowest) AIC.

Compute Adjusted R-square and RMSE values

reduced_model.1_summary <- summary(reduced_model.1)
prediction_reduced_model.1 <- predict(model.1)
rmse_reduced_model.1 <- rmse(data$k.core,prediction_reduced_model.1)
cat("Adjusted R-Square value:", reduced_model.1_summary$adj.r.squared, "| RMSE:",rmse_reduced_model.1, "\n")
## Adjusted R-Square value: 0.5846949 | RMSE: 1430.118

Plotting measured value versus perdicted value

data_mod <- data.frame(Predicted = prediction_reduced_model.1,  # Create data for ggplot2
                       Observed = data$k.core)
ggplot(data_mod,                                     # Draw plot using ggplot2 package
       aes(x = Predicted,
           y = Observed)) +
  geom_point() +
  geom_abline(intercept = 0,
              slope = 1,
              color = "red",
              linewidth = 2)

2. MLR using all CPI data with Facies


Pereambility is a strong function to facies, if we plot permeability and porosity w.r.t facies we can notice a pattern shape like a couldy shape.
This clouds represent how a permeability effected by facies distribution. So, according to this, by adding Facies to the model, a rise in R-square value is expected.

# Create a scatterplot of phi core versus phi log with facies as a color grouping
ggplot(data, aes(x = phi.core, y = k.core, color = Facies)) +
  geom_point() +
  theme_minimal() +
  labs(title = "Core Pereambility versus Core Porosity", 
       x = "Log porosity", 
       y = "Core Permeabilty")

model.2 <- lm(k.core ~. , data= data)

Compute Adjusted R-square and RMSE values

model.2_summary <- summary(model.2)
prediction_model.2= predict(model.2)
rmse_model.2 <- rmse(data$k.core,prediction_model.2)
cat("Adjusted R-Square value:", model.2_summary$adj.r.squared, "| RMSE:", rmse_model.2, "\n")
## Adjusted R-Square value: 0.6814911 | RMSE: 1246.201


A noticeable change in R-squared value after adding Facies as a feature to the model.

Plotting measured value versus perdicted value

data_mod <- data.frame(Predicted = prediction_model.2,  # Create data for ggplot2
                       Observed = data$k.core)
ggplot(data_mod,                                     # Draw plot using ggplot2 package
       aes(x = Predicted,
           y = Observed)) +
  geom_point() +
  geom_abline(intercept = 0,
              slope = 1,
              color = "red",
              linewidth = 2)

Stepwise Regression

reduced_model.2 <- step(model.2, direction="both") 
## Start:  AIC=11715.43
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N + 
##     R.deep + R.med + SP + density.corr + density + phi.core + 
##     Facies
## 
##                Df Sum of Sq        RSS   AIC
## - ind.deep      1     16793 1271937992 11713
## - ind.med       1    356746 1272277945 11714
## - density.corr  1    453661 1272374861 11714
## - phi.N         1   2953609 1274874809 11715
## - caliper       1   3063007 1274984206 11715
## <none>                      1271921199 11715
## - density       1   6217927 1278139127 11717
## - SP            1   8171834 1280093033 11719
## - R.deep        1  22117394 1294038593 11728
## - depth         1  36466976 1308388176 11737
## - R.med         1  61690461 1333611660 11752
## - gamma         1  92579723 1364500923 11771
## - phi.core      1 112793101 1384714301 11783
## - Facies        7 403127714 1675048914 11927
## 
## Step:  AIC=11713.44
## k.core ~ depth + caliper + ind.med + gamma + phi.N + R.deep + 
##     R.med + SP + density.corr + density + phi.core + Facies
## 
##                Df Sum of Sq        RSS   AIC
## - density.corr  1    437546 1272375538 11712
## - phi.N         1   2938766 1274876758 11713
## - caliper       1   3074396 1275012389 11713
## <none>                      1271937992 11713
## + ind.deep      1     16793 1271921199 11715
## - density       1   6228928 1278166920 11715
## - ind.med       1   6905855 1278843848 11716
## - SP            1   8191802 1280129794 11717
## - R.deep        1  22125695 1294063687 11726
## - depth         1  39139470 1311077462 11736
## - R.med         1  61773953 1333711946 11750
## - gamma         1  92865220 1364803212 11769
## - phi.core      1 112960440 1384898432 11781
## - Facies        7 479133709 1751071701 11961
## 
## Step:  AIC=11711.72
## k.core ~ depth + caliper + ind.med + gamma + phi.N + R.deep + 
##     R.med + SP + density + phi.core + Facies
## 
##                Df Sum of Sq        RSS   AIC
## - caliper       1   2980713 1275356252 11712
## <none>                      1272375538 11712
## - phi.N         1   3279032 1275654571 11712
## + density.corr  1    437546 1271937992 11713
## - density       1   5792837 1278168375 11713
## + ind.deep      1       677 1272374861 11714
## - ind.med       1   6813959 1279189497 11714
## - SP            1   8391302 1280766840 11715
## - R.deep        1  22009402 1294384940 11724
## - depth         1  38705776 1311081314 11734
## - R.med         1  61436819 1333812357 11748
## - gamma         1  93974329 1366349868 11768
## - phi.core      1 115336515 1387712053 11781
## - Facies        7 480267100 1752642639 11960
## 
## Step:  AIC=11711.64
## k.core ~ depth + ind.med + gamma + phi.N + R.deep + R.med + SP + 
##     density + phi.core + Facies
## 
##                Df Sum of Sq        RSS   AIC
## - phi.N         1   2534906 1277891157 11711
## <none>                      1275356252 11712
## + caliper       1   2980713 1272375538 11712
## + density.corr  1    343863 1275012389 11713
## + ind.deep      1      5597 1275350654 11714
## - density       1   7270311 1282626562 11714
## - SP            1   8733336 1284089587 11715
## - ind.med       1  12924050 1288280301 11718
## - R.deep        1  22449117 1297805369 11724
## - depth         1  51507476 1326863728 11742
## - R.med         1  60137982 1335494234 11747
## - phi.core      1 112564835 1387921086 11779
## - gamma         1 141535555 1416891807 11796
## - Facies        7 520094756 1795451008 11978
## 
## Step:  AIC=11711.26
## k.core ~ depth + ind.med + gamma + R.deep + R.med + SP + density + 
##     phi.core + Facies
## 
##                Df Sum of Sq        RSS   AIC
## <none>                      1277891157 11711
## + phi.N         1   2534906 1275356252 11712
## + caliper       1   2236587 1275654571 11712
## - density       1   5155969 1283047127 11713
## + density.corr  1    624807 1277266351 11713
## + ind.deep      1      1762 1277889395 11713
## - SP            1   8515796 1286406953 11715
## - ind.med       1  10944937 1288836095 11716
## - R.deep        1  23273312 1301164469 11724
## - depth         1  49725248 1327616405 11740
## - R.med         1  59454645 1337345802 11746
## - phi.core      1 110154394 1388045551 11777
## - gamma         1 219059092 1496950249 11839
## - Facies        7 526383446 1804274603 11980


The following variables was removed:
1. ind.deep: Removed in the first step.
2. ind.med: Removed in the second step.
3. density.corr: Removed in the third step.
4. phi.N: Removed in the fourth step.
5. caliper: Removed in the fifth step


Slight change in Adjusted R-squared is noticeable on the model.
#### Compute Adjusted R-square and RMSE values

reduced_model.2_summary <- summary(reduced_model.2)
prediction_reduced_model.2= predict(reduced_model.2)
rmse_reduced_model.2 <-rmse(data$k.core,prediction_reduced_model.2)
cat("Adjusted R-Square value:", reduced_model.2_summary$adj.r.squared, "| RMSE:", rmse_reduced_model.2, "\n")
## Adjusted R-Square value: 0.6815901 | RMSE: 1249.122

Plotting measured value versus perdicted value

data_mod <- data.frame(Predicted = prediction_reduced_model.2,  # Create data for ggplot2
                       Observed = data$k.core)
ggplot(data_mod,                                     # Draw plot using ggplot2 package
       aes(x = Predicted,
           y = Observed)) +
  geom_point() +
  geom_abline(intercept = 0,
              slope = 1,
              color = "red",
              linewidth = 2)

3. MLR of Log10(Permeability) using all CPI data


Log 10 transformation is considered as type of normal distribution, we can define a normal distribution as follow:
The normal distribution is a very important class of probability distribution that can measure the degree of dispersion with the mean value of the dataset of values. These only depend on the probability assumption that a large number of observations occur mainly near the mean, with fewer as the distance from the mean increases. This distribution can be said to be the ‘standard’ in statistics primarily because it is most common in many natural and social sciences.
Three different ways used to normal distribution:
1. Log Transformation.
2. Box-Cox Transformation.
3. Normal Score Transformation.

when plotting permeability distribution curve

# Create a histogram
hist((data$k.core), 
     probability = FALSE,  # Plot density instead of frequency
     main = "Histogram with Frequency Distribution", 
     xlab = "Data", 
     ylab = "Frequency", 
     col = "lightblue", 
     border = "black")


Applying Log10 Transformation on the permeability, the curve converge to normal distribution:

hist(log10(data$k.core), 
     probability = FALSE, 
     main = "Frequency Distribution", 
     xlab = "Permeability", 
     ylab = "Frequency", 
     col = "lightblue", 
     border = "black")

log10data <-data
log10data$log10_Permeability <-log10(log10data$k.core)
log10data$k.core <- NULL
model.3<- lm(log10_Permeability~. ,data=log10data)

Compute Adjusted R-square and RMSE values

model.3_summary <- summary(model.3)
prediction_model.3=10^(predict(model.3))
rmse_model.3 <-rmse(10^(log10data$log10_Permeability),prediction_model.3)
cat("Adjusted R-Square value:", model.3_summary$adj.r.squared, "| RMSE:", rmse_model.3, "\n")
## Adjusted R-Square value: 0.6729783 | RMSE: 1333.017

Plotting measured value versus perdicted value

data_mod <- data.frame(Predicted = (prediction_model.3),  # Create data for ggplot2
                       Observed = 10^(log10data$log10_Permeability))
ggplot(data_mod,                                     # Draw plot using ggplot2 package
       aes(x = Predicted,
           y = Observed)) +
  geom_point() +
  geom_abline(intercept = 0,
              slope = 1,
              color = "red",
              linewidth = 2)

Stepwise Regression

reduced_model.3 <- step(model.3, direction="both") 
## Start:  AIC=-1779.02
## log10_Permeability ~ depth + caliper + ind.deep + ind.med + gamma + 
##     phi.N + R.deep + R.med + SP + density.corr + density + phi.core + 
##     Facies
## 
##                Df Sum of Sq     RSS     AIC
## - ind.med       1    0.1213  88.981 -1779.9
## - density.corr  1    0.1440  89.004 -1779.7
## - ind.deep      1    0.1816  89.042 -1779.3
## <none>                       88.860 -1779.0
## - R.deep        1    0.2696  89.130 -1778.5
## - depth         1    0.2754  89.135 -1778.5
## - caliper       1    0.3253  89.185 -1778.0
## - R.med         1    0.3763  89.236 -1777.6
## - SP            1    0.4617  89.322 -1776.8
## - phi.N         1    2.2710  91.131 -1760.3
## - density       1    3.0160  91.876 -1753.7
## - gamma         1    3.6713  92.531 -1747.9
## - Facies        7    7.0758  95.936 -1730.3
## - phi.core      1   27.4982 116.358 -1560.2
## 
## Step:  AIC=-1779.9
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N + 
##     R.deep + R.med + SP + density.corr + density + phi.core + 
##     Facies
## 
##                Df Sum of Sq     RSS     AIC
## - density.corr  1    0.1931  89.174 -1780.1
## <none>                       88.981 -1779.9
## - ind.deep      1    0.2179  89.199 -1779.9
## - R.deep        1    0.2447  89.226 -1779.7
## - caliper       1    0.2921  89.273 -1779.2
## + ind.med       1    0.1213  88.860 -1779.0
## - R.med         1    0.3397  89.321 -1778.8
## - SP            1    0.4101  89.391 -1778.1
## - depth         1    0.4622  89.444 -1777.7
## - phi.N         1    2.2035  91.185 -1761.9
## - density       1    3.0113  91.993 -1754.6
## - gamma         1    3.5761  92.557 -1749.6
## - Facies        7    9.1242  98.106 -1714.0
## - phi.core      1   27.4190 116.400 -1561.9
## 
## Step:  AIC=-1780.12
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N + 
##     R.deep + R.med + SP + density + phi.core + Facies
## 
##                Df Sum of Sq     RSS     AIC
## - ind.deep      1    0.2180  89.392 -1780.1
## <none>                       89.174 -1780.1
## + density.corr  1    0.1931  88.981 -1779.9
## - R.deep        1    0.2526  89.427 -1779.8
## + ind.med       1    0.1705  89.004 -1779.7
## - caliper       1    0.2676  89.442 -1779.7
## - R.med         1    0.3598  89.534 -1778.8
## - SP            1    0.3832  89.558 -1778.6
## - depth         1    0.5404  89.715 -1777.2
## - phi.N         1    2.0726  91.247 -1763.3
## - gamma         1    3.4838  92.658 -1750.7
## - density       1    3.6220  92.796 -1749.5
## - Facies        7    9.3567  98.531 -1712.4
## - phi.core      1   27.2273 116.402 -1563.9
## 
## Step:  AIC=-1780.12
## log10_Permeability ~ depth + caliper + gamma + phi.N + R.deep + 
##     R.med + SP + density + phi.core + Facies
## 
##                Df Sum of Sq     RSS     AIC
## <none>                       89.392 -1780.1
## + ind.deep      1    0.2180  89.174 -1780.1
## + density.corr  1    0.1932  89.199 -1779.9
## - R.deep        1    0.2869  89.679 -1779.5
## + ind.med       1    0.1478  89.245 -1779.5
## - depth         1    0.3332  89.726 -1779.1
## - SP            1    0.4296  89.822 -1778.2
## - R.med         1    0.5085  89.901 -1777.5
## - caliper       1    0.5746  89.967 -1776.9
## - phi.N         1    2.3337  91.726 -1761.0
## - gamma         1    3.8214  93.214 -1747.8
## - density       1    3.8626  93.255 -1747.5
## - Facies        7    9.2100  98.602 -1713.8
## - phi.core      1   27.0935 116.486 -1565.3

Compute Adjusted R-square and RMSE values

reduced_model.3_summary <- summary(reduced_model.3)
prediction_reduced_model.3= predict(reduced_model.3)
rmse_reduced_model.3 = rmse(data$k.core,prediction_reduced_model.3)
cat("Adjusted R-Square value:", reduced_model.3_summary$adj.r.squared, "| RMSE:", rmse_reduced_model.3, "\n")
## Adjusted R-Square value: 0.6722495 | RMSE: 3169.763

Plotting measured value versus perdicted value

data_mod <- data.frame(Predicted = 10^(prediction_reduced_model.3),  # Create data for ggplot2
                       Observed = 10^(log10data$log10_Permeability))
ggplot(data_mod,                                     # Draw plot using ggplot2 package
       aes(x = Predicted,
           y = Observed)) +
  geom_point() +
  geom_abline(intercept = 0,
              slope = 1,
              color = "red",
              linewidth = 2)

4. MLR using all CPI data with log10 transformation while applying Cross Validation


To train a model to predict the permeability value to be used is actual characterization and avoid overfitting, cross validation is used.
There are three different types of cross validation:
1. Leave one out.
2. Fold cross validation.
3. Random Subsampling: this type is used in this study.

Random subsampling involve resize the data into training and testing subsets to avoid overfitting, here in this study 75% is used as training set and 25% is used as testing set.

set.seed(123)

train_index <- sample(1:nrow(data), size = 0.75 * nrow(data))

training_set <- log10data[train_index, ]
testing_set <- log10data[-train_index, ]

cat("Training set size:", nrow(training_set), "| Testing set size:", nrow(testing_set), "\n")
## Training set size: 614 | Testing set size: 205
model.4<- lm(log10_Permeability~. ,data=training_set)


The prediction of this model should be using testing set. #### Compute Adjusted R-square and RMSE values

model.4_summary <- summary(model.4)
prediction_model.4<-10^(predict(model.4, newdata=testing_set))
rmse_model.4 <-rmse(10^(testing_set$log10_Permeability),prediction_model.4)
cat("Adjusted R-Square value:", model.4_summary$adj.r.squared, "| RMSE:", rmse_model.4, "\n")
## Adjusted R-Square value: 0.6848339 | RMSE: 1487.062

Plotting measured value versus perdicted value

data_mod <- data.frame(Predicted = (prediction_model.4),  # Create data for ggplot2
                       Observed = 10^(testing_set$log10_Permeability))
ggplot(data_mod,                                     # Draw plot using ggplot2 package
       aes(x = Predicted,
           y = Observed)) +
  geom_point() +
  geom_abline(intercept = 0,
              slope = 1,
              color = "red",
              linewidth = 2)

Stepwise Regression


reduced_model.4 <- step(model.4, direction="both") 
## Start:  AIC=-1379.62
## log10_Permeability ~ depth + caliper + ind.deep + ind.med + gamma + 
##     phi.N + R.deep + R.med + SP + density.corr + density + phi.core + 
##     Facies
## 
##                Df Sum of Sq    RSS     AIC
## - density.corr  1    0.0060 60.826 -1381.6
## - ind.med       1    0.1198 60.940 -1380.4
## <none>                      60.820 -1379.6
## - SP            1    0.1988 61.019 -1379.6
## - ind.deep      1    0.2014 61.021 -1379.6
## - R.deep        1    0.2115 61.031 -1379.5
## - depth         1    0.2471 61.067 -1379.1
## - R.med         1    0.2689 61.089 -1378.9
## - caliper       1    0.2991 61.119 -1378.6
## - phi.N         1    0.7967 61.617 -1373.6
## - density       1    1.4060 62.226 -1367.6
## - gamma         1    3.4375 64.257 -1347.9
## - Facies        7    4.8399 65.660 -1346.6
## - phi.core      1   17.4926 78.312 -1226.4
## 
## Step:  AIC=-1381.56
## log10_Permeability ~ depth + caliper + ind.deep + ind.med + gamma + 
##     phi.N + R.deep + R.med + SP + density + phi.core + Facies
## 
##                Df Sum of Sq    RSS     AIC
## - ind.med       1    0.1290 60.955 -1382.2
## - SP            1    0.1963 61.022 -1381.6
## <none>                      60.826 -1381.6
## - R.deep        1    0.2141 61.040 -1381.4
## - ind.deep      1    0.2142 61.040 -1381.4
## - depth         1    0.2520 61.078 -1381.0
## - R.med         1    0.2730 61.099 -1380.8
## - caliper       1    0.2950 61.121 -1380.6
## + density.corr  1    0.0060 60.820 -1379.6
## - phi.N         1    0.7939 61.620 -1375.6
## - density       1    1.5931 62.419 -1367.7
## - gamma         1    3.4363 64.262 -1349.8
## - Facies        7    4.8624 65.688 -1348.3
## - phi.core      1   17.4926 78.318 -1228.4
## 
## Step:  AIC=-1382.25
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N + 
##     R.deep + R.med + SP + density + phi.core + Facies
## 
##                Df Sum of Sq    RSS     AIC
## - SP            1    0.1477 61.103 -1382.8
## - R.deep        1    0.1821 61.137 -1382.4
## <none>                      60.955 -1382.2
## - R.med         1    0.2320 61.187 -1381.9
## - caliper       1    0.2620 61.217 -1381.6
## + ind.med       1    0.1290 60.826 -1381.6
## - ind.deep      1    0.3818 61.337 -1380.4
## + density.corr  1    0.0153 60.940 -1380.4
## - depth         1    0.4391 61.394 -1379.8
## - phi.N         1    0.7385 61.693 -1376.9
## - density       1    1.6335 62.588 -1368.0
## - gamma         1    3.3311 64.286 -1351.6
## - Facies        7    6.3071 67.262 -1335.8
## - phi.core      1   17.4590 78.414 -1229.6
## 
## Step:  AIC=-1382.77
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N + 
##     R.deep + R.med + density + phi.core + Facies
## 
##                Df Sum of Sq    RSS     AIC
## - R.deep        1    0.1305 61.233 -1383.5
## - R.med         1    0.1896 61.292 -1382.9
## <none>                      61.103 -1382.8
## - caliper       1    0.2473 61.350 -1382.3
## + SP            1    0.1477 60.955 -1382.2
## + ind.med       1    0.0805 61.022 -1381.6
## + density.corr  1    0.0096 61.093 -1380.9
## - ind.deep      1    0.4221 61.525 -1380.5
## - depth         1    0.4537 61.556 -1380.2
## - phi.N         1    0.7397 61.842 -1377.4
## - density       1    1.5831 62.686 -1369.1
## - gamma         1    3.2995 64.402 -1352.5
## - Facies        7    6.2443 67.347 -1337.0
## - phi.core      1   17.4779 78.580 -1230.3
## 
## Step:  AIC=-1383.46
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N + 
##     R.med + density + phi.core + Facies
## 
##                Df Sum of Sq    RSS     AIC
## - R.med         1    0.0732 61.306 -1384.7
## <none>                      61.233 -1383.5
## - caliper       1    0.2401 61.473 -1383.1
## + R.deep        1    0.1305 61.103 -1382.8
## + SP            1    0.0961 61.137 -1382.4
## + ind.med       1    0.0658 61.167 -1382.1
## + density.corr  1    0.0120 61.221 -1381.6
## - depth         1    0.4014 61.635 -1381.5
## - ind.deep      1    0.4498 61.683 -1381.0
## - phi.N         1    0.7811 62.014 -1377.7
## - density       1    1.5011 62.734 -1370.6
## - gamma         1    3.2068 64.440 -1354.1
## - Facies        7    6.1755 67.409 -1338.5
## - phi.core      1   17.9168 79.150 -1227.9
## 
## Step:  AIC=-1384.72
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N + 
##     density + phi.core + Facies
## 
##                Df Sum of Sq    RSS     AIC
## <none>                      61.306 -1384.7
## + SP            1    0.1160 61.190 -1383.9
## - caliper       1    0.3214 61.628 -1383.5
## + R.med         1    0.0732 61.233 -1383.5
## + ind.med       1    0.0539 61.252 -1383.3
## - depth         1    0.3480 61.654 -1383.2
## + R.deep        1    0.0141 61.292 -1382.9
## + density.corr  1    0.0133 61.293 -1382.9
## - phi.N         1    0.7107 62.017 -1379.7
## - ind.deep      1    0.7651 62.071 -1379.1
## - density       1    1.4952 62.802 -1371.9
## - gamma         1    3.9412 65.248 -1348.5
## - Facies        7    6.1780 67.484 -1339.8
## - phi.core      1   18.2530 79.559 -1226.7

Compute Adjusted R-square and RMSE values

reduced_model.4_summary <- summary(reduced_model.4)
prediction_reduced_model.4=10^(predict(reduced_model.4,newdata= testing_set))
rmse_reduced_model.4 <- rmse(10^(testing_set$log10_Permeability),prediction_reduced_model.4)
cat("Adjusted R-Square value:", reduced_model.4_summary$adj.r.squared, "| RMSE:", rmse_reduced_model.4, "\n")
## Adjusted R-Square value: 0.6849646 | RMSE: 1515.47

Plotting measured value versus perdicted value

data_mod <- data.frame(Predicted = (prediction_reduced_model.4),  # Create data for ggplot2
                       Observed = 10^(testing_set$log10_Permeability))
ggplot(data_mod,                                     # Draw plot using ggplot2 package
       aes(x = Predicted,
           y = Observed)) +
  geom_point() +
  geom_abline(intercept = 0,
              slope = 1,
              color = "red",
              linewidth = 2)

Conclusion


model_comparison <- data.frame(
  Model = paste("Model", 1:8),
  Adjusted_R2 = c(model.1_summary$adj.r.squared,
                  reduced_model.1_summary$adj.r.squared,
                  model.2_summary$adj.r.squared,
                  reduced_model.2_summary$adj.r.squared,
                  model.3_summary$adj.r.squared,
                  reduced_model.3_summary$adj.r.squared,
                  model.4_summary$adj.r.squared,
                  reduced_model.4_summary$adj.r.squared),  # Example Adjusted R-squared values
  RMSE = c(rmse_model.1,
           rmse_reduced_model.1,
           rmse_model.2,
           rmse_reduced_model.2,
           rmse_model.3,
           rmse_reduced_model.3,
           rmse_model.4,
           rmse_reduced_model.4)  # Example RMSE values
)
model_comparison
##     Model Adjusted_R2     RMSE
## 1 Model 1   0.5841845 1430.118
## 2 Model 2   0.5846949 1430.118
## 3 Model 3   0.6814911 1246.201
## 4 Model 4   0.6815901 1249.122
## 5 Model 5   0.6729783 1333.017
## 6 Model 6   0.6722495 3169.763
## 7 Model 7   0.6848339 1487.062
## 8 Model 8   0.6849646 1515.470


Using statistical techniques, the permeability model’s performance was improved. The initial model had an Adjusted R-squared value of 0.584. By adding new features, applying transformations, and refining the features using stepwise regression, the model became more accurate.

Adding facies as a categorical variable and applying a log10 transformation on permeability showed the importance of selecting the right features and normalizing the data. Cross-validation confirmed the model’s reliability on new data.

As a result, the Adjusted R-squared increased to 0.685, showing a clear improvement. This demonstrates how combining domain knowledge with statistical methods can create better models for permeability prediction.