Setting working dir and import the data
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
setwd("C:/Users/n/Desktop/ResCharHomework")
data<-read.csv(("karpur.csv"), header = TRUE)
head(data)
## depth caliper ind.deep ind.med gamma phi.N R.deep R.med SP
## 1 5667.0 8.685 618.005 569.781 98.823 0.410 1.618 1.755 -56.587
## 2 5667.5 8.686 497.547 419.494 90.640 0.307 2.010 2.384 -61.916
## 3 5668.0 8.686 384.935 300.155 78.087 0.203 2.598 3.332 -55.861
## 4 5668.5 8.686 278.324 205.224 66.232 0.119 3.593 4.873 -41.860
## 5 5669.0 8.686 183.743 131.155 59.807 0.069 5.442 7.625 -34.934
## 6 5669.5 8.686 109.512 75.633 57.109 0.048 9.131 13.222 -39.769
## density.corr density phi.core k.core Facies
## 1 -0.033 2.205 33.9000 2442.590 F1
## 2 -0.067 2.040 33.4131 3006.989 F1
## 3 -0.064 1.888 33.1000 3370.000 F1
## 4 -0.053 1.794 34.9000 2270.000 F1
## 5 -0.054 1.758 35.0644 2530.758 F1
## 6 -0.058 1.759 35.3152 2928.314 F1
rmse <- function(actual, predicted) {
sqrt(mean((actual - predicted)^2))
}
We can define a regression model as a model tha describe a
relationships between variables by fitting a line to the observed data.
This model allows you to estimate how a dependent variable chagnes as
the independent variable(s) change.
In this section a multilinear
regression models was developed in several scenarios to predict or
estimate a permeability value using a karper dataset.
In this section, Facies was not included with the independent features.
model.1 <- lm(k.core ~ . -Facies , data= data)
model.1_summary <- summary(model.1)
prediction_model.1<- predict(model.1)
rmse_model.1 <- rmse(data$k.core,prediction_model.1)
cat("Adjusted R-Square value:", model.1_summary$adj.r.squared, "| RMSE:", rmse_model.1, "\n")
## Adjusted R-Square value: 0.5841845 | RMSE: 1430.118
data_mod <- data.frame(Predicted = prediction_model.1, # Create data for ggplot2
Observed = data$k.core)
ggplot(data_mod, # Draw plot using ggplot2 package
aes(x = Predicted,
y = Observed)) +
geom_point() +
geom_abline(intercept = 0,
slope = 1,
color = "red",
linewidth = 2)
The R-squared value of 0.5903 and Adjusted R-Squared value of
0.5842 is to low and not acceptable.There is some features in the data
set that do not have a strong relationship with predicted value. So, by
deleting this features from independent matrix adjusted R-squared value
may rise. This done by using stepwise regression.
Stepwise
regression is a procedure we can use to build a regression model from a
set of predictor variables by entering and removing predictors in a
stepwise manner into the model until there is no statistically valid
reason to enter or remove any more.
Three different types of
stepwise Regression:
reduced_model.1 <- step(model.1, direction="both") #Both attr for using the third type of stepwise.
## Start: AIC=11926.91
## k.core ~ (depth + caliper + ind.deep + ind.med + gamma + phi.N +
## R.deep + R.med + SP + density.corr + density + phi.core +
## Facies) - Facies
##
## Df Sum of Sq RSS AIC
## - density.corr 1 19799 1675068713 11925
## - phi.N 1 3906205 1678955118 11927
## <none> 1675048914 11927
## - SP 1 13394190 1688443104 11931
## - R.deep 1 28897686 1703946599 11939
## - caliper 1 29214826 1704263740 11939
## - depth 1 54372650 1729421563 11951
## - ind.deep 1 76022788 1751071701 11961
## - R.med 1 86603706 1761652619 11966
## - ind.med 1 98823752 1773872666 11972
## - density 1 106221406 1781270319 11975
## - phi.core 1 123125117 1798174031 11983
## - gamma 1 416312526 2091361440 12107
##
## Step: AIC=11924.92
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N +
## R.deep + R.med + SP + density + phi.core
##
## Df Sum of Sq RSS AIC
## <none> 1675068713 11925
## - phi.N 1 4564880 1679633593 11925
## + density.corr 1 19799 1675048914 11927
## - SP 1 13491079 1688559792 11930
## - R.deep 1 28896144 1703964857 11937
## - caliper 1 29253869 1704322581 11937
## - depth 1 54825159 1729893872 11949
## - ind.deep 1 77573926 1752642639 11960
## - R.med 1 86772220 1761840933 11964
## - ind.med 1 100740701 1775809413 11971
## - density 1 114209586 1789278299 11977
## - phi.core 1 124694278 1799762991 11982
## - gamma 1 417015194 2092083907 12105
By applying stepwise regression density.corr was removed because
its removal results in the best (lowest) AIC.
reduced_model.1_summary <- summary(reduced_model.1)
prediction_reduced_model.1 <- predict(model.1)
rmse_reduced_model.1 <- rmse(data$k.core,prediction_reduced_model.1)
cat("Adjusted R-Square value:", reduced_model.1_summary$adj.r.squared, "| RMSE:",rmse_reduced_model.1, "\n")
## Adjusted R-Square value: 0.5846949 | RMSE: 1430.118
data_mod <- data.frame(Predicted = prediction_reduced_model.1, # Create data for ggplot2
Observed = data$k.core)
ggplot(data_mod, # Draw plot using ggplot2 package
aes(x = Predicted,
y = Observed)) +
geom_point() +
geom_abline(intercept = 0,
slope = 1,
color = "red",
linewidth = 2)
Pereambility is a strong function to facies, if we plot
permeability and porosity w.r.t facies we can notice a pattern shape
like a couldy shape.
This clouds represent how a permeability
effected by facies distribution. So, according to this, by adding Facies
to the model, a rise in R-square value is expected.
# Create a scatterplot of phi core versus phi log with facies as a color grouping
ggplot(data, aes(x = phi.core, y = k.core, color = Facies)) +
geom_point() +
theme_minimal() +
labs(title = "Core Pereambility versus Core Porosity",
x = "Log porosity",
y = "Core Permeabilty")
model.2 <- lm(k.core ~. , data= data)
model.2_summary <- summary(model.2)
prediction_model.2= predict(model.2)
rmse_model.2 <- rmse(data$k.core,prediction_model.2)
cat("Adjusted R-Square value:", model.2_summary$adj.r.squared, "| RMSE:", rmse_model.2, "\n")
## Adjusted R-Square value: 0.6814911 | RMSE: 1246.201
A noticeable change in R-squared value after adding Facies as a
feature to the model.
data_mod <- data.frame(Predicted = prediction_model.2, # Create data for ggplot2
Observed = data$k.core)
ggplot(data_mod, # Draw plot using ggplot2 package
aes(x = Predicted,
y = Observed)) +
geom_point() +
geom_abline(intercept = 0,
slope = 1,
color = "red",
linewidth = 2)
reduced_model.2 <- step(model.2, direction="both")
## Start: AIC=11715.43
## k.core ~ depth + caliper + ind.deep + ind.med + gamma + phi.N +
## R.deep + R.med + SP + density.corr + density + phi.core +
## Facies
##
## Df Sum of Sq RSS AIC
## - ind.deep 1 16793 1271937992 11713
## - ind.med 1 356746 1272277945 11714
## - density.corr 1 453661 1272374861 11714
## - phi.N 1 2953609 1274874809 11715
## - caliper 1 3063007 1274984206 11715
## <none> 1271921199 11715
## - density 1 6217927 1278139127 11717
## - SP 1 8171834 1280093033 11719
## - R.deep 1 22117394 1294038593 11728
## - depth 1 36466976 1308388176 11737
## - R.med 1 61690461 1333611660 11752
## - gamma 1 92579723 1364500923 11771
## - phi.core 1 112793101 1384714301 11783
## - Facies 7 403127714 1675048914 11927
##
## Step: AIC=11713.44
## k.core ~ depth + caliper + ind.med + gamma + phi.N + R.deep +
## R.med + SP + density.corr + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - density.corr 1 437546 1272375538 11712
## - phi.N 1 2938766 1274876758 11713
## - caliper 1 3074396 1275012389 11713
## <none> 1271937992 11713
## + ind.deep 1 16793 1271921199 11715
## - density 1 6228928 1278166920 11715
## - ind.med 1 6905855 1278843848 11716
## - SP 1 8191802 1280129794 11717
## - R.deep 1 22125695 1294063687 11726
## - depth 1 39139470 1311077462 11736
## - R.med 1 61773953 1333711946 11750
## - gamma 1 92865220 1364803212 11769
## - phi.core 1 112960440 1384898432 11781
## - Facies 7 479133709 1751071701 11961
##
## Step: AIC=11711.72
## k.core ~ depth + caliper + ind.med + gamma + phi.N + R.deep +
## R.med + SP + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - caliper 1 2980713 1275356252 11712
## <none> 1272375538 11712
## - phi.N 1 3279032 1275654571 11712
## + density.corr 1 437546 1271937992 11713
## - density 1 5792837 1278168375 11713
## + ind.deep 1 677 1272374861 11714
## - ind.med 1 6813959 1279189497 11714
## - SP 1 8391302 1280766840 11715
## - R.deep 1 22009402 1294384940 11724
## - depth 1 38705776 1311081314 11734
## - R.med 1 61436819 1333812357 11748
## - gamma 1 93974329 1366349868 11768
## - phi.core 1 115336515 1387712053 11781
## - Facies 7 480267100 1752642639 11960
##
## Step: AIC=11711.64
## k.core ~ depth + ind.med + gamma + phi.N + R.deep + R.med + SP +
## density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - phi.N 1 2534906 1277891157 11711
## <none> 1275356252 11712
## + caliper 1 2980713 1272375538 11712
## + density.corr 1 343863 1275012389 11713
## + ind.deep 1 5597 1275350654 11714
## - density 1 7270311 1282626562 11714
## - SP 1 8733336 1284089587 11715
## - ind.med 1 12924050 1288280301 11718
## - R.deep 1 22449117 1297805369 11724
## - depth 1 51507476 1326863728 11742
## - R.med 1 60137982 1335494234 11747
## - phi.core 1 112564835 1387921086 11779
## - gamma 1 141535555 1416891807 11796
## - Facies 7 520094756 1795451008 11978
##
## Step: AIC=11711.26
## k.core ~ depth + ind.med + gamma + R.deep + R.med + SP + density +
## phi.core + Facies
##
## Df Sum of Sq RSS AIC
## <none> 1277891157 11711
## + phi.N 1 2534906 1275356252 11712
## + caliper 1 2236587 1275654571 11712
## - density 1 5155969 1283047127 11713
## + density.corr 1 624807 1277266351 11713
## + ind.deep 1 1762 1277889395 11713
## - SP 1 8515796 1286406953 11715
## - ind.med 1 10944937 1288836095 11716
## - R.deep 1 23273312 1301164469 11724
## - depth 1 49725248 1327616405 11740
## - R.med 1 59454645 1337345802 11746
## - phi.core 1 110154394 1388045551 11777
## - gamma 1 219059092 1496950249 11839
## - Facies 7 526383446 1804274603 11980
The following variables was removed:
1. ind.deep: Removed in
the first step.
2. ind.med: Removed in the second step.
3.
density.corr: Removed in the third step.
4. phi.N: Removed in the
fourth step.
5. caliper: Removed in the fifth step
Slight change in Adjusted R-squared is noticeable on the model.
#### Compute Adjusted R-square and RMSE values
reduced_model.2_summary <- summary(reduced_model.2)
prediction_reduced_model.2= predict(reduced_model.2)
rmse_reduced_model.2 <-rmse(data$k.core,prediction_reduced_model.2)
cat("Adjusted R-Square value:", reduced_model.2_summary$adj.r.squared, "| RMSE:", rmse_reduced_model.2, "\n")
## Adjusted R-Square value: 0.6815901 | RMSE: 1249.122
data_mod <- data.frame(Predicted = prediction_reduced_model.2, # Create data for ggplot2
Observed = data$k.core)
ggplot(data_mod, # Draw plot using ggplot2 package
aes(x = Predicted,
y = Observed)) +
geom_point() +
geom_abline(intercept = 0,
slope = 1,
color = "red",
linewidth = 2)
Log 10 transformation is considered as type of normal
distribution, we can define a normal distribution as follow:
The
normal distribution is a very important class of probability
distribution that can measure the degree of dispersion with the mean
value of the dataset of values. These only depend on the probability
assumption that a large number of observations occur mainly near the
mean, with fewer as the distance from the mean increases. This
distribution can be said to be the ‘standard’ in statistics primarily
because it is most common in many natural and social sciences.
Three
different ways used to normal distribution:
1. Log
Transformation.
2. Box-Cox Transformation.
3. Normal Score
Transformation.
when plotting permeability distribution curve
# Create a histogram
hist((data$k.core),
probability = FALSE, # Plot density instead of frequency
main = "Histogram with Frequency Distribution",
xlab = "Data",
ylab = "Frequency",
col = "lightblue",
border = "black")
Applying Log10 Transformation on the permeability, the curve
converge to normal distribution:
hist(log10(data$k.core),
probability = FALSE,
main = "Frequency Distribution",
xlab = "Permeability",
ylab = "Frequency",
col = "lightblue",
border = "black")
log10data <-data
log10data$log10_Permeability <-log10(log10data$k.core)
log10data$k.core <- NULL
model.3<- lm(log10_Permeability~. ,data=log10data)
model.3_summary <- summary(model.3)
prediction_model.3=10^(predict(model.3))
rmse_model.3 <-rmse(10^(log10data$log10_Permeability),prediction_model.3)
cat("Adjusted R-Square value:", model.3_summary$adj.r.squared, "| RMSE:", rmse_model.3, "\n")
## Adjusted R-Square value: 0.6729783 | RMSE: 1333.017
data_mod <- data.frame(Predicted = (prediction_model.3), # Create data for ggplot2
Observed = 10^(log10data$log10_Permeability))
ggplot(data_mod, # Draw plot using ggplot2 package
aes(x = Predicted,
y = Observed)) +
geom_point() +
geom_abline(intercept = 0,
slope = 1,
color = "red",
linewidth = 2)
reduced_model.3 <- step(model.3, direction="both")
## Start: AIC=-1779.02
## log10_Permeability ~ depth + caliper + ind.deep + ind.med + gamma +
## phi.N + R.deep + R.med + SP + density.corr + density + phi.core +
## Facies
##
## Df Sum of Sq RSS AIC
## - ind.med 1 0.1213 88.981 -1779.9
## - density.corr 1 0.1440 89.004 -1779.7
## - ind.deep 1 0.1816 89.042 -1779.3
## <none> 88.860 -1779.0
## - R.deep 1 0.2696 89.130 -1778.5
## - depth 1 0.2754 89.135 -1778.5
## - caliper 1 0.3253 89.185 -1778.0
## - R.med 1 0.3763 89.236 -1777.6
## - SP 1 0.4617 89.322 -1776.8
## - phi.N 1 2.2710 91.131 -1760.3
## - density 1 3.0160 91.876 -1753.7
## - gamma 1 3.6713 92.531 -1747.9
## - Facies 7 7.0758 95.936 -1730.3
## - phi.core 1 27.4982 116.358 -1560.2
##
## Step: AIC=-1779.9
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N +
## R.deep + R.med + SP + density.corr + density + phi.core +
## Facies
##
## Df Sum of Sq RSS AIC
## - density.corr 1 0.1931 89.174 -1780.1
## <none> 88.981 -1779.9
## - ind.deep 1 0.2179 89.199 -1779.9
## - R.deep 1 0.2447 89.226 -1779.7
## - caliper 1 0.2921 89.273 -1779.2
## + ind.med 1 0.1213 88.860 -1779.0
## - R.med 1 0.3397 89.321 -1778.8
## - SP 1 0.4101 89.391 -1778.1
## - depth 1 0.4622 89.444 -1777.7
## - phi.N 1 2.2035 91.185 -1761.9
## - density 1 3.0113 91.993 -1754.6
## - gamma 1 3.5761 92.557 -1749.6
## - Facies 7 9.1242 98.106 -1714.0
## - phi.core 1 27.4190 116.400 -1561.9
##
## Step: AIC=-1780.12
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N +
## R.deep + R.med + SP + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - ind.deep 1 0.2180 89.392 -1780.1
## <none> 89.174 -1780.1
## + density.corr 1 0.1931 88.981 -1779.9
## - R.deep 1 0.2526 89.427 -1779.8
## + ind.med 1 0.1705 89.004 -1779.7
## - caliper 1 0.2676 89.442 -1779.7
## - R.med 1 0.3598 89.534 -1778.8
## - SP 1 0.3832 89.558 -1778.6
## - depth 1 0.5404 89.715 -1777.2
## - phi.N 1 2.0726 91.247 -1763.3
## - gamma 1 3.4838 92.658 -1750.7
## - density 1 3.6220 92.796 -1749.5
## - Facies 7 9.3567 98.531 -1712.4
## - phi.core 1 27.2273 116.402 -1563.9
##
## Step: AIC=-1780.12
## log10_Permeability ~ depth + caliper + gamma + phi.N + R.deep +
## R.med + SP + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## <none> 89.392 -1780.1
## + ind.deep 1 0.2180 89.174 -1780.1
## + density.corr 1 0.1932 89.199 -1779.9
## - R.deep 1 0.2869 89.679 -1779.5
## + ind.med 1 0.1478 89.245 -1779.5
## - depth 1 0.3332 89.726 -1779.1
## - SP 1 0.4296 89.822 -1778.2
## - R.med 1 0.5085 89.901 -1777.5
## - caliper 1 0.5746 89.967 -1776.9
## - phi.N 1 2.3337 91.726 -1761.0
## - gamma 1 3.8214 93.214 -1747.8
## - density 1 3.8626 93.255 -1747.5
## - Facies 7 9.2100 98.602 -1713.8
## - phi.core 1 27.0935 116.486 -1565.3
reduced_model.3_summary <- summary(reduced_model.3)
prediction_reduced_model.3= predict(reduced_model.3)
rmse_reduced_model.3 = rmse(data$k.core,prediction_reduced_model.3)
cat("Adjusted R-Square value:", reduced_model.3_summary$adj.r.squared, "| RMSE:", rmse_reduced_model.3, "\n")
## Adjusted R-Square value: 0.6722495 | RMSE: 3169.763
data_mod <- data.frame(Predicted = 10^(prediction_reduced_model.3), # Create data for ggplot2
Observed = 10^(log10data$log10_Permeability))
ggplot(data_mod, # Draw plot using ggplot2 package
aes(x = Predicted,
y = Observed)) +
geom_point() +
geom_abline(intercept = 0,
slope = 1,
color = "red",
linewidth = 2)
To train a model to predict the permeability value to be used is
actual characterization and avoid overfitting, cross validation is
used.
There are three different types of cross validation:
1.
Leave one out.
2. Fold cross validation.
3. Random Subsampling:
this type is used in this study.
Random subsampling involve resize the data into training and testing
subsets to avoid overfitting, here in this study 75% is used as training
set and 25% is used as testing set.
set.seed(123)
train_index <- sample(1:nrow(data), size = 0.75 * nrow(data))
training_set <- log10data[train_index, ]
testing_set <- log10data[-train_index, ]
cat("Training set size:", nrow(training_set), "| Testing set size:", nrow(testing_set), "\n")
## Training set size: 614 | Testing set size: 205
model.4<- lm(log10_Permeability~. ,data=training_set)
The prediction of this model should be using testing set. ####
Compute Adjusted R-square and RMSE values
model.4_summary <- summary(model.4)
prediction_model.4<-10^(predict(model.4, newdata=testing_set))
rmse_model.4 <-rmse(10^(testing_set$log10_Permeability),prediction_model.4)
cat("Adjusted R-Square value:", model.4_summary$adj.r.squared, "| RMSE:", rmse_model.4, "\n")
## Adjusted R-Square value: 0.6848339 | RMSE: 1487.062
data_mod <- data.frame(Predicted = (prediction_model.4), # Create data for ggplot2
Observed = 10^(testing_set$log10_Permeability))
ggplot(data_mod, # Draw plot using ggplot2 package
aes(x = Predicted,
y = Observed)) +
geom_point() +
geom_abline(intercept = 0,
slope = 1,
color = "red",
linewidth = 2)
reduced_model.4 <- step(model.4, direction="both")
## Start: AIC=-1379.62
## log10_Permeability ~ depth + caliper + ind.deep + ind.med + gamma +
## phi.N + R.deep + R.med + SP + density.corr + density + phi.core +
## Facies
##
## Df Sum of Sq RSS AIC
## - density.corr 1 0.0060 60.826 -1381.6
## - ind.med 1 0.1198 60.940 -1380.4
## <none> 60.820 -1379.6
## - SP 1 0.1988 61.019 -1379.6
## - ind.deep 1 0.2014 61.021 -1379.6
## - R.deep 1 0.2115 61.031 -1379.5
## - depth 1 0.2471 61.067 -1379.1
## - R.med 1 0.2689 61.089 -1378.9
## - caliper 1 0.2991 61.119 -1378.6
## - phi.N 1 0.7967 61.617 -1373.6
## - density 1 1.4060 62.226 -1367.6
## - gamma 1 3.4375 64.257 -1347.9
## - Facies 7 4.8399 65.660 -1346.6
## - phi.core 1 17.4926 78.312 -1226.4
##
## Step: AIC=-1381.56
## log10_Permeability ~ depth + caliper + ind.deep + ind.med + gamma +
## phi.N + R.deep + R.med + SP + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - ind.med 1 0.1290 60.955 -1382.2
## - SP 1 0.1963 61.022 -1381.6
## <none> 60.826 -1381.6
## - R.deep 1 0.2141 61.040 -1381.4
## - ind.deep 1 0.2142 61.040 -1381.4
## - depth 1 0.2520 61.078 -1381.0
## - R.med 1 0.2730 61.099 -1380.8
## - caliper 1 0.2950 61.121 -1380.6
## + density.corr 1 0.0060 60.820 -1379.6
## - phi.N 1 0.7939 61.620 -1375.6
## - density 1 1.5931 62.419 -1367.7
## - gamma 1 3.4363 64.262 -1349.8
## - Facies 7 4.8624 65.688 -1348.3
## - phi.core 1 17.4926 78.318 -1228.4
##
## Step: AIC=-1382.25
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N +
## R.deep + R.med + SP + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - SP 1 0.1477 61.103 -1382.8
## - R.deep 1 0.1821 61.137 -1382.4
## <none> 60.955 -1382.2
## - R.med 1 0.2320 61.187 -1381.9
## - caliper 1 0.2620 61.217 -1381.6
## + ind.med 1 0.1290 60.826 -1381.6
## - ind.deep 1 0.3818 61.337 -1380.4
## + density.corr 1 0.0153 60.940 -1380.4
## - depth 1 0.4391 61.394 -1379.8
## - phi.N 1 0.7385 61.693 -1376.9
## - density 1 1.6335 62.588 -1368.0
## - gamma 1 3.3311 64.286 -1351.6
## - Facies 7 6.3071 67.262 -1335.8
## - phi.core 1 17.4590 78.414 -1229.6
##
## Step: AIC=-1382.77
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N +
## R.deep + R.med + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - R.deep 1 0.1305 61.233 -1383.5
## - R.med 1 0.1896 61.292 -1382.9
## <none> 61.103 -1382.8
## - caliper 1 0.2473 61.350 -1382.3
## + SP 1 0.1477 60.955 -1382.2
## + ind.med 1 0.0805 61.022 -1381.6
## + density.corr 1 0.0096 61.093 -1380.9
## - ind.deep 1 0.4221 61.525 -1380.5
## - depth 1 0.4537 61.556 -1380.2
## - phi.N 1 0.7397 61.842 -1377.4
## - density 1 1.5831 62.686 -1369.1
## - gamma 1 3.2995 64.402 -1352.5
## - Facies 7 6.2443 67.347 -1337.0
## - phi.core 1 17.4779 78.580 -1230.3
##
## Step: AIC=-1383.46
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N +
## R.med + density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## - R.med 1 0.0732 61.306 -1384.7
## <none> 61.233 -1383.5
## - caliper 1 0.2401 61.473 -1383.1
## + R.deep 1 0.1305 61.103 -1382.8
## + SP 1 0.0961 61.137 -1382.4
## + ind.med 1 0.0658 61.167 -1382.1
## + density.corr 1 0.0120 61.221 -1381.6
## - depth 1 0.4014 61.635 -1381.5
## - ind.deep 1 0.4498 61.683 -1381.0
## - phi.N 1 0.7811 62.014 -1377.7
## - density 1 1.5011 62.734 -1370.6
## - gamma 1 3.2068 64.440 -1354.1
## - Facies 7 6.1755 67.409 -1338.5
## - phi.core 1 17.9168 79.150 -1227.9
##
## Step: AIC=-1384.72
## log10_Permeability ~ depth + caliper + ind.deep + gamma + phi.N +
## density + phi.core + Facies
##
## Df Sum of Sq RSS AIC
## <none> 61.306 -1384.7
## + SP 1 0.1160 61.190 -1383.9
## - caliper 1 0.3214 61.628 -1383.5
## + R.med 1 0.0732 61.233 -1383.5
## + ind.med 1 0.0539 61.252 -1383.3
## - depth 1 0.3480 61.654 -1383.2
## + R.deep 1 0.0141 61.292 -1382.9
## + density.corr 1 0.0133 61.293 -1382.9
## - phi.N 1 0.7107 62.017 -1379.7
## - ind.deep 1 0.7651 62.071 -1379.1
## - density 1 1.4952 62.802 -1371.9
## - gamma 1 3.9412 65.248 -1348.5
## - Facies 7 6.1780 67.484 -1339.8
## - phi.core 1 18.2530 79.559 -1226.7
reduced_model.4_summary <- summary(reduced_model.4)
prediction_reduced_model.4=10^(predict(reduced_model.4,newdata= testing_set))
rmse_reduced_model.4 <- rmse(10^(testing_set$log10_Permeability),prediction_reduced_model.4)
cat("Adjusted R-Square value:", reduced_model.4_summary$adj.r.squared, "| RMSE:", rmse_reduced_model.4, "\n")
## Adjusted R-Square value: 0.6849646 | RMSE: 1515.47
data_mod <- data.frame(Predicted = (prediction_reduced_model.4), # Create data for ggplot2
Observed = 10^(testing_set$log10_Permeability))
ggplot(data_mod, # Draw plot using ggplot2 package
aes(x = Predicted,
y = Observed)) +
geom_point() +
geom_abline(intercept = 0,
slope = 1,
color = "red",
linewidth = 2)
model_comparison <- data.frame(
Model = paste("Model", 1:8),
Adjusted_R2 = c(model.1_summary$adj.r.squared,
reduced_model.1_summary$adj.r.squared,
model.2_summary$adj.r.squared,
reduced_model.2_summary$adj.r.squared,
model.3_summary$adj.r.squared,
reduced_model.3_summary$adj.r.squared,
model.4_summary$adj.r.squared,
reduced_model.4_summary$adj.r.squared), # Example Adjusted R-squared values
RMSE = c(rmse_model.1,
rmse_reduced_model.1,
rmse_model.2,
rmse_reduced_model.2,
rmse_model.3,
rmse_reduced_model.3,
rmse_model.4,
rmse_reduced_model.4) # Example RMSE values
)
model_comparison
## Model Adjusted_R2 RMSE
## 1 Model 1 0.5841845 1430.118
## 2 Model 2 0.5846949 1430.118
## 3 Model 3 0.6814911 1246.201
## 4 Model 4 0.6815901 1249.122
## 5 Model 5 0.6729783 1333.017
## 6 Model 6 0.6722495 3169.763
## 7 Model 7 0.6848339 1487.062
## 8 Model 8 0.6849646 1515.470
Using statistical techniques, the permeability model’s
performance was improved. The initial model had an Adjusted R-squared
value of 0.584. By adding new features, applying transformations, and
refining the features using stepwise regression, the model became more
accurate.
Adding facies as a categorical variable and applying a log10 transformation on permeability showed the importance of selecting the right features and normalizing the data. Cross-validation confirmed the model’s reliability on new data.
As a result, the Adjusted R-squared increased to 0.685, showing a clear improvement. This demonstrates how combining domain knowledge with statistical methods can create better models for permeability prediction.