Landslide susceptibility refers to the spatial probability of landslides occurring over a specific area. While it can predict where a landslide may occur, it cannot determine the timing of the event, thus it remains a spatial probability. For landslide susceptibility mapping accurate locations of past landslides are necessary. This task is a classic classification problem where landslide locations serve as presence data, and non-landslide locations, randomly selected across the study area, serve as absence data. The selection of non-landslide locations requires extensive data exploration this link provides example of some of the techniques of absence data sampling. Predictors for the classification algorithm will be derived from a digital elevation model (DEM). Specifically, we will use eight DEM-based variables, referred to as causal factors in landslide susceptibility mapping. The accuracy of the landslide susceptibility map depends not only on the accurate identification of landslide locations but also on the quality of the causal factors. Since these causal factors are derived from the DEM, utilizing a high-quality DEM can significantly enhance the map’s accuracy. This study will compare the effectiveness of a 1-meter resolution DEM with that of the Shuttle Radar Topography Mission (SRTM) DEM, which has a 30-meter resolution.
Eight DEM based variables will be used
Elevation
Slope
library(sf)
library(tidyverse)
library(randomForest)
library(caret)
library(pROC)
library(e1071)
We have two shape files both of them are point files one show the locations of landslides and another one is the locations of randomly generated non-landslide locations. Our study area is Watauga County of North Carolina. We will upload these two files in R and convert them into two separate dataframes.
# Upload the Shape Files
library(sf)
Landslides_Points <- st_read("Landslides_Points.shp")
Non_Landslides_Points <- st_read("Non_Landslides_Points.shp")
# Convert to Data Frames
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
# Print the heads of the Data Frames
#print(head(landslides))
#print(head(Non_landslides))
#
library(kableExtra)
my_data <- as_tibble(Landslides_Points)
my_data
## # A tibble: 2,257 × 68
## OBJECTID Process_ID Project County Quad Col_Date Mvmnt_Date SM_State
## <int> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 59592 3256 Landslide Haza… Watau… Gran… <NA> 1940/08/1… <NA>
## 2 59593 3257 Landslide Haza… Watau… Gran… <NA> 1940/08/1… <NA>
## 3 59594 3258 Landslide Haza… Watau… Gran… <NA> 1940/08/1… <NA>
## 4 59595 3259 Landslide Haza… Watau… Gran… <NA> 1940/08/1… <NA>
## 5 59596 3260 Landslide Haza… Watau… Gran… <NA> 1940/08/1… <NA>
## 6 59598 3262 Landslide Haza… Watau… Gran… <NA> 1940/08/1… <NA>
## 7 59599 3263 Landslide Haza… Watau… Gran… <NA> 1940/08/1… <NA>
## 8 59600 3264 Landslide Haza… Watau… Gran… <NA> 1940/08/1… <NA>
## 9 59601 3265 Landslide Haza… Watau… Gran… <NA> 1940/08/1… <NA>
## 10 59602 3266 Landslide Haza… Watau… Gran… <NA> 1940/08/1… <NA>
## # ℹ 2,247 more rows
## # ℹ 60 more variables: SM_Rate_1 <chr>, SM_Mat_1 <chr>, SM_Type_1 <chr>,
## # SM_Rate_2 <chr>, SM_Mat_2 <chr>, SM_Type_2 <chr>, Slp_Config <chr>,
## # Fatalities <int>, Confidence <int>, MapMethod <chr>, GlobalID <chr>,
## # SHAPE_1 <chr>, Slope_DEM_ <dbl>, Slope <dbl>, Slope_1 <dbl>, DEM_30 <dbl>,
## # Slope_30 <dbl>, Elevation <dbl>, Aspect <dbl>, Plan <dbl>, Pr <dbl>,
## # TWI <dbl>, SPI <dbl>, Slope_12 <dbl>, LS <dbl>, TRI <dbl>, V <chr>, …
my_data1 <- as_tibble(Non_Landslides_Points)
my_data1
## # A tibble: 2,257 × 36
## OID_ CID Elevation Plan Pr Aspect Slope_12 SPI TRI TWI Soil
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0 3565. 3.11 -1.08 333. 9.13 0.154 0.517 1.83 6
## 2 0 0 3562. 14.5 1.33 91.2 38.3 6.72 0.578 2.18 6
## 3 0 0 3660. -4.13 2.22 91.4 20.4 1.85 0.524 2.60 6
## 4 0 0 3646. 3.82 -4.87 333. 8.97 0.152 0.434 1.85 6
## 5 0 0 4565. 4.01 -0.951 76.8 13.5 0.232 0.536 1.43 6
## 6 0 0 4137. -1.35 0.179 343. 17.6 34.0 0.496 5.84 6
## 7 0 0 4260. 0.194 -0.921 80.4 5.78 0.0967 0.351 2.29 6
## 8 0 0 4126. 1.62 4.13 154. 12.8 0.439 0.397 2.18 6
## 9 0 0 4447. 2.04 -6.32 177. 17.7 0.315 0.516 1.14 3
## 10 0 0 3995. 2.06 -1.03 30.9 11.0 0.188 0.535 1.64 4
## # ℹ 2,247 more rows
## # ℹ 25 more variables: Aspect_1 <dbl>, PL <dbl>, PR_1 <dbl>, Geology <dbl>,
## # NDVI <dbl>, Rainfall <dbl>, Slope_1 <dbl>, SPI_1 <dbl>, TRI_1 <dbl>,
## # TWI_1 <dbl>, Landuse <dbl>, LULC_chang <dbl>, Imprevious <dbl>, Rain <dbl>,
## # Distance_R <dbl>, Aspect_DEM <dbl>, TRI_12 <dbl>, DEM_Projec <dbl>,
## # Hillshade <dbl>, Plan_curva <dbl>, Profile_cu <dbl>, Slope <dbl>,
## # SPI_3 <dbl>, TWI__3 <dbl>, geometry <POINT [°]>
These two data frames have several unnecessary columns or variables. I used ArcGIS software to derive the causal factors using DEM. Then extract multi-values to points tool was used to extract cell values of causal factors for each landslide and non-landslide points. We need one target variable (V) and eight causal factors or variables (Elevation, Slope, Aspect, Plan Curvature, Profile Curvature, TWI, TRI, SPI). We will select these variables and since these variables have different names or labels we will change them accordingly.
library(tidyverse)
library(dplyr)
landslides <- landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, V, TWI, SPI, TRI)
Non_landslides <- Non_landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, CID, TWI, SPI, TRI)
# Rename columns for consistency
landslides <- rename(landslides, Slope = Slope_12, PL = Plan, PR = Pr)
Non_landslides <- rename(Non_landslides, Slope = Slope_12, PL = Plan, PR = Pr, V = CID)
##Lets show them using tables
library(DT)
dat1 <- landslides
dat2 <- Non_landslides
datatable(
data = landslides,
caption = "Landslides",
filter = "top",
class = 'cell-border stripe'
)
datatable(
data = Non_landslides,
caption = "Non_Landslides",
filter = "top",
class = 'cell-border stripe'
)
# library(kableExtra)
# dat1
# kable(dat1)%>%kable_styling(latex_options = 'sriped')
# dat <- landslides[1:10,]
# knitr::kable(dat, caption= "Landslides")
# dat1 <- Non_Landslides_Points[1:10,]
# knitr::kable(dat1, caption= "Non_Landslides")
# library(kableExtra)
# my_data <- as_tibble(Landslides_Points)
# my_data
# kable(dat1)%>%kable_styling(latex_options = 'sriped')
# dat <- landslides[1:10,]
# knitr::kable(dat, caption= "Landslides")
# dat1 <- Non_Landslides_Points[1:10,]
# knitr::kable(dat1, caption= "Non_Landslides")
Now we will combine two dataframes into a single dataframe. But before that we need to check the type of variables.
tab1 <- summary(landslides)
tab2 <- summary(Non_landslides)
kable(tab1, caption="Summary Landslides")
Slope | Elevation | Aspect | PL | PR | V | TWI | SPI | TRI | |
---|---|---|---|---|---|---|---|---|---|
Min. :-9999.00 | Min. :-9999 | Min. :-9999.00 | Min. :-9999.000 | Min. :-9999.000 | Length:2257 | Min. :-9999.000 | Min. : -9999.00 | Min. :-9999.000 | |
1st Qu.: 25.61 | 1st Qu.: 2830 | 1st Qu.: 90.24 | 1st Qu.: -4.176 | 1st Qu.: -3.606 | Class :character | 1st Qu.: 1.172 | 1st Qu.: 1.15 | 1st Qu.: 0.476 | |
Median : 30.57 | Median : 3134 | Median : 126.64 | Median : -0.528 | Median : 0.261 | Mode :character | Median : 2.541 | Median : 4.37 | Median : 0.500 | |
Mean : -18.56 | Mean : 3012 | Mean : 85.47 | Mean : -49.398 | Mean : -48.531 | NA | Mean : -45.738 | Mean : 186.77 | Mean : -12.789 | |
3rd Qu.: 35.32 | 3rd Qu.: 3437 | 3rd Qu.: 172.47 | 3rd Qu.: 2.988 | 3rd Qu.: 3.975 | NA | 3rd Qu.: 4.526 | 3rd Qu.: 34.64 | 3rd Qu.: 0.524 | |
Max. : 72.29 | Max. : 5162 | Max. : 357.53 | Max. : 44.670 | Max. : 81.796 | NA | Max. : 14.860 | Max. :232744.00 | Max. : 0.842 |
kable(tab2, caption="Summary Non_Landslides")
Slope | Elevation | Aspect | PL | PR | V | TWI | SPI | TRI | |
---|---|---|---|---|---|---|---|---|---|
Min. :-9999.000 | Min. :-9999 | Min. :-9999.00 | Min. :-9999.000 | Min. :-9999.000 | Min. :0 | Min. :-9999.000 | Min. : -9999.00 | Min. :-9999.000 | |
1st Qu.: 7.986 | 1st Qu.: 3056 | 1st Qu.: 91.27 | 1st Qu.: -1.134 | 1st Qu.: -1.390 | 1st Qu.:0 | 1st Qu.: 1.946 | 1st Qu.: 0.30 | 1st Qu.: 0.469 | |
Median : 14.085 | Median : 3335 | Median : 185.17 | Median : 0.012 | Median : 0.000 | Median :0 | Median : 3.073 | Median : 0.97 | Median : 0.502 | |
Mean : 5.951 | Mean : 3345 | Mean : 171.33 | Mean : -8.904 | Mean : -8.792 | Mean :0 | Mean : -18.668 | Mean : 221.66 | Mean : -65.956 | |
3rd Qu.: 20.366 | 3rd Qu.: 3665 | 3rd Qu.: 268.38 | 3rd Qu.: 1.302 | 3rd Qu.: 1.411 | 3rd Qu.:0 | 3rd Qu.: 4.592 | 3rd Qu.: 5.02 | 3rd Qu.: 0.532 | |
Max. : 59.009 | Max. : 5295 | Max. : 359.78 | Max. : 21.310 | Max. : 68.018 | Max. :0 | Max. : 19.732 | Max. :270285.00 | Max. : 0.884 |
Summary shows that the target variable V is in character format and we will convert it into factor format. All varibales have missing data (-9999), we will drop them after joining the two dataframes.
landslides$V <- as.character(landslides$V)
Non_landslides$V <- as.character(Non_landslides$V)
Data <- rbind(landslides, Non_landslides)
Data_m <- Data
Data_m <- Data_m %>% filter(across(-V, ~ . != -9999))
Now lets see the summary again.
##Lets show them using tables
library(DT)
dat3 <- summary(Data_m)
kable(dat3, caption="Summary of Data")
Slope | Elevation | Aspect | PL | PR | V | TWI | SPI | TRI | |
---|---|---|---|---|---|---|---|---|---|
Min. : 0.01862 | Min. :1413 | Min. : 0.00 | Min. :-53.1819 | Min. :-71.61270 | Length:4486 | Min. :-1.033 | Min. : -66.25 | Min. :0.0186 | |
1st Qu.:13.36940 | 1st Qu.:2933 | 1st Qu.: 91.27 | 1st Qu.: -2.3481 | 1st Qu.: -2.20290 | Class :character | 1st Qu.: 1.592 | 1st Qu.: 0.61 | 1st Qu.:0.4736 | |
Median :23.40960 | Median :3246 | Median :144.91 | Median : -0.1317 | Median : 0.08716 | Mode :character | Median : 2.854 | Median : 2.08 | Median :0.5013 | |
Mean :22.62100 | Mean :3217 | Mean :157.74 | Mean : -0.3573 | Mean : 0.13419 | NA | Mean : 3.239 | Mean : 234.45 | Mean :0.5012 | |
3rd Qu.:31.42370 | 3rd Qu.:3556 | 3rd Qu.:217.14 | 3rd Qu.: 1.8725 | 3rd Qu.: 2.36965 | NA | 3rd Qu.: 4.548 | 3rd Qu.: 13.41 | 3rd Qu.:0.5280 | |
Max. :72.29000 | Max. :5295 | Max. :359.78 | Max. : 44.6705 | Max. : 81.79600 | NA | Max. :19.732 | Max. :270285.00 | Max. :0.8844 |
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Load necessary libraries
library(caret)
library(dplyr)
# Set a seed for reproducibility
set.seed(123)
# Perform stratified sampling
trainIndex <- createDataPartition(Data_m$V, p = 0.8, list = FALSE)
# Split the data into training and test sets
trainData <- Data_m[trainIndex, ]
testData <- Data_m[-trainIndex, ]
# Check the distribution of target variable in training and test sets
dat4 <- table(trainData$V)
dat5 <- table(testData$V)
kable(dat4)
Var1 | Freq |
---|---|
0 | 1792 |
1 | 1797 |
kable(dat5)
Var1 | Freq |
---|---|
0 | 448 |
1 | 449 |
So in our training dataset there are 1792, absence and 1797 presence data where as in test dataset there are 448 absence and 449 presence data. Now we will train our model. ## Logistic Regression Model
# Load necessary libraries
library(pROC)
library(ggplot2)
library(gt)
# Train a logistic regression model
logistic_model_1m <- glm(V ~ ., data = trainData, family = binomial)
# Summarize the model
summary(logistic_model_1m)
##
## Call:
## glm(formula = V ~ ., family = binomial, data = trainData)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.03158 0.04773 -0.662 0.5082
## Slope 2.21441 0.07486 29.580 < 2e-16 ***
## Elevation -0.31600 0.05157 -6.128 8.89e-10 ***
## Aspect -0.38257 0.04935 -7.753 9.00e-15 ***
## PL -0.13833 0.07261 -1.905 0.0568 .
## PR -0.17099 0.07159 -2.388 0.0169 *
## TWI 0.28014 0.05302 5.284 1.27e-07 ***
## SPI 0.03856 0.06065 0.636 0.5249
## TRI 0.15603 0.06062 2.574 0.0101 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4975.4 on 3588 degrees of freedom
## Residual deviance: 2831.4 on 3580 degrees of freedom
## AIC: 2849.4
##
## Number of Fisher Scoring iterations: 5
exp(coef(logistic_model_1m))
## (Intercept) Slope Elevation Aspect PL PR
## 0.9689135 9.1559973 0.7290574 0.6821078 0.8708116 0.8428334
## TWI SPI TRI
## 1.3233170 1.0393155 1.1688608
#kable(l$coefficients, caption="g")
Our goal is not to check the result of the logistic regression rather we are concerned about the prediction. Lets see the prediction accuracy.
# Make predictions on the training set
library(dplyr)
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(caret)
pred_prob_train <- predict(logistic_model_1m , newdata = trainData, type = "response")
pred_class_train <- ifelse(pred_prob_train > 0.5, 1, 0)
# Evaluate the model on training set
confusion_matrix_train <- table(trainData$V, pred_class_train)
##knitr::kable(confusion_matrix_train)
accuracy_train <- sum(diag(confusion_matrix_train)) / sum(confusion_matrix_train)
print(confusion_matrix_train)
## pred_class_train
## 0 1
## 0 1487 305
## 1 251 1546
print(paste("Training Accuracy:", accuracy_train))
## [1] "Training Accuracy: 0.84508219559766"
#kable(t)
# Make predictions on the test set
pred_prob_test <- predict(logistic_model_1m, newdata = testData, type = "response")
pred_class_test <- ifelse(pred_prob_test > 0.5, 1, 0)
# Evaluate the model on test set
confusion_matrix_test <- table(testData$V, pred_class_test)
accuracy_test <- sum(diag(confusion_matrix_test)) / sum(confusion_matrix_test)
print(confusion_matrix_test)
## pred_class_test
## 0 1
## 0 382 66
## 1 71 378
print(paste("Test Accuracy:", accuracy_test))
## [1] "Test Accuracy: 0.84726867335563"
accuracy_test_1m <- as.data.frame(accuracy_test)
# Calculate ROC curve and AUC
roc_curve <- roc(testData$V, pred_prob_test)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
auc_value <- auc(roc_curve)
auc_1m <- auc_value
print(paste("AUC:", auc_value))
## [1] "AUC: 0.898628897550111"
# Plot ROC curve using ggplot2
roc_data <- data.frame(
specificity = rev(roc_curve$specificities),
sensitivity = rev(roc_curve$sensitivities)
)
ggplot(roc_data, aes(x = specificity, y = sensitivity)) +
geom_line(color = "blue") +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
labs(title = "ROC Curve for Logistic Regression_1m", x = "Specificity", y = "Sensitivity") +
theme_minimal()
write.csv(Data_m, "Data_1m.csv", row.names = FALSE)
Logistic regression model gives satisfactory accuracy for 1m DEM based causal factors. Now we will work with SRTM (30 m) DEM based causal factors and check the accuracy of prediction. I will help us to understand the role of the resolution and accuracy of DEM on landslide prediction over an area. We will repeat the whole process again only the data-set will be different.
# Upload the Shape Files
library(sf)
Landslides_Points <- st_read("Landslides_Points.shp")
## Reading layer `Landslides_Points' from data source
## `C:\R\Shape\Shape Files\Data\Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 67 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 349401.4 ymin: 265880.2 xmax: 387724.9 ymax: 293335.3
## Projected CRS: NAD83 / North Carolina
Non_Landslides_Points <- st_read("Non_Landslides_Points.shp")
## Reading layer `Non_Landslides_Points' from data source
## `C:\R\Shape\Shape Files\Data\Non_Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 35 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -81.9147 ymin: 36.11658 xmax: -81.46408 ymax: 36.38499
## Geodetic CRS: NAD83
library(tidyverse)
library(dplyr)
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
landslides_30 <- landslides %>% select(Slope_1_14, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, V, TWI__3, SPI_3, TRI_12)
Non_landslides_30 <- Non_landslides %>% select(Slope, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, CID, TWI__3, SPI_3, TRI_12)
# Rename columns for consistency
landslides_30 <- rename(landslides_30, Slope = Slope_1_14, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12)
Non_landslides_30 <- rename(Non_landslides_30, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12, V = CID)
##Lets show them using tables
library(DT)
dat1 <- landslides_30
dat2 <- Non_landslides_30
datatable(
data = landslides_30,
caption = "Landslides",
filter = "top",
class = 'cell-border stripe'
)
datatable(
data = Non_landslides_30,
caption = "Non_Landslides",
filter = "top",
class = 'cell-border stripe'
)
tab1 <- summary(landslides_30)
tab2 <- summary(Non_landslides_30)
kable(tab1, caption="Summary Landslides_30m")
Slope | Elevation | Aspect | PL | PR | V | TWI | SPI | TRI | |
---|---|---|---|---|---|---|---|---|---|
Min. : 2.07 | Min. : 449.0 | Min. :-9999.00 | Min. :-3.11276 | Min. :-3.21878 | Length:2257 | Min. :-3.4113 | Min. : 0.0 | Min. :0.2922 | |
1st Qu.:15.21 | 1st Qu.: 876.0 | 1st Qu.: 88.23 | 1st Qu.:-0.31934 | 1st Qu.:-0.46307 | Class :character | 1st Qu.:-2.0951 | 1st Qu.: 0.0 | 1st Qu.:0.4547 | |
Median :20.25 | Median : 971.0 | Median : 130.91 | Median : 0.03593 | Median :-0.07091 | Mode :character | Median :-1.5829 | Median : 15.8 | Median :0.5048 | |
Mean :20.08 | Mean : 950.7 | Mean : 126.98 | Mean : 0.03297 | Mean :-0.08110 | NA | Mean :-1.4046 | Mean : 1148.7 | Mean :0.5088 | |
3rd Qu.:24.74 | 3rd Qu.:1063.0 | 3rd Qu.: 176.63 | 3rd Qu.: 0.37393 | 3rd Qu.: 0.29913 | NA | 3rd Qu.:-0.9447 | 3rd Qu.: 44.1 | 3rd Qu.:0.5644 | |
Max. :44.21 | Max. :1586.0 | Max. : 359.49 | Max. : 3.60871 | Max. : 2.86714 | NA | Max. :10.0097 | Max. :2402323.8 | Max. :0.7527 |
kable(tab2, caption="Summary Non_Landslides")
Slope | Elevation | Aspect | PL | PR | V | TWI | SPI | TRI | |
---|---|---|---|---|---|---|---|---|---|
Min. : 0.3552 | Min. : 434 | Min. :-9999.0 | Min. :-2.04006 | Min. :-2.39026 | Min. :0 | Min. :-2.9974 | Min. : 0.0 | Min. :0.1510 | |
1st Qu.: 6.7917 | 1st Qu.: 936 | 1st Qu.: 85.6 | 1st Qu.:-0.22708 | 1st Qu.:-0.21649 | 1st Qu.:0 | 1st Qu.:-1.5306 | 1st Qu.: 0.0 | 1st Qu.:0.4013 | |
Median :11.1627 | Median :1022 | Median : 175.2 | Median : 0.00000 | Median : 0.02460 | Median :0 | Median :-0.8667 | Median : 12.1 | Median :0.4701 | |
Mean :11.7711 | Mean :1030 | Mean : 160.2 | Mean : 0.01086 | Mean : 0.05174 | Mean :0 | Mean :-0.1922 | Mean : 2444.6 | Mean :0.4697 | |
3rd Qu.:15.9215 | 3rd Qu.:1126 | 3rd Qu.: 257.0 | 3rd Qu.: 0.24601 | 3rd Qu.: 0.32753 | 3rd Qu.:0 | 3rd Qu.: 0.4310 | 3rd Qu.: 48.6 | 3rd Qu.:0.5400 | |
Max. :37.2150 | Max. :1618 | Max. : 359.5 | Max. : 1.72111 | Max. : 2.73880 | Max. :0 | Max. :12.0588 | Max. :2588779.8 | Max. :0.7704 |
landslides_30$V <- as.character(landslides_30$V)
Non_landslides_30$V <- as.character(Non_landslides_30$V)
Data_30 <- rbind(landslides_30, Non_landslides_30)
Data_m_30 <- Data_30
Data_m_30 <- Data_m_30 %>% filter(across(-V, ~ . != -9999))
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
We will store the clean 1m DEM and 30m DEM data as csv files so that we can use it later and we do not have to clean it again.
write.csv(Data_m, "Data_1m.csv", row.names = FALSE)
write.csv(Data_m_30, "Data_30m.csv", row.names = FALSE)
# Scale variables to z-scores
scale_to_z <- function(x) {
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}
Data_m_30 <- Data_m_30 %>% mutate(across(.cols = -V, .fns = scale_to_z))
Data_m_30$V <- as.factor(Data_m_30$V)
# Load necessary libraries
library(caret)
library(dplyr)
# Set a seed for reproducibility
set.seed(123)
# Perform stratified sampling
trainIndex <- createDataPartition(Data_m_30$V, p = 0.8, list = FALSE)
# Split the data into training and test sets
trainData <- Data_m_30[trainIndex, ]
testData <- Data_m_30[-trainIndex, ]
# Check the distribution of target variable in training and test sets
dat4 <- table(trainData$V)
dat5 <- table(testData$V)
kable(dat4)
Var1 | Freq |
---|---|
0 | 1804 |
1 | 1804 |
kable(dat5)
Var1 | Freq |
---|---|
0 | 450 |
1 | 451 |
# Load necessary libraries
library(pROC)
library(ggplot2)
library(gt)
# Train a logistic regression model
logistic_model_30m <- glm(V ~ ., data = trainData, family = binomial)
# Summarize the model
summary(logistic_model_30m)
##
## Call:
## glm(formula = V ~ ., family = binomial, data = trainData)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.04696 0.04473 -1.050 0.294
## Slope 1.35156 0.06116 22.099 < 2e-16 ***
## Elevation -0.57739 0.04683 -12.329 < 2e-16 ***
## Aspect -0.37151 0.04437 -8.373 < 2e-16 ***
## PL -0.25238 0.05986 -4.216 2.48e-05 ***
## PR -0.27709 0.05670 -4.887 1.02e-06 ***
## TWI -0.44548 0.08233 -5.411 6.26e-08 ***
## SPI 0.12954 0.10531 1.230 0.219
## TRI 0.56899 0.04986 11.411 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5001.8 on 3607 degrees of freedom
## Residual deviance: 3304.0 on 3599 degrees of freedom
## AIC: 3322
##
## Number of Fisher Scoring iterations: 5
exp(coef(logistic_model_30m))
## (Intercept) Slope Elevation Aspect PL PR
## 0.9541291 3.8634647 0.5613606 0.6896913 0.7769459 0.7579835
## TWI SPI TRI
## 0.6405158 1.1383086 1.7664896
#summary(logistic_model_1m)
Lets see the logistic regression model results of two models side by side.
a <- summary(logistic_model_30m)
b <- summary(logistic_model_1m)
re1 <- as.data.frame(a$coefficients)
re2 <- as.data.frame(b$coefficients)
re3 <- cbind(re1,re2)
colnames(re3) <- c("Estimate_30m", "Std.Error_30m", "Z Value_30m", "Pr(>|Z|_30m)",
"Estimate_1m", "Std.Error_1m", "Z Value_1m", "Pr(>|Z|_1m)")
kable(re3, caption = "Comarison")
Estimate_30m | Std.Error_30m | Z Value_30m | Pr(>|Z|_30m) | Estimate_1m | Std.Error_1m | Z Value_1m | Pr(>|Z|_1m) | |
---|---|---|---|---|---|---|---|---|
(Intercept) | -0.0469563 | 0.0447307 | -1.049754 | 0.2938313 | -0.0315799 | 0.0477302 | -0.6616344 | 0.5082055 |
Slope | 1.3515644 | 0.0611590 | 22.099196 | 0.0000000 | 2.2144091 | 0.0748610 | 29.5802686 | 0.0000000 |
Elevation | -0.5773918 | 0.0468332 | -12.328697 | 0.0000000 | -0.3160028 | 0.0515664 | -6.1280777 | 0.0000000 |
Aspect | -0.3715111 | 0.0443703 | -8.372967 | 0.0000000 | -0.3825676 | 0.0493470 | -7.7525966 | 0.0000000 |
PL | -0.2523846 | 0.0598575 | -4.216422 | 0.0000248 | -0.1383296 | 0.0726106 | -1.9050876 | 0.0567687 |
PR | -0.2770936 | 0.0566990 | -4.887102 | 0.0000010 | -0.1709860 | 0.0715890 | -2.3884399 | 0.0169201 |
TWI | -0.4454815 | 0.0823265 | -5.411154 | 0.0000001 | 0.2801415 | 0.0530191 | 5.2837881 | 0.0000001 |
SPI | 0.1295435 | 0.1053086 | 1.230131 | 0.2186479 | 0.0385623 | 0.0606510 | 0.6358070 | 0.5249023 |
TRI | 0.5689943 | 0.0498644 | 11.410838 | 0.0000000 | 0.1560296 | 0.0606154 | 2.5740911 | 0.0100504 |
# datatable(
# data = re3,
# caption = "Comparison 30m vs 1m",
# filter = "top",
# class = 'cell-border stripe'
# )
Our main goal is prediction so we are not concerned about the coefficients. But when we will produce the landslide susceptibility map we will only use the significant casual factors. For both DEMs Slope elevation, aspect, profile curvature, TWI and TRI are the significant factors.
library(caret)
pred_prob_train <- predict(logistic_model_30m , newdata = trainData, type = "response")
pred_class_train <- ifelse(pred_prob_train > 0.5, 1, 0)
# Evaluate the model on training set
confusion_matrix_train <- table(trainData$V, pred_class_train)
##knitr::kable(confusion_matrix_train)
accuracy_train <- sum(diag(confusion_matrix_train)) / sum(confusion_matrix_train)
print(confusion_matrix_train)
## pred_class_train
## 0 1
## 0 1410 394
## 1 370 1434
print(paste("Training Accuracy:", accuracy_train))
## [1] "Training Accuracy: 0.788248337028825"
# Make predictions on the test set
pred_prob_test <- predict(logistic_model_30m, newdata = testData, type = "response")
pred_class_test <- ifelse(pred_prob_test > 0.5, 1, 0)
# Evaluate the model on test set
confusion_matrix_test <- table(testData$V, pred_class_test)
accuracy_test <- sum(diag(confusion_matrix_test)) / sum(confusion_matrix_test)
print(confusion_matrix_test)
## pred_class_test
## 0 1
## 0 345 105
## 1 91 360
print(paste("Test Accuracy:", accuracy_test))
## [1] "Test Accuracy: 0.782463928967814"
accuracy_test_30m <- as.data.frame(accuracy_test)
# Calculate ROC curve and AUC
roc_curve <- roc(testData$V, pred_prob_test)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
auc_value <- auc(roc_curve)
auc_30m <- as.data.frame(auc_value)
print(paste("AUC:", auc_value))
## [1] "AUC: 0.857031288494703"
# Plot ROC curve using ggplot2
roc_data <- data.frame(
specificity = rev(roc_curve$specificities),
sensitivity = rev(roc_curve$sensitivities)
)
ggplot(roc_data, aes(x = specificity, y = sensitivity)) +
geom_line(color = "green") +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
labs(title = "ROC Curve for Logistic Regression_30m", x = "Specificity", y = "Sensitivity") +
theme_minimal()
## Combine Four Dataframes that we created before and we will relabel
library(flextable)
##
## Attaching package: 'flextable'
## The following objects are masked from 'package:kableExtra':
##
## as_image, footnote
## The following object is masked from 'package:purrr':
##
## compose
Acc_Comp_LR <- cbind(accuracy_test_1m, accuracy_test_30m, auc_1m, auc_30m)
colnames(Acc_Comp_LR) <- c('Accuracy_1m', 'Accuracy_30m', 'AUC_1m', 'AUC_30m')
#kable(Acc_Comp_LR, caption= "Comparison of Logistic Regression Models" )
Acc_Comp_LR%>%
kbl(caption = "Comparison of Logistic Regression Models")%>%
kable_styling()
Accuracy_1m | Accuracy_30m | AUC_1m | AUC_30m |
---|---|---|---|
0.8472687 | 0.7824639 | 0.8986289 | 0.8570313 |
#flextable(Acc_Comp_LR, )
Logistic regression model shows that if variables are derived using 1m DEM it gives better accuracy than the variables derived from 30 m DEM. We will check another model here.
We will not carry out any hyper parameter tuning right now. Here we will just see the differences. Later we will work on hyper parameter tuning.
# Upload the Shape Files
library(sf)
Landslides_Points <- st_read("Landslides_Points.shp")
## Reading layer `Landslides_Points' from data source
## `C:\R\Shape\Shape Files\Data\Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 67 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 349401.4 ymin: 265880.2 xmax: 387724.9 ymax: 293335.3
## Projected CRS: NAD83 / North Carolina
Non_Landslides_Points <- st_read("Non_Landslides_Points.shp")
## Reading layer `Non_Landslides_Points' from data source
## `C:\R\Shape\Shape Files\Data\Non_Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 35 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -81.9147 ymin: 36.11658 xmax: -81.46408 ymax: 36.38499
## Geodetic CRS: NAD83
# Convert to Data Frames
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
library(tidyverse)
library(dplyr)
landslides <- landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, V, TWI, SPI, TRI)
Non_landslides <- Non_landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, CID, TWI, SPI, TRI)
# Rename columns for consistency
landslides <- rename(landslides, Slope = Slope_12, PL = Plan, PR = Pr)
Non_landslides <- rename(Non_landslides, Slope = Slope_12, PL = Plan, PR = Pr, V = CID)
# Combine Data frames
landslides$V <- as.character(landslides$V)
Non_landslides$V <- as.character(Non_landslides$V)
Data <- rbind(landslides, Non_landslides)
Data_m <- Data
Data_m <- Data_m %>% filter(across(-V, ~ . != -9999))
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#Train and Test Datasets
# Load necessary libraries
library(caret)
library(dplyr)
# Set a seed for reproducibility
set.seed(123)
Data_m$V <- as.factor(Data_m$V)
# Perform stratified sampling
trainIndex <- createDataPartition(Data_m$V, p = 0.8, list = FALSE)
# Split the data into training and test sets
trainData <- Data_m[trainIndex, ]
testData <- Data_m[-trainIndex, ]
# Train a Random Forest model
rf_model_1m <- randomForest(V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)
# Summarize the model
print(rf_model_1m)
##
## Call:
## randomForest(formula = V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 15.44%
## Confusion matrix:
## 0 1 class.error
## 0 1494 298 0.1662946
## 1 256 1541 0.1424597
# Make predictions on the test set
rf_pred_prob <- predict(rf_model_1m, newdata = testData, type = "prob")[,2]
rf_pred_class <- predict(rf_model_1m, newdata = testData, type = "response")
# Evaluate the model
confusion_matrix_rf <- confusionMatrix(as.factor(rf_pred_class), as.factor(testData$V))
print(confusion_matrix_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 380 81
## 1 68 368
##
## Accuracy : 0.8339
## 95% CI : (0.8079, 0.8577)
## No Information Rate : 0.5006
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.6678
##
## Mcnemar's Test P-Value : 0.3256
##
## Sensitivity : 0.8482
## Specificity : 0.8196
## Pos Pred Value : 0.8243
## Neg Pred Value : 0.8440
## Prevalence : 0.4994
## Detection Rate : 0.4236
## Detection Prevalence : 0.5139
## Balanced Accuracy : 0.8339
##
## 'Positive' Class : 0
##
# Calculate ROC curve and AUC
roc_curve_rf <- roc(testData$V, rf_pred_prob)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
auc_value_rf <- auc(roc_curve_rf)
print(paste("AUC for Random Forest:", auc_value_rf))
## [1] "AUC for Random Forest: 0.915461939230035"
auc_1m <- as.data.frame(auc_value_rf )
# Plot ROC curve using ggplot2
roc_data_rf <- data.frame(
specificity = rev(roc_curve_rf$specificities),
sensitivity = rev(roc_curve_rf$sensitivities)
)
ggplot(roc_data_rf, aes(x = specificity, y = sensitivity)) +
geom_line(color = "blue") +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
labs(title = "ROC Curve for Random Forest", x = "Specificity", y = "Sensitivity") +
theme_minimal()
# Get variable importance
importance_rf <- importance(rf_model_1m )
importance_df <- data.frame(Variable = rownames(importance_rf),
MeanDecreaseGini = importance_rf[, "MeanDecreaseGini"])
# Plot variable importance using ggplot2
p1 <- ggplot(importance_df, aes(x = reorder(Variable, MeanDecreaseGini), y = MeanDecreaseGini)) +
geom_bar(stat = "identity", fill = "blue") +
coord_flip() +
labs(title = "Variable Importance Plot_1m", x = "Variable", y = "Mean Decrease in Gini") +
theme_minimal()
p1
library(tidyverse)
library(dplyr)
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
landslides_30 <- landslides %>% select(Slope_1_14, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, V, TWI__3, SPI_3, TRI_12)
Non_landslides_30 <- Non_landslides %>% select(Slope, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, CID, TWI__3, SPI_3, TRI_12)
# Rename columns for consistency
landslides_30 <- rename(landslides_30, Slope = Slope_1_14, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12)
Non_landslides_30 <- rename(Non_landslides_30, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12, V = CID)
landslides_30$V <- as.factor(landslides_30$V)
Non_landslides_30$V <- as.factor(Non_landslides_30$V)
Data_30 <- rbind(landslides_30, Non_landslides_30)
Data_m_30 <- Data_30
Data_m_30 <- Data_m_30 %>% filter(across(-V, ~ . != -9999))
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Scale variables to z-scores
scale_to_z <- function(x) {
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}
Data_m_30 <- Data_m_30 %>% mutate(across(.cols = -V, .fns = scale_to_z))
Data_m_30$V <- as.factor(Data_m_30$V)
# Load necessary libraries
library(caret)
library(dplyr)
# Set a seed for reproducibility
set.seed(123)
# Perform stratified sampling
trainIndex <- createDataPartition(Data_m_30$V, p = 0.8, list = FALSE)
# Split the data into training and test sets
trainData <- Data_m_30[trainIndex, ]
testData <- Data_m_30[-trainIndex, ]
# Train a Random Forest model
rf_model_30m <- randomForest(V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)
# Summarize the model
print(rf_model_30m)
##
## Call:
## randomForest(formula = V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 19.96%
## Confusion matrix:
## 1 0 class.error
## 1 1453 351 0.1945676
## 0 369 1435 0.2045455
# Make predictions on the test set
rf_pred_prob <- predict(rf_model_30m, newdata = testData, type = "prob")[,2]
rf_pred_class <- predict(rf_model_30m, newdata = testData, type = "response")
# Evaluate the model
confusion_matrix_rf <- confusionMatrix(as.factor(rf_pred_class), as.factor(testData$V))
print(confusion_matrix_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 0
## 1 366 90
## 0 85 360
##
## Accuracy : 0.8058
## 95% CI : (0.7784, 0.8311)
## No Information Rate : 0.5006
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.6115
##
## Mcnemar's Test P-Value : 0.7624
##
## Sensitivity : 0.8115
## Specificity : 0.8000
## Pos Pred Value : 0.8026
## Neg Pred Value : 0.8090
## Prevalence : 0.5006
## Detection Rate : 0.4062
## Detection Prevalence : 0.5061
## Balanced Accuracy : 0.8058
##
## 'Positive' Class : 1
##
# Calculate ROC curve and AUC
roc_curve_rf <- roc(testData$V, rf_pred_prob)
## Setting levels: control = 1, case = 0
## Setting direction: controls < cases
auc_value_rf <- auc(roc_curve_rf)
print(paste("AUC for Random Forest:", auc_value_rf))
## [1] "AUC for Random Forest: 0.885826558265583"
auc_30m <- as.data.frame(auc_value_rf )
# Plot ROC curve using ggplot2
roc_data_rf <- data.frame(
specificity = rev(roc_curve_rf$specificities),
sensitivity = rev(roc_curve_rf$sensitivities)
)
ggplot(roc_data_rf, aes(x = specificity, y = sensitivity)) +
geom_line(color = "blue") +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
labs(title = "ROC Curve for Random Forest_30m", x = "Specificity", y = "Sensitivity") +
theme_minimal()
# Get variable importance
importance_rf <- importance(rf_model_30m )
importance_df <- data.frame(Variable = rownames(importance_rf),
MeanDecreaseGini = importance_rf[, "MeanDecreaseGini"])
# Plot variable importance using ggplot2
p2 <- ggplot(importance_df, aes(x = reorder(Variable, MeanDecreaseGini), y = MeanDecreaseGini)) +
geom_bar(stat = "identity", fill = "blue") +
coord_flip() +
labs(title = "Variable Importance Plot_30m", x = "Variable", y = "Mean Decrease in Gini") +
theme_minimal()
p2
library(patchwork)
p1 + p2
auc_comparison <- cbind(auc_1m, auc_30m)
colnames(auc_comparison) <- c("auc_1m", "auc_30m")
auc_comparison%>%
kbl()%>%
kable_styling()
auc_1m | auc_30m |
---|---|
0.9154619 | 0.8858266 |
Both the model shows that 1m DEM based model gives better accuracy.
Now we will take a little different approach. Previously we used DEM based factors. Now we will add some non-DEM based factors
Distance from Road (DR)
Normalized Difference Vegetation Index (NDVI)
Since in previous modeling we did not include these variables we will create two sperate data frames with these variables
# Upload the Shape Files
library(sf)
Landslides_Points <- st_read("Landslides_Points.shp")
## Reading layer `Landslides_Points' from data source
## `C:\R\Shape\Shape Files\Data\Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 67 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 349401.4 ymin: 265880.2 xmax: 387724.9 ymax: 293335.3
## Projected CRS: NAD83 / North Carolina
Non_Landslides_Points <- st_read("Non_Landslides_Points.shp")
## Reading layer `Non_Landslides_Points' from data source
## `C:\R\Shape\Shape Files\Data\Non_Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 35 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -81.9147 ymin: 36.11658 xmax: -81.46408 ymax: 36.38499
## Geodetic CRS: NAD83
# Convert to Data Frames
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
library(tidyverse)
library(dplyr)
landslides <- landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, V, TWI, SPI, TRI, Imprevious , Change, Geology, Rain, Distance_R, NDVI, Soil, Landuse)
Non_landslides <- Non_landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, CID, TWI, SPI, TRI, Imprevious, LULC_chang, Geology, Rain, Distance_R, NDVI, Soil, Landuse)
# Rename columns for consistency
landslides <- rename(landslides, Slope = Slope_12, PL = Plan, PR = Pr, Imp= Imprevious, Rainfall= Rain, DR= Distance_R )
Non_landslides <- rename(Non_landslides, Slope = Slope_12, PL = Plan, PR = Pr, V = CID, Imp= Imprevious, Rainfall= Rain, DR= Distance_R, Change=LULC_chang)
##Combine Datasets
landslides$V <- as.character(landslides$V)
Non_landslides$V <- as.character(Non_landslides$V)
Data <- rbind(landslides, Non_landslides)
Data_m <- Data
## Drop the Missing values
Data_m <- Data_m %>% filter(across(-V, ~ . != -9999))
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#write.csv(Data_m, "Data_All_1m.csv", row.names = FALSE)
#Scale the Variables
# Load necessary library
library(dplyr)
# Function to scale variables to z-scores
scale_to_z <- function(x) {
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}
Data_m <- Data_m %>%
mutate(across(.cols = -c(V, Soil, Landuse, Geology, Change), .fns = scale_to_z))
Data_m$Soil <- as.factor(Data_m$Soil )
Data_m$Geology <- as.factor(Data_m$Geology )
Data_m$Landuse <- as.factor(Data_m$Landuse )
Data_m$Change <- as.factor(Data_m$Change )
write.csv(Data_m, "Data_All_1m.csv", row.names = FALSE)
# Load necessary libraries
library(caret)
library(dplyr)
# Set a seed for reproducibility
set.seed(123)
Data_m$V <- as.factor(Data_m$V)
# Perform stratified sampling
trainIndex <- createDataPartition(Data_m$V, p = 0.8, list = FALSE)
# Split the data into training and test sets
trainData <- Data_m[trainIndex, ]
testData <- Data_m[-trainIndex, ]
# Load necessary libraries
library(pROC)
library(ggplot2)
library(gt)
Data_m$V <- as.factor(Data_m$V)
# Train a logistic regression model
logistic_model_1m_all <- glm(V ~ ., data = trainData, family = binomial)
# Summarize the model
summary(logistic_model_1m_all )
##
## Call:
## glm(formula = V ~ ., family = binomial, data = trainData)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.932e-01 3.371e-01 -1.759 0.078501 .
## Slope 2.129e+00 8.098e-02 26.292 < 2e-16 ***
## Elevation -8.920e-02 8.369e-02 -1.066 0.286519
## Aspect -3.406e-01 5.262e-02 -6.473 9.60e-11 ***
## PL -1.137e-01 7.597e-02 -1.497 0.134510
## PR -1.504e-01 7.318e-02 -2.055 0.039904 *
## TWI 3.265e-01 5.636e-02 5.793 6.91e-09 ***
## SPI 8.032e-03 3.486e-02 0.230 0.817786
## TRI 2.174e-01 6.272e-02 3.466 0.000528 ***
## Imp -3.075e-01 1.480e-01 -2.078 0.037736 *
## Change3 4.280e-01 5.332e-01 0.803 0.422228
## Change8 -2.899e-01 4.121e-01 -0.703 0.481759
## Change10 -2.911e+00 1.045e+03 -0.003 0.997778
## Change11 3.253e-01 2.392e-01 1.360 0.173909
## Geology2 -9.832e-01 2.137e-01 -4.600 4.22e-06 ***
## Geology3 5.798e-01 2.181e-01 2.659 0.007839 **
## Geology4 -1.397e+00 3.065e-01 -4.557 5.18e-06 ***
## Geology5 -1.570e+01 4.112e+02 -0.038 0.969552
## Geology6 -1.884e-01 1.467e-01 -1.284 0.198968
## Geology7 -2.578e-01 2.012e-01 -1.281 0.200215
## Rainfall -2.884e-01 6.605e-02 -4.367 1.26e-05 ***
## DR 8.459e-02 5.625e-02 1.504 0.132640
## NDVI 2.590e-01 6.456e-02 4.012 6.02e-05 ***
## Soil2 -7.975e-01 1.577e+00 -0.506 0.613104
## Soil3 -3.432e-01 7.770e-01 -0.442 0.658678
## Soil4 -1.097e-01 3.020e-01 -0.363 0.716556
## Soil5 -3.098e+00 1.166e+00 -2.656 0.007913 **
## Soil6 4.570e-01 3.011e-01 1.518 0.129079
## Soil7 6.673e-02 2.999e-01 0.223 0.823903
## Landuse22 1.239e+00 5.853e-01 2.117 0.034281 *
## Landuse23 2.139e+00 1.142e+00 1.873 0.061136 .
## Landuse24 3.612e+00 1.958e+00 1.844 0.065145 .
## Landuse31 -1.072e+01 6.076e+02 -0.018 0.985928
## Landuse41 9.666e-01 2.016e-01 4.795 1.63e-06 ***
## Landuse42 -3.524e-01 4.419e-01 -0.798 0.425153
## Landuse43 -1.202e-01 2.305e-01 -0.522 0.601931
## Landuse52 2.268e-01 6.032e-01 0.376 0.706971
## Landuse71 -7.449e-01 7.888e-01 -0.944 0.345002
## Landuse81 -7.090e-01 2.728e-01 -2.599 0.009362 **
## Landuse90 -1.161e+01 1.455e+03 -0.008 0.993633
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4968.5 on 3583 degrees of freedom
## Residual deviance: 2551.2 on 3544 degrees of freedom
## AIC: 2631.2
##
## Number of Fisher Scoring iterations: 14
exp(coef(logistic_model_1m))
## (Intercept) Slope Elevation Aspect PL PR
## 0.9689135 9.1559973 0.7290574 0.6821078 0.8708116 0.8428334
## TWI SPI TRI
## 1.3233170 1.0393155 1.1688608
library(caret)
pred_prob_train <- predict(logistic_model_1m , newdata = trainData, type = "response")
pred_class_train <- ifelse(pred_prob_train > 0.5, 1, 0)
# Evaluate the model on training set
confusion_matrix_train <- table(trainData$V, pred_class_train)
##knitr::kable(confusion_matrix_train)
accuracy_train <- sum(diag(confusion_matrix_train)) / sum(confusion_matrix_train)
print(confusion_matrix_train)
## pred_class_train
## 0 1
## 0 1488 300
## 1 256 1540
print(paste("Training Accuracy:", accuracy_train))
## [1] "Training Accuracy: 0.844866071428571"
# Make predictions on the test set
pred_prob_test <- predict(logistic_model_1m, newdata = testData, type = "response")
pred_class_test <- ifelse(pred_prob_test > 0.5, 1, 0)
# Evaluate the model on test set
confusion_matrix_test <- table(testData$V, pred_class_test)
accuracy_test <- sum(diag(confusion_matrix_test)) / sum(confusion_matrix_test)
print(confusion_matrix_test)
## pred_class_test
## 0 1
## 0 375 71
## 1 65 384
# Make predictions on the test set
pred_prob_test <- predict(logistic_model_1m, newdata = testData, type = "response")
pred_class_test <- ifelse(pred_prob_test > 0.5, 1, 0)
# Evaluate the model on test set
confusion_matrix_test <- table(testData$V, pred_class_test)
accuracy_test <- sum(diag(confusion_matrix_test)) / sum(confusion_matrix_test)
print(confusion_matrix_test)
## pred_class_test
## 0 1
## 0 375 71
## 1 65 384
print(paste("Test Accuracy:", accuracy_test))
## [1] "Test Accuracy: 0.84804469273743"
accuracy_test_1m_all <- as.data.frame(accuracy_test)
# Calculate ROC curve and AUC
roc_curve <- roc(testData$V, pred_prob_test)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
auc_value <- auc(roc_curve)
auc_1m <- auc_value
auc_1m_All <- as.data.frame(auc_value)
print(paste("AUC:", auc_value))
## [1] "AUC: 0.899138094619833"
# Upload the Shape Files
library(sf)
Landslides_Points <- st_read("Landslides_Points.shp")
## Reading layer `Landslides_Points' from data source
## `C:\R\Shape\Shape Files\Data\Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 67 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 349401.4 ymin: 265880.2 xmax: 387724.9 ymax: 293335.3
## Projected CRS: NAD83 / North Carolina
Non_Landslides_Points <- st_read("Non_Landslides_Points.shp")
## Reading layer `Non_Landslides_Points' from data source
## `C:\R\Shape\Shape Files\Data\Non_Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 35 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -81.9147 ymin: 36.11658 xmax: -81.46408 ymax: 36.38499
## Geodetic CRS: NAD83
library(tidyverse)
library(dplyr)
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
landslides_30 <- landslides %>% select(Slope_1_14, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, V, TWI__3, SPI_3, TRI_12, Imprevious , Change, Geology, Rain, Distance_R, NDVI, Soil, Landuse)
Non_landslides_30 <- Non_landslides %>% select(Slope, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, CID, TWI__3, SPI_3, TRI_12, Imprevious, LULC_chang, Geology, Rain, Distance_R, NDVI, Soil, Landuse)
# Rename columns for consistency
landslides_30 <- rename(landslides_30, Slope = Slope_1_14, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12, Imp= Imprevious, Rainfall= Rain, DR= Distance_R)
Non_landslides_30 <- rename(Non_landslides_30, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12, V = CID, Imp= Imprevious, Rainfall= Rain, DR= Distance_R, Change=LULC_chang)
landslides_30$V <- as.character(landslides_30$V)
Non_landslides_30$V <- as.character(Non_landslides_30$V)
Data_30 <- rbind(landslides_30, Non_landslides_30)
Data_m_30_All <- Data_30
Data_m_30_All <- Data_m_30_All %>% filter(across(-V, ~ . != -9999))
## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Scale variables to z-scores
scale_to_z <- function(x) {
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}
Data_m_30_All <- Data_m_30_All %>% mutate(across(.cols = -V, .fns = scale_to_z))
Data_m_30_All$V <- as.factor(Data_m_30_All$V)
Data_m_30_All$Soil <- as.factor(Data_m_30_All$Soil )
Data_m_30_All$Geology <- as.factor(Data_m_30_All$Geology )
Data_m_30_All$Landuse <- as.factor(Data_m_30_All$Landuse )
Data_m_30_All$Change <- as.factor(Data_m_30_All$Change )
write.csv(Data_m_30_All, "Data_All_30m.csv", row.names = FALSE)
# Load necessary libraries
library(caret)
library(dplyr)
# Set a seed for reproducibility
set.seed(123)
# Perform stratified sampling
trainIndex <- createDataPartition(Data_m_30_All$V, p = 0.8, list = FALSE)
# Split the data into training and test sets
trainData <- Data_m_30_All[trainIndex, ]
testData <- Data_m_30_All[-trainIndex, ]
# Load necessary libraries
library(pROC)
library(ggplot2)
library(gt)
# Train a logistic regression model
logistic_model_30m_All <- glm(V ~ ., data = trainData, family = binomial)
# Summarize the model
summary(logistic_model_30m_All)
##
## Call:
## glm(formula = V ~ ., family = binomial, data = trainData)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.78817 2399.54468 -0.004 0.99708
## Slope 1.32015 0.06714 19.663 < 2e-16 ***
## Elevation -0.35077 0.07901 -4.439 9.03e-06 ***
## Aspect -0.38827 0.04792 -8.102 5.40e-16 ***
## PL -0.17453 0.06316 -2.763 0.00572 **
## PR -0.23243 0.05904 -3.937 8.25e-05 ***
## TWI -0.35262 0.08759 -4.026 5.68e-05 ***
## SPI 0.05814 0.03468 1.676 0.09365 .
## TRI 0.43022 0.05471 7.864 3.73e-15 ***
## Imp -0.15058 0.12003 -1.255 0.20965
## Change0.132491109316288 -2.39375 3393.46863 -0.001 0.99944
## Change0.547077872218507 0.77116 0.46275 1.666 0.09562 .
## Change2.6200116867296 -0.21569 0.36976 -0.583 0.55968
## Change3.44918521253404 -1.42871 2118.78452 -0.001 0.99946
## Change3.86377197543626 0.25914 0.21203 1.222 0.22165
## Geology-0.572883768189359 -1.59029 0.19637 -8.099 5.56e-16 ***
## Geology-0.141785862467707 -0.16160 0.19010 -0.850 0.39528
## Geology0.289312043253945 -1.69809 0.30485 -5.570 2.54e-08 ***
## Geology0.720409948975597 -17.63793 797.19173 -0.022 0.98235
## Geology1.15150785469725 -0.63433 0.13410 -4.730 2.24e-06 ***
## Geology1.5826057604189 -0.14733 0.18648 -0.790 0.42950
## Rainfall -0.31314 0.06118 -5.119 3.08e-07 ***
## DR 0.15756 0.05118 3.079 0.00208 **
## NDVI 0.18499 0.06035 3.065 0.00218 **
## Soil-2.17319987284062 -2.74623 1.73956 -1.579 0.11441
## Soil-1.54568057601141 -1.12567 0.75167 -1.498 0.13424
## Soil-0.918161279182197 -0.39696 0.27291 -1.455 0.14580
## Soil-0.290641982352987 -16.05199 287.60091 -0.056 0.95549
## Soil0.336877314476222 0.04945 0.27072 0.183 0.85507
## Soil0.964396611305432 -0.34253 0.27022 -1.268 0.20493
## Landuse-1.48123028866035 8.88355 2399.54467 0.004 0.99705
## Landuse-1.41404193684783 9.43539 2399.54473 0.004 0.99686
## Landuse-1.34685358503532 10.41828 2399.54486 0.004 0.99654
## Landuse-1.2796652332228 10.61164 2399.54525 0.004 0.99647
## Landuse-0.809346770535189 -4.42004 2743.17837 -0.002 0.99871
## Landuse-0.137463252410028 9.73041 2399.54467 0.004 0.99676
## Landuse-0.0702749005975118 8.75122 2399.54471 0.004 0.99709
## Landuse-0.00308654878499566 8.83675 2399.54467 0.004 0.99706
## Landuse0.60160861752765 9.36249 2399.54471 0.004 0.99689
## Landuse1.87818730196546 7.95211 2399.54480 0.003 0.99736
## Landuse2.55007082009062 8.44723 2399.54468 0.004 0.99719
## Landuse3.15476598640326 -2.44852 3393.46863 -0.001 0.99942
## Landuse3.49070774546584 -2.94774 3393.46869 -0.001 0.99931
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4997.6 on 3604 degrees of freedom
## Residual deviance: 2982.0 on 3562 degrees of freedom
## AIC: 3068
##
## Number of Fisher Scoring iterations: 15
exp(coef(logistic_model_30m))
## (Intercept) Slope Elevation Aspect PL PR
## 0.9541291 3.8634647 0.5613606 0.6896913 0.7769459 0.7579835
## TWI SPI TRI
## 0.6405158 1.1383086 1.7664896
library(caret)
pred_prob_train <- predict(logistic_model_30m_All , newdata = trainData, type = "response")
pred_class_train <- ifelse(pred_prob_train > 0.5, 1, 0)
# Evaluate the model on training set
confusion_matrix_train <- table(trainData$V, pred_class_train)
##knitr::kable(confusion_matrix_train)
accuracy_train <- sum(diag(confusion_matrix_train)) / sum(confusion_matrix_train)
print(confusion_matrix_train)
## pred_class_train
## 0 1
## 0 1451 350
## 1 311 1493
print(paste("Training Accuracy:", accuracy_train))
## [1] "Training Accuracy: 0.816643550624133"
# Make predictions on the test set
pred_prob_test <- predict(logistic_model_30m_All, newdata = testData, type = "response")
pred_class_test <- ifelse(pred_prob_test > 0.5, 1, 0)
# Evaluate the model on test set
confusion_matrix_test <- table(testData$V, pred_class_test)
accuracy_test <- sum(diag(confusion_matrix_test)) / sum(confusion_matrix_test)
print(confusion_matrix_test)
## pred_class_test
## 0 1
## 0 366 84
## 1 89 362
print(paste("Test Accuracy:", accuracy_test))
## [1] "Test Accuracy: 0.807991120976693"
accuracy_test_30m_All <- as.data.frame(accuracy_test)
# Calculate ROC curve and AUC
roc_curve <- roc(testData$V, pred_prob_test)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
auc_value <- auc(roc_curve)
auc_30m_All <- as.data.frame(auc_value)
print(paste("AUC:", auc_value))
## [1] "AUC: 0.897161862527716"
Data_m <- read.csv("Data_All_1m.csv")
Data_m$V <- as.factor(Data_m$V )
Data_m$Soil <- as.factor(Data_m$Soil )
Data_m$Geology <- as.factor(Data_m$Geology )
Data_m$Landuse <- as.factor(Data_m$Landuse )
Data_m$Change <- as.factor(Data_m$Change )
#Train and Test Datasets
# Load necessary libraries
library(caret)
library(dplyr)
# Set a seed for reproducibility
set.seed(123)
Data_m$V <- as.factor(Data_m$V)
# Perform stratified sampling
trainIndex <- createDataPartition(Data_m$V, p = 0.8, list = FALSE)
# Split the data into training and test sets
trainData <- Data_m[trainIndex, ]
testData <- Data_m[-trainIndex, ]
# Train a Random Forest model
rf_model_1m_All <- randomForest(V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)
# Summarize the model
print(rf_model_1m_All)
##
## Call:
## randomForest(formula = V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 12.83%
## Confusion matrix:
## 0 1 class.error
## 0 1550 238 0.1331096
## 1 222 1574 0.1236080
# Make predictions on the test set
rf_pred_prob <- predict(rf_model_1m_All, newdata = testData, type = "prob")[,2]
rf_pred_class <- predict(rf_model_1m_All, newdata = testData, type = "response")
# Evaluate the model
confusion_matrix_rf <- confusionMatrix(as.factor(rf_pred_class), as.factor(testData$V))
print(confusion_matrix_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 393 54
## 1 53 395
##
## Accuracy : 0.8804
## 95% CI : (0.8574, 0.901)
## No Information Rate : 0.5017
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.7609
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.8812
## Specificity : 0.8797
## Pos Pred Value : 0.8792
## Neg Pred Value : 0.8817
## Prevalence : 0.4983
## Detection Rate : 0.4391
## Detection Prevalence : 0.4994
## Balanced Accuracy : 0.8804
##
## 'Positive' Class : 0
##
# Calculate ROC curve and AUC
roc_curve_rf <- roc(testData$V, rf_pred_prob)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
auc_value_rf <- auc(roc_curve_rf)
print(paste("AUC for Random Forest:", auc_value_rf))
## [1] "AUC for Random Forest: 0.945938657904461"
auc_1m_All <- as.data.frame(auc_value_rf )
# Plot ROC curve using ggplot2
roc_data_rf <- data.frame(
specificity = rev(roc_curve_rf$specificities),
sensitivity = rev(roc_curve_rf$sensitivities)
)
ggplot(roc_data_rf, aes(x = specificity, y = sensitivity)) +
geom_line(color = "blue") +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
labs(title = "ROC Curve for Random Forest", x = "Specificity", y = "Sensitivity") +
theme_minimal()
# Get variable importance
importance_rf <- importance(rf_model_1m_All )
importance_df <- data.frame(Variable = rownames(importance_rf),
MeanDecreaseGini = importance_rf[, "MeanDecreaseGini"])
# Plot variable importance using ggplot2
p3 <- ggplot(importance_df, aes(x = reorder(Variable, MeanDecreaseGini), y = MeanDecreaseGini)) +
geom_bar(stat = "identity", fill = "blue") +
coord_flip() +
labs(title = "Variable Importance Plot_1m", x = "Variable", y = "Mean Decrease in Gini") +
theme_minimal()
p3
Data_m_30 <- read.csv("Data_All_30m")
Data_m_30$V <- as.factor(Data_m_30$V )
Data_m_30$Soil <- as.factor(Data_m_30$Soil )
Data_m_30$Geology <- as.factor(Data_m_30$Geology )
Data_m_30$Landuse <- as.factor(Data_m_30$Landuse )
Data_m_30Change <- as.factor(Data_m_30$Change )
# Load necessary libraries
library(caret)
library(dplyr)
# Set a seed for reproducibility
set.seed(123)
# Perform stratified sampling
trainIndex <- createDataPartition(Data_m_30$V, p = 0.8, list = FALSE)
# Split the data into training and test sets
trainData <- Data_m_30[trainIndex, ]
testData <- Data_m_30[-trainIndex, ]
# Train a Random Forest model
rf_model_30m_All <- randomForest(V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)
# Summarize the model
print(rf_model_30m_All )
##
## Call:
## randomForest(formula = V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 14.51%
## Confusion matrix:
## 0 1 class.error
## 0 1529 272 0.1510272
## 1 251 1553 0.1391353
# Make predictions on the test set
rf_pred_prob <- predict(rf_model_30m_All , newdata = testData, type = "prob")[,2]
rf_pred_class <- predict(rf_model_30m_All , newdata = testData, type = "response")
# Evaluate the model
confusion_matrix_rf <- confusionMatrix(as.factor(rf_pred_class), as.factor(testData$V))
print(confusion_matrix_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 393 59
## 1 57 392
##
## Accuracy : 0.8713
## 95% CI : (0.8476, 0.8924)
## No Information Rate : 0.5006
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.7425
##
## Mcnemar's Test P-Value : 0.926
##
## Sensitivity : 0.8733
## Specificity : 0.8692
## Pos Pred Value : 0.8695
## Neg Pred Value : 0.8731
## Prevalence : 0.4994
## Detection Rate : 0.4362
## Detection Prevalence : 0.5017
## Balanced Accuracy : 0.8713
##
## 'Positive' Class : 0
##
# Calculate ROC curve and AUC
roc_curve_rf <- roc(testData$V, rf_pred_prob)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
auc_value_rf <- auc(roc_curve_rf)
print(paste("AUC for Random Forest:", auc_value_rf))
## [1] "AUC for Random Forest: 0.943254496181325"
auc_30m_All <- as.data.frame(auc_value_rf )
# Plot ROC curve using ggplot2
roc_data_rf <- data.frame(
specificity = rev(roc_curve_rf$specificities),
sensitivity = rev(roc_curve_rf$sensitivities)
)
ggplot(roc_data_rf, aes(x = specificity, y = sensitivity)) +
geom_line(color = "blue") +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
labs(title = "ROC Curve for Random Forest_30m", x = "Specificity", y = "Sensitivity") +
theme_minimal()
# Get variable importance
importance_rf <- importance(rf_model_30m_All )
importance_df <- data.frame(Variable = rownames(importance_rf),
MeanDecreaseGini = importance_rf[, "MeanDecreaseGini"])
# Plot variable importance using ggplot2
p4 <- ggplot(importance_df, aes(x = reorder(Variable, MeanDecreaseGini), y = MeanDecreaseGini)) +
geom_bar(stat = "identity", fill = "blue") +
coord_flip() +
labs(title = "Variable Importance Plot_30m", x = "Variable", y = "Mean Decrease in Gini") +
theme_minimal()
p4
library(patchwork)
p1 + p2+p3+p4
Lets compare all models.
LR_1m_DEM Only | LR_1m_All Factors | LR_30m _DEM Only | LR_30m All Factors | RF_1m_DEM Only | RF_1m_All Factors | RF_30m DEM Only | RF_30m All Factors | |
---|---|---|---|---|---|---|---|---|
Accuracy | 0.845 | 0.833 | 0.783 | 0.808 | 0.834 | 0.880 | 0.806 | 0.877 |
Area Under the Curve (AUC) | 0.899 | 0.915 | 0.857 | 0.897 | 0.915 | 0.946 | 0.888 | 0.943 |