Variables or Causal Factors

Eight DEM based variables will be used

Install and Load Required Packages

library(sf)
library(tidyverse)
library(randomForest)

library(caret)
library(pROC)
library(e1071)

Upload and Convert Shape Files

We have two shape files both of them are point files one show the locations of landslides and another one is the locations of randomly generated non-landslide locations. Our study area is Watauga County of North Carolina. We will upload these two files in R and convert them into two separate dataframes.

# Upload the Shape Files
library(sf)
Landslides_Points <- st_read("Landslides_Points.shp")
Non_Landslides_Points <- st_read("Non_Landslides_Points.shp")

# Convert to Data Frames
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)

Check the Dataframes

# Print the heads of the Data Frames
#print(head(landslides))
#print(head(Non_landslides))
#
library(kableExtra)
my_data <- as_tibble(Landslides_Points)
my_data

## # A tibble: 2,257 × 68
##    OBJECTID Process_ID Project         County Quad  Col_Date Mvmnt_Date SM_State
##       <int>      <dbl> <chr>           <chr>  <chr> <chr>    <chr>      <chr>   
##  1    59592       3256 Landslide Haza… Watau… Gran… <NA>     1940/08/1… <NA>    
##  2    59593       3257 Landslide Haza… Watau… Gran… <NA>     1940/08/1… <NA>    
##  3    59594       3258 Landslide Haza… Watau… Gran… <NA>     1940/08/1… <NA>    
##  4    59595       3259 Landslide Haza… Watau… Gran… <NA>     1940/08/1… <NA>    
##  5    59596       3260 Landslide Haza… Watau… Gran… <NA>     1940/08/1… <NA>    
##  6    59598       3262 Landslide Haza… Watau… Gran… <NA>     1940/08/1… <NA>    
##  7    59599       3263 Landslide Haza… Watau… Gran… <NA>     1940/08/1… <NA>    
##  8    59600       3264 Landslide Haza… Watau… Gran… <NA>     1940/08/1… <NA>    
##  9    59601       3265 Landslide Haza… Watau… Gran… <NA>     1940/08/1… <NA>    
## 10    59602       3266 Landslide Haza… Watau… Gran… <NA>     1940/08/1… <NA>    
## # ℹ 2,247 more rows
## # ℹ 60 more variables: SM_Rate_1 <chr>, SM_Mat_1 <chr>, SM_Type_1 <chr>,
## #   SM_Rate_2 <chr>, SM_Mat_2 <chr>, SM_Type_2 <chr>, Slp_Config <chr>,
## #   Fatalities <int>, Confidence <int>, MapMethod <chr>, GlobalID <chr>,
## #   SHAPE_1 <chr>, Slope_DEM_ <dbl>, Slope <dbl>, Slope_1 <dbl>, DEM_30 <dbl>,
## #   Slope_30 <dbl>, Elevation <dbl>, Aspect <dbl>, Plan <dbl>, Pr <dbl>,
## #   TWI <dbl>, SPI <dbl>, Slope_12 <dbl>, LS <dbl>, TRI <dbl>, V <chr>, …

my_data1 <- as_tibble(Non_Landslides_Points)
my_data1

## # A tibble: 2,257 × 36
##     OID_   CID Elevation   Plan     Pr Aspect Slope_12     SPI   TRI   TWI  Soil
##    <dbl> <dbl>     <dbl>  <dbl>  <dbl>  <dbl>    <dbl>   <dbl> <dbl> <dbl> <dbl>
##  1     0     0     3565.  3.11  -1.08   333.      9.13  0.154  0.517  1.83     6
##  2     0     0     3562. 14.5    1.33    91.2    38.3   6.72   0.578  2.18     6
##  3     0     0     3660. -4.13   2.22    91.4    20.4   1.85   0.524  2.60     6
##  4     0     0     3646.  3.82  -4.87   333.      8.97  0.152  0.434  1.85     6
##  5     0     0     4565.  4.01  -0.951   76.8    13.5   0.232  0.536  1.43     6
##  6     0     0     4137. -1.35   0.179  343.     17.6  34.0    0.496  5.84     6
##  7     0     0     4260.  0.194 -0.921   80.4     5.78  0.0967 0.351  2.29     6
##  8     0     0     4126.  1.62   4.13   154.     12.8   0.439  0.397  2.18     6
##  9     0     0     4447.  2.04  -6.32   177.     17.7   0.315  0.516  1.14     3
## 10     0     0     3995.  2.06  -1.03    30.9    11.0   0.188  0.535  1.64     4
## # ℹ 2,247 more rows
## # ℹ 25 more variables: Aspect_1 <dbl>, PL <dbl>, PR_1 <dbl>, Geology <dbl>,
## #   NDVI <dbl>, Rainfall <dbl>, Slope_1 <dbl>, SPI_1 <dbl>, TRI_1 <dbl>,
## #   TWI_1 <dbl>, Landuse <dbl>, LULC_chang <dbl>, Imprevious <dbl>, Rain <dbl>,
## #   Distance_R <dbl>, Aspect_DEM <dbl>, TRI_12 <dbl>, DEM_Projec <dbl>,
## #   Hillshade <dbl>, Plan_curva <dbl>, Profile_cu <dbl>, Slope <dbl>,
## #   SPI_3 <dbl>, TWI__3 <dbl>, geometry <POINT [°]>

These two data frames have several unnecessary columns or variables. I used ArcGIS software to derive the causal factors using DEM. Then extract multi-values to points tool was used to extract cell values of causal factors for each landslide and non-landslide points. We need one target variable (V) and eight causal factors or variables (Elevation, Slope, Aspect, Plan Curvature, Profile Curvature, TWI, TRI, SPI). We will select these variables and since these variables have different names or labels we will change them accordingly.

Select Required Coloumns for 1m DEM

library(tidyverse)
library(dplyr)
landslides <- landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, V, TWI, SPI, TRI)
Non_landslides <- Non_landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, CID, TWI, SPI, TRI)

# Rename columns for consistency
landslides <- rename(landslides, Slope = Slope_12, PL = Plan, PR = Pr)
Non_landslides <- rename(Non_landslides, Slope = Slope_12, PL = Plan, PR = Pr, V = CID)
##Lets show them using tables
library(DT)
dat1 <- landslides
dat2 <- Non_landslides
datatable(
data = landslides,
caption = "Landslides",
filter = "top",
class = 'cell-border stripe'
)

datatable(
data = Non_landslides,
caption = "Non_Landslides",
filter = "top",
class = 'cell-border stripe'
)

# library(kableExtra)
# dat1
# kable(dat1)%>%kable_styling(latex_options = 'sriped')
# dat <- landslides[1:10,]
# knitr::kable(dat, caption= "Landslides")
# dat1 <- Non_Landslides_Points[1:10,]
# knitr::kable(dat1, caption= "Non_Landslides")

# library(kableExtra)
# my_data <- as_tibble(Landslides_Points)
# my_data
# kable(dat1)%>%kable_styling(latex_options = 'sriped')
# dat <- landslides[1:10,]
# knitr::kable(dat, caption= "Landslides")
# dat1 <- Non_Landslides_Points[1:10,]
# knitr::kable(dat1, caption= "Non_Landslides")

Combine Data Frames and Clean Data

Now we will combine two dataframes into a single dataframe. But before that we need to check the type of variables.

tab1 <- summary(landslides)
tab2 <- summary(Non_landslides)
kable(tab1, caption="Summary Landslides")

Summary Landslides
Slope	Elevation	Aspect	PL	PR	V	TWI	SPI	TRI
Min. :-9999.00	Min. :-9999	Min. :-9999.00	Min. :-9999.000	Min. :-9999.000	Length:2257	Min. :-9999.000	Min. : -9999.00	Min. :-9999.000
1st Qu.: 25.61	1st Qu.: 2830	1st Qu.: 90.24	1st Qu.: -4.176	1st Qu.: -3.606	Class :character	1st Qu.: 1.172	1st Qu.: 1.15	1st Qu.: 0.476
Median : 30.57	Median : 3134	Median : 126.64	Median : -0.528	Median : 0.261	Mode :character	Median : 2.541	Median : 4.37	Median : 0.500
Mean : -18.56	Mean : 3012	Mean : 85.47	Mean : -49.398	Mean : -48.531	NA	Mean : -45.738	Mean : 186.77	Mean : -12.789
3rd Qu.: 35.32	3rd Qu.: 3437	3rd Qu.: 172.47	3rd Qu.: 2.988	3rd Qu.: 3.975	NA	3rd Qu.: 4.526	3rd Qu.: 34.64	3rd Qu.: 0.524
Max. : 72.29	Max. : 5162	Max. : 357.53	Max. : 44.670	Max. : 81.796	NA	Max. : 14.860	Max. :232744.00	Max. : 0.842

kable(tab2, caption="Summary Non_Landslides")

Summary Non_Landslides
Slope	Elevation	Aspect	PL	PR	V	TWI	SPI	TRI
Min. :-9999.000	Min. :-9999	Min. :-9999.00	Min. :-9999.000	Min. :-9999.000	Min. :0	Min. :-9999.000	Min. : -9999.00	Min. :-9999.000
1st Qu.: 7.986	1st Qu.: 3056	1st Qu.: 91.27	1st Qu.: -1.134	1st Qu.: -1.390	1st Qu.:0	1st Qu.: 1.946	1st Qu.: 0.30	1st Qu.: 0.469
Median : 14.085	Median : 3335	Median : 185.17	Median : 0.012	Median : 0.000	Median :0	Median : 3.073	Median : 0.97	Median : 0.502
Mean : 5.951	Mean : 3345	Mean : 171.33	Mean : -8.904	Mean : -8.792	Mean :0	Mean : -18.668	Mean : 221.66	Mean : -65.956
3rd Qu.: 20.366	3rd Qu.: 3665	3rd Qu.: 268.38	3rd Qu.: 1.302	3rd Qu.: 1.411	3rd Qu.:0	3rd Qu.: 4.592	3rd Qu.: 5.02	3rd Qu.: 0.532
Max. : 59.009	Max. : 5295	Max. : 359.78	Max. : 21.310	Max. : 68.018	Max. :0	Max. : 19.732	Max. :270285.00	Max. : 0.884

Summary shows that the target variable V is in character format and we will convert it into factor format. All varibales have missing data (-9999), we will drop them after joining the two dataframes.

Combine the Dataframes

landslides$V <- as.character(landslides$V)
Non_landslides$V <- as.character(Non_landslides$V)
Data <- rbind(landslides, Non_landslides)
Data_m <- Data

Drop the missing values labeled as -9999

Data_m <- Data_m %>% filter(across(-V, ~ . != -9999))

Now lets see the summary again.

##Lets show them using tables
library(DT)
dat3 <- summary(Data_m)

kable(dat3, caption="Summary of Data")

Summary of Data
Slope	Elevation	Aspect	PL	PR	V	TWI	SPI	TRI
Min. : 0.01862	Min. :1413	Min. : 0.00	Min. :-53.1819	Min. :-71.61270	Length:4486	Min. :-1.033	Min. : -66.25	Min. :0.0186
1st Qu.:13.36940	1st Qu.:2933	1st Qu.: 91.27	1st Qu.: -2.3481	1st Qu.: -2.20290	Class :character	1st Qu.: 1.592	1st Qu.: 0.61	1st Qu.:0.4736
Median :23.40960	Median :3246	Median :144.91	Median : -0.1317	Median : 0.08716	Mode :character	Median : 2.854	Median : 2.08	Median :0.5013
Mean :22.62100	Mean :3217	Mean :157.74	Mean : -0.3573	Mean : 0.13419	NA	Mean : 3.239	Mean : 234.45	Mean :0.5012
3rd Qu.:31.42370	3rd Qu.:3556	3rd Qu.:217.14	3rd Qu.: 1.8725	3rd Qu.: 2.36965	NA	3rd Qu.: 4.548	3rd Qu.: 13.41	3rd Qu.:0.5280
Max. :72.29000	Max. :5295	Max. :359.78	Max. : 44.6705	Max. : 81.79600	NA	Max. :19.732	Max. :270285.00	Max. :0.8844

Scale the Variables to Z-Score

## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Split Data into Train and Test Sets

# Load necessary libraries
library(caret)
library(dplyr)

# Set a seed for reproducibility
set.seed(123)

# Perform stratified sampling
trainIndex <- createDataPartition(Data_m$V, p = 0.8, list = FALSE)

# Split the data into training and test sets
trainData <- Data_m[trainIndex, ]
testData <- Data_m[-trainIndex, ]

# Check the distribution of target variable in training and test sets
dat4 <- table(trainData$V)
dat5 <- table(testData$V)
kable(dat4)

Var1	Freq
0	1792
1	1797

kable(dat5)

Var1	Freq
0	448
1	449

So in our training dataset there are 1792, absence and 1797 presence data where as in test dataset there are 448 absence and 449 presence data. Now we will train our model. ## Logistic Regression Model

# Load necessary libraries
library(pROC)
library(ggplot2)

library(gt)

# Train a logistic regression model
logistic_model_1m <- glm(V ~ ., data = trainData, family = binomial)

# Summarize the model
summary(logistic_model_1m)

## 
## Call:
## glm(formula = V ~ ., family = binomial, data = trainData)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.03158    0.04773  -0.662   0.5082    
## Slope        2.21441    0.07486  29.580  < 2e-16 ***
## Elevation   -0.31600    0.05157  -6.128 8.89e-10 ***
## Aspect      -0.38257    0.04935  -7.753 9.00e-15 ***
## PL          -0.13833    0.07261  -1.905   0.0568 .  
## PR          -0.17099    0.07159  -2.388   0.0169 *  
## TWI          0.28014    0.05302   5.284 1.27e-07 ***
## SPI          0.03856    0.06065   0.636   0.5249    
## TRI          0.15603    0.06062   2.574   0.0101 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4975.4  on 3588  degrees of freedom
## Residual deviance: 2831.4  on 3580  degrees of freedom
## AIC: 2849.4
## 
## Number of Fisher Scoring iterations: 5

exp(coef(logistic_model_1m))

## (Intercept)       Slope   Elevation      Aspect          PL          PR 
##   0.9689135   9.1559973   0.7290574   0.6821078   0.8708116   0.8428334 
##         TWI         SPI         TRI 
##   1.3233170   1.0393155   1.1688608

#kable(l$coefficients, caption="g")

Our goal is not to check the result of the logistic regression rather we are concerned about the prediction. Lets see the prediction accuracy.

Accuracy Assessment for 1 meter DEM

# Make predictions on the training set

library(dplyr)
library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

library(caret)

pred_prob_train <- predict(logistic_model_1m , newdata = trainData, type = "response")
pred_class_train <- ifelse(pred_prob_train > 0.5, 1, 0)

# Evaluate the model on training set
confusion_matrix_train <- table(trainData$V, pred_class_train)

##knitr::kable(confusion_matrix_train)
accuracy_train <- sum(diag(confusion_matrix_train)) / sum(confusion_matrix_train)
print(confusion_matrix_train)

##    pred_class_train
##        0    1
##   0 1487  305
##   1  251 1546

 print(paste("Training Accuracy:", accuracy_train))

## [1] "Training Accuracy: 0.84508219559766"

#kable(t)
# Make predictions on the test set
pred_prob_test <- predict(logistic_model_1m, newdata = testData, type = "response")
pred_class_test <- ifelse(pred_prob_test > 0.5, 1, 0)

# Evaluate the model on test set
confusion_matrix_test <- table(testData$V, pred_class_test)
accuracy_test <- sum(diag(confusion_matrix_test)) / sum(confusion_matrix_test)
print(confusion_matrix_test)

##    pred_class_test
##       0   1
##   0 382  66
##   1  71 378

print(paste("Test Accuracy:", accuracy_test))

## [1] "Test Accuracy: 0.84726867335563"

accuracy_test_1m <- as.data.frame(accuracy_test)

# Calculate ROC curve and AUC
roc_curve <- roc(testData$V, pred_prob_test)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

auc_value <- auc(roc_curve)
auc_1m <- auc_value 
print(paste("AUC:", auc_value))

## [1] "AUC: 0.898628897550111"

# Plot ROC curve using ggplot2
roc_data <- data.frame(
  specificity = rev(roc_curve$specificities),
  sensitivity = rev(roc_curve$sensitivities)
)

ggplot(roc_data, aes(x = specificity, y = sensitivity)) +
  geom_line(color = "blue") +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Logistic Regression_1m", x = "Specificity", y = "Sensitivity") +
  theme_minimal()

write.csv(Data_m, "Data_1m.csv", row.names = FALSE)

Logistic regression model gives satisfactory accuracy for 1m DEM based causal factors. Now we will work with SRTM (30 m) DEM based causal factors and check the accuracy of prediction. I will help us to understand the role of the resolution and accuracy of DEM on landslide prediction over an area. We will repeat the whole process again only the data-set will be different.

Slelect Required Coloumns for SRTM (30m) DEM

# Upload the Shape Files
library(sf)
Landslides_Points <- st_read("Landslides_Points.shp")

## Reading layer `Landslides_Points' from data source 
##   `C:\R\Shape\Shape Files\Data\Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 67 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 349401.4 ymin: 265880.2 xmax: 387724.9 ymax: 293335.3
## Projected CRS: NAD83 / North Carolina

Non_Landslides_Points <- st_read("Non_Landslides_Points.shp")

## Reading layer `Non_Landslides_Points' from data source 
##   `C:\R\Shape\Shape Files\Data\Non_Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 35 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -81.9147 ymin: 36.11658 xmax: -81.46408 ymax: 36.38499
## Geodetic CRS:  NAD83

library(tidyverse)
library(dplyr)
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
landslides_30 <- landslides %>% select(Slope_1_14, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, V, TWI__3, SPI_3, TRI_12)
Non_landslides_30 <- Non_landslides %>% select(Slope, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, CID, TWI__3, SPI_3, TRI_12)

# Rename columns for consistency
landslides_30 <- rename(landslides_30, Slope = Slope_1_14, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12)
Non_landslides_30 <- rename(Non_landslides_30, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12, V = CID)
##Lets show them using tables
library(DT)
dat1 <- landslides_30
dat2 <- Non_landslides_30
datatable(
data = landslides_30,
caption = "Landslides",
filter = "top",
class = 'cell-border stripe'
)

datatable(
data = Non_landslides_30,
caption = "Non_Landslides",
filter = "top",
class = 'cell-border stripe'
)

Combine the Dataframes and Clean Data

tab1 <- summary(landslides_30)
tab2 <- summary(Non_landslides_30)
kable(tab1, caption="Summary Landslides_30m")

Summary Landslides_30m
Slope	Elevation	Aspect	PL	PR	V	TWI	SPI	TRI
Min. : 2.07	Min. : 449.0	Min. :-9999.00	Min. :-3.11276	Min. :-3.21878	Length:2257	Min. :-3.4113	Min. : 0.0	Min. :0.2922
1st Qu.:15.21	1st Qu.: 876.0	1st Qu.: 88.23	1st Qu.:-0.31934	1st Qu.:-0.46307	Class :character	1st Qu.:-2.0951	1st Qu.: 0.0	1st Qu.:0.4547
Median :20.25	Median : 971.0	Median : 130.91	Median : 0.03593	Median :-0.07091	Mode :character	Median :-1.5829	Median : 15.8	Median :0.5048
Mean :20.08	Mean : 950.7	Mean : 126.98	Mean : 0.03297	Mean :-0.08110	NA	Mean :-1.4046	Mean : 1148.7	Mean :0.5088
3rd Qu.:24.74	3rd Qu.:1063.0	3rd Qu.: 176.63	3rd Qu.: 0.37393	3rd Qu.: 0.29913	NA	3rd Qu.:-0.9447	3rd Qu.: 44.1	3rd Qu.:0.5644
Max. :44.21	Max. :1586.0	Max. : 359.49	Max. : 3.60871	Max. : 2.86714	NA	Max. :10.0097	Max. :2402323.8	Max. :0.7527

kable(tab2, caption="Summary Non_Landslides")

Summary Non_Landslides
Slope	Elevation	Aspect	PL	PR	V	TWI	SPI	TRI
Min. : 0.3552	Min. : 434	Min. :-9999.0	Min. :-2.04006	Min. :-2.39026	Min. :0	Min. :-2.9974	Min. : 0.0	Min. :0.1510
1st Qu.: 6.7917	1st Qu.: 936	1st Qu.: 85.6	1st Qu.:-0.22708	1st Qu.:-0.21649	1st Qu.:0	1st Qu.:-1.5306	1st Qu.: 0.0	1st Qu.:0.4013
Median :11.1627	Median :1022	Median : 175.2	Median : 0.00000	Median : 0.02460	Median :0	Median :-0.8667	Median : 12.1	Median :0.4701
Mean :11.7711	Mean :1030	Mean : 160.2	Mean : 0.01086	Mean : 0.05174	Mean :0	Mean :-0.1922	Mean : 2444.6	Mean :0.4697
3rd Qu.:15.9215	3rd Qu.:1126	3rd Qu.: 257.0	3rd Qu.: 0.24601	3rd Qu.: 0.32753	3rd Qu.:0	3rd Qu.: 0.4310	3rd Qu.: 48.6	3rd Qu.:0.5400
Max. :37.2150	Max. :1618	Max. : 359.5	Max. : 1.72111	Max. : 2.73880	Max. :0	Max. :12.0588	Max. :2588779.8	Max. :0.7704

landslides_30$V <- as.character(landslides_30$V)
Non_landslides_30$V <- as.character(Non_landslides_30$V)
Data_30 <- rbind(landslides_30, Non_landslides_30)
Data_m_30 <- Data_30

Drop the Missing Values Labled as -9999

Data_m_30 <- Data_m_30 %>% filter(across(-V, ~ . != -9999))

## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Store the Clean Dataset for 1m DEM and 30m DEM as CSV Files

We will store the clean 1m DEM and 30m DEM data as csv files so that we can use it later and we do not have to clean it again.

write.csv(Data_m, "Data_1m.csv", row.names = FALSE)
write.csv(Data_m_30, "Data_30m.csv", row.names = FALSE)

Scale the variables to Z Score

# Scale variables to z-scores
scale_to_z <- function(x) {
  (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}

Data_m_30 <- Data_m_30 %>% mutate(across(.cols = -V, .fns = scale_to_z))
Data_m_30$V <- as.factor(Data_m_30$V)

Split Data into Train and Test Sets

# Load necessary libraries
library(caret)
library(dplyr)

# Set a seed for reproducibility
set.seed(123)

# Perform stratified sampling
trainIndex <- createDataPartition(Data_m_30$V, p = 0.8, list = FALSE)

# Split the data into training and test sets
trainData <- Data_m_30[trainIndex, ]
testData <- Data_m_30[-trainIndex, ]

# Check the distribution of target variable in training and test sets
dat4 <- table(trainData$V)
dat5 <- table(testData$V)
kable(dat4)

Var1	Freq
0	1804
1	1804

kable(dat5)

Var1	Freq
0	450
1	451

Logistic Regression Model

# Load necessary libraries
library(pROC)
library(ggplot2)

library(gt)

# Train a logistic regression model
logistic_model_30m <- glm(V ~ ., data = trainData, family = binomial)

# Summarize the model
summary(logistic_model_30m)

## 
## Call:
## glm(formula = V ~ ., family = binomial, data = trainData)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.04696    0.04473  -1.050    0.294    
## Slope        1.35156    0.06116  22.099  < 2e-16 ***
## Elevation   -0.57739    0.04683 -12.329  < 2e-16 ***
## Aspect      -0.37151    0.04437  -8.373  < 2e-16 ***
## PL          -0.25238    0.05986  -4.216 2.48e-05 ***
## PR          -0.27709    0.05670  -4.887 1.02e-06 ***
## TWI         -0.44548    0.08233  -5.411 6.26e-08 ***
## SPI          0.12954    0.10531   1.230    0.219    
## TRI          0.56899    0.04986  11.411  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5001.8  on 3607  degrees of freedom
## Residual deviance: 3304.0  on 3599  degrees of freedom
## AIC: 3322
## 
## Number of Fisher Scoring iterations: 5

exp(coef(logistic_model_30m))

## (Intercept)       Slope   Elevation      Aspect          PL          PR 
##   0.9541291   3.8634647   0.5613606   0.6896913   0.7769459   0.7579835 
##         TWI         SPI         TRI 
##   0.6405158   1.1383086   1.7664896

#summary(logistic_model_1m)

Lets see the logistic regression model results of two models side by side.

a <- summary(logistic_model_30m)
b <- summary(logistic_model_1m)
re1 <- as.data.frame(a$coefficients)
re2 <- as.data.frame(b$coefficients)
re3 <- cbind(re1,re2)
colnames(re3) <- c("Estimate_30m", "Std.Error_30m", "Z Value_30m", "Pr(>|Z|_30m)",
                   "Estimate_1m", "Std.Error_1m", "Z Value_1m", "Pr(>|Z|_1m)")


kable(re3, caption = "Comarison")

Comarison
	Estimate_30m	Std.Error_30m	Z Value_30m	Pr(>\|Z\|_30m)	Estimate_1m	Std.Error_1m	Z Value_1m	Pr(>\|Z\|_1m)
(Intercept)	-0.0469563	0.0447307	-1.049754	0.2938313	-0.0315799	0.0477302	-0.6616344	0.5082055
Slope	1.3515644	0.0611590	22.099196	0.0000000	2.2144091	0.0748610	29.5802686	0.0000000
Elevation	-0.5773918	0.0468332	-12.328697	0.0000000	-0.3160028	0.0515664	-6.1280777	0.0000000
Aspect	-0.3715111	0.0443703	-8.372967	0.0000000	-0.3825676	0.0493470	-7.7525966	0.0000000
PL	-0.2523846	0.0598575	-4.216422	0.0000248	-0.1383296	0.0726106	-1.9050876	0.0567687
PR	-0.2770936	0.0566990	-4.887102	0.0000010	-0.1709860	0.0715890	-2.3884399	0.0169201
TWI	-0.4454815	0.0823265	-5.411154	0.0000001	0.2801415	0.0530191	5.2837881	0.0000001
SPI	0.1295435	0.1053086	1.230131	0.2186479	0.0385623	0.0606510	0.6358070	0.5249023
TRI	0.5689943	0.0498644	11.410838	0.0000000	0.1560296	0.0606154	2.5740911	0.0100504

# datatable(
# data = re3,
# caption = "Comparison 30m vs 1m",
# filter = "top",
# class = 'cell-border stripe'
# )

Our main goal is prediction so we are not concerned about the coefficients. But when we will produce the landslide susceptibility map we will only use the significant casual factors. For both DEMs Slope elevation, aspect, profile curvature, TWI and TRI are the significant factors.

Accuracy Assessment for 30 meter DEM

library(caret)

pred_prob_train <- predict(logistic_model_30m , newdata = trainData, type = "response")
pred_class_train <- ifelse(pred_prob_train > 0.5, 1, 0)

# Evaluate the model on training set
confusion_matrix_train <- table(trainData$V, pred_class_train)

##knitr::kable(confusion_matrix_train)
accuracy_train <- sum(diag(confusion_matrix_train)) / sum(confusion_matrix_train)
print(confusion_matrix_train)

##    pred_class_train
##        0    1
##   0 1410  394
##   1  370 1434

 print(paste("Training Accuracy:", accuracy_train))

## [1] "Training Accuracy: 0.788248337028825"

 # Make predictions on the test set
pred_prob_test <- predict(logistic_model_30m, newdata = testData, type = "response")
pred_class_test <- ifelse(pred_prob_test > 0.5, 1, 0)

# Evaluate the model on test set
confusion_matrix_test <- table(testData$V, pred_class_test)
accuracy_test <- sum(diag(confusion_matrix_test)) / sum(confusion_matrix_test)
print(confusion_matrix_test)

##    pred_class_test
##       0   1
##   0 345 105
##   1  91 360

print(paste("Test Accuracy:", accuracy_test))

## [1] "Test Accuracy: 0.782463928967814"

accuracy_test_30m <- as.data.frame(accuracy_test)

# Calculate ROC curve and AUC
roc_curve <- roc(testData$V, pred_prob_test)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

auc_value <- auc(roc_curve)

auc_30m <- as.data.frame(auc_value)
print(paste("AUC:", auc_value))

## [1] "AUC: 0.857031288494703"

# Plot ROC curve using ggplot2
roc_data <- data.frame(
  specificity = rev(roc_curve$specificities),
  sensitivity = rev(roc_curve$sensitivities)
)

ggplot(roc_data, aes(x = specificity, y = sensitivity)) +
  geom_line(color = "green") +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Logistic Regression_30m", x = "Specificity", y = "Sensitivity") +
  theme_minimal()

Comparison of Accuracy Between 1m and 30m DEM

## Combine Four Dataframes that we created before and we will relabel
library(flextable)

## 
## Attaching package: 'flextable'

## The following objects are masked from 'package:kableExtra':
## 
##     as_image, footnote

## The following object is masked from 'package:purrr':
## 
##     compose

Acc_Comp_LR <- cbind(accuracy_test_1m, accuracy_test_30m, auc_1m, auc_30m)
colnames(Acc_Comp_LR) <- c('Accuracy_1m', 'Accuracy_30m', 'AUC_1m', 'AUC_30m')
#kable(Acc_Comp_LR, caption= "Comparison of Logistic Regression Models" )
Acc_Comp_LR%>%
  kbl(caption = "Comparison of Logistic Regression Models")%>%
  kable_styling()

Comparison of Logistic Regression Models
Accuracy_1m	Accuracy_30m	AUC_1m	AUC_30m
0.8472687	0.7824639	0.8986289	0.8570313

#flextable(Acc_Comp_LR, )

Logistic regression model shows that if variables are derived using 1m DEM it gives better accuracy than the variables derived from 30 m DEM. We will check another model here.

Random Forest Model

We will not carry out any hyper parameter tuning right now. Here we will just see the differences. Later we will work on hyper parameter tuning.

1m DEM

Data Processing

# Upload the Shape Files
library(sf)
Landslides_Points <- st_read("Landslides_Points.shp")

## Reading layer `Landslides_Points' from data source 
##   `C:\R\Shape\Shape Files\Data\Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 67 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 349401.4 ymin: 265880.2 xmax: 387724.9 ymax: 293335.3
## Projected CRS: NAD83 / North Carolina

Non_Landslides_Points <- st_read("Non_Landslides_Points.shp")

## Reading layer `Non_Landslides_Points' from data source 
##   `C:\R\Shape\Shape Files\Data\Non_Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 35 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -81.9147 ymin: 36.11658 xmax: -81.46408 ymax: 36.38499
## Geodetic CRS:  NAD83

# Convert to Data Frames
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
library(tidyverse)
library(dplyr)
landslides <- landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, V, TWI, SPI, TRI)
Non_landslides <- Non_landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, CID, TWI, SPI, TRI)

# Rename columns for consistency
landslides <- rename(landslides, Slope = Slope_12, PL = Plan, PR = Pr)
Non_landslides <- rename(Non_landslides, Slope = Slope_12, PL = Plan, PR = Pr, V = CID)

#  Combine Data frames
landslides$V <- as.character(landslides$V)
Non_landslides$V <- as.character(Non_landslides$V)
Data <- rbind(landslides, Non_landslides)
Data_m <- Data
Data_m <- Data_m %>% filter(across(-V, ~ . != -9999))

## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

#Train and Test Datasets
# Load necessary libraries
library(caret)
library(dplyr)

# Set a seed for reproducibility
set.seed(123)
Data_m$V <- as.factor(Data_m$V)

# Perform stratified sampling
trainIndex <- createDataPartition(Data_m$V, p = 0.8, list = FALSE)

# Split the data into training and test sets
trainData <- Data_m[trainIndex, ]
testData <- Data_m[-trainIndex, ]

# Train a Random Forest model
rf_model_1m <- randomForest(V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)

# Summarize the model
print(rf_model_1m)

## 
## Call:
##  randomForest(formula = V ~ ., data = trainData, ntree = 500,      mtry = 3, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 3
## 
##         OOB estimate of  error rate: 15.44%
## Confusion matrix:
##      0    1 class.error
## 0 1494  298   0.1662946
## 1  256 1541   0.1424597

# Make predictions on the test set
rf_pred_prob <- predict(rf_model_1m, newdata = testData, type = "prob")[,2]
rf_pred_class <- predict(rf_model_1m, newdata = testData, type = "response")

# Evaluate the model
confusion_matrix_rf <- confusionMatrix(as.factor(rf_pred_class), as.factor(testData$V))
print(confusion_matrix_rf)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 380  81
##          1  68 368
##                                           
##                Accuracy : 0.8339          
##                  95% CI : (0.8079, 0.8577)
##     No Information Rate : 0.5006          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6678          
##                                           
##  Mcnemar's Test P-Value : 0.3256          
##                                           
##             Sensitivity : 0.8482          
##             Specificity : 0.8196          
##          Pos Pred Value : 0.8243          
##          Neg Pred Value : 0.8440          
##              Prevalence : 0.4994          
##          Detection Rate : 0.4236          
##    Detection Prevalence : 0.5139          
##       Balanced Accuracy : 0.8339          
##                                           
##        'Positive' Class : 0               
##

# Calculate ROC curve and AUC
roc_curve_rf <- roc(testData$V, rf_pred_prob)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

auc_value_rf <- auc(roc_curve_rf)
print(paste("AUC for Random Forest:", auc_value_rf))

## [1] "AUC for Random Forest: 0.915461939230035"

auc_1m <- as.data.frame(auc_value_rf )
# Plot ROC curve using ggplot2
roc_data_rf <- data.frame(
  specificity = rev(roc_curve_rf$specificities),
  sensitivity = rev(roc_curve_rf$sensitivities)
)

ggplot(roc_data_rf, aes(x = specificity, y = sensitivity)) +
  geom_line(color = "blue") +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Random Forest", x = "Specificity", y = "Sensitivity") +
  theme_minimal()

# Get variable importance
importance_rf <- importance(rf_model_1m )
importance_df <- data.frame(Variable = rownames(importance_rf), 
                            MeanDecreaseGini = importance_rf[, "MeanDecreaseGini"])

# Plot variable importance using ggplot2
p1 <- ggplot(importance_df, aes(x = reorder(Variable, MeanDecreaseGini), y = MeanDecreaseGini)) +
  geom_bar(stat = "identity", fill = "blue") +
  coord_flip() +
  labs(title = "Variable Importance Plot_1m", x = "Variable", y = "Mean Decrease in Gini") +
  theme_minimal()
p1

30 m DEM

library(tidyverse)
library(dplyr)
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
landslides_30 <- landslides %>% select(Slope_1_14, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, V, TWI__3, SPI_3, TRI_12)
Non_landslides_30 <- Non_landslides %>% select(Slope, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, CID, TWI__3, SPI_3, TRI_12)

# Rename columns for consistency
landslides_30 <- rename(landslides_30, Slope = Slope_1_14, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12)
Non_landslides_30 <- rename(Non_landslides_30, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12, V = CID)
landslides_30$V <- as.factor(landslides_30$V)
Non_landslides_30$V <- as.factor(Non_landslides_30$V)
Data_30 <- rbind(landslides_30, Non_landslides_30)
Data_m_30 <- Data_30

Data_m_30 <- Data_m_30 %>% filter(across(-V, ~ . != -9999))

## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Scale variables to z-scores
scale_to_z <- function(x) {
  (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}

Data_m_30 <- Data_m_30 %>% mutate(across(.cols = -V, .fns = scale_to_z))
Data_m_30$V <- as.factor(Data_m_30$V)

# Load necessary libraries
library(caret)
library(dplyr)

# Set a seed for reproducibility
set.seed(123)

# Perform stratified sampling
trainIndex <- createDataPartition(Data_m_30$V, p = 0.8, list = FALSE)

# Split the data into training and test sets
trainData <- Data_m_30[trainIndex, ]
testData <- Data_m_30[-trainIndex, ]

# Train a Random Forest model
rf_model_30m <- randomForest(V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)

# Summarize the model
print(rf_model_30m)

## 
## Call:
##  randomForest(formula = V ~ ., data = trainData, ntree = 500,      mtry = 3, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 3
## 
##         OOB estimate of  error rate: 19.96%
## Confusion matrix:
##      1    0 class.error
## 1 1453  351   0.1945676
## 0  369 1435   0.2045455

# Make predictions on the test set
rf_pred_prob <- predict(rf_model_30m, newdata = testData, type = "prob")[,2]
rf_pred_class <- predict(rf_model_30m, newdata = testData, type = "response")

# Evaluate the model
confusion_matrix_rf <- confusionMatrix(as.factor(rf_pred_class), as.factor(testData$V))
print(confusion_matrix_rf)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   1   0
##          1 366  90
##          0  85 360
##                                           
##                Accuracy : 0.8058          
##                  95% CI : (0.7784, 0.8311)
##     No Information Rate : 0.5006          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6115          
##                                           
##  Mcnemar's Test P-Value : 0.7624          
##                                           
##             Sensitivity : 0.8115          
##             Specificity : 0.8000          
##          Pos Pred Value : 0.8026          
##          Neg Pred Value : 0.8090          
##              Prevalence : 0.5006          
##          Detection Rate : 0.4062          
##    Detection Prevalence : 0.5061          
##       Balanced Accuracy : 0.8058          
##                                           
##        'Positive' Class : 1               
##

# Calculate ROC curve and AUC
roc_curve_rf <- roc(testData$V, rf_pred_prob)

## Setting levels: control = 1, case = 0

## Setting direction: controls < cases

auc_value_rf <- auc(roc_curve_rf)
print(paste("AUC for Random Forest:", auc_value_rf))

## [1] "AUC for Random Forest: 0.885826558265583"

auc_30m <- as.data.frame(auc_value_rf )
# Plot ROC curve using ggplot2
roc_data_rf <- data.frame(
  specificity = rev(roc_curve_rf$specificities),
  sensitivity = rev(roc_curve_rf$sensitivities)
)

ggplot(roc_data_rf, aes(x = specificity, y = sensitivity)) +
  geom_line(color = "blue") +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Random Forest_30m", x = "Specificity", y = "Sensitivity") +
  theme_minimal()

# Get variable importance
importance_rf <- importance(rf_model_30m )
importance_df <- data.frame(Variable = rownames(importance_rf), 
                            MeanDecreaseGini = importance_rf[, "MeanDecreaseGini"])

# Plot variable importance using ggplot2
p2 <- ggplot(importance_df, aes(x = reorder(Variable, MeanDecreaseGini), y = MeanDecreaseGini)) +
  geom_bar(stat = "identity", fill = "blue") +
  coord_flip() +
  labs(title = "Variable Importance Plot_30m", x = "Variable", y = "Mean Decrease in Gini") +
  theme_minimal()
p2

Comaprison 1m vs 30m

library(patchwork)

p1 + p2

auc_comparison <- cbind(auc_1m, auc_30m)
colnames(auc_comparison) <- c("auc_1m", "auc_30m")

auc_comparison%>%
  kbl()%>%
  kable_styling()

auc_1m	auc_30m
0.9154619	0.8858266

Both the model shows that 1m DEM based model gives better accuracy.

Now we will take a little different approach. Previously we used DEM based factors. Now we will add some non-DEM based factors

Land use/Land Cover
Land use/Land Cover Change Index
Imperiousness
Rainfall
Distance from Road (DR)
Geology
Normalized Difference Vegetation Index (NDVI)
Soil

Since in previous modeling we did not include these variables we will create two sperate data frames with these variables

Data Cleaning and Preparation

# Upload the Shape Files
library(sf)
Landslides_Points <- st_read("Landslides_Points.shp")

## Reading layer `Landslides_Points' from data source 
##   `C:\R\Shape\Shape Files\Data\Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 67 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 349401.4 ymin: 265880.2 xmax: 387724.9 ymax: 293335.3
## Projected CRS: NAD83 / North Carolina

Non_Landslides_Points <- st_read("Non_Landslides_Points.shp")

## Reading layer `Non_Landslides_Points' from data source 
##   `C:\R\Shape\Shape Files\Data\Non_Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 35 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -81.9147 ymin: 36.11658 xmax: -81.46408 ymax: 36.38499
## Geodetic CRS:  NAD83

# Convert to Data Frames
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
library(tidyverse)
library(dplyr)
landslides <- landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, V, TWI, SPI, TRI, Imprevious , Change, Geology, Rain, Distance_R, NDVI, Soil, Landuse)
Non_landslides <- Non_landslides %>% select(Slope_12, Elevation, Aspect, Plan, Pr, CID, TWI, SPI, TRI, Imprevious, LULC_chang, Geology, Rain, Distance_R, NDVI, Soil, Landuse)

#  Rename columns for consistency
 landslides <- rename(landslides, Slope = Slope_12, PL = Plan, PR = Pr, Imp= Imprevious, Rainfall= Rain, DR= Distance_R )
 Non_landslides <- rename(Non_landslides, Slope = Slope_12, PL = Plan, PR = Pr, V = CID, Imp= Imprevious, Rainfall= Rain, DR= Distance_R, Change=LULC_chang)

##Combine Datasets
 landslides$V <- as.character(landslides$V)
Non_landslides$V <- as.character(Non_landslides$V)
Data <- rbind(landslides, Non_landslides)
Data_m <- Data
## Drop the Missing values
Data_m <- Data_m %>% filter(across(-V, ~ . != -9999))

## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

#write.csv(Data_m, "Data_All_1m.csv", row.names = FALSE)

#Scale the Variables
# Load necessary library
library(dplyr)

# Function to scale variables to z-scores
scale_to_z <- function(x) {
  (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}

Data_m <- Data_m %>%
  mutate(across(.cols = -c(V, Soil, Landuse, Geology, Change), .fns = scale_to_z))
Data_m$Soil <- as.factor(Data_m$Soil )
Data_m$Geology <- as.factor(Data_m$Geology )
Data_m$Landuse <- as.factor(Data_m$Landuse )
Data_m$Change <- as.factor(Data_m$Change )
write.csv(Data_m, "Data_All_1m.csv", row.names = FALSE)

Split Data into Train and Test Sets

# Load necessary libraries
library(caret)
library(dplyr)

# Set a seed for reproducibility
set.seed(123)
Data_m$V <- as.factor(Data_m$V)
# Perform stratified sampling
trainIndex <- createDataPartition(Data_m$V, p = 0.8, list = FALSE)

# Split the data into training and test sets
trainData <- Data_m[trainIndex, ]
testData <- Data_m[-trainIndex, ]

Logistic Regression Model

# Load necessary libraries
library(pROC)
library(ggplot2)

library(gt)
Data_m$V <- as.factor(Data_m$V)
# Train a logistic regression model
logistic_model_1m_all <- glm(V ~ ., data = trainData, family = binomial)

# Summarize the model
summary(logistic_model_1m_all )

## 
## Call:
## glm(formula = V ~ ., family = binomial, data = trainData)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.932e-01  3.371e-01  -1.759 0.078501 .  
## Slope        2.129e+00  8.098e-02  26.292  < 2e-16 ***
## Elevation   -8.920e-02  8.369e-02  -1.066 0.286519    
## Aspect      -3.406e-01  5.262e-02  -6.473 9.60e-11 ***
## PL          -1.137e-01  7.597e-02  -1.497 0.134510    
## PR          -1.504e-01  7.318e-02  -2.055 0.039904 *  
## TWI          3.265e-01  5.636e-02   5.793 6.91e-09 ***
## SPI          8.032e-03  3.486e-02   0.230 0.817786    
## TRI          2.174e-01  6.272e-02   3.466 0.000528 ***
## Imp         -3.075e-01  1.480e-01  -2.078 0.037736 *  
## Change3      4.280e-01  5.332e-01   0.803 0.422228    
## Change8     -2.899e-01  4.121e-01  -0.703 0.481759    
## Change10    -2.911e+00  1.045e+03  -0.003 0.997778    
## Change11     3.253e-01  2.392e-01   1.360 0.173909    
## Geology2    -9.832e-01  2.137e-01  -4.600 4.22e-06 ***
## Geology3     5.798e-01  2.181e-01   2.659 0.007839 ** 
## Geology4    -1.397e+00  3.065e-01  -4.557 5.18e-06 ***
## Geology5    -1.570e+01  4.112e+02  -0.038 0.969552    
## Geology6    -1.884e-01  1.467e-01  -1.284 0.198968    
## Geology7    -2.578e-01  2.012e-01  -1.281 0.200215    
## Rainfall    -2.884e-01  6.605e-02  -4.367 1.26e-05 ***
## DR           8.459e-02  5.625e-02   1.504 0.132640    
## NDVI         2.590e-01  6.456e-02   4.012 6.02e-05 ***
## Soil2       -7.975e-01  1.577e+00  -0.506 0.613104    
## Soil3       -3.432e-01  7.770e-01  -0.442 0.658678    
## Soil4       -1.097e-01  3.020e-01  -0.363 0.716556    
## Soil5       -3.098e+00  1.166e+00  -2.656 0.007913 ** 
## Soil6        4.570e-01  3.011e-01   1.518 0.129079    
## Soil7        6.673e-02  2.999e-01   0.223 0.823903    
## Landuse22    1.239e+00  5.853e-01   2.117 0.034281 *  
## Landuse23    2.139e+00  1.142e+00   1.873 0.061136 .  
## Landuse24    3.612e+00  1.958e+00   1.844 0.065145 .  
## Landuse31   -1.072e+01  6.076e+02  -0.018 0.985928    
## Landuse41    9.666e-01  2.016e-01   4.795 1.63e-06 ***
## Landuse42   -3.524e-01  4.419e-01  -0.798 0.425153    
## Landuse43   -1.202e-01  2.305e-01  -0.522 0.601931    
## Landuse52    2.268e-01  6.032e-01   0.376 0.706971    
## Landuse71   -7.449e-01  7.888e-01  -0.944 0.345002    
## Landuse81   -7.090e-01  2.728e-01  -2.599 0.009362 ** 
## Landuse90   -1.161e+01  1.455e+03  -0.008 0.993633    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4968.5  on 3583  degrees of freedom
## Residual deviance: 2551.2  on 3544  degrees of freedom
## AIC: 2631.2
## 
## Number of Fisher Scoring iterations: 14

exp(coef(logistic_model_1m))

## (Intercept)       Slope   Elevation      Aspect          PL          PR 
##   0.9689135   9.1559973   0.7290574   0.6821078   0.8708116   0.8428334 
##         TWI         SPI         TRI 
##   1.3233170   1.0393155   1.1688608

Accuracy Assessment

library(caret)

pred_prob_train <- predict(logistic_model_1m , newdata = trainData, type = "response")
pred_class_train <- ifelse(pred_prob_train > 0.5, 1, 0)

# Evaluate the model on training set
confusion_matrix_train <- table(trainData$V, pred_class_train)

##knitr::kable(confusion_matrix_train)
accuracy_train <- sum(diag(confusion_matrix_train)) / sum(confusion_matrix_train)
print(confusion_matrix_train)

##    pred_class_train
##        0    1
##   0 1488  300
##   1  256 1540

 print(paste("Training Accuracy:", accuracy_train))

## [1] "Training Accuracy: 0.844866071428571"

 # Make predictions on the test set
pred_prob_test <- predict(logistic_model_1m, newdata = testData, type = "response")
pred_class_test <- ifelse(pred_prob_test > 0.5, 1, 0)

# Evaluate the model on test set
confusion_matrix_test <- table(testData$V, pred_class_test)
accuracy_test <- sum(diag(confusion_matrix_test)) / sum(confusion_matrix_test)
print(confusion_matrix_test)

##    pred_class_test
##       0   1
##   0 375  71
##   1  65 384

# Make predictions on the test set
pred_prob_test <- predict(logistic_model_1m, newdata = testData, type = "response")
pred_class_test <- ifelse(pred_prob_test > 0.5, 1, 0)

# Evaluate the model on test set
confusion_matrix_test <- table(testData$V, pred_class_test)
accuracy_test <- sum(diag(confusion_matrix_test)) / sum(confusion_matrix_test)
print(confusion_matrix_test)

##    pred_class_test
##       0   1
##   0 375  71
##   1  65 384

print(paste("Test Accuracy:", accuracy_test))

## [1] "Test Accuracy: 0.84804469273743"

accuracy_test_1m_all <- as.data.frame(accuracy_test)

# Calculate ROC curve and AUC
roc_curve <- roc(testData$V, pred_prob_test)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

auc_value <- auc(roc_curve)
auc_1m <- auc_value 
auc_1m_All <- as.data.frame(auc_value)
print(paste("AUC:", auc_value))

## [1] "AUC: 0.899138094619833"

SRTM (30m)

# Upload the Shape Files
library(sf)
Landslides_Points <- st_read("Landslides_Points.shp")

## Reading layer `Landslides_Points' from data source 
##   `C:\R\Shape\Shape Files\Data\Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 67 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 349401.4 ymin: 265880.2 xmax: 387724.9 ymax: 293335.3
## Projected CRS: NAD83 / North Carolina

Non_Landslides_Points <- st_read("Non_Landslides_Points.shp")

## Reading layer `Non_Landslides_Points' from data source 
##   `C:\R\Shape\Shape Files\Data\Non_Landslides_Points.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2257 features and 35 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -81.9147 ymin: 36.11658 xmax: -81.46408 ymax: 36.38499
## Geodetic CRS:  NAD83

library(tidyverse)
library(dplyr)
landslides <- as.data.frame(Landslides_Points)
Non_landslides <- as.data.frame(Non_Landslides_Points)
landslides_30 <- landslides %>% select(Slope_1_14, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, V, TWI__3, SPI_3, TRI_12, Imprevious , Change, Geology, Rain, Distance_R, NDVI, Soil, Landuse)
Non_landslides_30 <- Non_landslides %>% select(Slope, DEM_Projec, Aspect_DEM, Plan_curva, Profile_cu, CID, TWI__3, SPI_3, TRI_12, Imprevious, LULC_chang, Geology, Rain, Distance_R, NDVI, Soil, Landuse)

# Rename columns for consistency
landslides_30 <- rename(landslides_30, Slope = Slope_1_14, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12, Imp= Imprevious, Rainfall= Rain, DR= Distance_R)
Non_landslides_30 <- rename(Non_landslides_30, PL = Plan_curva, PR = Profile_cu, Elevation= DEM_Projec, Aspect=Aspect_DEM, TWI= TWI__3, SPI= SPI_3, TRI=TRI_12, V = CID, Imp= Imprevious, Rainfall= Rain, DR= Distance_R, Change=LULC_chang)


landslides_30$V <- as.character(landslides_30$V)
Non_landslides_30$V <- as.character(Non_landslides_30$V)
Data_30 <- rbind(landslides_30, Non_landslides_30)
Data_m_30_All <- Data_30

Data_m_30_All <- Data_m_30_All %>% filter(across(-V, ~ . != -9999))

## Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
## ℹ Please use `if_any()` or `if_all()` instead.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Scale variables to z-scores
scale_to_z <- function(x) {
  (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}

Data_m_30_All <- Data_m_30_All %>% mutate(across(.cols = -V, .fns = scale_to_z))
Data_m_30_All$V <- as.factor(Data_m_30_All$V)

Data_m_30_All$Soil <- as.factor(Data_m_30_All$Soil )
Data_m_30_All$Geology <- as.factor(Data_m_30_All$Geology )
Data_m_30_All$Landuse <- as.factor(Data_m_30_All$Landuse )
Data_m_30_All$Change <- as.factor(Data_m_30_All$Change )
write.csv(Data_m_30_All, "Data_All_30m.csv", row.names = FALSE)
# Load necessary libraries
library(caret)
library(dplyr)

# Set a seed for reproducibility
set.seed(123)

# Perform stratified sampling
trainIndex <- createDataPartition(Data_m_30_All$V, p = 0.8, list = FALSE)

# Split the data into training and test sets
trainData <- Data_m_30_All[trainIndex, ]
testData <- Data_m_30_All[-trainIndex, ]

Logistic Regression Model (30m)

# Load necessary libraries
library(pROC)
library(ggplot2)

library(gt)

# Train a logistic regression model
logistic_model_30m_All <- glm(V ~ ., data = trainData, family = binomial)

# Summarize the model
summary(logistic_model_30m_All)

## 
## Call:
## glm(formula = V ~ ., family = binomial, data = trainData)
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -8.78817 2399.54468  -0.004  0.99708    
## Slope                          1.32015    0.06714  19.663  < 2e-16 ***
## Elevation                     -0.35077    0.07901  -4.439 9.03e-06 ***
## Aspect                        -0.38827    0.04792  -8.102 5.40e-16 ***
## PL                            -0.17453    0.06316  -2.763  0.00572 ** 
## PR                            -0.23243    0.05904  -3.937 8.25e-05 ***
## TWI                           -0.35262    0.08759  -4.026 5.68e-05 ***
## SPI                            0.05814    0.03468   1.676  0.09365 .  
## TRI                            0.43022    0.05471   7.864 3.73e-15 ***
## Imp                           -0.15058    0.12003  -1.255  0.20965    
## Change0.132491109316288       -2.39375 3393.46863  -0.001  0.99944    
## Change0.547077872218507        0.77116    0.46275   1.666  0.09562 .  
## Change2.6200116867296         -0.21569    0.36976  -0.583  0.55968    
## Change3.44918521253404        -1.42871 2118.78452  -0.001  0.99946    
## Change3.86377197543626         0.25914    0.21203   1.222  0.22165    
## Geology-0.572883768189359     -1.59029    0.19637  -8.099 5.56e-16 ***
## Geology-0.141785862467707     -0.16160    0.19010  -0.850  0.39528    
## Geology0.289312043253945      -1.69809    0.30485  -5.570 2.54e-08 ***
## Geology0.720409948975597     -17.63793  797.19173  -0.022  0.98235    
## Geology1.15150785469725       -0.63433    0.13410  -4.730 2.24e-06 ***
## Geology1.5826057604189        -0.14733    0.18648  -0.790  0.42950    
## Rainfall                      -0.31314    0.06118  -5.119 3.08e-07 ***
## DR                             0.15756    0.05118   3.079  0.00208 ** 
## NDVI                           0.18499    0.06035   3.065  0.00218 ** 
## Soil-2.17319987284062         -2.74623    1.73956  -1.579  0.11441    
## Soil-1.54568057601141         -1.12567    0.75167  -1.498  0.13424    
## Soil-0.918161279182197        -0.39696    0.27291  -1.455  0.14580    
## Soil-0.290641982352987       -16.05199  287.60091  -0.056  0.95549    
## Soil0.336877314476222          0.04945    0.27072   0.183  0.85507    
## Soil0.964396611305432         -0.34253    0.27022  -1.268  0.20493    
## Landuse-1.48123028866035       8.88355 2399.54467   0.004  0.99705    
## Landuse-1.41404193684783       9.43539 2399.54473   0.004  0.99686    
## Landuse-1.34685358503532      10.41828 2399.54486   0.004  0.99654    
## Landuse-1.2796652332228       10.61164 2399.54525   0.004  0.99647    
## Landuse-0.809346770535189     -4.42004 2743.17837  -0.002  0.99871    
## Landuse-0.137463252410028      9.73041 2399.54467   0.004  0.99676    
## Landuse-0.0702749005975118     8.75122 2399.54471   0.004  0.99709    
## Landuse-0.00308654878499566    8.83675 2399.54467   0.004  0.99706    
## Landuse0.60160861752765        9.36249 2399.54471   0.004  0.99689    
## Landuse1.87818730196546        7.95211 2399.54480   0.003  0.99736    
## Landuse2.55007082009062        8.44723 2399.54468   0.004  0.99719    
## Landuse3.15476598640326       -2.44852 3393.46863  -0.001  0.99942    
## Landuse3.49070774546584       -2.94774 3393.46869  -0.001  0.99931    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4997.6  on 3604  degrees of freedom
## Residual deviance: 2982.0  on 3562  degrees of freedom
## AIC: 3068
## 
## Number of Fisher Scoring iterations: 15

exp(coef(logistic_model_30m))

## (Intercept)       Slope   Elevation      Aspect          PL          PR 
##   0.9541291   3.8634647   0.5613606   0.6896913   0.7769459   0.7579835 
##         TWI         SPI         TRI 
##   0.6405158   1.1383086   1.7664896

Accuracy Assessment 30 m

library(caret)

pred_prob_train <- predict(logistic_model_30m_All , newdata = trainData, type = "response")
pred_class_train <- ifelse(pred_prob_train > 0.5, 1, 0)

# Evaluate the model on training set
confusion_matrix_train <- table(trainData$V, pred_class_train)

##knitr::kable(confusion_matrix_train)
accuracy_train <- sum(diag(confusion_matrix_train)) / sum(confusion_matrix_train)
print(confusion_matrix_train)

##    pred_class_train
##        0    1
##   0 1451  350
##   1  311 1493

print(paste("Training Accuracy:", accuracy_train))

## [1] "Training Accuracy: 0.816643550624133"

 # Make predictions on the test set
pred_prob_test <- predict(logistic_model_30m_All, newdata = testData, type = "response")
pred_class_test <- ifelse(pred_prob_test > 0.5, 1, 0)

# Evaluate the model on test set
confusion_matrix_test <- table(testData$V, pred_class_test)
accuracy_test <- sum(diag(confusion_matrix_test)) / sum(confusion_matrix_test)
print(confusion_matrix_test)

##    pred_class_test
##       0   1
##   0 366  84
##   1  89 362

print(paste("Test Accuracy:", accuracy_test))

## [1] "Test Accuracy: 0.807991120976693"

accuracy_test_30m_All <- as.data.frame(accuracy_test)

# Calculate ROC curve and AUC
roc_curve <- roc(testData$V, pred_prob_test)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

auc_value <- auc(roc_curve)

auc_30m_All <- as.data.frame(auc_value)
print(paste("AUC:", auc_value))

## [1] "AUC: 0.897161862527716"

Random Forest Model

1m DEM With All Data

Data_m <- read.csv("Data_All_1m.csv")
Data_m$V <- as.factor(Data_m$V )
Data_m$Soil <- as.factor(Data_m$Soil )
Data_m$Geology <- as.factor(Data_m$Geology )
Data_m$Landuse <- as.factor(Data_m$Landuse )
Data_m$Change <- as.factor(Data_m$Change )

#Train and Test Datasets
# Load necessary libraries
library(caret)
library(dplyr)

# Set a seed for reproducibility
set.seed(123)
Data_m$V <- as.factor(Data_m$V)

# Perform stratified sampling
trainIndex <- createDataPartition(Data_m$V, p = 0.8, list = FALSE)

# Split the data into training and test sets
trainData <- Data_m[trainIndex, ]
testData <- Data_m[-trainIndex, ]

# Train a Random Forest model
rf_model_1m_All <- randomForest(V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)

# Summarize the model
print(rf_model_1m_All)

## 
## Call:
##  randomForest(formula = V ~ ., data = trainData, ntree = 500,      mtry = 3, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 3
## 
##         OOB estimate of  error rate: 12.83%
## Confusion matrix:
##      0    1 class.error
## 0 1550  238   0.1331096
## 1  222 1574   0.1236080

# Make predictions on the test set
rf_pred_prob <- predict(rf_model_1m_All, newdata = testData, type = "prob")[,2]
rf_pred_class <- predict(rf_model_1m_All, newdata = testData, type = "response")

# Evaluate the model
confusion_matrix_rf <- confusionMatrix(as.factor(rf_pred_class), as.factor(testData$V))
print(confusion_matrix_rf)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 393  54
##          1  53 395
##                                          
##                Accuracy : 0.8804         
##                  95% CI : (0.8574, 0.901)
##     No Information Rate : 0.5017         
##     P-Value [Acc > NIR] : <2e-16         
##                                          
##                   Kappa : 0.7609         
##                                          
##  Mcnemar's Test P-Value : 1              
##                                          
##             Sensitivity : 0.8812         
##             Specificity : 0.8797         
##          Pos Pred Value : 0.8792         
##          Neg Pred Value : 0.8817         
##              Prevalence : 0.4983         
##          Detection Rate : 0.4391         
##    Detection Prevalence : 0.4994         
##       Balanced Accuracy : 0.8804         
##                                          
##        'Positive' Class : 0              
##

# Calculate ROC curve and AUC
roc_curve_rf <- roc(testData$V, rf_pred_prob)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

auc_value_rf <- auc(roc_curve_rf)
print(paste("AUC for Random Forest:", auc_value_rf))

## [1] "AUC for Random Forest: 0.945938657904461"

auc_1m_All <- as.data.frame(auc_value_rf )
# Plot ROC curve using ggplot2
roc_data_rf <- data.frame(
  specificity = rev(roc_curve_rf$specificities),
  sensitivity = rev(roc_curve_rf$sensitivities)
)

ggplot(roc_data_rf, aes(x = specificity, y = sensitivity)) +
  geom_line(color = "blue") +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Random Forest", x = "Specificity", y = "Sensitivity") +
  theme_minimal()

# Get variable importance
importance_rf <- importance(rf_model_1m_All )
importance_df <- data.frame(Variable = rownames(importance_rf), 
                            MeanDecreaseGini = importance_rf[, "MeanDecreaseGini"])

# Plot variable importance using ggplot2
p3 <- ggplot(importance_df, aes(x = reorder(Variable, MeanDecreaseGini), y = MeanDecreaseGini)) +
  geom_bar(stat = "identity", fill = "blue") +
  coord_flip() +
  labs(title = "Variable Importance Plot_1m", x = "Variable", y = "Mean Decrease in Gini") +
  theme_minimal()
p3

30 m DEM

Data_m_30 <- read.csv("Data_All_30m")

Data_m_30$V <- as.factor(Data_m_30$V )
Data_m_30$Soil <- as.factor(Data_m_30$Soil )
Data_m_30$Geology <- as.factor(Data_m_30$Geology )
Data_m_30$Landuse <- as.factor(Data_m_30$Landuse )
Data_m_30Change <- as.factor(Data_m_30$Change )


# Load necessary libraries
library(caret)
library(dplyr)

# Set a seed for reproducibility
set.seed(123)

# Perform stratified sampling
trainIndex <- createDataPartition(Data_m_30$V, p = 0.8, list = FALSE)

# Split the data into training and test sets
trainData <- Data_m_30[trainIndex, ]
testData <- Data_m_30[-trainIndex, ]

# Train a Random Forest model
rf_model_30m_All <- randomForest(V ~ ., data = trainData, ntree = 500, mtry = 3, importance = TRUE)

# Summarize the model
print(rf_model_30m_All )

## 
## Call:
##  randomForest(formula = V ~ ., data = trainData, ntree = 500,      mtry = 3, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 3
## 
##         OOB estimate of  error rate: 14.51%
## Confusion matrix:
##      0    1 class.error
## 0 1529  272   0.1510272
## 1  251 1553   0.1391353

# Make predictions on the test set
rf_pred_prob <- predict(rf_model_30m_All , newdata = testData, type = "prob")[,2]
rf_pred_class <- predict(rf_model_30m_All , newdata = testData, type = "response")

# Evaluate the model
confusion_matrix_rf <- confusionMatrix(as.factor(rf_pred_class), as.factor(testData$V))
print(confusion_matrix_rf)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 393  59
##          1  57 392
##                                           
##                Accuracy : 0.8713          
##                  95% CI : (0.8476, 0.8924)
##     No Information Rate : 0.5006          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.7425          
##                                           
##  Mcnemar's Test P-Value : 0.926           
##                                           
##             Sensitivity : 0.8733          
##             Specificity : 0.8692          
##          Pos Pred Value : 0.8695          
##          Neg Pred Value : 0.8731          
##              Prevalence : 0.4994          
##          Detection Rate : 0.4362          
##    Detection Prevalence : 0.5017          
##       Balanced Accuracy : 0.8713          
##                                           
##        'Positive' Class : 0               
##

# Calculate ROC curve and AUC
roc_curve_rf <- roc(testData$V, rf_pred_prob)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

auc_value_rf <- auc(roc_curve_rf)
print(paste("AUC for Random Forest:", auc_value_rf))

## [1] "AUC for Random Forest: 0.943254496181325"

auc_30m_All <- as.data.frame(auc_value_rf )
# Plot ROC curve using ggplot2
roc_data_rf <- data.frame(
  specificity = rev(roc_curve_rf$specificities),
  sensitivity = rev(roc_curve_rf$sensitivities)
)

ggplot(roc_data_rf, aes(x = specificity, y = sensitivity)) +
  geom_line(color = "blue") +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  labs(title = "ROC Curve for Random Forest_30m", x = "Specificity", y = "Sensitivity") +
  theme_minimal()

# Get variable importance
importance_rf <- importance(rf_model_30m_All )
importance_df <- data.frame(Variable = rownames(importance_rf), 
                            MeanDecreaseGini = importance_rf[, "MeanDecreaseGini"])

# Plot variable importance using ggplot2
p4 <- ggplot(importance_df, aes(x = reorder(Variable, MeanDecreaseGini), y = MeanDecreaseGini)) +
  geom_bar(stat = "identity", fill = "blue") +
  coord_flip() +
  labs(title = "Variable Importance Plot_30m", x = "Variable", y = "Mean Decrease in Gini") +
  theme_minimal()
p4

library(patchwork)

p1 + p2+p3+p4

Lets compare all models.

It is obvious that random forest model gives better accuracy than the logistic regression model. When DEM based factors are complimented with non-DEM based factors the accuracy increases for both the model. We used 8 DEM based factors and 8 non-DEM based factors. In all models 1m DEM based factors give the highest accuracy. On interesting findings is that for SRTM based factors if we compliment it with non_DEM based factors the accuracy improved by 6.1%.
	LR_1m_DEM Only	LR_1m_All Factors	LR_30m _DEM Only	LR_30m All Factors	RF_1m_DEM Only	RF_1m_All Factors	RF_30m DEM Only	RF_30m All Factors
Accuracy	0.845	0.833	0.783	0.808	0.834	0.880	0.806	0.877
Area Under the Curve (AUC)	0.899	0.915	0.857	0.897	0.915	0.946	0.888	0.943

Landslide Susceptibility

Yasin Wahid Rabby

2024-06-10

Goal

Variables or Causal Factors

Install and Load Required Packages

Upload and Convert Shape Files

Check the Dataframes

Select Required Coloumns for 1m DEM

Combine Data Frames and Clean Data

Combine the Dataframes

Drop the missing values labeled as -9999

Scale the Variables to Z-Score

Split Data into Train and Test Sets

Accuracy Assessment for 1 meter DEM

Slelect Required Coloumns for SRTM (30m) DEM

Combine the Dataframes and Clean Data

Drop the Missing Values Labled as -9999

Store the Clean Dataset for 1m DEM and 30m DEM as CSV Files

Scale the variables to Z Score

Split Data into Train and Test Sets

Logistic Regression Model

Accuracy Assessment for 30 meter DEM

Comparison of Accuracy Between 1m and 30m DEM

Random Forest Model

1m DEM

Data Processing

30 m DEM

Comaprison 1m vs 30m

Data Cleaning and Preparation

Split Data into Train and Test Sets

Logistic Regression Model

Accuracy Assessment

SRTM (30m)

Logistic Regression Model (30m)

Accuracy Assessment 30 m

Random Forest Model

1m DEM With All Data

30 m DEM