We at CivicDataLab are engaged in a data experiment for preparing an intelligent data model for flood response and management. As part of the experiment, we intent to build a tool kit for inter-operability of Assam’s Flood Reporting and Information Management System (FRIMS). FRIMS is a digital real-time flood reporting system launched by Assam in 2021. Assam is the first Indian state to have a real-time digital flood reporting system. It is jointly developed by Assam State Disaster Management Agency (ASDMA) and United Nations Children’s Fund (UNICEF). The detailed documentation of the impacts of flood is necessary not only for measuring the impacts of the disaster but also for formulating risk reduction interventions.
Here we use the flood risk calculated using flood inundation maps, socio-economic variables and damages as recorded in FRIMS along with government preparedness and response to floods to cluster the districts across different years from 2016 to 2021.
For the work, we have to load the following libraries:
packages <- c("factoextra", "stats", "dplyr", "tidyverse", "NbClust")
A consolidated table having the values of flood, damages, socio-economic variables, government response (preparedness and response) is prepared using the steps recorded in the notes of the following three slides:
risk_response <- read.csv("~/r-codes/risk_district_5_2016_2022_consolidated_kmean_2.csv")
head(risk_response, n = 3)
## id district year district_yr infr_damages population_livestock_damages flood
## 1 1 cachar 2016 cachar2016 0.01964133 0.005352578 2.29
## 2 2 darrang 2016 darrang2016 1.35148940 1.919197455 2.03
## 3 3 morigaon 2016 morigaon2016 1.36069132 2.884362257 3.03
## demo infr risk govtresponse_prep govtresponse_monsoon
## 1 1 1 0.05723605 NA NA
## 2 1 1 6.63949431 NA NA
## 3 1 1 12.86251234 NA NA
## impact_after.considering.govt_prep
## 1 NA
## 2 NA
## 3 NA
## impact_after.considering.govt_prep.and.govt_response
## 1 NA
## 2 NA
## 3 NA
#removing all NA rows
risk_response_2018_2021 <- risk_response[-c(1:10),]
head(risk_response_2018_2021)
## id district year district_yr infr_damages population_livestock_damages
## 11 11 cachar 2018 cachar2018 0.2808788 1.622470978
## 12 12 darrang 2018 darrang2018 0.8412435 0.349436740
## 13 13 morigaon 2018 morigaon2018 0.4942782 1.000000000
## 14 14 nalbari 2018 nalbari2018 0.4459268 0.227189907
## 15 15 tinsukia 2018 tinsukia2018 0.0000000 0.154576246
## 16 16 cachar 2019 cachar2019 0.1201699 0.009796465
## flood demo infr risk govtresponse_prep govtresponse_monsoon
## 11 3.683811 1.2130681 1.533431 8.84 0.05500084 1.58988958
## 12 1.353333 1.1154202 1.140474 1.83 0.02963139 0.05586386
## 13 2.881861 0.9352285 1.540477 4.89 0.02454346 0.01697237
## 14 1.503980 1.1660283 1.611596 1.48 0.04198684 0.07089531
## 15 1.368241 1.2529894 1.576436 0.27 0.00000000 0.01964286
## 16 2.178046 0.6790244 1.478639 0.40 0.08097631 0.04320325
## impact_after.considering.govt_prep
## 11 1419.845414
## 12 112.512309
## 13 974.083400
## 14 52.116564
## 15 702.272290
## 16 1.990741
## impact_after.considering.govt_prep.and.govt_response
## 11 47.475920
## 12 38.995101
## 13 575.861731
## 14 19.384906
## 15 3.557096
## 16 1.298143
#number of columns in the final sheet
ncol(risk_response_2018_2021)
## [1] 14
#scaling the numeric values
risk_response_fin <- risk_response_2018_2021[,5:(ncol(risk_response_2018_2021)-2)]
df_risk <- scale(risk_response_fin)
df_risk <- df_risk[,-6]
head(df_risk, n = 3)
## infr_damages population_livestock_damages flood demo infr
## 11 -0.5789851 0.77462633 1.3163452 0.8153172 0.6569625
## 12 0.1278465 -0.67070146 -0.9168659 0.4829864 -1.5089778
## 13 -0.3098079 0.06790959 0.5478658 -0.1302704 0.6958030
## govtresponse_prep govtresponse_monsoon
## 11 -0.06226224 4.1832025
## 12 -0.41954297 -0.2645589
## 13 -0.49119685 -0.3773210
#converting to a data frame
df_risk <- as.data.frame(df_risk)
Next, the optimal number of clusters are estimated.
For more details : https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/#:~:text=The%20optimal%20number%20of%20clusters%20can%20be%20defined%20as%20follow,sum%20of%20square%20(wss).
#estimating the optimal number of clusters in the data using wss method
fviz_nbclust(df_risk, kmeans, method = "wss")
#estimating the optimal number of clusters in the data using silhouette method
fviz_nbclust(df_risk, kmeans, method = "silhouette")
#estimating the optimal number of clusters in the data using gap statistic method
set.seed(123)
fviz_nbclust(df_risk, kmeans, nstart = 25, method = "gap_stat", nboot = 50)+
labs(subtitle = "Gap statistic method")
The optimal number of clusters is found to be 2.
For more details on k-mean clustering : https://www.datanovia.com/en/lessons/k-means-clustering-in-r-algorith-and-practical-examples/#:~:text=K%2Dmeans%20clustering%20(MacQueen%201967,pre%2Dspecified%20by%20the%20analyst.
# Compute k-means with k = 2
set.seed(123)
km.res <- kmeans(df_risk, 2, iter.max = 10, nstart = 50)
#print the results
print(km.res)
## K-means clustering with 2 clusters of sizes 14, 6
##
## Cluster means:
## infr_damages population_livestock_damages flood demo infr
## 1 -0.4527263 -0.3872362 -0.2993360 0.2057772 0.2840139
## 2 1.0563614 0.9035510 0.6984507 -0.4801467 -0.6626992
## govtresponse_prep govtresponse_monsoon
## 1 0.1168927 0.1263735
## 2 -0.2727497 -0.2948715
##
## Clustering vector:
## 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
## 1 1 1 1 1 1 2 2 1 1 1 2 2 1 1 1 2 2 1 1
##
## Within cluster sum of squares by cluster:
## [1] 62.87873 42.25393
## (between_SS / total_SS = 21.0 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
#size of each cluster
km.res$size
## [1] 14 6
#means of each variable by clusters using original data
print(aggregate(risk_response_fin, by=list(cluster=km.res$cluster), mean))
## cluster infr_damages population_livestock_damages flood demo infr
## 1 1 0.3809747 0.5991109 1.997759 1.0339685 1.465768
## 2 2 1.5773551 1.7360269 3.039004 0.8324252 1.294010
## risk govtresponse_prep govtresponse_monsoon
## 1 2.607857 0.06772210 0.19069580
## 2 10.943333 0.04005475 0.04540909
km.res$centers
## infr_damages population_livestock_damages flood demo infr
## 1 -0.4527263 -0.3872362 -0.2993360 0.2057772 0.2840139
## 2 1.0563614 0.9035510 0.6984507 -0.4801467 -0.6626992
## govtresponse_prep govtresponse_monsoon
## 1 0.1168927 0.1263735
## 2 -0.2727497 -0.2948715
#adding point classifications to original data
dd <- cbind(risk_response_2018_2021, cluster = km.res$cluster)
#cluster in which a district falls in each year
dd_final <- dd %>% select(district_yr, cluster)
dd_final[with(dd_final, order(cluster)),]
## district_yr cluster
## 11 cachar2018 1
## 12 darrang2018 1
## 13 morigaon2018 1
## 14 nalbari2018 1
## 15 tinsukia2018 1
## 16 cachar2019 1
## 19 nalbari2019 1
## 20 tinsukia2019 1
## 21 cachar2020 1
## 24 nalbari2020 1
## 25 tinsukia2020 1
## 26 cachar2021 1
## 29 nalbari2021 1
## 30 tinsukia2021 1
## 17 darrang2019 2
## 18 morigaon2019 2
## 22 darrang2020 2
## 23 morigaon2020 2
## 27 darrang2021 2
## 28 morigaon2021 2
Here we find that cluster 2 has districts that have more flood risk (even though vulnerability to floods is less) as flood proneness is high but the government preparedness to face floods are relatively lower.