clustering_risk_govtreponse

Method

A consolidated table having the values of flood, damages, socio-economic variables, government response (preparedness and response) is prepared using the steps recorded in the notes of the following three slides:

https://docs.google.com/presentation/d/1hpySuuHM1ntmDY3OiosnovcrqDh7uANsNh4tKt35I0I/edit#slide=id.g168debe8b1b_1_10

https://docs.google.com/presentation/d/1hpySuuHM1ntmDY3OiosnovcrqDh7uANsNh4tKt35I0I/edit#slide=id.g168debe8b1b_1_15

https://docs.google.com/presentation/d/1hpySuuHM1ntmDY3OiosnovcrqDh7uANsNh4tKt35I0I/edit#slide=id.g168debe8b1b_1_20

risk_response <- read.csv("~/r-codes/risk_district_5_2016_2022_consolidated_kmean_2.csv")
head(risk_response, n = 3)

##   id district year  district_yr infr_damages population_livestock_damages flood
## 1  1   cachar 2016   cachar2016   0.01964133                  0.005352578  2.29
## 2  2  darrang 2016  darrang2016   1.35148940                  1.919197455  2.03
## 3  3 morigaon 2016 morigaon2016   1.36069132                  2.884362257  3.03
##   demo infr        risk govtresponse_prep govtresponse_monsoon
## 1    1    1  0.05723605                NA                   NA
## 2    1    1  6.63949431                NA                   NA
## 3    1    1 12.86251234                NA                   NA
##   impact_after.considering.govt_prep
## 1                                 NA
## 2                                 NA
## 3                                 NA
##   impact_after.considering.govt_prep.and.govt_response
## 1                                                   NA
## 2                                                   NA
## 3                                                   NA

#removing all NA rows
risk_response_2018_2021 <- risk_response[-c(1:10),]
head(risk_response_2018_2021)

##    id district year  district_yr infr_damages population_livestock_damages
## 11 11   cachar 2018   cachar2018    0.2808788                  1.622470978
## 12 12  darrang 2018  darrang2018    0.8412435                  0.349436740
## 13 13 morigaon 2018 morigaon2018    0.4942782                  1.000000000
## 14 14  nalbari 2018  nalbari2018    0.4459268                  0.227189907
## 15 15 tinsukia 2018 tinsukia2018    0.0000000                  0.154576246
## 16 16   cachar 2019   cachar2019    0.1201699                  0.009796465
##       flood      demo     infr risk govtresponse_prep govtresponse_monsoon
## 11 3.683811 1.2130681 1.533431 8.84        0.05500084           1.58988958
## 12 1.353333 1.1154202 1.140474 1.83        0.02963139           0.05586386
## 13 2.881861 0.9352285 1.540477 4.89        0.02454346           0.01697237
## 14 1.503980 1.1660283 1.611596 1.48        0.04198684           0.07089531
## 15 1.368241 1.2529894 1.576436 0.27        0.00000000           0.01964286
## 16 2.178046 0.6790244 1.478639 0.40        0.08097631           0.04320325
##    impact_after.considering.govt_prep
## 11                        1419.845414
## 12                         112.512309
## 13                         974.083400
## 14                          52.116564
## 15                         702.272290
## 16                           1.990741
##    impact_after.considering.govt_prep.and.govt_response
## 11                                            47.475920
## 12                                            38.995101
## 13                                           575.861731
## 14                                            19.384906
## 15                                             3.557096
## 16                                             1.298143

#number of columns in the final sheet
ncol(risk_response_2018_2021)

## [1] 14

#scaling the numeric values
risk_response_fin <- risk_response_2018_2021[,5:(ncol(risk_response_2018_2021)-2)]

df_risk <- scale(risk_response_fin)
df_risk <- df_risk[,-6]
head(df_risk, n = 3)

##    infr_damages population_livestock_damages      flood       demo       infr
## 11   -0.5789851                   0.77462633  1.3163452  0.8153172  0.6569625
## 12    0.1278465                  -0.67070146 -0.9168659  0.4829864 -1.5089778
## 13   -0.3098079                   0.06790959  0.5478658 -0.1302704  0.6958030
##    govtresponse_prep govtresponse_monsoon
## 11       -0.06226224            4.1832025
## 12       -0.41954297           -0.2645589
## 13       -0.49119685           -0.3773210

#converting to a data frame
df_risk <- as.data.frame(df_risk)

Next, the optimal number of clusters are estimated.

For more details : https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/#:~:text=The%20optimal%20number%20of%20clusters%20can%20be%20defined%20as%20follow,sum%20of%20square%20(wss).

#estimating the optimal number of clusters in the data using wss method
fviz_nbclust(df_risk, kmeans, method = "wss")

#estimating the optimal number of clusters in the data using silhouette method
fviz_nbclust(df_risk, kmeans, method = "silhouette")

#estimating the optimal number of clusters in the data using gap statistic method
set.seed(123)
fviz_nbclust(df_risk, kmeans, nstart = 25,  method = "gap_stat", nboot = 50)+
  labs(subtitle = "Gap statistic method")

The optimal number of clusters is found to be 2.

For more details on k-mean clustering : https://www.datanovia.com/en/lessons/k-means-clustering-in-r-algorith-and-practical-examples/#:~:text=K%2Dmeans%20clustering%20(MacQueen%201967,pre%2Dspecified%20by%20the%20analyst.

# Compute k-means with k = 2
set.seed(123)
km.res <- kmeans(df_risk, 2, iter.max = 10, nstart = 50)
#print the results
print(km.res)

## K-means clustering with 2 clusters of sizes 14, 6
## 
## Cluster means:
##   infr_damages population_livestock_damages      flood       demo       infr
## 1   -0.4527263                   -0.3872362 -0.2993360  0.2057772  0.2840139
## 2    1.0563614                    0.9035510  0.6984507 -0.4801467 -0.6626992
##   govtresponse_prep govtresponse_monsoon
## 1         0.1168927            0.1263735
## 2        -0.2727497           -0.2948715
## 
## Clustering vector:
## 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
##  1  1  1  1  1  1  2  2  1  1  1  2  2  1  1  1  2  2  1  1 
## 
## Within cluster sum of squares by cluster:
## [1] 62.87873 42.25393
##  (between_SS / total_SS =  21.0 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

#size of each cluster
km.res$size

## [1] 14  6

#means of each variable by clusters using original data
print(aggregate(risk_response_fin, by=list(cluster=km.res$cluster), mean))

##   cluster infr_damages population_livestock_damages    flood      demo     infr
## 1       1    0.3809747                    0.5991109 1.997759 1.0339685 1.465768
## 2       2    1.5773551                    1.7360269 3.039004 0.8324252 1.294010
##        risk govtresponse_prep govtresponse_monsoon
## 1  2.607857        0.06772210           0.19069580
## 2 10.943333        0.04005475           0.04540909

km.res$centers

##   infr_damages population_livestock_damages      flood       demo       infr
## 1   -0.4527263                   -0.3872362 -0.2993360  0.2057772  0.2840139
## 2    1.0563614                    0.9035510  0.6984507 -0.4801467 -0.6626992
##   govtresponse_prep govtresponse_monsoon
## 1         0.1168927            0.1263735
## 2        -0.2727497           -0.2948715

#adding point classifications to original data
dd <- cbind(risk_response_2018_2021, cluster = km.res$cluster)
#cluster in which a district falls in each year 
dd_final <- dd %>% select(district_yr, cluster)
dd_final[with(dd_final, order(cluster)),]

##     district_yr cluster
## 11   cachar2018       1
## 12  darrang2018       1
## 13 morigaon2018       1
## 14  nalbari2018       1
## 15 tinsukia2018       1
## 16   cachar2019       1
## 19  nalbari2019       1
## 20 tinsukia2019       1
## 21   cachar2020       1
## 24  nalbari2020       1
## 25 tinsukia2020       1
## 26   cachar2021       1
## 29  nalbari2021       1
## 30 tinsukia2021       1
## 17  darrang2019       2
## 18 morigaon2019       2
## 22  darrang2020       2
## 23 morigaon2020       2
## 27  darrang2021       2
## 28 morigaon2021       2

Here we find that cluster 2 has districts that have more flood risk (even though vulnerability to floods is less) as flood proneness is high but the government preparedness to face floods are relatively lower.

clustering_risk_govtreponse

jeeno_cdl

2022-10-18

Introduction

Method