We at CivicDataLab are engaged in a data experiment for preparing an intelligent data model for flood response and management. As part of the experiment, we have to assign weights to multiple socio-economic variables that represent the physical infrastructure by assessing the relationship of the selected variable with the flood-proneness of the region. We aim to use a data-driven approach, borrowing concepts from Bayesian statistics, to calculate the weight of different classes of a variable to the flood-proneness of the region.
Bayesian statistics calculates probability based on conditions, hence also known as conditional probability. Here, we use bayesian statistics to estimate the probability at which different categories of a given socio-economic variable experiences flooding in any sub-district. For a given variable, it is the proportion of the affect flood has on a category to the affect of the entire category in a study area. The proportion is recalculated for different years of floods and averaged to get a probability, which is taken as the weight of each category of a given variable.
Here, we calculate the yearly probability as VarWt = (E/F)/(M/L), E is the number of pixels with flood for each factor, F is the number of total floods in the study area, M is the number of pixels in the class area of the factor and L is the number of total pixels in the study area. The yearly probabilities are later averaged to get the final value, which is useful for future calculations.
For the work, we have to load the following libraries:
packages <- c("terra")
#read the district shapefile
state_4326 <- vect("assam_State.shp")
plot(state_4326)
The data for the analysis are (1) the conditioning factor, which in this case is ‘distance from road’, and (2) the yearly flooding probability map.
These maps form the base to calculate E, F, M, and L of VarWt = (E/F)/(M/L).
Pre-conditioning for conditioning factor - The map (downloaded from https://hub.worldpop.org/project/categories?id=14) is cropped to the study area, which is the Assam state boundary. The map is classified into five categories using quantiles of 0.2.
Pre-conditioning for yearly ‘flooding probability’ map - The 0’s in the yearly flooding probability map are masked as NAs. The map is resampled to resemble the dimension and extent of the socio-economic variable. The cell values greater than or equal to 0.1 is filtered out in the resampled map to get a map with only 0s and 1s.
E - The newly created map of the physical infrastructure variable with five classes (values ranges from 1 to 5) is added up with the filtered resampled yearly ‘flooding probability’ map, resulting a new map with values ranging from 2 to 6. The area of each category or class in the map is calculated. This forms the E. A user-defined function called ‘rastr_area’ calculates the area of different categories or classes.
F - The total flooded area is calculated from the yearly flooding probability map (F). The value is pre-calculated and we read it from a saved table.
M - The area of each category or class in the conditioning factor is calculated. This forms the M. A user-defined function called ‘rastr_area’ calculates the area of different categories or classes.
L - The study area (L) is calculated from this map.
The above four variables are used to get the frequency ratio or conditional probability for each year. FR <- (overlapClassArea or E/flooded_area or F)/(classArea or M/total_area or L) The ‘F’ and ‘M’ are repeated five times to resemble the number of categories the socio-economic variable is classified.
The weight of the classes (FR) for each year is saved in separate tables.
#read the conditioning factor - distance from road
rd_dist <- rast("state/ind_osm_dst_road_100m_2016.tif")
readStop(rd_dist)
plot(rd_dist)
#reproject to 4326 #reqiredd for few conditioning factors
#riv_dist <- terra::project(riv_dist , crs(state_4326), overwrite = T)
rdDist_dar <- terra::crop(rd_dist , state_4326)
rdDist_dar <- terra::mask(rdDist_dar , state_4326, overwrite = T)
plot(rdDist_dar, main = "Distance from road")
#total area
a <- (terra::cellSize(rdDist_dar, unit = "m"))
#flooded area
area_dar <- sum(a[],na.rm=T)
# alternate
# area_dar <- expanse(rdDist_dar)
area_dar
## [1] 78528390615
#reading the overall flooding proabbaility - consolidated from 2015 to 2022
flood_inv_10 <- rast("conditioningfactors/floodedareas_grtr1in5_Aug.tif")
plot(flood_inv_10)
#calc the flooded area
area_flood <- read.table('area_flood_Aug')
area_flood$V1
## [1] 3330796349
#classify the map using quantiles
tt <- cut(rdDist_dar[], breaks = c(quantile(rdDist_dar[],probs = seq(0, 1, by = 0.20),na.rm=T)),
labels=c("1","2","3","4","5"),
include.lowest =T, na.rm = T)
rddist <- rdDist_dar
values(rddist) <- tt
rddist <- (rddist + 1)
plot(rddist)
#function for calc the area
rastr_area <- function(img){
img <- terra::cellSize(img, unit = "m")*img
imgArea <- sum(img[],na.rm=T)
return(imgArea)
}
#creating a list the classes which are raster images
dist <- c(rddist == 1, rddist == 2,
rddist == 3, rddist == 4,
rddist == 5)
plot(dist[[5]], man = "Reclassified map")
#class Area using the user-defined function
classArea <- sapply(dist, rastr_area)
classArea
## [1] 15699244418 15707657717 15699230291 15710450238 15711807962
#overlap of classes with the flooded area
#since the resolution and extent is different, dummy raster is used
dummy <- rast(ext=ext(rddist),crs=crs(rddist, proj=T, describe=FALSE, parse=FALSE),
nrow=dim(rddist)[1],ncol=dim(rddist)[2])
#values(dummy) <- 1
#flood_inv_10 <- flood_inv_10 * dummy
plot(flood_inv_10)
flood_inv_11 <-resample(flood_inv_10, dummy)
plot(flood_inv_11)
flood_inv_11 <- (flood_inv_11 >= 0.1)*1
plot(flood_inv_11)
#masking all the non-flooded areas by converting it to NA
flood_inv_11[flood_inv_11 == 0] <- NA
plot(flood_inv_11)
#sum to find the overlaps
overlap <- flood_inv_11 + rddist
plot(overlap, main = 'Overalps')
overlap_class <- c(overlap == 2, overlap == 3,
overlap == 4, overlap == 5,
overlap == 6)
plot(overlap_class)
overlapClassArea <- sapply(overlap_class, rastr_area)
overlapClassArea
## [1] 987445242 1085096489 998566257 748627147 453869541
#frequency ratio
flooded_area <- rep(area_flood$V1,5)
flooded_area
## [1] 3330796349 3330796349 3330796349 3330796349 3330796349
total_area <- rep(area_dar,5)
total_area
## [1] 78528390615 78528390615 78528390615 78528390615 78528390615
FR <- (overlapClassArea/flooded_area)/(classArea/total_area)
FR
## [1] 1.482903 1.628679 1.499606 1.123455 0.681057
#prepare a data frame of the freqeuncy ratio
df <- data.frame(factor = rep("distfromRoad",5),
Class = levels(tt),
FR = FR)
df
## factor Class FR
## 1 distfromRoad 1 1.482903
## 2 distfromRoad 2 1.628679
## 3 distfromRoad 3 1.499606
## 4 distfromRoad 4 1.123455
## 5 distfromRoad 5 0.681057
#save it in a table
The same logic is followed other conditioning factors such as : Soil Lithology Drainage Density Slope Elevation Surface runoff Distance from major rivers Distance from road Distance from road intersections Distance from large natural reserves