We at CivicDataLab are engaged in a data experiment for preparing an intelligent data model for flood response and management. As part of the experiment, we have to assign weights to multiple socio-economic variables by assessing the relationship of the selected variable with the flood-proneness of the region. We aim to use a data-driven approach, borrowing concepts from Bayesian statistics, to calculate the weight of different classes of a variable to the flood-proneness of the region.
Bayesian statistics calculates probability based on conditions, hence also known as conditional probability. Here, we use bayesian statistics to estimate the probability at which different categories of a given socio-economic variable experiences flooding in any sub-district. For a given variable, it is the proportion of the affect flood has on a category to the affect of the entire category in a study area. The proportion is recalculated for different years of floods and averaged to get a probability, which is taken as the weight of each category of a given variable.
Here, we calculate the yearly probability as VarWt = (E/F)/(M/L), E is the number of pixels with flood for each factor, F is the number of total floods in the study area, M is the number of pixels in the class area of the factor and L is the number of total pixels in the study area. The yearly probabilities are later averaged to get the final value, which is useful for future calculations.
For the work, we have to load the following libraries:
packages <- c("terra")
#read the district shapefile
state_4326 <- vect("assam_State.shp")
The data for the analysis are (1) the socio-economic variable, which in this case is the yearly population, and (2) the yearly flooding probability map.
These maps form the base to calculate E, F, M, and L of VarWt = (E/F)/(M/L).
Pre-conditioning for socio-economic variable - The yearly population map is cropped to the study area, which is the Assam state boundary. The map is classified into five categories using quantiles of 0.2.
Pre-conditioning for yearly ‘flooding probability’ map - The 0’s in the yearly flooding probability map are masked as NAs. The map is resampled to resemble the dimension and extent of the socio-economic variable. The cell values greater than or equal to 0.1 is filtered out in the resampled map to get a map with only 0s and 1s.
E - The newly created map of the socio-economic variable with five classes (values ranges from 1 to 5) is added up with the filtered resampled yearly ‘flooding probability’ map, resulting a new map with values ranging from 2 to 6. The area of each category or class in the map is calculated. This forms the E. A user-defined function called ‘rastr_area’ calculates the area of different categories or classes.
F - The total flooded area is calculated from the yearly flooding probability map (F). The value is pre-calculated and we read it from a saved table.
M - The area of each category or class in the map of the socio-economic variable is calculated. This forms the M. A user-defined function called ‘rastr_area’ calculates the area of different categories or classes.
L - The study area (L) is calculated from this map.
The above four variables are used to get the frequency ratio or conditional prbability for each year. FR <- (overlapClassArea or E/flooded_area or F)/(classArea or M/total_area or L) The ‘F’ and ‘M’ are repeated five times to resemble the number of categories the socio-economic variable is classified.
The weight of the classes (FR) for each year is saved in separate tables.
#RUN THE LOOP FOR ALL THE YEARS
for (yr in 2016:2016) {
state_4326 <- vect("assam_State.shp")
layer_type <- 'pop'
#reading the socio-economic variable
factr <- rast(sprintf('state/pop_ind_ppp_%s_UNadj.tif',yr))
factr_dar <- factr
factr_dar <- terra::crop(factr , state_4326)
factr_dar <- terra::mask(factr_dar , state_4326, overwrite = T)
plot(factr_dar , main = sprintf("Socio-economic layer %s",yr))
#calculating the extent of the study area
a <- (terra::cellSize(factr_dar, unit = "m"))
#flooded area
area_dar <- sum(a[],na.rm=T)
#reading the FLOODED AREAS
flood_inv_10 <- rast(sprintf("conditioningfactors/floodedareas_grtr1in5_%sAug.tif",yr))
#masking the non-flooded areas
flood_inv_10[flood_inv_10 == 0] <- NA
#reading the pre-calculated flooded area
area_flood <- read.table(sprintf('conditioningfactors/area_flood_%s__Aug',yr))
area_flood$V1
#using quantiles to cut into 5 levels
tt <- cut(factr_dar[], breaks = c(quantile(factr_dar[],probs = seq(0, 1, by = 0.20),na.rm=T)),
labels=c("1","2","3","4","5"),
include.lowest =T, na.rm = T)
factr_l <- factr_dar
values(factr_l) <- tt
factr_l <- (factr_l + 1)
plot(factr_l,main = sprintf("Socio-economic layer %s",yr))
factr_l <- rast(sprintf("conditioningfactors/classes_%s_Aug_%s.tif",layer_type, yr))
#user-defined function for calculate the area of different classes
rastr_area <- function(img){
img <- terra::cellSize(img, unit = "m")*img
imgArea <- sum(img[],na.rm=T)
return(imgArea)
}
#creating a list the classes which are raster images
dist <- c(factr_l == 1, factr_l == 2,
factr_l == 3, factr_l == 4,
factr_l == 5)
#finding area of each class using the user-defined function
classArea <- sapply(dist, rastr_area)
classArea
#since the resolution and extent is different, dummy raster is used to resample the flood map
dummy <- rast(ext=ext(factr_l),crs=crs(factr_l, proj=T, describe=FALSE, parse=FALSE),
nrow=dim(factr_l)[1],ncol=dim(factr_l)[2])
flood_inv_11 <-resample(flood_inv_10, dummy)
flood_inv_11 <- (flood_inv_11 >= 0.1)*1
#masking all non-flooded areas
flood_inv_11[flood_inv_11 == 0] <- NA
#prepare a map of the areas of socio-economic variables that overlap with the flooded area
overlap <- flood_inv_11 + factr_l
plot(overlap, main = sprintf("Overlap of Socio-economic layer with flood for %s",yr))
overlap_class <- c(overlap == 2, overlap == 3,
overlap == 4, overlap == 5,
overlap == 6)
overlapClassArea <- sapply(overlap_class, rastr_area)
#frequency ratio
flooded_area <- rep(area_flood$V1,5)
total_area <- rep(area_dar,5)
FR <- (overlapClassArea/flooded_area)/(classArea/total_area)
#prepare a data frame of the frequency ratio
df <- data.frame(factor = rep(sprintf('%s',layer_type),5),
Class = c(1:5),
FR = FR)
#save it in a table
}
The same logic is followed for the following socio-economic layers:
population density, AAY, Aged population (>65 years), deprived
households, households with no drinking water and sanitation, households
with no landline or mobile phone, sex ratio, young population (0-5
years).