library(tidyverse)
library(lubridate)
library(plotly)
library(data.table)
library(patchwork) 
library(scales) 
library(viridis)
library(scales)
library(geometry)
library(MASS)
library(gridExtra)
library(utilities)
library(mapview)
library(forcats)
library(rnaturalearth)
library(rnaturalearthdata)
library(rgdal)
library(sp)
library(sf)
library(leaflet)
library(utils)
library(base)
theme_set(theme_bw())
theme_set(theme_minimal())
library(readxl)
Data6 <- read_excel("emdat_public_2021_03_02_query_uid-0CR386.xlsx")

In 1988, the Centre for Research on the Epidemiology of Disasters (CRED) launched the Emergency Events Database (EM-DAT). EM-DAT was created with the initial support of the World Health Organisation (WHO).

EM-DAT contains essential core data on the occurrence and effects of over 22,000 mass disasters in the world from 1900 to the present day. The database is compiled from various sources, including UN agencies, non-governmental organisations, insurance companies, research institutes and press agencies.

Link (EM-DAT) = https://www.emdat.be

Geographical informations :

Continent: The Continent in which the disaster has occurred or had an impact;

Region: The region to which the country belongs.

Country: The Country in which the disaster has occurred or had an impact;

ISO Code: The International Organization for Standardization attributes a 3-letter code to each country. CRED uses the ISO 3166 . This field is automatically linked to the country.

Disaster informations:

Disaster: The disaster category is divided into 7 Groups: Geophysical, Meteorological, Hydrological, Climatological, Biological, Extra-terrestrial and Technological disasters.

Each Disaster Group includes the following Disaster Types:

Geophysical = Earthquake ; Volcanic activity ; Mass Movement

Meteorological = Storm ; Extreme Temperature ; Fog ;

Hydrological = Flood ; Landslide ; Wave action ;

Climatological = Drought ; Glacial lake outburst ; Wildfire ;

Biological = Epidemic ; Insect Infestation ; Animal accident ;

Extra-terrestrial = Impact; Space weather;

Technological = Industrial accident ; Miscellaneous accident ;
Transport accident ;

Human impact informations :

Deaths: Number of people who lost their life because the event happened.

Total affected: The total affected is the sum of injured, affected and homeless

Entry criteria: The reason for recording the disaster event into EM-DAT. At least one of the following criteria must be fulfilled in order for an event to be entered into the database:

Deaths: 10 or more people deaths

Affected: 100 or more people affected/injured/homeless.

Our Dataset is made up of 9 columns and 24,829 rows [24,829 x 9]:

library(dplyr)

library(raster)

Data0=Data6 %>%rename( Affected= `Total Affected`)%>%
  rename( Disaster= `Disaster Subgroup`) %>%
  rename( Deaths= `Total Deaths`)%>%
  rename( Type= `Disaster Type`)

Data=Data0

Data$Affected[is.na(Data$Affected)] <- 0
Data$Deaths[is.na(Data$Deaths)] <- 0


Data<-Data %>% dplyr::select(Year,  Disaster, Type, Country,ISO,  Deaths,  Affected,Region,Continent)

Data1=Data %>%
  filter(Disaster!= "Complex Disasters")%>%
  filter(Disaster!= "Extra-terrestrial")



Data2= Data1 %>% dplyr::select(Continent, Disaster, Country, ISO)

sum=Data2 %>%
  group_by(ISO) %>%
  summarise(num=n())


 s <- shapefile("C:/Users/User/Desktop/UNI/2 Anno/ADV/PROJECT/Proj. Claudio Olivari/Proj/TM_WORLD_BORDERS_SIMPL-0.3.shp")
sum=sum %>% dplyr::rename( ISO3= ISO)
Data2=Data2 %>% dplyr::rename( ISO3= ISO)


s %>% merge(sum,by="ISO3", duplicateGeoms = TRUE) -> sum1
sum$num <- as.numeric(sum$num)

str(Data1)
## tibble [24,829 x 9] (S3: tbl_df/tbl/data.frame)
##  $ Year     : chr [1:24829] "1900" "1900" "1901" "1902" ...
##  $ Disaster : chr [1:24829] "Climatological" "Climatological" "Technological" "Geophysical" ...
##  $ Type     : chr [1:24829] "Drought" "Drought" "Industrial accident" "Earthquake" ...
##  $ Country  : chr [1:24829] "Cabo Verde" "India" "Belgium" "Guatemala" ...
##  $ ISO      : chr [1:24829] "CPV" "IND" "BEL" "GTM" ...
##  $ Deaths   : num [1:24829] 11000 1250000 18 2000 1000 6000 76 17 84 15 ...
##  $ Affected : num [1:24829] 0 0 0 0 0 0 23 0 0 2 ...
##  $ Region   : chr [1:24829] "Western Africa" "Southern Asia" "Western Europe" "Central America" ...
##  $ Continent: chr [1:24829] "Africa" "Asia" "Europe" "Americas" ...

HISTORICAL SERIES

This graphs shows the behavior of N. Disaster / Affected / Deaths over the time period from 1900 to 2020. The series are divided into different colors with respect to the Continent each Country belongs.

DISASTERS PER CONTINENT

The Continent with the greatest number of disasters is Asia .

dat=Data1 %>%
  group_by(Year, Continent) %>%
  summarise(num=n())

  
dat=as.data.frame(dat)

dat$Disaster=as.factor(dat$Continent)

dat %>%
  streamgraph("Disaster", "num", "Year", offset="zero", interpolate="cardinal") %>%
  sg_axis_x(2, "Year", "%Y") %>%
  sg_fill_brewer("Set1") %>%
 sg_legend(TRUE, "Continent: ")%>%

  sg_annotate(label="from 1900 to 2020", 
              x=as.Date("1900-01-01"), y=800, color="#000000", size=18)

AFFECTED PER CONTINENT

The Continentt with the greatest number of Affected is Asia .

sum6 <- Data1 %>%
  dplyr::select(Year,Continent) %>%
   group_by(Year,Continent) %>%
  summarise(num=n())

Affected=aggregate(Data1$Affected, by=list(Year=Data1$Year,Continent=Data1$Continent), FUN=sum)

sum6 <- subset( sum6, select = -num )


df3 = merge(sum6, Affected, by.x=c("Year", "Continent"), by.y=c("Year", "Continent"))



df3$x=as.integer(df3$x)




df3$Year=as.Date(ISOdate(df3$Year, 1, 1))  # beginning of year
df3$Year=as.Date(ISOdate(df3$Year, 12, 31))  # end of year

df3$Disaster=as.factor(df3$Continent)


# save th


df3 %>%
  streamgraph("Continent", "x", "Year", offset="zero", interpolate="cardinal") %>%
  sg_axis_x(2, "Year", "%Y") %>%
  sg_fill_brewer("Set1") %>%
 sg_legend(TRUE, "Continent: ")%>%

  sg_annotate(label="from 1900 to 2020", 
              x=as.Date("1900-01-01"), y=600000000, color="#000000", size=18)

DEATHS PER CONTINENT

The Continent with the greatest number of Deaths is Asia .

sum6 <- Data1 %>%
  dplyr::select(Year,Continent) %>%
   group_by(Year,Continent) %>%
  summarise(num=n())

Deaths=aggregate(Data1$Deaths, by=list(Year=Data1$Year,Continent=Data1$Continent), FUN=sum)

sum6 <- subset( sum6, select = -num )


df3 = merge(sum6, Deaths, by.x=c("Year", "Continent"), by.y=c("Year", "Continent"))



df3$x=as.integer(df3$x)




df3$Year=as.Date(ISOdate(df3$Year, 1, 1))  # beginning of year
df3$Year=as.Date(ISOdate(df3$Year, 12, 31))  # end of year

df3$Disaster=as.factor(df3$Continent)


# save th


df3 %>%
  streamgraph("Continent", "x", "Year", offset="zero", interpolate="cardinal") %>%
  sg_axis_x(2, "Year", "%Y") %>%
  sg_fill_brewer("Set1") %>%
 sg_legend(TRUE, "Continent: ")%>%

  sg_annotate(label="from 1900 to 2020", 
              x=as.Date("1900-01-01"), y=3400000, color="#000000", size=18)

Findings :

Density distributions

As we have seen in previous distributions, the number of disasters and the number and Deaths changes dramatically between 1900-1960 and 1960-2020.

To better visualize this difference and in order to measure the severity of each Disaster Group, I decided to use the Disaster Groups Density Plot with respect to the number of Deaths and the number of Disasters that occurred in each Country during these two time periods

Since the two periods have very different ranges and to avoid the “flattening” of the plots of the different Types of Disasters, I decided to show two different graphs, one for the first period and one for the second and to remove outliers for both the periods.

library(ggplot2)
library(hrbrthemes)
library(dplyr)
library(viridis)
library(ggridges)
library(extrafont)




Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>% dplyr::filter(Year<1960, Deaths>1)      %>%
   group_by(Disaster,Country,Continent) %>%
  summarise(num=n())

Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country) %>% dplyr::filter(Year<1960 , Deaths>1)


Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)




Deaths2=Deaths2 %>%rename( Disaster= Group.2)%>%
  rename( Continent= Category)%>%rename( Country= Group.3)%>%rename( Deaths= x)


Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))



Density <- Density[Density$num > quantile(Density$num, .25) - 1.5*IQR(Density$num) & 
        Density$num < quantile(Density$num, .75) + 1.5*IQR(Density$num), ] 

Density <- Density[Density$Deaths > quantile(Density$Deaths, .25) - 1.5*IQR(Density$Deaths) & 
        Density$Deaths < quantile(Density$Deaths, .75) + 1.5*IQR(Density$Deaths), ] 



Density <- transform(Density, ratio = Deaths / num)





Density <- Density[Density$ratio > quantile(Density$ratio, .25) - 1.5*IQR(Density$ratio) & 
        Density$ratio < quantile(Density$ratio, .75) + 1.5*IQR(Density$ratio), ] 




Density=Density %>% dplyr::filter(ratio<1000)


Density1=Density %>%
  mutate(text = fct_reorder(Disaster, ratio)) %>%
  ggplot( aes(y=Disaster, x=ratio,  fill=Disaster)) +
    geom_density_ridges(alpha=0.8, bandwidth=2) +
    scale_fill_viridis(discrete=TRUE) +
    scale_color_viridis(discrete=TRUE) +
    theme_ipsum() +
    theme(
      legend.position="none",
      panel.spacing = unit(0.1, "lines"),
      strip.text.x = element_text(size = 10)
    ) +
    xlab("") +
    ylab("Disaster Group 1900-1960")








Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>% dplyr::filter(Year>1960, Deaths>1)      %>%
   group_by(Disaster,Country,Continent) %>%
  summarise(num=n())


Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country) %>% dplyr::filter(Year>1960 , Deaths>1)


Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)



Deaths2=Deaths2 %>%rename( Disaster= Group.2)%>%
  rename( Continent= Category)%>%rename( Country= Group.3)%>%rename( Deaths= x)


Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))



Density <- Density[Density$num > quantile(Density$num, .25) - 1.5*IQR(Density$num) & 
        Density$num < quantile(Density$num, .75) + 1.5*IQR(Density$num), ] 

Density <- Density[Density$Deaths > quantile(Density$Deaths, .25) - 1.5*IQR(Density$Deaths) & 
        Density$Deaths < quantile(Density$Deaths, .75) + 1.5*IQR(Density$Deaths), ] 





Density <- transform(Density, ratio = Deaths / num)




Density <- Density[Density$ratio > quantile(Density$ratio, .25) - 1.5*IQR(Density$ratio) & 
        Density$ratio < quantile(Density$ratio, .75) + 1.5*IQR(Density$ratio), ] 


Density2=Density %>%
  mutate(text = fct_reorder(Disaster, ratio)) %>%
  ggplot( aes(y=Disaster, x=ratio,  fill=Disaster)) +
    geom_density_ridges(alpha=0.8, bandwidth=2) +
    scale_fill_viridis(discrete=TRUE) +
    scale_color_viridis(discrete=TRUE) +
    theme_ipsum() +
    theme(
      legend.position="none",
      panel.spacing = unit(0.1, "lines"),
      strip.text.x = element_text(size = 10)
    ) +
    xlab("") +
    ylab("Disaster Group 1960-2020")



library(scater)

multiplot(Density1, Density2)

Findings 1900-1960 :

Findings 1960-2020 :

Treemap for hierarchical representation of the world with respect to the number of disasters

Using the Treemap we have a hierarchical view of the continents and their respective states with respect to the total number of disasters.

Each Continent is represented with a different color and contains the states that are part of it, their size is proportional to the total number of disasters it has presented in the period of time from 1900 to 2020.

The interactivity of the Treemap also allows us to see how the Disasters are distributed among the Countries of the same Continent and to check the exact number of Disasters that occurred from 1900 to 2020 in each Country.

inter

Geographic representation of the number of disasters around the world from 1900 to 2020 divided by the type of disaster

Using the Choropleth map we have a geographic view of the total number of disasters that occurred in the period 1900-2020 in each state of the world where each state is colored against this number.

In this section I have decided to represent the world with respect to each Disaster Group by making it possible to compare the different Continents (or states) with each other but not between the different Types of Disasters because, since we have very different ranges for each Type of Disaster, we are using different color scales in each type of choroplet plot.

With the frequency distributionit is possible to observe the distribution of each Group of disasters by single Continet over the years and associate the peaks with the actual Disasters that occurred in that period .

Technological

Results :

  • Asia is the continent with the largest number of technological disasters ;

  • Europe is the continent with the least number of technological disasters ;

  • India and China are the states with the greatest number of technological disasters ;

library(dplyr)



tech <- subset(Data2, Disaster == "Technological") 

tech=tech %>%
  group_by(Country,ISO3) %>%
  summarise(num=n())

s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2



library(RColorBrewer)
mybins <- c(0,100,200,300,400,500,600,700, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
 

# Prepare the text for tooltips:
mytext <- paste(
    "Country: ", sum_2$NAME,"<br/>", 
    "num: ", round(sum_2$num, 2), 
    sep="") %>%
  lapply(htmltools::HTML)
 
# Final Map
l=leaflet(tech) %>% 
  addTiles()  %>% 
  setView( lat=10, lng=0 , zoom=2) %>%
  addPolygons(data = sum_2,
    fillColor = ~mypalette(sum_2$num), 
    stroke=TRUE, 
    fillOpacity = 0.9, 
    color="white", 
    weight=0.3,
    label = mytext,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
  group_by(Year,Continent, Disaster) %>%
  filter(Disaster== "Technological")%>%
  summarise(num=n())
 
library(dygraphs)
library(xts) # To make the convertion 


States1=States %>%
  pivot_wider(names_from = Continent, values_from = num)



States1[is.na(States1)] <- 0

States1$Year=as.Date(ISOdate(States1$Year, 1, 1))  # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31))  # end of year


# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)

# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library(hrbrthemes)
library(plyr)



Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
  dplyr::group_by(Disaster,Country,Continent) %>%
 dplyr::summarise(num=n())


Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)



Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)



Deaths2=Deaths2 %>% dplyr::rename( Disaster = Group.2) %>% 
dplyr::rename( Continent = Category) %>% dplyr::rename( Country = Group.3) %>% dplyr::rename( Deaths = x)


Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))






Density <- transform(Density, ratio = Deaths / num)




op4 <- Density %>%
        filter(Disaster == "Technological") 

op4<-op4[order(-op4[,6]),]

op4<-op4[1:10,]



op4 %>%
dplyr::arrange(ratio) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=ratio) ) +
    geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
    geom_point(size=5, color="#69b3a2") +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("") +
    ylab("Top Country - Technological)")

Biological

Results :

  • Africa is the continent with the largest number of Biological disasters ;

  • Europe and Americas are the continents with the least numbers of Biological disasters ;

  • Nigeria, Congo and India are the states with the greatest number of technological disasters ;

library(dplyr)

library(xts)

tech <- subset(Data2, Disaster == "Biological") 
tech=tech %>%
 dplyr::group_by(Country,ISO3) %>%
  dplyr::summarise(num=n())

s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2



library(RColorBrewer)
mybins <- c(0,10,20,30,40,50,60, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
 

# Prepare the text for tooltips:
mytext <- paste(
    "Country: ", sum_2$NAME,"<br/>", 
    "num: ", round(sum_2$num, 2), 
    sep="") %>%
  lapply(htmltools::HTML)
 
# Final Map
l=leaflet(tech) %>% 
  addTiles()  %>% 
  setView( lat=10, lng=0 , zoom=2) %>%
  addPolygons(data = sum_2,
    fillColor = ~mypalette(sum_2$num), 
    stroke=TRUE, 
    fillOpacity = 0.9, 
    color="white", 
    weight=0.3,
    label = mytext,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
  dplyr::group_by(Year,Continent, Disaster) %>%
  dplyr::filter(Disaster== "Biological")%>%
  dplyr::summarise(num=n())
 
library(dygraphs)
library(xts) # To make the convertion 


States1=States %>%
  pivot_wider(names_from = Continent, values_from = num)



States1[is.na(States1)] <- 0

States1$Year=as.Date(ISOdate(States1$Year, 1, 1))  # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31))  # end of year


# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)

# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library( hrbrthemes)




Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
   dplyr::group_by(Disaster,Country,Continent) %>%
  dplyr::summarise(num=n())


Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)



Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)



Deaths2=Deaths2 %>%dplyr::rename( Disaster= Group.2)%>%
  dplyr::rename( Continent= Category)%>%dplyr::rename( Country= Group.3)%>%dplyr::rename( Deaths= x)


Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))






Density <- transform(Density, ratio = Deaths / num)




op4 <- Density %>%
        filter(Disaster == "Biological") 

op4<-op4[order(-op4[,6]),]

op4<-op4[1:10,]



op4 %>%
dplyr::arrange(ratio) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=ratio) ) +
    geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
    geom_point(size=5, color="#69b3a2") +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("") +
    ylab("Top Country - Biological")

Geophysical

Results :

  • Asia is the continent with the largest number of Geophysical disasters ;

  • Europe and Africa are the continents with the least numbers of Geophysical disasters ;

  • Cina and Indonesia are the states with the greatest number of Geophysical disasters ;

library(dplyr)



tech <- subset(Data2, Disaster == "Geophysical") 

tech=tech %>%
  dplyr::group_by(Country,ISO3) %>%
  dplyr::summarise(num=n())

s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2



library(RColorBrewer)
mybins <- c(0,30,60,90,120,150,180, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
 

# Prepare the text for tooltips:
mytext <- paste(
    "Country: ", sum_2$NAME,"<br/>", 
    "num: ", round(sum_2$num, 2), 
    sep="") %>%
  lapply(htmltools::HTML)
 
# Final Map
l=leaflet(tech) %>% 
  addTiles()  %>% 
  setView( lat=10, lng=0 , zoom=2) %>%
  addPolygons(data = sum_2,
    fillColor = ~mypalette(sum_2$num), 
    stroke=TRUE, 
    fillOpacity = 0.9, 
    color="white", 
    weight=0.3,
    label = mytext,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
  dplyr::group_by(Year,Continent, Disaster) %>%
  dplyr::filter(Disaster== "Geophysical")%>%
  dplyr::summarise(num=n())
 
library(dygraphs)
library(xts) # To make the convertion 


States1=States %>%
  pivot_wider(names_from = Continent, values_from = num)



States1[is.na(States1)] <- 0

States1$Year=as.Date(ISOdate(States1$Year, 1, 1))  # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31))  # end of year


# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)

# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library( hrbrthemes)




Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
   dplyr::group_by(Disaster,Country,Continent) %>%
  dplyr::summarise(num=n())


Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)



Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)



Deaths2=Deaths2 %>%dplyr::rename( Disaster= Group.2)%>%
  dplyr::rename( Continent= Category)%>%dplyr::rename( Country= Group.3)%>%dplyr::rename( Deaths= x)


Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))






Density <- transform(Density, ratio = Deaths / num)




op4 <- Density %>%
        filter(Disaster == "Geophysical") 

op4<-op4[order(-op4[,6]),]

op4<-op4[1:10,]



op4 %>%
dplyr::arrange(ratio) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=ratio) ) +
    geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
    geom_point(size=5, color="#69b3a2") +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("") +
    ylab("Top Country - Geophysical")

Meteorological

Results :

  • Americas and Asia are the continent with the largest number of Meteorological disasters ;

  • Europe and Africa are the continents with the least numbers of Meteorological disasters ;

  • Usa, Cina and India are the states with the greatest number of Meteorological disasters ;

library(dplyr)



tech <- subset(Data2, Disaster == "Meteorological") 

tech=tech %>%
  dplyr::group_by(Country,ISO3) %>%
  dplyr::summarise(num=n())

s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2



library(RColorBrewer)
mybins <- c(0,50,100,150,200,250,300,400,450, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
 

# Prepare the text for tooltips:
mytext <- paste(
    "Country: ", sum_2$NAME,"<br/>", 
    "num: ", round(sum_2$num, 2), 
    sep="") %>%
  lapply(htmltools::HTML)
 
# Final Map
l=leaflet(tech) %>% 
  addTiles()  %>% 
  setView( lat=10, lng=0 , zoom=2) %>%
  addPolygons(data = sum_2,
    fillColor = ~mypalette(sum_2$num), 
    stroke=TRUE, 
    fillOpacity = 0.9, 
    color="white", 
    weight=0.3,
    label = mytext,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
 leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
  dplyr::group_by(Year,Continent, Disaster) %>%
  dplyr::filter(Disaster== "Meteorological")%>%
  dplyr::summarise(num=n())
 
library(dygraphs)
library(xts) # To make the convertion 


States1=States %>%
  pivot_wider(names_from = Continent, values_from = num)



States1[is.na(States1)] <- 0

States1$Year=as.Date(ISOdate(States1$Year, 1, 1))  # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31))  # end of year


# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)

# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library( hrbrthemes)




Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
   dplyr::group_by(Disaster,Country,Continent) %>%
  dplyr::summarise(num=n())


Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)



Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)



Deaths2=Deaths2 %>%dplyr::rename( Disaster= Group.2)%>%
  dplyr::rename( Continent= Category)%>%dplyr::rename( Country= Group.3)%>%dplyr::rename( Deaths= x)


Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))






Density <- transform(Density, ratio = Deaths / num)




op4 <- Density %>%
        filter(Disaster == "Meteorological") 

op4<-op4[order(-op4[,6]),]

op4<-op4[1:10,]



op4 %>%
dplyr::arrange(ratio) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=ratio) ) +
    geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
    geom_point(size=5, color="#69b3a2") +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("") +
    ylab("Top Country - Meteorological")

Hydrological

Results :

  • Asia is the continent with the largest number of Hydrological disasters ;

  • Europe and Africa are the continents with the least numbers of Hydrological disasters ;

  • Cina, India and Indonesia are the states with the greatest number of Hydrological disasters ;

library(dplyr)



tech <- subset(Data2, Disaster == "Hydrological") 

tech=tech %>%
  dplyr::group_by(Country,ISO3) %>%
  dplyr::summarise(num=n())

s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2



library(RColorBrewer)
mybins <- c(0,40,80,120,160,200,240, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
 

# Prepare the text for tooltips:
mytext <- paste(
    "Country: ", sum_2$NAME,"<br/>", 
    "num: ", round(sum_2$num, 2), 
    sep="") %>%
  lapply(htmltools::HTML)
 
# Final Map
l=leaflet(tech) %>% 
  addTiles()  %>% 
  setView( lat=10, lng=0 , zoom=2) %>%
  addPolygons(data = sum_2,
    fillColor = ~mypalette(sum_2$num), 
    stroke=TRUE, 
    fillOpacity = 0.9, 
    color="white", 
    weight=0.3,
    label = mytext,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
  dplyr::group_by(Year,Continent, Disaster) %>%
  dplyr::filter(Disaster== "Hydrological")%>%
  dplyr::summarise(num=n())
 
library(dygraphs)
library(xts) # To make the convertion 


States1=States %>%
  pivot_wider(names_from = Continent, values_from = num)



States1[is.na(States1)] <- 0

States1$Year=as.Date(ISOdate(States1$Year, 1, 1))  # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31))  # end of year


# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)

# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library( hrbrthemes)




Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
   dplyr::group_by(Disaster,Country,Continent) %>%
  dplyr::summarise(num=n())


Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)



Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)



Deaths2=Deaths2 %>%dplyr::rename( Disaster= Group.2)%>%
  dplyr::rename( Continent= Category)%>%dplyr::rename( Country= Group.3)%>%dplyr::rename( Deaths= x)


Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))






Density <- transform(Density, ratio = Deaths / num)




op4 <- Density %>%
        filter(Disaster == "Hydrological") 

op4<-op4[order(-op4[,6]),]

op4<-op4[1:10,]



op4 %>%
dplyr::arrange(ratio) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=ratio) ) +
    geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
    geom_point(size=5, color="#69b3a2") +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("") +
    ylab("Top Country - Hydrological")

Climatological

Results :

  • Americas is the continent with the largest number of Climatological disasters ;

  • Europe and Africa are the continents with the least numbers of Climatological disasters ;

  • Usa is the states with the greatest number of Climatological disasters ;

library(dplyr)



tech <- subset(Data2, Disaster == "Climatological") 

tech=tech %>%
  dplyr::group_by(Country,ISO3) %>%
  dplyr::summarise(num=n())

s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2



library(RColorBrewer)
mybins <- c(0,20,40,60,80, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
 

# Prepare the text for tooltips:
mytext <- paste(
    "Country: ", sum_2$NAME,"<br/>", 
    "num: ", round(sum_2$num, 2), 
    sep="") %>%
  lapply(htmltools::HTML)
 
# Final Map
l=leaflet(tech) %>% 
  addTiles()  %>% 
  setView( lat=10, lng=0 , zoom=2) %>%
  addPolygons(data = sum_2,
    fillColor = ~mypalette(sum_2$num), 
    stroke=TRUE, 
    fillOpacity = 0.9, 
    color="white", 
    weight=0.3,
    label = mytext,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
  dplyr::group_by(Year,Continent, Disaster) %>%
  dplyr::filter(Disaster== "Climatological")%>%
  dplyr::summarise(num=n())
 
library(dygraphs)
library(xts) # To make the convertion 


States1=States %>%
  pivot_wider(names_from = Continent, values_from = num)



States1[is.na(States1)] <- 0

States1$Year=as.Date(ISOdate(States1$Year, 1, 1))  # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31))  # end of year


# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)

# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library( hrbrthemes)




Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
   dplyr::group_by(Disaster,Country,Continent) %>%
  dplyr::summarise(num=n())


Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)



Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)



Deaths2=Deaths2 %>%dplyr::rename( Disaster= Group.2)%>%
  dplyr::rename( Continent= Category)%>%dplyr::rename( Country= Group.3)%>%dplyr::rename( Deaths= x)


Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))






Density <- transform(Density, ratio = Deaths / num)




op4 <- Density %>%
        filter(Disaster == "Climatological") 

op4<-op4[order(-op4[,6]),]

op4<-op4[1:10,]



op4 %>%
dplyr::arrange(ratio) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=ratio) ) +
    geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
    geom_point(size=5, color="#69b3a2") +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("") +
    ylab("Top Country - Climatological")

RADAR CHARTS

Using the Radar we can see :

Using the same color for each Continent and the same measurement scale for each aspect we want to analyze, makes it easier to compare the different Continents.

Disaster per Continent

## Results {.tabset}
if(!require("V8")) install.packages("V8")
if(!require("devtools")) install.packages("devtools")

library(tidyverse)
library(viridis)
library(patchwork)
library(hrbrthemes)
library(fmsb)
library(colormap)

library(oce)
set.seed(1)


set.seed(1)




sum6 <- Data1 %>%
  dplyr::select(Disaster,Country,Continent,Deaths)

sum6=sum6 %>%
  group_by(Disaster, Continent) %>%
dplyr::summarise(Tot_Deaths=n())
sum6<-na.omit(sum6)



sum6=sum6 %>%
  pivot_wider(names_from = Disaster, values_from = Tot_Deaths)




sum6=cbind(sum6[!sapply(sum6, is.list)], 
      (t(apply(sum6[sapply(sum6, is.list)], 1, unlist))))



sum6 = sum6[-c(6),]

sum6 <- rbind(c("0","0","0","0","0","0"),sum6)

sum6 <- rbind(c("3975 ","3975 ","3975 "," 3975 ","3975 ","3975 "),sum6)
set.seed(1)


# plot with default options:

# Prepare color



library(colormap)
# Prepare color
# Prepare title
mytitle <- c("Africa", "America", "Asia", "Europe", "Oceania")

# Split the screen in 6 parts
par(mfrow=c(2,3))
par(mar=rep(0.8,4))

sum6 <- subset( sum6, select = -Continent )

sum6$Biological=as.numeric(sum6$Biological)
sum6$Hydrological=as.numeric(sum6$Hydrological)
sum6$Meteorological=as.numeric(sum6$Meteorological)
sum6$Technological=as.numeric(sum6$Technological)
sum6$Geophysical=as.numeric(sum6$Geophysical)
sum6$Climatological=as.numeric(sum6$Climatological)
# Loop for each plot

colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9) )
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4) )
for(i in 1:5){

  # Custom the radarChart !
radarchart( sum6[c(1,2,i+2),], axistype=1, 
    pcol=colors_border , pfcol=colors_in , plwd=4 , plty=2,
              
    #custom the grid
    cglcol="black", cglty=1, axislabcol="black", caxislabels=seq(0,4000,1000), cglwd=1,
  
    #custom labels
    vlcex=1.2,
    
    #title
    title=mytitle[i]
    )
}

We can therefore declare that:

  • The continent with the greatest number of Disasters is Asia and that most of them are due to Disasters belonging to the Technological and Hydrological Group;

  • In the Americas there are many Disasters belonging to the Technological, Hydrological and Metereological Group.

  • In Africa there are many Disasters belonging to the Technological, Biological and Hydrological Groups.

  • Europe and Oceania do not seem to have significant Disaster numbers compared to the number of Disasters in the other Continents.

Affected per Continent

sum6 <- Data1 %>%
  dplyr::select(Disaster,Continent, Affected)



sum6=aggregate(sum6$Affected, by=list(Category= sum6$Continent, sum6$Disaster), FUN=sum)

sum6=sum6 %>%
  pivot_wider(names_from = Group.2, values_from = x)


sum6=cbind(sum6[!sapply(sum6, is.list)], 
      (t(apply(sum6[sapply(sum6, is.list)], 1, unlist))))



sum6 = sum6[-c(6),]




#(sum6,2,min)
#apply(sum6,2,max)



sum6 <- rbind(c("0","0","0","0","0","0"),sum6)

sum6 <- rbind(c("3677294127 ","3677294127 ","3677294127 "," 3677294127 ","3677294127 ","3677294127 "),sum6)




# Asia = Hydrological

# maximum = 367 million
set.seed(1)


# plot with default options:

# Prepare color



library(colormap)
# Prepare color
# Prepare title
mytitle <- c("Africa", "America", "Asia", "Europe", "Oceania")

# Split the screen in 6 parts
par(mfrow=c(2,3))
par(mar=rep(0.8,4))

sum6 <- subset( sum6, select = -Category )

sum6$Biological=as.numeric(sum6$Biological)
sum6$Hydrological=as.numeric(sum6$Hydrological)
sum6$Meteorological=as.numeric(sum6$Meteorological)
sum6$Technological=as.numeric(sum6$Technological)
sum6$Geophysical=as.numeric(sum6$Geophysical)
sum6$Climatological=as.numeric(sum6$Climatological)
# Loop for each plot

colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9) )
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4) )
for(i in 1:5){

  # Custom the radarChart !
radarchart( sum6[c(1,2,i+2),], axistype=1, 
    pcol=colors_border , pfcol=colors_in , plwd=4 , plty=2,
              
    #custom the grid
    cglcol="black", cglty=1, axislabcol="black", caxislabels=seq(0,400000000,100000000), cglwd=1,
  
    #custom labels
    vlcex=1.2,
    
    #title
    title=mytitle[i]
    )
}


# Asia = Hydrological

# maximum = 367 million

We can therefore declare that:

  • The Continent with the greatest number of Deaths is Asia and most of them are due to Disasters belonging to the Climatological, Biological and Hydrological Groups;

  • The other Continents do not seem to have significant Disaster numbers compared to the number of Disasters in Asia.

Deaths per Continent

sum6 <- Data1 %>%
  dplyr::select(Disaster,Continent, Deaths)



sum6=aggregate(sum6$Deaths, by=list(Category= sum6$Continent, sum6$Disaster), FUN=sum)

sum6=sum6 %>%
  pivot_wider(names_from = Group.2, values_from = x)


sum6=cbind(sum6[!sapply(sum6, is.list)], 
      (t(apply(sum6[sapply(sum6, is.list)], 1, unlist))))



sum6 = sum6[-c(6),]




#apply(sum6,2,max)



sum6 <- rbind(c("0","0","0","0","0","0"),sum6)

sum6 <- rbind(c("9664574 ","9664574 ","9664574 "," 9664574 ","9664574 ","9664574 "),sum6)


# Asia = Climatological

# maximum = 9.664.574 = 9 million Deaths
set.seed(1)


# plot with default options:

# Prepare color



library(colormap)
# Prepare color
# Prepare title
mytitle <- c("Africa", "America", "Asia", "Europe", "Oceania")

# Split the screen in 6 parts
par(mfrow=c(2,3))
par(mar=rep(0.8,4))

sum6 <- subset( sum6, select = -Category )

sum6$Biological=as.numeric(sum6$Biological)
sum6$Hydrological=as.numeric(sum6$Hydrological)
sum6$Meteorological=as.numeric(sum6$Meteorological)
sum6$Technological=as.numeric(sum6$Technological)
sum6$Geophysical=as.numeric(sum6$Geophysical)
sum6$Climatological=as.numeric(sum6$Climatological)
# Loop for each plot

colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9) )
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4) )
for(i in 1:5){

  # Custom the radarChart !
radarchart( sum6[c(1,2,i+2),], axistype=1, 
    pcol=colors_border , pfcol=colors_in , plwd=4 , plty=2,
              
    #custom the grid
    cglcol="black", cglty=1, axislabcol="black", caxislabels=seq(0,10000000,2500000), cglwd=1,
  
    #custom labels
    vlcex=1.2,
    
    #title
    title=mytitle[i]
    )
}


# Asia = Climatological

# maximum = 9.664.574 = 9 million Deaths

We can therefore declare that:

  • The continent with the largest number of Affected is Asia and that most of them are due to Disasters belonging to the Hydrological, Biological and Climatological Group;

  • The other Continents do not seem to have significant Affected numbers compared to the number of Disasters in Asia.

ALLUVIAL PLOT FOR NETWORK DATA

With an Alluvial Plot it is possible to observe how different Continents are connected with Disaster Groups and in turn with Disaster Types.

In this type of graph, the width of the line is directly proportional to the number of disasters that occurred in each Continent with respect to each Disaster Group and in turn with respect to each Type of Disaster.

I have decided to analyze the Continents to see how the different geographical areas differ in terms of the number of Disasters that have occurred in them and respect to the Groups and Types of Disasters that characterize them.

To highlight this difference between Continents I assigned to each continent a different color.

# Library


library(alluvial)
library(ggalluvial)

library(networkD3)
library(dplyr)


Data1 <-filter(Data1,Type!= "Animal accident")
Data1 <-filter(Data1,Type!= "   Drought")
Data1 <-filter(Data1,Type!= "Mass movement (dry)")


Link.lode=Data1 %>%
  group_by(Continent,Disaster,Type, Deaths) %>%
dplyr::summarise(Freq=n())





Link.lode$Type= factor(Link.lode$Type, levels  =  c("Transport accident"  , "Miscellaneous accident", "Industrial accident" , "Fog" , "Flood" ,"Landslide" , "Storm", "Mass Movement "  ,"Extreme temperature","Earthquake", "Volcanic activity" , "Epidemic" ,  "Wildfire" ,  "Insect infestation", "Drought"))



Link.lode$Continent= factor(Link.lode$Continent, levels =  c( "Asia", "Americas", "Africa", "Europe","Oceania"))

Link.lode$Disaster= factor(Link.lode$Disaster, levels =  c("Technological", "Hydrological", "Meteorological", "Geophysical","Biological", "Climatological"))





#breaks <- seq(min(Link$Deaths), max(Link$Deaths), 20)
#breaks=cut(Link$Deaths, c(breaks, Inf), labels = breaks, include.lowest = TRUE)


ggplot(as.data.frame(Link.lode),
       aes(y = Freq,
           axis1 = Continent,
           axis2 = Disaster, 
           axis3 = Type))+
  geom_alluvium(aes(fill = Continent),
                width = 1/12) +
  geom_stratum(width = 1/12, fill="grey30", color= "grey") + 
  geom_label( stat="stratum", aes(label=after_stat(stratum)))+
  scale_x_discrete(limits = c("Continent", "Disaster ", " Type"),  expand = c(.05, .05))  +
  scale_fill_brewer(type = "qual", palette = "Set1")+
  labs(title = "Disasters Network")+ theme_void()

List of the main Network within Continent :

Africa :

=> Biological => Epidemic

=> Climatological => Drought

=> Hydrological => Flood

=> Technological => Transport Accident

America :

=> Metereological=> Storm ;

=> Hydrological => Flood ;

=> Technological => Transport Accident ;

Asia :

=> Geophysical=> Earthquake ;

=> Hydrological => Flood ;

=> Metereological => Storm ;

=> Technological => Transport Accident ;

Europe :

=> Geophysical=> Earthquake ;

=> Hydrological => Flood ;

=> Metereological => Storm ;

=> Technological => Transport Accident ;

Oceania :

=>Geophysical => Earthquake ;

=> Hydrological => Flood ;

=> Metereological => Storm ;

Since Oceania has a low number of Disasters compared to Asia, America and Africa, I decided to not consider it in the rest of the analysis.

I have therefore set an entry criteria of 3000 cases of Disasters per Continent.

Also, to avoid the overplot, I have decided to consider only the Countries with a total number of Disasters higher than 50 units.

At the end of these filters we are left with 101 Countries divided between Asia, America, Africa and Europe in this way :

Data4= Data1 %>% dplyr::filter(Continent != "Oceania")
Data4= Data4 %>% dplyr::group_by(Country) %>%
  dplyr::summarise(num=n())
Data4= Data4 %>% dplyr::filter(num>50 )



Data5 = merge(Data1, Data4, by= "Country")

Data1 <- subset( Data5, select = -num )


Data3=Data1 %>% dplyr::select(Continent, Country) %>% 
  dplyr::group_by(Continent,Country)%>%
dplyr::summarise(num=n())

Data3=Data3 %>% dplyr::group_by(Continent)%>%
dplyr::summarise(num=n())

head(Data3)

Since the distribution of Disasters, and also the Type of Disasters, has changed over the years, I have decided to consider only the second period, from 1961 to 2020, for the rest of the analysis.

The mean and median of the number of Disasters in the first period (1900 - 1960) are respectively:

PCA_data_1=Data1 %>% dplyr::select(Year, Disaster) %>% 
  dplyr::filter(Year<=1960) %>%
  dplyr::group_by(Year)%>%
dplyr::summarise(num=n())




mean(PCA_data_1$num)
## [1] 18.62295
median(PCA_data_1$num)
## [1] 17

The mean and median of the number of Disasters in the second period (1960 - 2020) are respectively:

PCA_data_1=Data1 %>% dplyr::select(Year, Disaster) %>% 
  dplyr::filter(Year>1960) %>%
  dplyr::group_by(Year)%>%
dplyr::summarise(num=n())


mean(PCA_data_1$num)
## [1] 341.5246
median(PCA_data_1$num)
## [1] 360

In the next part of the analysis we will go deeper into the relationship that exists between our variables, for this reason I have decided to use, instead of the Disaster Group, the different Types of which are part of it, we therefore obtain 17 variables …

PCA_data_1=Data1 %>%
  group_by(Continent,Country,Type) %>%
dplyr::summarise(num=n())


PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)
variable.names(PCA_data_1)
##  [1] "Continent"              "Country"                "Drought"               
##  [4] "Earthquake"             "Epidemic"               "Extreme temperature"   
##  [7] "Flood"                  "Industrial accident"    "Insect infestation"    
## [10] "Landslide"              "Miscellaneous accident" "Storm"                 
## [13] "Transport accident"     "Wildfire"               "Volcanic activity"     
## [16] "Glacial lake outburst"  "Fog"

To do this I used the “pivot_wider” function assigning to each Disaster Type the number of times it has occurred.

Moreover, to get a more accurate analysis, I decided to produce the Correlation Plot and the PCA for each Continent.

Using the PCA we reduce the size of our original data and build a two-dimensional model. With the PCA on numerical variables, in this case between the number of times that each state in the world has a Type of Disaster in a given period of time, we can see which of these indices are correlated and also to analyze the similarity between the different Nations or Continents.

  • The coordinates of each arrow measure how much the variables are associated with respect to the two dimensions.

  • The length of each arrow measures how well the variables are represented in this plane.

  • When interpreting the graph, only the longer arrows must be considered.

  • The closer the arrows are (the more the angle between variables is reduced) the more the variables are correlated with each other.

  • If the arrows have the same direction but opposite the variables are strongly negatively correlated.

  • If two arrows form an angle of 90 degrees to each other, it means that they are not related to each other

  • If the individuals are close to the center of the plane it means they are close to the average, if they are far it means they are far from the average.

CORRELATION / PCA => ASIA

CORRELATION : ASIA

library(ggcorrplot)
library(FactoMineR) #for PCA
library(factoextra) #for better PCA visualizations

library(GGally) #for parallel coordinates
library(d3r)
library(parcoords)
library(dplyr)

PCA_data_1=Data1 %>% filter(Continent=="Asia")

PCA_data_1=PCA_data_1 %>%
  group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())


PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)


PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable

PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )




ggcorr(PCA_data_1,method = c("everything", "pearson")) 

Thanks to the correlation matrix it is possible to discover some interesting correlations between the different Types of Disasters:

Glacial lake outburst <-> Extreme temperature

Transport Accident <-> Flood

Transport Accident <-> Epidemic

Transport Accident <-> Extreme temperature

miscellaneous Accident <-> Flood

Landslide <-> Flood

Industrial Accident <-> Drought

PCA : ASIA 1960-2020

Scree plot represent the percentage of explained variance / information for each principal components.

the first principal components will explain : 51 %

the second principal components will explain : 16 %

Using the PCA method, instead of considering all the variables of our dataset, we build two variables (Dimension 1 and Dimension 2) that are able to explain information equal to the sum of the percentages that the first two principal components have.

In this case the first two dimensions are able to explain 67% of our total data.

India-China-Indonesia

PCA_data_1=Data1 %>% filter(Year>1960,Continent=="Asia")




PCA_data_1=PCA_data_1 %>%
  group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())


PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)

PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )

PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable



PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data



#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))

fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)

In this case the most important variables for represent the First Dimension are :

Flood ;

Miscellaneous accident ;

In this case the most important variables for represent the Second Dimension are :

Volcanic activity ;

Wildfire ;

We can see three main correlation groups :

-1. Glacial lake outburst - Epidemic - Extreme temperature -Transport Accident

-2. Insect infestation - Flood - Miscellanous Accident - Storm - Landslide

-3. Industrial Accident - Volcanic Activity - Earthquake - Drought - Wildfire

RESULT :

  • China and Indonesia are correlated with Industrial Accident, Earthquake and Drought;

  • Philippines is correlated with Storm, Landslide and Drought;

  • Pakistan, Bangladesh and India are correlated with Glacial lake outburst, Epidemic, Extreme temperature and Transport Accident

PCA 2

PCA_data_1=Data1 %>% filter(Year>1960,Continent=="Asia")




PCA_data_1=PCA_data_1 %>%
  group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())


PCA_data_1=PCA_data_1 %>% dplyr::filter(Country !="India", Country!="China", Country!="Philippines (the)", Country!="Indonesia", Country!="Pakistan")

PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)

PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )

PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable



PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data



#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))

fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)

In this case the most important variables for represent the First Dimension are :

Transport Accident ;

Extreme Temperature ;

In this case the most important variables for represent the Second Dimension are :

Drought ;

We can see four main correlation groups :

-1. Miscellanous Accident - Volcanic Activity - Industrial Accident - Earthquake- Extreme temperature -Transport Accident

-2. Insect infestation - Flood - Landslide - Epidemic

-3. Drought

-4. Wildfire

RESULT :

  • Japan, Turkey and Iran, are correlated with the first Group ;

  • Thailand, Nepal, Vietnam, Afghanistan ;

  • Sri Lanka and Napal are correlated with third ;

  • Yemen, Irak, Hong Kong and Taiwan are correlated with the fourth Group ;

CORRELATION / PCA => AMERICA

CORRELATION: AMERICA

PCA_data_1=Data1 %>%
  filter(Continent=="Americas")

PCA_data_1=PCA_data_1 %>%
  group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())

PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)



PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable

PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )



ggcorr(PCA_data_1,method = c("everything", "pearson")) 

Thanks to the correlation matrix it is possible to discover some interesting correlations between the different Types of Disasters:

Industrial Accident <-> Storm

Transport Accident <-> Flood

miscellaneous Accident <-> Flood

Wildfire <-> Storm

Wildfire <-> Extreme temperature

Insect infestation <-> Landslide

Extreme temperature <-> Storm

Extreme temperaturefire <-> Industrial accident

PCA : AMERICA 1960-2020

Scree plot represent the percentage of explained variance / information for each principal components .

the first principal components will explain : 51 %

the second principal components will explain : 22 %

Using the PCA method, instead of considering all the variables of our dataset, we build two variables (Dimension 1 and Dimension 2) that are able to explain information equal to the sum of the percentages that the first two principal components have.

In this case the first two dimensions are able to explain 73 % of our total data.

PCA_data_1=Data1 %>% filter(Year>1960)%>%
  filter(Continent=="Americas")

PCA_data_1=PCA_data_1 %>%
  group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())




PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)


PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable

PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )

PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data



#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))

`

fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)

In this case the most important variables for represent the First Dimension are :

Drought ;

Flood ;

Miscellaneous accident ;

In this case the most important variables for represent the Second Dimension are :

Epidemic ;

Insect infestation ;

We can see four main correlation groups :

-1. Wildfire - Industrial Accident - Miscellanous Accident - Extreme temperature - Storm

-2. Flood - Drought

-3. Transport Accident - Earthquake

-4. Epidemic - Volcanic activity - Insect infestation - Landslide

RESULT :

  • United States is correlated with the first group;

  • Mexico is correlated with the second group;

  • Brazil is between the third goup and the fourth;

  • Colombia and Peru are correlated with the fourth group;

PCA 2

PCA_data_1=Data1 %>% filter(Year>1960,Continent=="Americas")




PCA_data_1=PCA_data_1 %>%
  group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())


PCA_data_1=PCA_data_1 %>% dplyr::filter(Country !="United States of America (the)", Country!="Canada", Country!="Brazil", Country!="Colombia", Country!="Peru")

PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)

PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )

PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable



PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data



#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))

fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)

In this case the most important variables for represent the First Dimension are :

Extreme Temperature ;

Transport Accident ;

In this case the most important variables for represent the Second Dimension are :

Drought ;

We can see four main correlation groups :

-1. Wildfire - Volcanic activity - Landslide - Earthquake

-2. Industrial Accident - Miscellanous Accident - Extreme temperature - Storm -Flood

-3. Drought

-4. Epidemic

RESULT :

  • Ecuador, Guatemala and Chile are correlated with the first group;

  • Mexico and Argentina are correlated with the second group;

  • Haiti and Bolivia are correlated with the third Group ;

  • Nicaragua and Costa Rica are correlated with the fourth group;

CORRELATION / PCA => AFRICA

CORRELATION : AFRICA

PCA_data_1=Data1 %>% 
  filter(Continent=="Africa")

PCA_data_1=PCA_data_1 %>%
  group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())


PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)


PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
PCA_data_1 <- PCA_data_1[,-1] #remove country variable

PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )


PCA_data_1[is.na(PCA_data_1)] <- 0


PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data


ggcorr(PCA_data_1,method = c("everything", "pearson")) 

Thanks to the correlation matrix it is possible to discover some interesting correlations between the different Types of Disasters:

Transport Accident <-> Industrial accident

Transport Accident <-> Miscellaneous accident

PCA : AFRICA 1960-2020

Scree plot represent the percentage of explained variance / information for each principal components.

the first principal components will explain : 25 %

the second principal components will explain : 14 %

Using the PCA method, instead of considering all the variables of our dataset, we build two variables (Dimension 1 and Dimension 2) that are able to explain information equal to the sum of the percentages that the first two principal components have.

In this case the first two dimensions are able to explain 39% of our total data.

PCA_data_1=Data1 %>% filter(Year>1960)%>%
  filter(Continent=="Africa")

PCA_data_1=PCA_data_1 %>%
  group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())


PCA_data_1 <- PCA_data_1[PCA_data_1$num > quantile(PCA_data_1$num, .25) - 1.5*IQR(PCA_data_1$num) & 
        PCA_data_1$num < quantile(PCA_data_1$num, .75) + 1.5*IQR(PCA_data_1$num), ] 




PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)


PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable

PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )

PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data




#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))

fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)

In this case the most important variables for represent the First Dimension are :

Miscellaneous accident ;

Industrial accident ;

In this case the most important variables for represent the Second Dimension are :

Drought ;

Insect Infestation ;

We can see four main correlation groups :

-1. Extreme temperature - Storm - Wildfire

-2. Earthquake - Transport Accident - Industrial accident - Miscellanous Accident - Earthquake

-3. Landslide - Epidemic - Volcanic activity - Flood

-4. Drought - Insect infestation

RESULT :

  • South Africa, Algeria, Morocco, Egypt are correlated with the first group;

  • Nigeria is correlated with the second group;

  • Tanzania, Uganda, Kenya and Sudan are correlated with the third group;

  • Niger, Somalia, Cameroon and Angola are correlated with the fourth group;

CORRELATION / PCA => EUROPE

CORRELATION : Europe

PCA_data_1=Data1 %>% 
  filter(Continent=="Europe")

PCA_data_1=PCA_data_1 %>%
  group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())


PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)


PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
PCA_data_1 <- PCA_data_1[,-1] #remove country variable

PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )


PCA_data_1[is.na(PCA_data_1)] <- 0


PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data


ggcorr(PCA_data_1,method = c("everything", "pearson")) 

Thanks to the correlation matrix it is possible to discover some interesting correlations between the different Types of Disasters:

Insect infestation <-> Miscellaneous accident

Epidemic <-> Miscellaneous accident

Insect infestation <-> Transport accident

Insect infestation <-> Epidemic

Wildfire <-> Drought

Transport Accident <-> Miscellaneous accident

PCA : EUROPE 1960-2020

Scree plot represent the percentage of explained variance / information for each principal components.

the first principal components will explain : 35 %

the second principal components will explain : 19 %

Using the PCA method, instead of considering all the variables of our dataset, we build two variables (Dimension 1 and Dimension 2) that are able to explain information equal to the sum of the percentages that the first two principal components have.

In this case the first two dimensions are able to explain 54 % of our total data.

PCA_data_1=Data1 %>% filter(Year>1960,Continent=="Europe")

PCA_data_1=PCA_data_1 %>%
  group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())




PCA_data_1 <- PCA_data_1[PCA_data_1$num > quantile(PCA_data_1$num, .25) - 1.5*IQR(PCA_data_1$num) & 
        PCA_data_1$num < quantile(PCA_data_1$num, .75) + 1.5*IQR(PCA_data_1$num), ] 

PCA_data_1=PCA_data_1 %>%
  pivot_wider(names_from = Type , values_from = num)



PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable

PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )



PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data



#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))

fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)

In this case the most important variables for represent the First Dimension are :

Wildfire ;

Transport accident ;

In this case the most important variables for represent the Second Dimension are :

Volcanic activity ;

Earthquake ;

We can see three main correlation groups :

-1. Extreme temperature - Insect infestation - Industrial Accident - Miscellanous Accident

-2. Drought - Flood - Transport Accident - Wildfire

-3. Earthquake - Volcanic activity

RESULT :

  • Russia and Spain are correlated with the first group;

  • France is correlated with the second group;

  • Italy is correlated with the fourth group;

TEXTUAL DATA

Regarding the textual data I have chosen a Communication from the European Commission to the European Parliament of 24.2.2021. Forging a climate-resilient Europe - the new EU Strategy on Adaptation to Climate Changes.

Link = https://ec.europa.eu/clima/sites/clima/files/adaptation/what/docs/eu_strategy_2021.pdf

A word cloud is a visual representation of text data. Tags are single words, and the importance of each tag is shown with font size. This format is useful for quickly perceiving the most important terms of the paper, bigger are the words, more often the words are used..

To build this Word Cloud I used Voyant (https://voyant-tools.org), an application that by default removes stop-words to show only the most important words.

word clouds

word clouds

Result :

As we can see the most often used words in the document are Climate, Adaptation and Eu.

In any case, the Word Cloud does not show the context and therefore the real meaning in which these words have been used, but only their frequency into the document.

To understand the semantic similarity and meaning of the words used in the text I chose to use a simple block diagram where the line width refers to the frequency with each link between words is used and the number of links between a word and another measure the similarity between words .

Links

Links

The most similar words are :

Climate = change, related, resilience, impacts;

Eu = support, level, action;

Adaption = action, solution;

The link most often used in the document is :

Climate - Change.

END