library(tidyverse)
library(lubridate)
library(plotly)
library(data.table)
library(patchwork)
library(scales)
library(viridis)
library(scales)
library(geometry)
library(MASS)
library(gridExtra)
library(utilities)
library(mapview)
library(forcats)
library(rnaturalearth)
library(rnaturalearthdata)
library(rgdal)
library(sp)
library(sf)
library(leaflet)
library(utils)
library(base)
theme_set(theme_bw())
theme_set(theme_minimal())
library(readxl)
Data6 <- read_excel("emdat_public_2021_03_02_query_uid-0CR386.xlsx")
In 1988, the Centre for Research on the Epidemiology of Disasters (CRED) launched the Emergency Events Database (EM-DAT). EM-DAT was created with the initial support of the World Health Organisation (WHO).
EM-DAT contains essential core data on the occurrence and effects of over 22,000 mass disasters in the world from 1900 to the present day. The database is compiled from various sources, including UN agencies, non-governmental organisations, insurance companies, research institutes and press agencies.
Link (EM-DAT) = https://www.emdat.be
Continent: The Continent in which the disaster has occurred or had an impact;
Region: The region to which the country belongs.
Country: The Country in which the disaster has occurred or had an impact;
ISO Code: The International Organization for Standardization attributes a 3-letter code to each country. CRED uses the ISO 3166 . This field is automatically linked to the country.
Disaster: The disaster category is divided into 7 Groups: Geophysical, Meteorological, Hydrological, Climatological, Biological, Extra-terrestrial and Technological disasters.
Each Disaster Group includes the following Disaster Types:
Geophysical = Earthquake ; Volcanic activity ; Mass Movement
Meteorological = Storm ; Extreme Temperature ; Fog ;
Hydrological = Flood ; Landslide ; Wave action ;
Climatological = Drought ; Glacial lake outburst ; Wildfire ;
Biological = Epidemic ; Insect Infestation ; Animal accident ;
Extra-terrestrial = Impact; Space weather;
Technological = Industrial accident ; Miscellaneous accident ;
Transport accident ;
Deaths: Number of people who lost their life because the event happened.
Total affected: The total affected is the sum of injured, affected and homeless
Entry criteria: The reason for recording the disaster event into EM-DAT. At least one of the following criteria must be fulfilled in order for an event to be entered into the database:
Deaths: 10 or more people deaths
Affected: 100 or more people affected/injured/homeless.
Our Dataset is made up of 9 columns and 24,829 rows [24,829 x 9]:
library(dplyr)
library(raster)
Data0=Data6 %>%rename( Affected= `Total Affected`)%>%
rename( Disaster= `Disaster Subgroup`) %>%
rename( Deaths= `Total Deaths`)%>%
rename( Type= `Disaster Type`)
Data=Data0
Data$Affected[is.na(Data$Affected)] <- 0
Data$Deaths[is.na(Data$Deaths)] <- 0
Data<-Data %>% dplyr::select(Year, Disaster, Type, Country,ISO, Deaths, Affected,Region,Continent)
Data1=Data %>%
filter(Disaster!= "Complex Disasters")%>%
filter(Disaster!= "Extra-terrestrial")
Data2= Data1 %>% dplyr::select(Continent, Disaster, Country, ISO)
sum=Data2 %>%
group_by(ISO) %>%
summarise(num=n())
s <- shapefile("C:/Users/User/Desktop/UNI/2 Anno/ADV/PROJECT/Proj. Claudio Olivari/Proj/TM_WORLD_BORDERS_SIMPL-0.3.shp")
sum=sum %>% dplyr::rename( ISO3= ISO)
Data2=Data2 %>% dplyr::rename( ISO3= ISO)
s %>% merge(sum,by="ISO3", duplicateGeoms = TRUE) -> sum1
sum$num <- as.numeric(sum$num)
str(Data1)
## tibble [24,829 x 9] (S3: tbl_df/tbl/data.frame)
## $ Year : chr [1:24829] "1900" "1900" "1901" "1902" ...
## $ Disaster : chr [1:24829] "Climatological" "Climatological" "Technological" "Geophysical" ...
## $ Type : chr [1:24829] "Drought" "Drought" "Industrial accident" "Earthquake" ...
## $ Country : chr [1:24829] "Cabo Verde" "India" "Belgium" "Guatemala" ...
## $ ISO : chr [1:24829] "CPV" "IND" "BEL" "GTM" ...
## $ Deaths : num [1:24829] 11000 1250000 18 2000 1000 6000 76 17 84 15 ...
## $ Affected : num [1:24829] 0 0 0 0 0 0 23 0 0 2 ...
## $ Region : chr [1:24829] "Western Africa" "Southern Asia" "Western Europe" "Central America" ...
## $ Continent: chr [1:24829] "Africa" "Asia" "Europe" "Americas" ...
This graphs shows the behavior of N. Disaster / Affected / Deaths over the time period from 1900 to 2020. The series are divided into different colors with respect to the Continent each Country belongs.
The Continent with the greatest number of disasters is Asia .
dat=Data1 %>%
group_by(Year, Continent) %>%
summarise(num=n())
dat=as.data.frame(dat)
dat$Disaster=as.factor(dat$Continent)
dat %>%
streamgraph("Disaster", "num", "Year", offset="zero", interpolate="cardinal") %>%
sg_axis_x(2, "Year", "%Y") %>%
sg_fill_brewer("Set1") %>%
sg_legend(TRUE, "Continent: ")%>%
sg_annotate(label="from 1900 to 2020",
x=as.Date("1900-01-01"), y=800, color="#000000", size=18)
The Continentt with the greatest number of Affected is Asia .
sum6 <- Data1 %>%
dplyr::select(Year,Continent) %>%
group_by(Year,Continent) %>%
summarise(num=n())
Affected=aggregate(Data1$Affected, by=list(Year=Data1$Year,Continent=Data1$Continent), FUN=sum)
sum6 <- subset( sum6, select = -num )
df3 = merge(sum6, Affected, by.x=c("Year", "Continent"), by.y=c("Year", "Continent"))
df3$x=as.integer(df3$x)
df3$Year=as.Date(ISOdate(df3$Year, 1, 1)) # beginning of year
df3$Year=as.Date(ISOdate(df3$Year, 12, 31)) # end of year
df3$Disaster=as.factor(df3$Continent)
# save th
df3 %>%
streamgraph("Continent", "x", "Year", offset="zero", interpolate="cardinal") %>%
sg_axis_x(2, "Year", "%Y") %>%
sg_fill_brewer("Set1") %>%
sg_legend(TRUE, "Continent: ")%>%
sg_annotate(label="from 1900 to 2020",
x=as.Date("1900-01-01"), y=600000000, color="#000000", size=18)
The Continent with the greatest number of Deaths is Asia .
sum6 <- Data1 %>%
dplyr::select(Year,Continent) %>%
group_by(Year,Continent) %>%
summarise(num=n())
Deaths=aggregate(Data1$Deaths, by=list(Year=Data1$Year,Continent=Data1$Continent), FUN=sum)
sum6 <- subset( sum6, select = -num )
df3 = merge(sum6, Deaths, by.x=c("Year", "Continent"), by.y=c("Year", "Continent"))
df3$x=as.integer(df3$x)
df3$Year=as.Date(ISOdate(df3$Year, 1, 1)) # beginning of year
df3$Year=as.Date(ISOdate(df3$Year, 12, 31)) # end of year
df3$Disaster=as.factor(df3$Continent)
# save th
df3 %>%
streamgraph("Continent", "x", "Year", offset="zero", interpolate="cardinal") %>%
sg_axis_x(2, "Year", "%Y") %>%
sg_fill_brewer("Set1") %>%
sg_legend(TRUE, "Continent: ")%>%
sg_annotate(label="from 1900 to 2020",
x=as.Date("1900-01-01"), y=3400000, color="#000000", size=18)
Findings :
In the first two distributions we can easily see that from 1900 to 1960 the distribution presents very low values, while starting from about 1960 we begin to see an increase which will peak around the 2000s.
In the Deaths, instead, from 1900 to 1965 the distribution presents very high values, while starting from about 1966 we can see the number of deaths from disasters started to decline .
The N. Deaths / N. Disasters ratio has drastically decreased over the past 120 years.
As we have seen in previous distributions, the number of disasters and the number and Deaths changes dramatically between 1900-1960 and 1960-2020.
To better visualize this difference and in order to measure the severity of each Disaster Group, I decided to use the Disaster Groups Density Plot with respect to the number of Deaths and the number of Disasters that occurred in each Country during these two time periods
Since the two periods have very different ranges and to avoid the “flattening” of the plots of the different Types of Disasters, I decided to show two different graphs, one for the first period and one for the second and to remove outliers for both the periods.
library(ggplot2)
library(hrbrthemes)
library(dplyr)
library(viridis)
library(ggridges)
library(extrafont)
Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>% dplyr::filter(Year<1960, Deaths>1) %>%
group_by(Disaster,Country,Continent) %>%
summarise(num=n())
Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country) %>% dplyr::filter(Year<1960 , Deaths>1)
Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)
Deaths2=Deaths2 %>%rename( Disaster= Group.2)%>%
rename( Continent= Category)%>%rename( Country= Group.3)%>%rename( Deaths= x)
Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))
Density <- Density[Density$num > quantile(Density$num, .25) - 1.5*IQR(Density$num) &
Density$num < quantile(Density$num, .75) + 1.5*IQR(Density$num), ]
Density <- Density[Density$Deaths > quantile(Density$Deaths, .25) - 1.5*IQR(Density$Deaths) &
Density$Deaths < quantile(Density$Deaths, .75) + 1.5*IQR(Density$Deaths), ]
Density <- transform(Density, ratio = Deaths / num)
Density <- Density[Density$ratio > quantile(Density$ratio, .25) - 1.5*IQR(Density$ratio) &
Density$ratio < quantile(Density$ratio, .75) + 1.5*IQR(Density$ratio), ]
Density=Density %>% dplyr::filter(ratio<1000)
Density1=Density %>%
mutate(text = fct_reorder(Disaster, ratio)) %>%
ggplot( aes(y=Disaster, x=ratio, fill=Disaster)) +
geom_density_ridges(alpha=0.8, bandwidth=2) +
scale_fill_viridis(discrete=TRUE) +
scale_color_viridis(discrete=TRUE) +
theme_ipsum() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 10)
) +
xlab("") +
ylab("Disaster Group 1900-1960")
Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>% dplyr::filter(Year>1960, Deaths>1) %>%
group_by(Disaster,Country,Continent) %>%
summarise(num=n())
Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country) %>% dplyr::filter(Year>1960 , Deaths>1)
Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)
Deaths2=Deaths2 %>%rename( Disaster= Group.2)%>%
rename( Continent= Category)%>%rename( Country= Group.3)%>%rename( Deaths= x)
Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))
Density <- Density[Density$num > quantile(Density$num, .25) - 1.5*IQR(Density$num) &
Density$num < quantile(Density$num, .75) + 1.5*IQR(Density$num), ]
Density <- Density[Density$Deaths > quantile(Density$Deaths, .25) - 1.5*IQR(Density$Deaths) &
Density$Deaths < quantile(Density$Deaths, .75) + 1.5*IQR(Density$Deaths), ]
Density <- transform(Density, ratio = Deaths / num)
Density <- Density[Density$ratio > quantile(Density$ratio, .25) - 1.5*IQR(Density$ratio) &
Density$ratio < quantile(Density$ratio, .75) + 1.5*IQR(Density$ratio), ]
Density2=Density %>%
mutate(text = fct_reorder(Disaster, ratio)) %>%
ggplot( aes(y=Disaster, x=ratio, fill=Disaster)) +
geom_density_ridges(alpha=0.8, bandwidth=2) +
scale_fill_viridis(discrete=TRUE) +
scale_color_viridis(discrete=TRUE) +
theme_ipsum() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 10)
) +
xlab("") +
ylab("Disaster Group 1960-2020")
library(scater)
multiplot(Density1, Density2)
Findings 1900-1960 :
In the first period there is a very high variance with a range of N. deaths / N. disasters between 0 and 1000
Deaths due to Biological causes are considered outliers for this reason they are not represented.
Disasters always cause many deaths.
Climatological, Geophysical, Meteorological and Hydrological Disasters have very similar distributions.
Findings 1960-2020 :
In the second period there is a variance with a range of N. deaths / N. disasters between 0 and 100.
Technological and Biological Disasters have very similar distributions and are the Groups that cause the most Disaster deaths.
Geophysical, Meteorological and Hydrological Disasters have very similar distributions and tend to cause few Disaster deaths.
Using the Treemap we have a hierarchical view of the continents and their respective states with respect to the total number of disasters.
Each Continent is represented with a different color and contains the states that are part of it, their size is proportional to the total number of disasters it has presented in the period of time from 1900 to 2020.
The interactivity of the Treemap also allows us to see how the Disasters are distributed among the Countries of the same Continent and to check the exact number of Disasters that occurred from 1900 to 2020 in each Country.
inter
Using the Choropleth map we have a geographic view of the total number of disasters that occurred in the period 1900-2020 in each state of the world where each state is colored against this number.
In this section I have decided to represent the world with respect to each Disaster Group by making it possible to compare the different Continents (or states) with each other but not between the different Types of Disasters because, since we have very different ranges for each Type of Disaster, we are using different color scales in each type of choroplet plot.
With the frequency distributionit is possible to observe the distribution of each Group of disasters by single Continet over the years and associate the peaks with the actual Disasters that occurred in that period .
Results :
Asia is the continent with the largest number of technological disasters ;
Europe is the continent with the least number of technological disasters ;
India and China are the states with the greatest number of technological disasters ;
library(dplyr)
tech <- subset(Data2, Disaster == "Technological")
tech=tech %>%
group_by(Country,ISO3) %>%
summarise(num=n())
s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2
library(RColorBrewer)
mybins <- c(0,100,200,300,400,500,600,700, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
# Prepare the text for tooltips:
mytext <- paste(
"Country: ", sum_2$NAME,"<br/>",
"num: ", round(sum_2$num, 2),
sep="") %>%
lapply(htmltools::HTML)
# Final Map
l=leaflet(tech) %>%
addTiles() %>%
setView( lat=10, lng=0 , zoom=2) %>%
addPolygons(data = sum_2,
fillColor = ~mypalette(sum_2$num),
stroke=TRUE,
fillOpacity = 0.9,
color="white",
weight=0.3,
label = mytext,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "13px",
direction = "auto"
)
) %>%
leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
group_by(Year,Continent, Disaster) %>%
filter(Disaster== "Technological")%>%
summarise(num=n())
library(dygraphs)
library(xts) # To make the convertion
States1=States %>%
pivot_wider(names_from = Continent, values_from = num)
States1[is.na(States1)] <- 0
States1$Year=as.Date(ISOdate(States1$Year, 1, 1)) # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31)) # end of year
# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)
# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library(hrbrthemes)
library(plyr)
Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
dplyr::group_by(Disaster,Country,Continent) %>%
dplyr::summarise(num=n())
Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)
Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)
Deaths2=Deaths2 %>% dplyr::rename( Disaster = Group.2) %>%
dplyr::rename( Continent = Category) %>% dplyr::rename( Country = Group.3) %>% dplyr::rename( Deaths = x)
Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))
Density <- transform(Density, ratio = Deaths / num)
op4 <- Density %>%
filter(Disaster == "Technological")
op4<-op4[order(-op4[,6]),]
op4<-op4[1:10,]
op4 %>%
dplyr::arrange(ratio) %>%
mutate(Country=factor(Country, Country)) %>%
ggplot( aes(x=Country, y=ratio) ) +
geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
geom_point(size=5, color="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("") +
ylab("Top Country - Technological)")
Results :
Africa is the continent with the largest number of Biological disasters ;
Europe and Americas are the continents with the least numbers of Biological disasters ;
Nigeria, Congo and India are the states with the greatest number of technological disasters ;
library(dplyr)
library(xts)
tech <- subset(Data2, Disaster == "Biological")
tech=tech %>%
dplyr::group_by(Country,ISO3) %>%
dplyr::summarise(num=n())
s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2
library(RColorBrewer)
mybins <- c(0,10,20,30,40,50,60, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
# Prepare the text for tooltips:
mytext <- paste(
"Country: ", sum_2$NAME,"<br/>",
"num: ", round(sum_2$num, 2),
sep="") %>%
lapply(htmltools::HTML)
# Final Map
l=leaflet(tech) %>%
addTiles() %>%
setView( lat=10, lng=0 , zoom=2) %>%
addPolygons(data = sum_2,
fillColor = ~mypalette(sum_2$num),
stroke=TRUE,
fillOpacity = 0.9,
color="white",
weight=0.3,
label = mytext,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "13px",
direction = "auto"
)
) %>%
leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
dplyr::group_by(Year,Continent, Disaster) %>%
dplyr::filter(Disaster== "Biological")%>%
dplyr::summarise(num=n())
library(dygraphs)
library(xts) # To make the convertion
States1=States %>%
pivot_wider(names_from = Continent, values_from = num)
States1[is.na(States1)] <- 0
States1$Year=as.Date(ISOdate(States1$Year, 1, 1)) # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31)) # end of year
# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)
# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library( hrbrthemes)
Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
dplyr::group_by(Disaster,Country,Continent) %>%
dplyr::summarise(num=n())
Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)
Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)
Deaths2=Deaths2 %>%dplyr::rename( Disaster= Group.2)%>%
dplyr::rename( Continent= Category)%>%dplyr::rename( Country= Group.3)%>%dplyr::rename( Deaths= x)
Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))
Density <- transform(Density, ratio = Deaths / num)
op4 <- Density %>%
filter(Disaster == "Biological")
op4<-op4[order(-op4[,6]),]
op4<-op4[1:10,]
op4 %>%
dplyr::arrange(ratio) %>%
mutate(Country=factor(Country, Country)) %>%
ggplot( aes(x=Country, y=ratio) ) +
geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
geom_point(size=5, color="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("") +
ylab("Top Country - Biological")
Results :
Asia is the continent with the largest number of Geophysical disasters ;
Europe and Africa are the continents with the least numbers of Geophysical disasters ;
Cina and Indonesia are the states with the greatest number of Geophysical disasters ;
library(dplyr)
tech <- subset(Data2, Disaster == "Geophysical")
tech=tech %>%
dplyr::group_by(Country,ISO3) %>%
dplyr::summarise(num=n())
s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2
library(RColorBrewer)
mybins <- c(0,30,60,90,120,150,180, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
# Prepare the text for tooltips:
mytext <- paste(
"Country: ", sum_2$NAME,"<br/>",
"num: ", round(sum_2$num, 2),
sep="") %>%
lapply(htmltools::HTML)
# Final Map
l=leaflet(tech) %>%
addTiles() %>%
setView( lat=10, lng=0 , zoom=2) %>%
addPolygons(data = sum_2,
fillColor = ~mypalette(sum_2$num),
stroke=TRUE,
fillOpacity = 0.9,
color="white",
weight=0.3,
label = mytext,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "13px",
direction = "auto"
)
) %>%
leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
dplyr::group_by(Year,Continent, Disaster) %>%
dplyr::filter(Disaster== "Geophysical")%>%
dplyr::summarise(num=n())
library(dygraphs)
library(xts) # To make the convertion
States1=States %>%
pivot_wider(names_from = Continent, values_from = num)
States1[is.na(States1)] <- 0
States1$Year=as.Date(ISOdate(States1$Year, 1, 1)) # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31)) # end of year
# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)
# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library( hrbrthemes)
Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
dplyr::group_by(Disaster,Country,Continent) %>%
dplyr::summarise(num=n())
Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)
Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)
Deaths2=Deaths2 %>%dplyr::rename( Disaster= Group.2)%>%
dplyr::rename( Continent= Category)%>%dplyr::rename( Country= Group.3)%>%dplyr::rename( Deaths= x)
Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))
Density <- transform(Density, ratio = Deaths / num)
op4 <- Density %>%
filter(Disaster == "Geophysical")
op4<-op4[order(-op4[,6]),]
op4<-op4[1:10,]
op4 %>%
dplyr::arrange(ratio) %>%
mutate(Country=factor(Country, Country)) %>%
ggplot( aes(x=Country, y=ratio) ) +
geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
geom_point(size=5, color="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("") +
ylab("Top Country - Geophysical")
Results :
Americas and Asia are the continent with the largest number of Meteorological disasters ;
Europe and Africa are the continents with the least numbers of Meteorological disasters ;
Usa, Cina and India are the states with the greatest number of Meteorological disasters ;
library(dplyr)
tech <- subset(Data2, Disaster == "Meteorological")
tech=tech %>%
dplyr::group_by(Country,ISO3) %>%
dplyr::summarise(num=n())
s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2
library(RColorBrewer)
mybins <- c(0,50,100,150,200,250,300,400,450, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
# Prepare the text for tooltips:
mytext <- paste(
"Country: ", sum_2$NAME,"<br/>",
"num: ", round(sum_2$num, 2),
sep="") %>%
lapply(htmltools::HTML)
# Final Map
l=leaflet(tech) %>%
addTiles() %>%
setView( lat=10, lng=0 , zoom=2) %>%
addPolygons(data = sum_2,
fillColor = ~mypalette(sum_2$num),
stroke=TRUE,
fillOpacity = 0.9,
color="white",
weight=0.3,
label = mytext,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "13px",
direction = "auto"
)
) %>%
leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
dplyr::group_by(Year,Continent, Disaster) %>%
dplyr::filter(Disaster== "Meteorological")%>%
dplyr::summarise(num=n())
library(dygraphs)
library(xts) # To make the convertion
States1=States %>%
pivot_wider(names_from = Continent, values_from = num)
States1[is.na(States1)] <- 0
States1$Year=as.Date(ISOdate(States1$Year, 1, 1)) # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31)) # end of year
# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)
# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library( hrbrthemes)
Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
dplyr::group_by(Disaster,Country,Continent) %>%
dplyr::summarise(num=n())
Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)
Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)
Deaths2=Deaths2 %>%dplyr::rename( Disaster= Group.2)%>%
dplyr::rename( Continent= Category)%>%dplyr::rename( Country= Group.3)%>%dplyr::rename( Deaths= x)
Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))
Density <- transform(Density, ratio = Deaths / num)
op4 <- Density %>%
filter(Disaster == "Meteorological")
op4<-op4[order(-op4[,6]),]
op4<-op4[1:10,]
op4 %>%
dplyr::arrange(ratio) %>%
mutate(Country=factor(Country, Country)) %>%
ggplot( aes(x=Country, y=ratio) ) +
geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
geom_point(size=5, color="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("") +
ylab("Top Country - Meteorological")
Results :
Asia is the continent with the largest number of Hydrological disasters ;
Europe and Africa are the continents with the least numbers of Hydrological disasters ;
Cina, India and Indonesia are the states with the greatest number of Hydrological disasters ;
library(dplyr)
tech <- subset(Data2, Disaster == "Hydrological")
tech=tech %>%
dplyr::group_by(Country,ISO3) %>%
dplyr::summarise(num=n())
s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2
library(RColorBrewer)
mybins <- c(0,40,80,120,160,200,240, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
# Prepare the text for tooltips:
mytext <- paste(
"Country: ", sum_2$NAME,"<br/>",
"num: ", round(sum_2$num, 2),
sep="") %>%
lapply(htmltools::HTML)
# Final Map
l=leaflet(tech) %>%
addTiles() %>%
setView( lat=10, lng=0 , zoom=2) %>%
addPolygons(data = sum_2,
fillColor = ~mypalette(sum_2$num),
stroke=TRUE,
fillOpacity = 0.9,
color="white",
weight=0.3,
label = mytext,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "13px",
direction = "auto"
)
) %>%
leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
dplyr::group_by(Year,Continent, Disaster) %>%
dplyr::filter(Disaster== "Hydrological")%>%
dplyr::summarise(num=n())
library(dygraphs)
library(xts) # To make the convertion
States1=States %>%
pivot_wider(names_from = Continent, values_from = num)
States1[is.na(States1)] <- 0
States1$Year=as.Date(ISOdate(States1$Year, 1, 1)) # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31)) # end of year
# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)
# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library( hrbrthemes)
Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
dplyr::group_by(Disaster,Country,Continent) %>%
dplyr::summarise(num=n())
Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)
Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)
Deaths2=Deaths2 %>%dplyr::rename( Disaster= Group.2)%>%
dplyr::rename( Continent= Category)%>%dplyr::rename( Country= Group.3)%>%dplyr::rename( Deaths= x)
Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))
Density <- transform(Density, ratio = Deaths / num)
op4 <- Density %>%
filter(Disaster == "Hydrological")
op4<-op4[order(-op4[,6]),]
op4<-op4[1:10,]
op4 %>%
dplyr::arrange(ratio) %>%
mutate(Country=factor(Country, Country)) %>%
ggplot( aes(x=Country, y=ratio) ) +
geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
geom_point(size=5, color="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("") +
ylab("Top Country - Hydrological")
Results :
Americas is the continent with the largest number of Climatological disasters ;
Europe and Africa are the continents with the least numbers of Climatological disasters ;
Usa is the states with the greatest number of Climatological disasters ;
library(dplyr)
tech <- subset(Data2, Disaster == "Climatological")
tech=tech %>%
dplyr::group_by(Country,ISO3) %>%
dplyr::summarise(num=n())
s %>% merge(tech,by="ISO3", duplicateGeoms = TRUE) -> sum_2
library(RColorBrewer)
mybins <- c(0,20,40,60,80, Inf)
mypalette <- colorBin( palette="YlOrBr", domain=sum_2$num, na.color="transparent", bins=mybins)
# Prepare the text for tooltips:
mytext <- paste(
"Country: ", sum_2$NAME,"<br/>",
"num: ", round(sum_2$num, 2),
sep="") %>%
lapply(htmltools::HTML)
# Final Map
l=leaflet(tech) %>%
addTiles() %>%
setView( lat=10, lng=0 , zoom=2) %>%
addPolygons(data = sum_2,
fillColor = ~mypalette(sum_2$num),
stroke=TRUE,
fillOpacity = 0.9,
color="white",
weight=0.3,
label = mytext,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "13px",
direction = "auto"
)
) %>%
leaflet::addLegend( pal=mypalette, values=~sum_2$num, opacity=0.9, title = "Tot cases", position = "bottomleft" )
l
States=Data1 %>%
dplyr::group_by(Year,Continent, Disaster) %>%
dplyr::filter(Disaster== "Climatological")%>%
dplyr::summarise(num=n())
library(dygraphs)
library(xts) # To make the convertion
States1=States %>%
pivot_wider(names_from = Continent, values_from = num)
States1[is.na(States1)] <- 0
States1$Year=as.Date(ISOdate(States1$Year, 1, 1)) # beginning of year
States1$Year=as.Date(ISOdate(States1$Year, 12, 31)) # end of year
# Then you can create the xts format:
don=xts( x=States1[,-1], order.by=States1$Year)
# Chart
p <- dygraph(don)
p
# save th
library(kableExtra)
library(patchwork)
library( hrbrthemes)
Disastrer_num= Data %>% dplyr::select(Continent,Year, Disaster,Deaths, Country) %>%
dplyr::group_by(Disaster,Country,Continent) %>%
dplyr::summarise(num=n())
Deaths= Data %>% dplyr::select(Continent, Disaster,Deaths ,Year, Country)
Deaths2=aggregate(Deaths$Deaths, by=list(Category= Deaths$Continent, Deaths$Disaster, Deaths$Country), FUN=sum)
Deaths2=Deaths2 %>%dplyr::rename( Disaster= Group.2)%>%
dplyr::rename( Continent= Category)%>%dplyr::rename( Country= Group.3)%>%dplyr::rename( Deaths= x)
Density = merge(Deaths2, Disastrer_num, by.x=c("Disaster", "Continent","Country"), by.y=c("Disaster", "Continent", "Country"))
Density <- transform(Density, ratio = Deaths / num)
op4 <- Density %>%
filter(Disaster == "Climatological")
op4<-op4[order(-op4[,6]),]
op4<-op4[1:10,]
op4 %>%
dplyr::arrange(ratio) %>%
mutate(Country=factor(Country, Country)) %>%
ggplot( aes(x=Country, y=ratio) ) +
geom_segment( aes(x=Country ,xend=Country, y=0, yend=ratio), color="black") +
geom_point(size=5, color="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("") +
ylab("Top Country - Climatological")
Using the Radar we can see :
Using the same color for each Continent and the same measurement scale for each aspect we want to analyze, makes it easier to compare the different Continents.
## Results {.tabset}
if(!require("V8")) install.packages("V8")
if(!require("devtools")) install.packages("devtools")
library(tidyverse)
library(viridis)
library(patchwork)
library(hrbrthemes)
library(fmsb)
library(colormap)
library(oce)
set.seed(1)
set.seed(1)
sum6 <- Data1 %>%
dplyr::select(Disaster,Country,Continent,Deaths)
sum6=sum6 %>%
group_by(Disaster, Continent) %>%
dplyr::summarise(Tot_Deaths=n())
sum6<-na.omit(sum6)
sum6=sum6 %>%
pivot_wider(names_from = Disaster, values_from = Tot_Deaths)
sum6=cbind(sum6[!sapply(sum6, is.list)],
(t(apply(sum6[sapply(sum6, is.list)], 1, unlist))))
sum6 = sum6[-c(6),]
sum6 <- rbind(c("0","0","0","0","0","0"),sum6)
sum6 <- rbind(c("3975 ","3975 ","3975 "," 3975 ","3975 ","3975 "),sum6)
set.seed(1)
# plot with default options:
# Prepare color
library(colormap)
# Prepare color
# Prepare title
mytitle <- c("Africa", "America", "Asia", "Europe", "Oceania")
# Split the screen in 6 parts
par(mfrow=c(2,3))
par(mar=rep(0.8,4))
sum6 <- subset( sum6, select = -Continent )
sum6$Biological=as.numeric(sum6$Biological)
sum6$Hydrological=as.numeric(sum6$Hydrological)
sum6$Meteorological=as.numeric(sum6$Meteorological)
sum6$Technological=as.numeric(sum6$Technological)
sum6$Geophysical=as.numeric(sum6$Geophysical)
sum6$Climatological=as.numeric(sum6$Climatological)
# Loop for each plot
colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9) )
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4) )
for(i in 1:5){
# Custom the radarChart !
radarchart( sum6[c(1,2,i+2),], axistype=1,
pcol=colors_border , pfcol=colors_in , plwd=4 , plty=2,
#custom the grid
cglcol="black", cglty=1, axislabcol="black", caxislabels=seq(0,4000,1000), cglwd=1,
#custom labels
vlcex=1.2,
#title
title=mytitle[i]
)
}
We can therefore declare that:
The continent with the greatest number of Disasters is Asia and that most of them are due to Disasters belonging to the Technological and Hydrological Group;
In the Americas there are many Disasters belonging to the Technological, Hydrological and Metereological Group.
In Africa there are many Disasters belonging to the Technological, Biological and Hydrological Groups.
Europe and Oceania do not seem to have significant Disaster numbers compared to the number of Disasters in the other Continents.
sum6 <- Data1 %>%
dplyr::select(Disaster,Continent, Affected)
sum6=aggregate(sum6$Affected, by=list(Category= sum6$Continent, sum6$Disaster), FUN=sum)
sum6=sum6 %>%
pivot_wider(names_from = Group.2, values_from = x)
sum6=cbind(sum6[!sapply(sum6, is.list)],
(t(apply(sum6[sapply(sum6, is.list)], 1, unlist))))
sum6 = sum6[-c(6),]
#(sum6,2,min)
#apply(sum6,2,max)
sum6 <- rbind(c("0","0","0","0","0","0"),sum6)
sum6 <- rbind(c("3677294127 ","3677294127 ","3677294127 "," 3677294127 ","3677294127 ","3677294127 "),sum6)
# Asia = Hydrological
# maximum = 367 million
set.seed(1)
# plot with default options:
# Prepare color
library(colormap)
# Prepare color
# Prepare title
mytitle <- c("Africa", "America", "Asia", "Europe", "Oceania")
# Split the screen in 6 parts
par(mfrow=c(2,3))
par(mar=rep(0.8,4))
sum6 <- subset( sum6, select = -Category )
sum6$Biological=as.numeric(sum6$Biological)
sum6$Hydrological=as.numeric(sum6$Hydrological)
sum6$Meteorological=as.numeric(sum6$Meteorological)
sum6$Technological=as.numeric(sum6$Technological)
sum6$Geophysical=as.numeric(sum6$Geophysical)
sum6$Climatological=as.numeric(sum6$Climatological)
# Loop for each plot
colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9) )
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4) )
for(i in 1:5){
# Custom the radarChart !
radarchart( sum6[c(1,2,i+2),], axistype=1,
pcol=colors_border , pfcol=colors_in , plwd=4 , plty=2,
#custom the grid
cglcol="black", cglty=1, axislabcol="black", caxislabels=seq(0,400000000,100000000), cglwd=1,
#custom labels
vlcex=1.2,
#title
title=mytitle[i]
)
}
# Asia = Hydrological
# maximum = 367 million
We can therefore declare that:
The Continent with the greatest number of Deaths is Asia and most of them are due to Disasters belonging to the Climatological, Biological and Hydrological Groups;
The other Continents do not seem to have significant Disaster numbers compared to the number of Disasters in Asia.
sum6 <- Data1 %>%
dplyr::select(Disaster,Continent, Deaths)
sum6=aggregate(sum6$Deaths, by=list(Category= sum6$Continent, sum6$Disaster), FUN=sum)
sum6=sum6 %>%
pivot_wider(names_from = Group.2, values_from = x)
sum6=cbind(sum6[!sapply(sum6, is.list)],
(t(apply(sum6[sapply(sum6, is.list)], 1, unlist))))
sum6 = sum6[-c(6),]
#apply(sum6,2,max)
sum6 <- rbind(c("0","0","0","0","0","0"),sum6)
sum6 <- rbind(c("9664574 ","9664574 ","9664574 "," 9664574 ","9664574 ","9664574 "),sum6)
# Asia = Climatological
# maximum = 9.664.574 = 9 million Deaths
set.seed(1)
# plot with default options:
# Prepare color
library(colormap)
# Prepare color
# Prepare title
mytitle <- c("Africa", "America", "Asia", "Europe", "Oceania")
# Split the screen in 6 parts
par(mfrow=c(2,3))
par(mar=rep(0.8,4))
sum6 <- subset( sum6, select = -Category )
sum6$Biological=as.numeric(sum6$Biological)
sum6$Hydrological=as.numeric(sum6$Hydrological)
sum6$Meteorological=as.numeric(sum6$Meteorological)
sum6$Technological=as.numeric(sum6$Technological)
sum6$Geophysical=as.numeric(sum6$Geophysical)
sum6$Climatological=as.numeric(sum6$Climatological)
# Loop for each plot
colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9) )
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4) )
for(i in 1:5){
# Custom the radarChart !
radarchart( sum6[c(1,2,i+2),], axistype=1,
pcol=colors_border , pfcol=colors_in , plwd=4 , plty=2,
#custom the grid
cglcol="black", cglty=1, axislabcol="black", caxislabels=seq(0,10000000,2500000), cglwd=1,
#custom labels
vlcex=1.2,
#title
title=mytitle[i]
)
}
# Asia = Climatological
# maximum = 9.664.574 = 9 million Deaths
We can therefore declare that:
The continent with the largest number of Affected is Asia and that most of them are due to Disasters belonging to the Hydrological, Biological and Climatological Group;
The other Continents do not seem to have significant Affected numbers compared to the number of Disasters in Asia.
With an Alluvial Plot it is possible to observe how different Continents are connected with Disaster Groups and in turn with Disaster Types.
In this type of graph, the width of the line is directly proportional to the number of disasters that occurred in each Continent with respect to each Disaster Group and in turn with respect to each Type of Disaster.
I have decided to analyze the Continents to see how the different geographical areas differ in terms of the number of Disasters that have occurred in them and respect to the Groups and Types of Disasters that characterize them.
To highlight this difference between Continents I assigned to each continent a different color.
# Library
library(alluvial)
library(ggalluvial)
library(networkD3)
library(dplyr)
Data1 <-filter(Data1,Type!= "Animal accident")
Data1 <-filter(Data1,Type!= " Drought")
Data1 <-filter(Data1,Type!= "Mass movement (dry)")
Link.lode=Data1 %>%
group_by(Continent,Disaster,Type, Deaths) %>%
dplyr::summarise(Freq=n())
Link.lode$Type= factor(Link.lode$Type, levels = c("Transport accident" , "Miscellaneous accident", "Industrial accident" , "Fog" , "Flood" ,"Landslide" , "Storm", "Mass Movement " ,"Extreme temperature","Earthquake", "Volcanic activity" , "Epidemic" , "Wildfire" , "Insect infestation", "Drought"))
Link.lode$Continent= factor(Link.lode$Continent, levels = c( "Asia", "Americas", "Africa", "Europe","Oceania"))
Link.lode$Disaster= factor(Link.lode$Disaster, levels = c("Technological", "Hydrological", "Meteorological", "Geophysical","Biological", "Climatological"))
#breaks <- seq(min(Link$Deaths), max(Link$Deaths), 20)
#breaks=cut(Link$Deaths, c(breaks, Inf), labels = breaks, include.lowest = TRUE)
ggplot(as.data.frame(Link.lode),
aes(y = Freq,
axis1 = Continent,
axis2 = Disaster,
axis3 = Type))+
geom_alluvium(aes(fill = Continent),
width = 1/12) +
geom_stratum(width = 1/12, fill="grey30", color= "grey") +
geom_label( stat="stratum", aes(label=after_stat(stratum)))+
scale_x_discrete(limits = c("Continent", "Disaster ", " Type"), expand = c(.05, .05)) +
scale_fill_brewer(type = "qual", palette = "Set1")+
labs(title = "Disasters Network")+ theme_void()
Africa :
=> Biological => Epidemic
=> Climatological => Drought
=> Hydrological => Flood
=> Technological => Transport Accident
America :
=> Metereological=> Storm ;
=> Hydrological => Flood ;
=> Technological => Transport Accident ;
Asia :
=> Geophysical=> Earthquake ;
=> Hydrological => Flood ;
=> Metereological => Storm ;
=> Technological => Transport Accident ;
Europe :
=> Geophysical=> Earthquake ;
=> Hydrological => Flood ;
=> Metereological => Storm ;
=> Technological => Transport Accident ;
Oceania :
=>Geophysical => Earthquake ;
=> Hydrological => Flood ;
=> Metereological => Storm ;
Since Oceania has a low number of Disasters compared to Asia, America and Africa, I decided to not consider it in the rest of the analysis.
I have therefore set an entry criteria of 3000 cases of Disasters per Continent.
Also, to avoid the overplot, I have decided to consider only the Countries with a total number of Disasters higher than 50 units.
At the end of these filters we are left with 101 Countries divided between Asia, America, Africa and Europe in this way :
Data4= Data1 %>% dplyr::filter(Continent != "Oceania")
Data4= Data4 %>% dplyr::group_by(Country) %>%
dplyr::summarise(num=n())
Data4= Data4 %>% dplyr::filter(num>50 )
Data5 = merge(Data1, Data4, by= "Country")
Data1 <- subset( Data5, select = -num )
Data3=Data1 %>% dplyr::select(Continent, Country) %>%
dplyr::group_by(Continent,Country)%>%
dplyr::summarise(num=n())
Data3=Data3 %>% dplyr::group_by(Continent)%>%
dplyr::summarise(num=n())
head(Data3)
Since the distribution of Disasters, and also the Type of Disasters, has changed over the years, I have decided to consider only the second period, from 1961 to 2020, for the rest of the analysis.
The mean and median of the number of Disasters in the first period (1900 - 1960) are respectively:
PCA_data_1=Data1 %>% dplyr::select(Year, Disaster) %>%
dplyr::filter(Year<=1960) %>%
dplyr::group_by(Year)%>%
dplyr::summarise(num=n())
mean(PCA_data_1$num)
## [1] 18.62295
median(PCA_data_1$num)
## [1] 17
The mean and median of the number of Disasters in the second period (1960 - 2020) are respectively:
PCA_data_1=Data1 %>% dplyr::select(Year, Disaster) %>%
dplyr::filter(Year>1960) %>%
dplyr::group_by(Year)%>%
dplyr::summarise(num=n())
mean(PCA_data_1$num)
## [1] 341.5246
median(PCA_data_1$num)
## [1] 360
In the next part of the analysis we will go deeper into the relationship that exists between our variables, for this reason I have decided to use, instead of the Disaster Group, the different Types of which are part of it, we therefore obtain 17 variables …
PCA_data_1=Data1 %>%
group_by(Continent,Country,Type) %>%
dplyr::summarise(num=n())
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
variable.names(PCA_data_1)
## [1] "Continent" "Country" "Drought"
## [4] "Earthquake" "Epidemic" "Extreme temperature"
## [7] "Flood" "Industrial accident" "Insect infestation"
## [10] "Landslide" "Miscellaneous accident" "Storm"
## [13] "Transport accident" "Wildfire" "Volcanic activity"
## [16] "Glacial lake outburst" "Fog"
To do this I used the “pivot_wider” function assigning to each Disaster Type the number of times it has occurred.
Moreover, to get a more accurate analysis, I decided to produce the Correlation Plot and the PCA for each Continent.
Using the PCA we reduce the size of our original data and build a two-dimensional model. With the PCA on numerical variables, in this case between the number of times that each state in the world has a Type of Disaster in a given period of time, we can see which of these indices are correlated and also to analyze the similarity between the different Nations or Continents.
The coordinates of each arrow measure how much the variables are associated with respect to the two dimensions.
The length of each arrow measures how well the variables are represented in this plane.
When interpreting the graph, only the longer arrows must be considered.
The closer the arrows are (the more the angle between variables is reduced) the more the variables are correlated with each other.
If the arrows have the same direction but opposite the variables are strongly negatively correlated.
If two arrows form an angle of 90 degrees to each other, it means that they are not related to each other
If the individuals are close to the center of the plane it means they are close to the average, if they are far it means they are far from the average.
library(ggcorrplot)
library(FactoMineR) #for PCA
library(factoextra) #for better PCA visualizations
library(GGally) #for parallel coordinates
library(d3r)
library(parcoords)
library(dplyr)
PCA_data_1=Data1 %>% filter(Continent=="Asia")
PCA_data_1=PCA_data_1 %>%
group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable
PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )
ggcorr(PCA_data_1,method = c("everything", "pearson"))
Thanks to the correlation matrix it is possible to discover some interesting correlations between the different Types of Disasters:
Glacial lake outburst <-> Extreme temperature
Transport Accident <-> Flood
Transport Accident <-> Epidemic
Transport Accident <-> Extreme temperature
miscellaneous Accident <-> Flood
Landslide <-> Flood
Industrial Accident <-> Drought
Scree plot represent the percentage of explained variance / information for each principal components.
the first principal components will explain : 51 %
the second principal components will explain : 16 %
Using the PCA method, instead of considering all the variables of our dataset, we build two variables (Dimension 1 and Dimension 2) that are able to explain information equal to the sum of the percentages that the first two principal components have.
In this case the first two dimensions are able to explain 67% of our total data.
India-China-Indonesia
PCA_data_1=Data1 %>% filter(Year>1960,Continent=="Asia")
PCA_data_1=PCA_data_1 %>%
group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )
PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable
PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data
#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))
fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)
In this case the most important variables for represent the First Dimension are :
Flood ;
Miscellaneous accident ;
In this case the most important variables for represent the Second Dimension are :
Volcanic activity ;
Wildfire ;
We can see three main correlation groups :
-1. Glacial lake outburst - Epidemic - Extreme temperature -Transport Accident
-2. Insect infestation - Flood - Miscellanous Accident - Storm - Landslide
-3. Industrial Accident - Volcanic Activity - Earthquake - Drought - Wildfire
RESULT :
China and Indonesia are correlated with Industrial Accident, Earthquake and Drought;
Philippines is correlated with Storm, Landslide and Drought;
Pakistan, Bangladesh and India are correlated with Glacial lake outburst, Epidemic, Extreme temperature and Transport Accident
PCA_data_1=Data1 %>% filter(Year>1960,Continent=="Asia")
PCA_data_1=PCA_data_1 %>%
group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())
PCA_data_1=PCA_data_1 %>% dplyr::filter(Country !="India", Country!="China", Country!="Philippines (the)", Country!="Indonesia", Country!="Pakistan")
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )
PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable
PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data
#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))
fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)
In this case the most important variables for represent the First Dimension are :
Transport Accident ;
Extreme Temperature ;
In this case the most important variables for represent the Second Dimension are :
Drought ;
We can see four main correlation groups :
-1. Miscellanous Accident - Volcanic Activity - Industrial Accident - Earthquake- Extreme temperature -Transport Accident
-2. Insect infestation - Flood - Landslide - Epidemic
-3. Drought
-4. Wildfire
RESULT :
Japan, Turkey and Iran, are correlated with the first Group ;
Thailand, Nepal, Vietnam, Afghanistan ;
Sri Lanka and Napal are correlated with third ;
Yemen, Irak, Hong Kong and Taiwan are correlated with the fourth Group ;
PCA_data_1=Data1 %>%
filter(Continent=="Americas")
PCA_data_1=PCA_data_1 %>%
group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable
PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )
ggcorr(PCA_data_1,method = c("everything", "pearson"))
Thanks to the correlation matrix it is possible to discover some interesting correlations between the different Types of Disasters:
Industrial Accident <-> Storm
Transport Accident <-> Flood
miscellaneous Accident <-> Flood
Wildfire <-> Storm
Wildfire <-> Extreme temperature
Insect infestation <-> Landslide
Extreme temperature <-> Storm
Extreme temperaturefire <-> Industrial accident
Scree plot represent the percentage of explained variance / information for each principal components .
the first principal components will explain : 51 %
the second principal components will explain : 22 %
Using the PCA method, instead of considering all the variables of our dataset, we build two variables (Dimension 1 and Dimension 2) that are able to explain information equal to the sum of the percentages that the first two principal components have.
In this case the first two dimensions are able to explain 73 % of our total data.
PCA_data_1=Data1 %>% filter(Year>1960)%>%
filter(Continent=="Americas")
PCA_data_1=PCA_data_1 %>%
group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable
PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )
PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data
#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))
`
fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)
In this case the most important variables for represent the First Dimension are :
Drought ;
Flood ;
Miscellaneous accident ;
In this case the most important variables for represent the Second Dimension are :
Epidemic ;
Insect infestation ;
We can see four main correlation groups :
-1. Wildfire - Industrial Accident - Miscellanous Accident - Extreme temperature - Storm
-2. Flood - Drought
-3. Transport Accident - Earthquake
-4. Epidemic - Volcanic activity - Insect infestation - Landslide
RESULT :
United States is correlated with the first group;
Mexico is correlated with the second group;
Brazil is between the third goup and the fourth;
Colombia and Peru are correlated with the fourth group;
PCA_data_1=Data1 %>% filter(Year>1960,Continent=="Americas")
PCA_data_1=PCA_data_1 %>%
group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())
PCA_data_1=PCA_data_1 %>% dplyr::filter(Country !="United States of America (the)", Country!="Canada", Country!="Brazil", Country!="Colombia", Country!="Peru")
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )
PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable
PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data
#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))
fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)
In this case the most important variables for represent the First Dimension are :
Extreme Temperature ;
Transport Accident ;
In this case the most important variables for represent the Second Dimension are :
Drought ;
We can see four main correlation groups :
-1. Wildfire - Volcanic activity - Landslide - Earthquake
-2. Industrial Accident - Miscellanous Accident - Extreme temperature - Storm -Flood
-3. Drought
-4. Epidemic
RESULT :
Ecuador, Guatemala and Chile are correlated with the first group;
Mexico and Argentina are correlated with the second group;
Haiti and Bolivia are correlated with the third Group ;
Nicaragua and Costa Rica are correlated with the fourth group;
PCA_data_1=Data1 %>%
filter(Continent=="Africa")
PCA_data_1=PCA_data_1 %>%
group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
PCA_data_1 <- PCA_data_1[,-1] #remove country variable
PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )
PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data
ggcorr(PCA_data_1,method = c("everything", "pearson"))
Thanks to the correlation matrix it is possible to discover some interesting correlations between the different Types of Disasters:
Transport Accident <-> Industrial accident
Transport Accident <-> Miscellaneous accident
Scree plot represent the percentage of explained variance / information for each principal components.
the first principal components will explain : 25 %
the second principal components will explain : 14 %
Using the PCA method, instead of considering all the variables of our dataset, we build two variables (Dimension 1 and Dimension 2) that are able to explain information equal to the sum of the percentages that the first two principal components have.
In this case the first two dimensions are able to explain 39% of our total data.
PCA_data_1=Data1 %>% filter(Year>1960)%>%
filter(Continent=="Africa")
PCA_data_1=PCA_data_1 %>%
group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())
PCA_data_1 <- PCA_data_1[PCA_data_1$num > quantile(PCA_data_1$num, .25) - 1.5*IQR(PCA_data_1$num) &
PCA_data_1$num < quantile(PCA_data_1$num, .75) + 1.5*IQR(PCA_data_1$num), ]
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable
PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )
PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data
#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))
fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)
In this case the most important variables for represent the First Dimension are :
Miscellaneous accident ;
Industrial accident ;
In this case the most important variables for represent the Second Dimension are :
Drought ;
Insect Infestation ;
We can see four main correlation groups :
-1. Extreme temperature - Storm - Wildfire
-2. Earthquake - Transport Accident - Industrial accident - Miscellanous Accident - Earthquake
-3. Landslide - Epidemic - Volcanic activity - Flood
-4. Drought - Insect infestation
RESULT :
South Africa, Algeria, Morocco, Egypt are correlated with the first group;
Nigeria is correlated with the second group;
Tanzania, Uganda, Kenya and Sudan are correlated with the third group;
Niger, Somalia, Cameroon and Angola are correlated with the fourth group;
PCA_data_1=Data1 %>%
filter(Continent=="Europe")
PCA_data_1=PCA_data_1 %>%
group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
PCA_data_1 <- PCA_data_1[,-1] #remove country variable
PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )
PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data
ggcorr(PCA_data_1,method = c("everything", "pearson"))
Thanks to the correlation matrix it is possible to discover some interesting correlations between the different Types of Disasters:
Insect infestation <-> Miscellaneous accident
Epidemic <-> Miscellaneous accident
Insect infestation <-> Transport accident
Insect infestation <-> Epidemic
Wildfire <-> Drought
Transport Accident <-> Miscellaneous accident
Scree plot represent the percentage of explained variance / information for each principal components.
the first principal components will explain : 35 %
the second principal components will explain : 19 %
Using the PCA method, instead of considering all the variables of our dataset, we build two variables (Dimension 1 and Dimension 2) that are able to explain information equal to the sum of the percentages that the first two principal components have.
In this case the first two dimensions are able to explain 54 % of our total data.
PCA_data_1=Data1 %>% filter(Year>1960,Continent=="Europe")
PCA_data_1=PCA_data_1 %>%
group_by(Country,Continent,Type) %>%
dplyr::summarise(num=n())
PCA_data_1 <- PCA_data_1[PCA_data_1$num > quantile(PCA_data_1$num, .25) - 1.5*IQR(PCA_data_1$num) &
PCA_data_1$num < quantile(PCA_data_1$num, .75) + 1.5*IQR(PCA_data_1$num), ]
PCA_data_1=PCA_data_1 %>%
pivot_wider(names_from = Type , values_from = num)
PCA_data_1[is.na(PCA_data_1)] <- 0
PCA_data_1 <- as.data.frame(PCA_data_1) #change from tibble to dataframe since tibble does not have row names, and we need it for the plots of PCA
row.names(PCA_data_1) <- PCA_data_1$Country #write the row names according to the country variable
PCA_data_1 <- PCA_data_1[,-1] #remove country variable
PCA_data_1 <- subset( PCA_data_1, select = -c( Continent ) )
PCA_res<-PCA(PCA_data_1,graph=FALSE)# do a PCA of data
#PCA_res[[1]]
fviz_eig(PCA_res, addlabels = TRUE, ylim = c(0, 60))
fit <- princomp(PCA_data_1, cor=TRUE)
fviz_pca_biplot(fit)
In this case the most important variables for represent the First Dimension are :
Wildfire ;
Transport accident ;
In this case the most important variables for represent the Second Dimension are :
Volcanic activity ;
Earthquake ;
We can see three main correlation groups :
-1. Extreme temperature - Insect infestation - Industrial Accident - Miscellanous Accident
-2. Drought - Flood - Transport Accident - Wildfire
-3. Earthquake - Volcanic activity
RESULT :
Russia and Spain are correlated with the first group;
France is correlated with the second group;
Italy is correlated with the fourth group;
Regarding the textual data I have chosen a Communication from the European Commission to the European Parliament of 24.2.2021. Forging a climate-resilient Europe - the new EU Strategy on Adaptation to Climate Changes.
Link = https://ec.europa.eu/clima/sites/clima/files/adaptation/what/docs/eu_strategy_2021.pdf
A word cloud is a visual representation of text data. Tags are single words, and the importance of each tag is shown with font size. This format is useful for quickly perceiving the most important terms of the paper, bigger are the words, more often the words are used..
To build this Word Cloud I used Voyant (https://voyant-tools.org), an application that by default removes stop-words to show only the most important words.
word clouds
Result :
As we can see the most often used words in the document are Climate, Adaptation and Eu.
In any case, the Word Cloud does not show the context and therefore the real meaning in which these words have been used, but only their frequency into the document.
To understand the semantic similarity and meaning of the words used in the text I chose to use a simple block diagram where the line width refers to the frequency with each link between words is used and the number of links between a word and another measure the similarity between words .
Links
The most similar words are :
Climate = change, related, resilience, impacts;
Eu = support, level, action;
Adaption = action, solution;
The link most often used in the document is :
Climate - Change.