The damages due to natural disasters in the United States of America are summarised in this project. The data of the damages is collected from the NOAA dataset of natural disasters. This study focusses on the fatalities, injuries and the damages to property and crop due to the storms from the NOAA dataset. It is then summarised for each year to help view the broader perspective on the topic.
To help perform this study, various function and libraries are used. ‘dplyr’ and ‘lubridate’ libraries are used for the preprocessing of the data while the other libraries are used for the visulisation of the data. Furthermore, a function ‘extract_legend’ is needed to extract the legend and use it to create better plots. This function is from a tutorial on statisticsglobe.
library(dplyr)
library(lubridate)
library(ggplot2)
library(randomcoloR)
library(gridExtra)
# Create user-defined function, which extracts legends from ggplots
extract_legend <- function(my_ggp) {
step1 <- ggplot_gtable(ggplot_build(my_ggp))
step2 <- which(sapply(step1$grobs, function(x) x$name) == "guide-box")
step3 <- step1$grobs[[step2]]
return(step3)
}
The data is downloaded from the coursera website as a ‘.csv.bz2’ file. It is downloaded in a folder called Data which is created by the ‘dir.create’ function. From there it will be imported to the environment using the read.csv function. However, this step will take some time due to the large file size.
if (!dir.exists('./Data')){
dir.create('./Data')
}
url = 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(url, './Data/FStorm.csv.bz2')
stormData = read.csv('./Data/FStorm.csv.bz2', stringsAsFactors = F)
Based on the documentation from NOAA, the natural clamities are classified into 48 different classes. However, ‘EVTYPE’ column of the dataset consists of 979 different classes. The various names of the calamities needed to be reclassified into the appropriate classes to generate reliable plots.
This was done by providing patterns for the grep function to look for within the EVTYPE column. These patterns are written in a dataframe ‘EvType’ within the exp column and the appropriate class name is included in the name column. ‘mapply’ then uses the grep function to find the patterns within the stormData$EVTYPE column and loops over the different patterns in the EvType data frame. We then add the correct class back in stormData within the EvtName column.
EvType = data.frame(exp = c('astr(?!.*high)','Ava','Bliz','(Coast)|(cstl)|(beach)','^(?!.*(extreme)|(record)|(bliz)|(fog)).*((Cold)|(Wind.*Chill))(?!.*((tornado)|(funnel)))','^(?!.*flood).*slide(?!.*flood).*$','^(?!.*ice)(?!.*freez).*fog','smoke','drou','dev','^(?!.*dev).*dus','^(?!.*dro).*^(?!.*dro).*(?=(ex)|(re)).*heat(?!.*dro)','((extreme)|(record)).*((Cold)|(Wind.*Chill))', '^.*Flash.*Flood','^(?!.*((Flash)|(Coast)|(cstl)|(tidal)|(lake)|(thunder))).*Flood','^(?!.*sleet)(?!.*bliz)(?!.*snow).*(black)?.*(frost)|(freez)(?!.*((sleet)|(fog)))','^(?!.*spout)funnel','^(?!.*den)((.+fog)|(.*fog.))(?!.*cold)','^(?!.*thunder)(?!.*funnel)(?!.*tstm)(?!.*marine).*hail','^(?!.*drou)(?!.*ex)(?!.*re)heat','^(?!.*((flood)|(thunder)|(tstm)|(snow)))(?=(.*(heavy)|(record)|(hvy)|(exc))).*rain(?!flood)','^(?=(.*(heavy)|(record)|(hvy)|(exc))).*snow','surf(?!.*coast)','^(?!.*((tstm)|(thunde)|(tunde)|(dust)|(snow)|(bliz)|(hurr)|(marine)|(strong)|(winter))).*wind(?!.*chill)','hur', '^(?!.*((sleet)|(heavy)))(?=.*ice).*storm(?!.*((flood)|(snow)))','lake.*snow','lake.*flood','^(?!.*((win)|(heavy))).*lightning','marine.*hail','MARINE.*high','marine.*strong','marine(?=.*((thunder)|(tstm)))','rip(?!.*surf)','seiche','^(?!.*snow).*sleet','^(?!.*((low)|(coast))).*((surge)|(tide))','^(?!.*((snow)|(marine)|(flood)|(ice))).*strong wind','^(?!.*((marine)|(non))).*((thund)|(tstm)|(thuderst)|(tund))','^(?!.*(water)).*torn','trop.*dep','trop.*sto', 'tsu', 'volc.*ash','^(?!.*(dust)).*er.*spo','wild','^(?!.*((snow)|(bliz))).*winter(?!.*((wea)|(mix)))','wint.*((wea)|(mix))'),name = c('Astronomical Low Tide', 'Avalanche', 'Blizzard','Coastal Flood ','Cold/Wind Chill', 'Debris Flow','Dense Fog','Dense Smoke','Drought', 'Dust Devil','Dust Storm','Excessive Heat','Extreme Cold/Wind Chill','Flash Flood','Flood','Frost/Freeze','Funnel Cloud','Freezing Fog','Hail', 'Heat', 'Heavy Rain','Heavy Snow','High Surf','High Wind','Hurricane (Typhoon)','Ice Storm','Lake-Effect Snow','Lakeshore Flood','Lightning','Marine Hail','Marine High Wind','Marine Strong Wind', 'Marine Thunderstorm Wind','Rip Current','Seiche','Sleet', 'Storm Surge/Tide','Strong Wind','Thunderstorm Wind', 'Tornado','Tropical Depression','Tropical Storm','Tsunami', 'Volcanic Ash','Waterspout','Wildfire','Winter Storm','Winter Weather'), stringsAsFactors = F)
a = mapply(grep, pattern = EvType$exp, MoreArgs = list(x = stormData$EVTYPE, ignore.case = T, perl = T))
stormData$EvtName = NA
for (i in 1:48){
stormData$EvtName[a[[i]]] = EvType$name[i]
}
This method reclassifies 99.04% of the observed events and the rest are labelled as NA.
The damages to properties and crops in the NOAA dataset is provided in terms of hundereds, thousands, Millions or Billions based on the expression in the ‘PROPDMGEXP’ column. It is recalculated to dollar be multiplying it with a column called ‘multiplier’ which is created based on the expressions.
exp = c(1000, 10^6, 1, 10^9, 10^6, 1, 10, 10, 10, 1, 10, 10, 10, 100, 10, 100, 1, 10, 10)
exp = data.frame(unique(stormData$PROPDMGEXP), exp)
names(exp) = c('EXP', 'multiplier')
stormData = inner_join(stormData, exp, by=c('PROPDMGEXP'='EXP'))
stormData = rename(stormData, PropDmgMultiplier = multiplier)%>%
inner_join(exp, by= c("CROPDMGEXP"= "EXP"))
stormData = rename(stormData, CropDmgMultiplier = multiplier)%>%
mutate(PropDmg = PROPDMG*PropDmgMultiplier,CropDmg = CROPDMG*CropDmgMultiplier)
The economic damages (property and crop damages) are subsetted from the stormData along with the class of the event and the date when it began. The year when the event began is then extracted from the ‘BGN_DATE’ column and is grouped based on it. The crop damage and the property damage for each event class and the year when it began is calculated from the summarised data. The data after 1992 is considered for this study as only tornado and hail storm data is available for the earlier years.
The same steps are used to calculate the total fatalities and total injuries for each year and for each event class.
economicDmg = stormData[,c('BGN_DATE', 'EvtName', 'PropDmg', 'CropDmg')]%>%
mutate(BGN_DATE = year(mdy_hms(BGN_DATE)))%>%
rename(Year = BGN_DATE)
economicDmgYrType = group_by(economicDmg, EvtName, Year)%>%
summarize(TotalCropDmg = sum(CropDmg), TotalPropDmg = sum(PropDmg))%>%
filter(Year>1992)
economicDmgType = filter(economicDmg, Year>1992)%>%
group_by(EvtName)%>%
summarize(TotalCropDmg = sum(CropDmg), TotalPropDmg = sum(PropDmg))
PopulationDmg = stormData[,c('BGN_DATE', 'EvtName', 'FATALITIES', 'INJURIES')]%>%
mutate(BGN_DATE = year(mdy_hms(BGN_DATE)))%>%
rename(Year = BGN_DATE)
PopDmgYrType = group_by(PopulationDmg, EvtName, Year)%>%
summarize(TotalFatalities = sum(FATALITIES), TotalInjuries = sum(INJURIES))%>%
filter(Year>1992)
PopDmgType = filter(PopulationDmg, Year>1992)%>%
group_by(EvtName)%>%
summarize(TotalFatalities = sum(FATALITIES), TotalInjuries = sum(INJURIES))
The damage to crops and properties are plotted in the the figure below. Between 1993 and 2012, $39.42 Billion worth of crops were damaged due to natural calamities. Drought caused the highest amount of damages, accounting for 28.34% of the total. It is followed by Floods, Hurricanes, Ice Storms and Hail. Along with floods, these storm classes caused 79.58% of the damages to crops.
Natural calamities cause a higher damage to properties than crops in US, costing $334.1 Billion in damages. Flood caused the most amount of damages, accounting for 40.42% of the damages. It is followed by Hurricanes, Storm Surges, Tornados and Flash Floods. The worst five events together caused 84.76% of the total damages during the period. It must be noted that the hurricane in 2005 and floods in 2006 caused large damages, way higher than the previous years.
Colors = unname(distinctColorPalette(48))
theme_set(theme_bw())
p1 = ggplot(economicDmgYrType, aes(Year,TotalCropDmg/10^9, fill = EvtName))+
geom_area()+
scale_fill_manual(values = Colors)+
labs(y = 'Total Damage (billion $)', title = 'Damage to crops due to storms (USA)',
subtitle = 'Data from NOAA')+
theme(legend.position = 'none')
p2 = ggplot(economicDmgYrType, aes(Year,TotalPropDmg/10^9, fill = EvtName))+
geom_area()+
scale_fill_manual(values = Colors)+
labs(y = 'Total Damage (billion $)', title = 'Damage to properties due to storms (USA)',
subtitle = 'Data from NOAA')+
theme(legend.position = 'none')
pLegend = ggplot(economicDmgYrType, aes(Year,TotalCropDmg/10^9, fill = EvtName))+
geom_area()+
scale_fill_manual(values = Colors)+
theme(legend.position = 'bottom', legend.text = element_text(size = 5), legend.key.size = unit(1,"line"))
sharedLegend = extract_legend(pLegend)
grid.arrange(arrangeGrob(p1, p2, ncol = 2),
sharedLegend, nrow = 2, heights = c(1.5,1))
The number of fatalities and injuries have been more or less costant over the period. Exceptions include heat in 1995 and floods in 1998 which caused a large number of fatalities and injuries. In total, 50.1 thousand injuries have been recorded. A total of 12.9 thousand people have been injured due to Tornado which is the leading cause of injuries during the period. A total of 7816 people have died during the 19 year period. Excessive Heat was the leading cause of death killing a total of 1801 people.
p1 = ggplot(PopDmgYrType, aes(Year, TotalFatalities/1000, fill = EvtName))+
geom_area()+
scale_fill_manual(values = Colors)+
theme(legend.position = 'none')+
labs(y = 'Number of Fatalities (thousand)', title = 'Fatalities due to storms (USA)', subtitle = 'Data from NOAA')
p2 = ggplot(PopDmgYrType, aes(Year, TotalInjuries/1000, fill = EvtName))+
geom_area()+
scale_fill_manual(values = Colors)+
theme(legend.position = 'none')+
labs(y = 'Number of injuries (thousand)', title = 'Injuries due to storms (USA)', subtitle = 'Data from NOAA')
pLegend = ggplot(PopDmgYrType)+
geom_area(aes(Year, TotalInjuries, fill = EvtName))+
scale_fill_manual(values = Colors)+
theme(legend.position = 'bottom', legend.text = element_text(size = 5), legend.key.size = unit(1,"line"))
sharedLegend = extract_legend(pLegend)
grid.arrange(arrangeGrob(p1, p2, ncol = 2),
sharedLegend, nrow = 2, heights = c(1.5,1))