Synopsis

NOAA’s storm database summarises the damage caused by various natural events since the 1950s. Using the resulting detailed database, my goal was to examine which natural events cause the most damage to human health and the economy. By looking at fatality and injury rates, I was able to show that tornadoes are by far the disaster that causes the most human injuries or deaths. From an economic perspective, the results are different. In terms of total property damage, floods are the biggest problem, while from an agricultural point of view, drought is a serious issue. In the future, it may be worth allocating more resources to the prevention of these natural events and disasters.

Data processing

The data was analyzed with the following R codes:

#load packages
library(dplyr)
## 
## Kapcsolódás csomaghoz: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
## 
## Kapcsolódás csomaghoz: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
#read data
data<-read.csv(file="repdata_data_StormData.csv.bz2")

#calculate the fatalities per event type
data_evtype<-data %>%
        group_by(EVTYPE)%>%
        summarise(Total_fat=sum(FATALITIES))
data_evtype$Ratio=data_evtype$Total_fat/sum(data_evtype$Total_fat)*100
#choose the first 15 events with the highest fatality
fat<-arrange(data_evtype,desc(Total_fat))
max_fat<-fat[1:15,]


#calculate the injuries per event type
data_evtype_inj<-data %>%
        group_by(EVTYPE)%>%
        summarise(Total_inj=sum(INJURIES))
data_evtype_inj$Ratio=data_evtype_inj$Total_inj/sum(data_evtype_inj$Total_inj)*100

#choose the first 15 events with the highest injuries
inj<-arrange(data_evtype_inj,desc(Total_inj))
max_inj<-inj[1:15,]


#calculate the property damage
data_prop<-data[data$PROPDMGEXP %in% c("B","K","M"),]
data_prop<-data_prop%>%
        mutate(money= case_when(
                PROPDMGEXP=="K"~PROPDMG*1000,
                PROPDMGEXP=="M"~PROPDMG*1000000,
                PROPDMGEXP=="B"~PROPDMG*1000000000
        ))

#calculate the propdmg per event type
data_prop_dmg<-data_prop %>%
        group_by(EVTYPE)%>%
        summarise(Total_propdmg=sum(money))
data_prop_dmg$Ratio=data_prop_dmg$Total_propdmg/sum(data_prop_dmg$Total_propdmg)*100

#choose the first 15 events with the highest propdmg
data_prop_dmg<-arrange(data_prop_dmg,desc(Total_propdmg))
max_prop_dmg<-data_prop_dmg[1:15,]


#calculate the crop damage
data_crop<-data[data$CROPDMGEXP %in% c("B","K","M"),]
data_crop<-data_crop%>%
        mutate(money= case_when(
                CROPDMGEXP=="K"~CROPDMG*1000,
                CROPDMGEXP=="M"~CROPDMG*1000000,
                CROPDMGEXP=="B"~CROPDMG*1000000000
        ))

#calculate the cropdmg per event type
data_crop_dmg<-data_crop %>%
        group_by(EVTYPE)%>%
        summarise(Total_cropdmg=sum(money))
data_crop_dmg$Ratio=data_crop_dmg$Total_cropdmg/sum(data_crop_dmg$Total_cropdmg)*100

#choose the first 15 events with the highest propdmg
data_crop_dmg<-arrange(data_crop_dmg,desc(Total_cropdmg))
max_crop_dmg<-data_crop_dmg[1:15,]

Results

Human health

The first question of the analysis was, that across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To answer this question I generated a figure with two panels, where I illustrated the 15 most harmful events regarding fatalities and injuries.

#plot for the most harmful events with respect to population health

p1<-ggplot(max_fat,aes(x=Ratio,y=EVTYPE, fill=Ratio)) +
        geom_bar(stat="identity") +
        labs(title = "Fatalities", y="Events", x="Ratio(%)")+
        theme(legend.position = "none")
p2<-ggplot(max_inj,aes(x=Ratio,y=EVTYPE, fill=Ratio)) +
        geom_bar(stat="identity") +
        labs(title = "Injuries", y="Events", x="Ratio(%)")+
        theme(legend.position = "none")
grid.arrange(p1, p2, ncol = 2)

Based on this figure tornadoes are the most dangerous natural events for human health.

Economy

The second question of the analysis was, that across the United States, which types of events have the greatest economic consequences?

To answer this question I generated a figure with two panels, where I illustrated the 15 most damaging events affecting the economy.

#plot for the which types of events have the greatest economic consequences

p1<-ggplot(max_prop_dmg,aes(x=Ratio,y=EVTYPE, fill=Ratio)) +
        geom_bar(stat="identity") +
        labs(title = "Property damage", y="Events", x="Ratio(%)")+
        theme(legend.position = "none")
p2<-ggplot(max_crop_dmg,aes(x=Ratio,y=EVTYPE, fill=Ratio)) +
        geom_bar(stat="identity") +
        labs(title = "Crop damage", y="Events", x="Ratio(%)")+
        theme(legend.position = "none")
grid.arrange(p1, p2, ncol = 2)

Based on this figure floods and droughts cause the most damage to the economy.

Summary

Based on this analysis tornados, floods and droughts cause the most problem regarding human health and the economy. In the future, it may be worth allocating more resources to the prevention of these natural events and disasters.