Introduction

Storm Data is an official publication of the National Oceanic and Atmospheric Administration which documents the ocurrence of: - significant intensive weather phenomena (e.g., storms) that cause damaging disruption in lifestyle and commerce - rare and unusual weather phenomena - other significant meteorological events. Such phenomena, as mentioned above, may represent a potential threat both for health and business. Therefore, it is important to find out more about which phenomena have the potential to cause the most damage. Such an approach allows us to focus more efforts in prevention and remediation of their consequences. Delineating the most destructive events could focus us on paying more attention to potential climate changes that foster it, as well as to provide more appropriate help to citizens in order to protect both their property and their health. Thus, the aim of this brief report was to find which weather phenomena cause the most damage according to NOAA’s data, with damage being defined in terms of deaths, injuries and property damage.

Synopsis of analyses

The analyses of this brief report were focused on answering the two research questions: which phenomena cause the most damage to health and which phenomena cause the most damage to property. Before answering these questions, data were downloaded and processed. The database was firstly shrunky by date, as this research was focused on phenomena that occurred after 1996 (due to incomplete measurements of the previous period). After that, a new database was formed that contained only the data relevant for our analyses. In order to create the variable that represents the total material damage, variables containing the property and crop damage exponents were recoded and multiplied by appropriate values in order to obtain unique values for each. These two values were then summed up to a new variable that represented material damage, while injuries and fatalities were treated separately as measures of health damage. Variables were then grouped by the event type and sorted in descending order with respect to material damage, injuries and fatalities in order to provide answers to the set research questions. Floods and hurricanes were found as biggest threat to material wealth, while tornados, floods and excessive heat had the most detrimental effects in terms of injuries and fatalities.

Data processing

Firstly, the relevant libraries were activated and the database was downloaded and set in the working directory.

lapply(c("data.table", "dplyr", "ggplot2"), library, character.only = T)
## [[1]]
## [1] "data.table" "stats"      "graphics"   "grDevices"  "utils"     
## [6] "datasets"   "methods"    "base"      
## 
## [[2]]
## [1] "dplyr"      "data.table" "stats"      "graphics"   "grDevices" 
## [6] "utils"      "datasets"   "methods"    "base"      
## 
## [[3]]
##  [1] "ggplot2"    "dplyr"      "data.table" "stats"      "graphics"  
##  [6] "grDevices"  "utils"      "datasets"   "methods"    "base"
fileurl = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl, "../stormdata/StormData.bz2", mode = "wb")

Afterwards, the data were read into R.

StormX <- read.csv("../stormdata/StormData.bz2")

After reading the entire data, it was shrunk step-by-step. Firstly, the beginning point in time was set to be 1996-01-01. Afterwards, the new data set was formed by keeping only the variables that are relevant in upcoming analyses.

StormX$DATE2 <- as.Date(as.character(StormX$BGN_DATE), "%m/%d/%Y")
StormY <- subset(StormX, DATE2 > as.Date("1996-01-01"))
Storm <- StormY[, c("DATE2", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Then, relevant exponents were checked, matched and recoded.

o1 <- Storm %>% group_by(PROPDMGEXP) %>% summarise(count = n())
o1
## # A tibble: 5 x 2
##   PROPDMGEXP  count
##       <fctr>  <int>
## 1            276166
## 2          0      1
## 3          B     32
## 4          K 369934
## 5          M   7374
propexprec <-c(0, 0, 1000000000, 1000, 1000000)
propexp1rec <- cbind(o1[,1], propexprec)
propexp1rec
##   PROPDMGEXP propexprec
## 1                 0e+00
## 2          0      0e+00
## 3          B      1e+09
## 4          K      1e+03
## 5          M      1e+06
o2 <- Storm %>% group_by(CROPDMGEXP) %>% summarise(count = n())
o2
## # A tibble: 4 x 2
##   CROPDMGEXP  count
##       <fctr>  <int>
## 1            373047
## 2          B      4
## 3          K 278685
## 4          M   1771
cropexprec <- c(0, 1000000000, 1000, 1000000)
cropexp1rec <- cbind(o2[,1], cropexprec)
cropexp1rec
##   CROPDMGEXP cropexprec
## 1                 0e+00
## 2          B      1e+09
## 3          K      1e+03
## 4          M      1e+06

Recoded values were returned into the dataset with left_join.

Storm1 <- left_join(Storm, propexp1rec)
## Joining, by = "PROPDMGEXP"
Storm2 <- left_join(Storm1, cropexp1rec)
## Joining, by = "CROPDMGEXP"
head(Storm2)
##        DATE2       EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 1996-01-06 WINTER STORM          0        0     380          K      38
## 2 1996-01-11      TORNADO          0        0     100          K       0
## 3 1996-01-11    TSTM WIND          0        0       3          K       0
## 4 1996-01-11    TSTM WIND          0        0       5          K       0
## 5 1996-01-11    TSTM WIND          0        0       2          K       0
## 6 1996-01-18         HAIL          0        0       0                  0
##   CROPDMGEXP propexprec cropexprec
## 1          K       1000       1000
## 2                  1000          0
## 3                  1000          0
## 4                  1000          0
## 5                  1000          0
## 6                     0          0

Then the variable representing total damage was formed, which indicated the end of data preparation.

Storm2$Total_Damage <- Storm2$PROPDMG*Storm2$propexprec + Storm2$CROPDMG*Storm2$cropexprec

Results

In this section, answers to the research questions are provided, backed up by simple statistical analyses.

What weather phenomena cause the most physical damage to human health in the USA?

peop <- Storm2 %>% group_by(EVTYPE) %>% summarise(Injuries = sum(INJURIES), Fatalities = sum(FATALITIES))
peop_g <- head(arrange(peop, desc(Injuries)), 10)
## Warning: package 'bindrcpp' was built under R version 3.4.4
ggplot(peop_g, aes(x = EVTYPE, y = Injuries))+ geom_point(aes(size = Fatalities), col = "orange") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + xlab("Type of event")

The graph above represents the top 10 most damaging wather phenomena for human health in the USA. It suggests that the most injuries are caused by tornados, floods and excessive heat, with tornados and excessive heat causing also the most fatalities during the observed period.

What weather phenomena causes the most material damage in the USA?

dam <- Storm2 %>% group_by(EVTYPE) %>% summarise(Damage = sum(Total_Damage))
dam_g <- head(arrange(dam, desc(Damage)), 10)
ggplot(dam_g, aes(x = as.factor(EVTYPE), y = Damage))+ geom_point(size = 8, col = "steelblue") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + xlab("Type of event")

As the graph above suggests, the most material damage to human property in the USA is caused by floods and hurricanes.

Thank you for your time! :)