The purpose of this analysis is to examine severe weather events as well as consequences related to such events. Specifically, this analysis addresses two main questions:
To answer these questions, we use data from the U.S. National Oceanic and Atmospheric Administration (NOAA). NOAA tracks major storms and weather events in the United States, including dates, types, as well as estimates of any fatalities, injuries, and property damage.
First, the data were downloaded from NOAA website and loaded into R. Given the size of the dataset, we used readr package, which provides a faster way to read tabular data. We also used use dplyr and tidyr package to manipulate the data. ggplot2 was used to visualize the results.
library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
storm <- read_csv('repdata-data-StormData.csv.bz2')
##
|================================================================================| 100% 535 MB
storm$BGN_DATE <- as.POSIXct(strptime(storm$BGN_DATE, "%m/%d/%Y %H:%M:%S"))
dim(storm)
## [1] 902297 37
min(storm$BGN_DATE);max(storm$BGN_DATE)
## [1] "1950-01-03 EST"
## [1] "2011-11-30 EST"
As it can be seen, there are over 900,000 records and 37 variables in the file. The data set includes almost 62 years of data.
This section presents key findings for the two main questions. Please note that while individual estimates can vary by year depending on the severity of weather conditions, this analysis provides findings across all years combined.
Let’s look at top five most deadly events as well as those resulting in a biggest number of injuries. We grouped by event type and summed up number of fatalities and injuries. The table below summarizes results.
p <- storm %>%
group_by(EVTYPE) %>%
summarise(Deaths=sum(FATALITIES),Injuries=sum(INJURIES)) %>%
top_n(5) %>%
arrange(desc(Deaths))
## Selecting by Injuries
knitr::kable(p, caption="Table 1: Most Harmful Weather Events 1950-2011")
| EVTYPE | Deaths | Injuries |
|---|---|---|
| TORNADO | 5633 | 91346 |
| EXCESSIVE HEAT | 1903 | 6525 |
| LIGHTNING | 816 | 5230 |
| TSTM WIND | 504 | 6957 |
| FLOOD | 470 | 6789 |
As it can be seen, tornado are the most deadly type of event resulting in a largest number of deaths and injuries. The chart below shows this information graphically.
p %>% gather(Type,N, 2:3) %>%
ggplot(aes(x=EVTYPE,y=N))+geom_bar(stat='identity')+facet_wrap(~Type,scales = "free_y")+
labs(title='Figure 1: Most Harmful Weather Events 1950-2011')
To measure economic economic losses, we created a new variable called total_dam. This variable is a sum of two variables - property damages and crop damages. As before, we examined top five events with greatest economic losses. Variable PRODMEXP contains units ( billions, millions, etc.) in which losses are measured. (CROPDMGEXP also contains units but it is empty). Before performing the analysis, some recoding was be done to ensure consistency.
f<-function(x){switch(as.character(x), H=100, K=1000, M=1000000,B=1000000000,1)}
storm$PROPDMGEXP<- sapply(storm$PROPDMGEXP, f)
storm <- storm %>%
filter(PROPDMG > 0) %>%
mutate(total_dam = PROPDMG*PROPDMGEXP+CROPDMG)
p <- storm %>%
group_by(EVTYPE) %>%
summarise(Losses=sum(total_dam)) %>%
top_n(5) %>%
arrange(desc(Losses))
## Selecting by Losses
p$Losses <- prettyNum(p$Losses,big.mark = ",")
knitr::kable(p,caption="Table 2: Most Economically Devastating Weather Events 1950-2011")
| EVTYPE | Losses |
|---|---|
| FLOOD | 144,657,858,149 |
| HURRICANE/TYPHOON | 69,305,844,748 |
| TORNADO | 56,925,751,854 |
| STORM SURGE | 43,323,536,005 |
| FLASH FLOOD | 16,140,985,677 |
Flood is the single most devastating weather event for the past 62 years. Overall damages are almost $145 billion, well above hurricanes,tornadoes and any other events.