Synopsis

This analysis will study US weather events from 1950 to November 2011 as recorded by the NOAA. It will investigate which events cause the greatest injury and loss of life and it will also study which events cause the most economic damage in terms of property loss and crop value lost. The R language will be used to conduct the analysis, all of which can be reproduced by running the blocks of code below.

Data Processing

Use cache=TRUE for pre-processing. Describe how I processed the data starting from the .csv file.

# download
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest <- "data.csv.bz2"
download.file(url,dest,method="curl")
# unzip to csv
data<- read.csv("data.csv.bz2")

Results

Tornados have almost 3 times the deaths and 14 times the injuries of the next most dangerous event excessive heat. It seems that anyone concerned with public and economic health should focus their attention on ensuring citizens and property are safe during tornados, floods, and excessive heat.

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Let us define “harmful” as the sum of deaths and injuries caused by a given event type. It is clear that tornados are the most deadly and cause the most injuries by a significant factor. The sum of the injuries and death from tornados is 96,979 which is over 10 times greater than the next kind of event excessive heat at 8,424.

Below is the code that generates the result. It uses library(dplyr).

eventharm <- select(data,EVTYPE,FATALITIES, INJURIES)
summaryharm <- eventharm %>% group_by(EVTYPE) %>% summarise(death=sum(FATALITIES),injury=sum(INJURIES))
# subset that further to create table where death or injuries are >0
dangerevents<- summaryharm[summaryharm$death>0 | summaryharm$injury>0, ]
# create a third column which is the sum of deaths and injuries
sumdangerevents<- mutate(dangerevents,sum = death+injury)
# arranges that new table by the sum of injuries and death
arrange(sumdangerevents,desc(sum))
## # A tibble: 220 x 4
##    EVTYPE            death injury   sum
##    <fct>             <dbl>  <dbl> <dbl>
##  1 TORNADO            5633  91346 96979
##  2 EXCESSIVE HEAT     1903   6525  8428
##  3 TSTM WIND           504   6957  7461
##  4 FLOOD               470   6789  7259
##  5 LIGHTNING           816   5230  6046
##  6 HEAT                937   2100  3037
##  7 FLASH FLOOD         978   1777  2755
##  8 ICE STORM            89   1975  2064
##  9 THUNDERSTORM WIND   133   1488  1621
## 10 WINTER STORM        206   1321  1527
## # ... with 210 more rows
fig1 <- select(head(arrange(sumdangerevents,desc(sum)),10),EVTYPE,death,injury)
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.5.3
fig1<- gather(fig1,"damage","number", -EVTYPE)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
# Stacked barplot with multiple groups
p <- ggplot(data=fig1, aes(x=reorder(EVTYPE,number), y=number, fill=damage)) +
  geom_bar(stat="identity") + coord_flip() +
  labs(title="injuries and deaths of most damaging storm types")
p

Across the UnitedStates, which types of events have the greatest economic consequences?

There are two main kinds of economic damage recorded: propery damage and crop damage. We shall address each individually. Before goinng forward with either analysis however it is important to note that there are columns ending with EXP for both property and crop dmaage. This stands for exponent and will be a letter h,k,m, or b (mixed cases) corresponding to hundreds, thousands, millions, or billions respectively.

Property

# Select only the event type, amount of property damage, and the corresponding "exponent"
propcost <- select(data,EVTYPE,PROPDMG,PROPDMGEXP)
# group by event and "exponent", sum each group by the property damage within that exponent
propdamage <- propcost %>% group_by(EVTYPE,PROPDMGEXP) %>% summarise(total=sum(PROPDMG))
# Select only those rows which contain nonzero sums
propdamage <- propdamage[propdamage$total>0, ]
# create four tables, one for each variation of the cases of BMK.
PD1 <- head(arrange(propdamage[propdamage$PROPDMGEXP == "B",],desc(total)),10) %>% mutate(billions=total/1)
PD2 <- head(arrange(propdamage[propdamage$PROPDMGEXP == "M",],desc(total)),10) %>% mutate(billions=total/1000)
PD3 <- head(arrange(propdamage[propdamage$PROPDMGEXP == "m",],desc(total)),10) %>% mutate(billions=total/1000)
PD4 <- head(arrange(propdamage[propdamage$PROPDMGEXP == "K",],desc(total)),10) %>% mutate(billions=total/1000000)
propdamage<- bind_rows(PD1,PD2,PD3,PD4) %>% group_by(EVTYPE) %>% summarise(sum_billions=sum(billions))
head(arrange(propdamage,desc(sum_billions)),5)
## # A tibble: 5 x 2
##   EVTYPE            sum_billions
##   <fct>                    <dbl>
## 1 FLOOD                    145. 
## 2 HURRICANE/TYPHOON         69.3
## 3 TORNADO                   56.9
## 4 STORM SURGE               42.6
## 5 FLASH FLOOD               15.1

It is clear that flooding is the most costly to property, by far at 144.66 billion dollars!

Now the same analysis for Crops:

cropcost <- select(data,EVTYPE,CROPDMG,CROPDMGEXP)
cropdamage <- cropcost %>% group_by(EVTYPE,CROPDMGEXP) %>% summarise(total=sum(CROPDMG))
cropdamage <- cropdamage[cropdamage$total>0, ]
CD1 <- head(arrange(cropdamage[cropdamage$CROPDMGEXP == "B",],desc(total)),10) %>% mutate(billions=total/1)
CD2 <- head(arrange(cropdamage[cropdamage$CROPDMGEXP == "M",],desc(total)),10) %>% mutate(billions=total/1000)
CD3 <- head(arrange(cropdamage[cropdamage$CROPDMGEXP == "m",],desc(total)),10) %>% mutate(billions=total/1000)
CD4 <- head(arrange(cropdamage[cropdamage$CROPDMGEXP == "K",],desc(total)),10) %>% mutate(billions=total/1000000)
CD5 <- head(arrange(cropdamage[cropdamage$CROPDMGEXP == "k",],desc(total)),10) %>% mutate(billions=total/1000000)
cropdamage<- bind_rows(CD1,CD2,CD3,CD4,CD5) %>% group_by(EVTYPE) %>% summarise(sum_billions=sum(billions))
head(arrange(cropdamage,desc(sum_billions)),5)
## # A tibble: 5 x 2
##   EVTYPE      sum_billions
##   <fct>              <dbl>
## 1 DROUGHT            14.0 
## 2 FLOOD               5.66
## 3 ICE STORM           5   
## 4 RIVER FLOOD         5   
## 5 HAIL                3.03

The most economic value is lost by droughts when it comes to crops at 13.97 billion.

Now combine the tables for crops and props.

totaldamage <- bind_rows(propdamage,cropdamage) %>% group_by(EVTYPE)  %>% summarise(billions_of_dollars=sum(sum_billions))
head(arrange(totaldamage,desc(billions_of_dollars)),5)
## # A tibble: 5 x 2
##   EVTYPE            billions_of_dollars
##   <fct>                           <dbl>
## 1 FLOOD                           150. 
## 2 HURRICANE/TYPHOON                71.9
## 3 TORNADO                          57.0
## 4 STORM SURGE                      42.6
## 5 HAIL                             17.0

Again, flooding is by far the costliest econoomic disaster type at 150.32 billion dollars.

fig2 <- head(arrange(totaldamage,desc(billions_of_dollars)),10)
p <- ggplot(data=fig2, aes(x=reorder(EVTYPE,billions_of_dollars), y=billions_of_dollars)) +
  geom_bar(stat="identity", color="blue", fill="white")+
  coord_flip() +
  labs(title="most expensive weather events",x="event type")
p