This analysis will study US weather events from 1950 to November 2011 as recorded by the NOAA. It will investigate which events cause the greatest injury and loss of life and it will also study which events cause the most economic damage in terms of property loss and crop value lost. The R language will be used to conduct the analysis, all of which can be reproduced by running the blocks of code below.
Use cache=TRUE for pre-processing. Describe how I processed the data starting from the .csv file.
# download
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest <- "data.csv.bz2"
download.file(url,dest,method="curl")
# unzip to csv
data<- read.csv("data.csv.bz2")
Tornados have almost 3 times the deaths and 14 times the injuries of the next most dangerous event excessive heat. It seems that anyone concerned with public and economic health should focus their attention on ensuring citizens and property are safe during tornados, floods, and excessive heat.
EVTYPE variable) are most harmful with respect to population health?Let us define “harmful” as the sum of deaths and injuries caused by a given event type. It is clear that tornados are the most deadly and cause the most injuries by a significant factor. The sum of the injuries and death from tornados is 96,979 which is over 10 times greater than the next kind of event excessive heat at 8,424.
Below is the code that generates the result. It uses library(dplyr).
eventharm <- select(data,EVTYPE,FATALITIES, INJURIES)
summaryharm <- eventharm %>% group_by(EVTYPE) %>% summarise(death=sum(FATALITIES),injury=sum(INJURIES))
# subset that further to create table where death or injuries are >0
dangerevents<- summaryharm[summaryharm$death>0 | summaryharm$injury>0, ]
# create a third column which is the sum of deaths and injuries
sumdangerevents<- mutate(dangerevents,sum = death+injury)
# arranges that new table by the sum of injuries and death
arrange(sumdangerevents,desc(sum))
## # A tibble: 220 x 4
## EVTYPE death injury sum
## <fct> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
## # ... with 210 more rows
fig1 <- select(head(arrange(sumdangerevents,desc(sum)),10),EVTYPE,death,injury)
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.5.3
fig1<- gather(fig1,"damage","number", -EVTYPE)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
# Stacked barplot with multiple groups
p <- ggplot(data=fig1, aes(x=reorder(EVTYPE,number), y=number, fill=damage)) +
geom_bar(stat="identity") + coord_flip() +
labs(title="injuries and deaths of most damaging storm types")
p
There are two main kinds of economic damage recorded: propery damage and crop damage. We shall address each individually. Before goinng forward with either analysis however it is important to note that there are columns ending with EXP for both property and crop dmaage. This stands for exponent and will be a letter h,k,m, or b (mixed cases) corresponding to hundreds, thousands, millions, or billions respectively.
# Select only the event type, amount of property damage, and the corresponding "exponent"
propcost <- select(data,EVTYPE,PROPDMG,PROPDMGEXP)
# group by event and "exponent", sum each group by the property damage within that exponent
propdamage <- propcost %>% group_by(EVTYPE,PROPDMGEXP) %>% summarise(total=sum(PROPDMG))
# Select only those rows which contain nonzero sums
propdamage <- propdamage[propdamage$total>0, ]
# create four tables, one for each variation of the cases of BMK.
PD1 <- head(arrange(propdamage[propdamage$PROPDMGEXP == "B",],desc(total)),10) %>% mutate(billions=total/1)
PD2 <- head(arrange(propdamage[propdamage$PROPDMGEXP == "M",],desc(total)),10) %>% mutate(billions=total/1000)
PD3 <- head(arrange(propdamage[propdamage$PROPDMGEXP == "m",],desc(total)),10) %>% mutate(billions=total/1000)
PD4 <- head(arrange(propdamage[propdamage$PROPDMGEXP == "K",],desc(total)),10) %>% mutate(billions=total/1000000)
propdamage<- bind_rows(PD1,PD2,PD3,PD4) %>% group_by(EVTYPE) %>% summarise(sum_billions=sum(billions))
head(arrange(propdamage,desc(sum_billions)),5)
## # A tibble: 5 x 2
## EVTYPE sum_billions
## <fct> <dbl>
## 1 FLOOD 145.
## 2 HURRICANE/TYPHOON 69.3
## 3 TORNADO 56.9
## 4 STORM SURGE 42.6
## 5 FLASH FLOOD 15.1
It is clear that flooding is the most costly to property, by far at 144.66 billion dollars!
cropcost <- select(data,EVTYPE,CROPDMG,CROPDMGEXP)
cropdamage <- cropcost %>% group_by(EVTYPE,CROPDMGEXP) %>% summarise(total=sum(CROPDMG))
cropdamage <- cropdamage[cropdamage$total>0, ]
CD1 <- head(arrange(cropdamage[cropdamage$CROPDMGEXP == "B",],desc(total)),10) %>% mutate(billions=total/1)
CD2 <- head(arrange(cropdamage[cropdamage$CROPDMGEXP == "M",],desc(total)),10) %>% mutate(billions=total/1000)
CD3 <- head(arrange(cropdamage[cropdamage$CROPDMGEXP == "m",],desc(total)),10) %>% mutate(billions=total/1000)
CD4 <- head(arrange(cropdamage[cropdamage$CROPDMGEXP == "K",],desc(total)),10) %>% mutate(billions=total/1000000)
CD5 <- head(arrange(cropdamage[cropdamage$CROPDMGEXP == "k",],desc(total)),10) %>% mutate(billions=total/1000000)
cropdamage<- bind_rows(CD1,CD2,CD3,CD4,CD5) %>% group_by(EVTYPE) %>% summarise(sum_billions=sum(billions))
head(arrange(cropdamage,desc(sum_billions)),5)
## # A tibble: 5 x 2
## EVTYPE sum_billions
## <fct> <dbl>
## 1 DROUGHT 14.0
## 2 FLOOD 5.66
## 3 ICE STORM 5
## 4 RIVER FLOOD 5
## 5 HAIL 3.03
The most economic value is lost by droughts when it comes to crops at 13.97 billion.
totaldamage <- bind_rows(propdamage,cropdamage) %>% group_by(EVTYPE) %>% summarise(billions_of_dollars=sum(sum_billions))
head(arrange(totaldamage,desc(billions_of_dollars)),5)
## # A tibble: 5 x 2
## EVTYPE billions_of_dollars
## <fct> <dbl>
## 1 FLOOD 150.
## 2 HURRICANE/TYPHOON 71.9
## 3 TORNADO 57.0
## 4 STORM SURGE 42.6
## 5 HAIL 17.0
Again, flooding is by far the costliest econoomic disaster type at 150.32 billion dollars.
fig2 <- head(arrange(totaldamage,desc(billions_of_dollars)),10)
p <- ggplot(data=fig2, aes(x=reorder(EVTYPE,billions_of_dollars), y=billions_of_dollars)) +
geom_bar(stat="identity", color="blue", fill="white")+
coord_flip() +
labs(title="most expensive weather events",x="event type")
p