This analysis will look at different weather events types across the United States. The goal is to determine the economic impact of weather events and also the impact on population health. The data used for this analysis is from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.
The data was first read in from the original bz2 file. The function exp_factor converted the different representations of the numbers in the CROPDMGEXP and PROPDMGEXP to actual representations of the numbers. They were then multiplied with CROPDMG and PROPDMG to convert them to their original values. As an example, the process I followed if there was just a number in PROPDMGEXP or CROPDMGEXP was to multiply the damage by ten to the power of that number. I also assumed in the PROPDMGEXP and CROPDMGEXP columns that “B” stands for billion, “M” stands for million, “K” stands for Kilo(1,000), and “H” is hecto(100). This information was also used to multiply the CROPDMG and PROPDMG by the respective representations.
A table called populationHealth was made that summed the total injuries, total fatalities and total casualties for each EVTYPE. Another table called economicHealth was made that summed the total property damage, total crop damage and total damage for each EVTYPE. These tables were then arranged in descending order by total damage for economicHealth and casualties for population health.
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
library(data.table)
## -------------------------------------------------------------------------
## data.table + dplyr code now lives in dtplyr.
## Please library(dtplyr)!
## -------------------------------------------------------------------------
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
library(ggplot2)
library(knitr)
stormData<-bzfile("repdata-data-StormData.csv.bz2", open = "r")
stormData<-read.csv(stormData)
exp_factor <- function(x){suppressWarnings(
ifelse(x %in% as.character(0:8), 10^as.numeric(x),
ifelse(x %in% c("b","B"), 10^9, # billion
ifelse(x %in% c("m","M"), 10^6, # million/mega
ifelse(x %in% c("k","K"), 10^3, # kilo
ifelse(x %in% c("h","H"), 10^2, # hecto
1))))))
}
stormData$PROPDMG<-exp_factor(stormData$PROPDMGEXP)*stormData$PROPDMG
stormData$CROPDMG<-exp_factor(stormData$CROPDMGEXP)*stormData$CROPDMG
populationHealth<-ddply(stormData,.(EVTYPE),summarise,TOTALINJURIES=sum(INJURIES),TOTALFATALITIES=sum(FATALITIES),CASUALTIES=sum(FATALITIES)+sum(INJURIES))
populationHealth<-arrange(populationHealth, desc(CASUALTIES))
economicHealth<-ddply(stormData,.(EVTYPE),summarise,TOTALPROPDMG=sum(PROPDMG),TOTALCROPDMG=sum(CROPDMG),TOTAL=sum(CROPDMG)+sum(PROPDMG))
economicHealth<-arrange(economicHealth, desc(TOTAL))
fiveEconomicHealth<-head(economicHealth,5)
fiveEconomicHealth[grep("THUNDERSTORM WINDS",fiveEconomicHealth$EVTYPE),]$EVTYPE<-"THUNDERSTORM WIND"
fivePopulationHealth<-head(populationHealth,5)
fivePopulationHealth[grep("TSTM WIND",fivePopulationHealth$EVTYPE),]$EVTYPE<-"THUNDERSTORM WIND"
Here are the top five most harmful to the population. Tornados seem to do the most damage to the population.
kable(fivePopulationHealth)
EVTYPE | TOTALINJURIES | TOTALFATALITIES | CASUALTIES |
---|---|---|---|
TORNADO | 91346 | 5633 | 96979 |
EXCESSIVE HEAT | 6525 | 1903 | 8428 |
THUNDERSTORM WIND | 6957 | 504 | 7461 |
FLOOD | 6789 | 470 | 7259 |
LIGHTNING | 5230 | 816 | 6046 |
ggplot(fivePopulationHealth,aes(x=EVTYPE,y=CASUALTIES,fill=factor(EVTYPE)))+geom_bar(stat="identity")+xlab("Event")+ylab("Casualties")+labs(fill= "Event Type")
Here are the top five most harmful in terms of economic damage. Flash floods seem to do the most damage to the economy.
kable(fiveEconomicHealth)
EVTYPE | TOTALPROPDMG | TOTALCROPDMG | TOTAL |
---|---|---|---|
FLASH FLOOD | 6.820237e+13 | 1421317100 | 6.820379e+13 |
THUNDERSTORM WIND | 2.086532e+13 | 190734708 | 2.086551e+13 |
TORNADO | 1.078951e+12 | 415113110 | 1.079366e+12 |
HAIL | 3.157558e+11 | 3025974453 | 3.187818e+11 |
LIGHTNING | 1.729433e+11 | 12092090 | 1.729554e+11 |
ggplot(fiveEconomicHealth,aes(x=EVTYPE,y=TOTAL,fill=factor(EVTYPE)))+geom_bar(stat="identity")+xlab("Event")+ylab("Total Damage(US Dollars)")+labs(fill="Event Type")