This data analysis is to answer the following questions using NOAA (U.S. National Oceanic and Atmospheric Administration) storm database:
* Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?
* Across the United States, which types of events have the greatest economic consequences? Answering these questions will help a government or municipal manager responsible for preparing for severe weather events and to better prioritize resources for different types of events.
Population health in this analysis, adopts a simple definition, which is the total number of fatalities and injuries.
Economic consequences in this analysis, is defined as total monetary damage from property damage and crop damage.
library(dplyr)
library(ggplot2)
stormData <- read.csv("repdata%2Fdata%2FStormData.csv.bz2")
#Notice that no conversion of dates and times are done here, as the ultimate questions we are trying to answer does not concern time, we want to see aggregated impact of different event types
#Get health metric: total number of fatalities and injuries
stormData$healthDamage <- stormData$FATALITIES + stormData$INJURIES
#Get economic metric: total monetary damage from property and crop damage
#convert unit from char to actual scales for properties and crop
stormData$propUnit <- ifelse(stormData$PROPDMGEXP=="K",1000,ifelse(stormData$PROPDMGEXP == "M", 1000000, ifelse(stormData$PROPDMGEXP == "B", 1000000000, 0)))
stormData$cropUnit <- ifelse(stormData$CROPDMGEXP=="K",1000,ifelse(stormData$CROPDMGEXP == "M", 1000000, ifelse(stormData$CROPDMGEXP == "B", 1000000000, 0)))
stormData$econDamage <- stormData$PROPDMG*stormData$propUnit + stormData$CROPDMG*stormData$cropUnit
#summarize health metrics and economic metrics per event type across all time
summaryData <- stormData %>% group_by(EVTYPE) %>% summarize(totalHealthDamage = sum(healthDamage), totalEconDamage = sum(econDamage))
#First let's plot the health damage chart, since there are a lot of event types, we only plot the top 5
top5HealthDamage <- top_n(summaryData, 5, totalHealthDamage)
ggplot(top5HealthDamage, aes(x=EVTYPE, y=totalHealthDamage)) + geom_point()
#Now let's plot top 5 econ damage chart
top5EconDamage <- top_n(summaryData, 5, totalEconDamage)
ggplot(top5EconDamage, aes(x=EVTYPE, y=totalEconDamage)) + geom_point()
#Event type with greatest health damage
eventMaxHealthDamage <- summaryData[which.max(summaryData$totalHealthDamage),]$EVTYPE
#Event type with greatest economics damage
eventMaxEconDamage <- summaryData[which.max(summaryData$totalEconDamage),]$EVTYPE
Event type with max health damage is TORNADO
Event type with max economics damage is FLOOD