This analysis aims to answer 2 question via the information obtained from Storm data. First, which types of events are most harmful with respect to population health? Second, Across the United States, which types of events have the greatest economic consequences? The results of question 1 showed that the most harmful event for public health is tornado. The resuts of question 2 shows that the most harmful events for property damage is also tornado. However, the most harmful events for crop damage is hail.
Load the raw data and check some basic properties of the loaded dataset.
# language setting
Sys.setlocale("LC_TIME", "en_US.UTF-8")
[1] “en_US.UTF-8”
# load the data first anyway
raw.data <- read.csv("StormData.csv")
dim(raw.data)
[1] 902297 37
The raw data is so large, so select the variables which are related to 1). public health and 2).health and economics
For 1) public health, the most straight forward variables related to public health should be FATALITIES and INJURIES. So, pick up EVTYPE, FATALITIES and INJURIES from the raw data to form a new dataset.
health.dataset <- cbind.data.frame(raw.data$STATE__,raw.data$EVTYPE,raw.data$FATALITIES,raw.data$INJURIES)
# assign colnum names
colnames(health.dataset) <- c("state", "EVTYPE", "FATALITIES", "INJURIES")
# calculate fatalities
fatailties <- tapply(health.dataset$FATALITIES, health.dataset$EVTYPE, sum)
# find the type of event causes most fatalities
sort.fatalities <- sort(fatailties)
most6.event <- tail(sort.fatalities)
It is obvious to find out that TORNADO causes most fatalities across the states. Then try to find out what event causes most injuries.
injuries <- tapply(health.dataset$INJURIES, health.dataset$EVTYPE, sum)
sort.injuries <- sort(injuries)
most6.injuries <- tail(sort.injuries)
It is obvious now that TORNADO causes most injuries across the states. Therefore, the event which is most harmful for the public health should be tornado.
To visualize the results, simply use the barplot to indicate what kinds of events cause most fatalities and injuries across the states.
# show results of fatalities
par(mfrow=c(2,1), cex.axis=0.6)
#par(cex.axis=0.6)
barplot(most6.event, ylab ="numbers of fatalities", ylim = c(0,6000), main = "Figure 1, top 6 events cause most public health")
# show results of injuries
barplot(most6.injuries, xlab = "event type", ylab ="numbers of injuries", ylim = c(0, 100000))
#title("Figure 1, events harmful for public health", outer=TRUE)
In conclusion, tornado is the most harmful events across the states. Followed by heat, flood, etc. But others are not at the same level with tornado.
Same with Question 1, the raw data is huge, so pick up variables related to economic consequences first. Due to the meaning of PROPDEMGEXP and CROPDMGEXP is unclear, I just use the the property damage(PROPDMG) and crop damage (CROPDMG) to estimate the impact on economics.
economic.dataset <- cbind.data.frame(raw.data$STATE__, raw.data$EVTYPE,raw.data$PROPDMG, raw.data$CROPDMG)
# assign colnum names
colnames(economic.dataset) <- c("STATE", "EVTYPE", "PROPDMG", "CROPDMG")
# calculate proporty damage
prop.damage <- tapply(economic.dataset$PROPDMG, economic.dataset$EVTYPE, sum)
# find the type of event causes most property damge
sort.propdamage <- sort(prop.damage)
mostprop.damage <- tail(sort.propdamage)
mostprop.damage
HAIL THUNDERSTORM WIND FLOOD TSTM WIND
688693.4 876844.2 899938.5 1335965.6
FLASH FLOOD TORNADO
1420124.6 3212258.2
The events cause most economic damage are tornado, flash flood, tstm wind, flood, thunderstorm wind, hail.
# find the event causes most corp damage
crop.damage <- tapply(economic.dataset$CROPDMG, economic.dataset$EVTYPE, sum)
# find the type of event causes most property damge
sort.cropdamage <- sort(crop.damage)
mostcrop.damage <- tail(sort.cropdamage)
mostcrop.damage
THUNDERSTORM WIND TORNADO TSTM WIND FLOOD 66791.45 100018.52 109202.60 168037.88 FLASH FLOOD HAIL 179200.46 579596.28 The events cause most damage on crop damage are Hail, flash flood, flood, tstm wind, tornado, and thunderstrom wind.
This section shows the visualization of results of question 2 by barplot as question 1.
par(mfrow=c(1,1))
par(cex.axis=0.6)
barplot(mostprop.damage, xlab = "event type", ylab ="numbers of property damage", main = "Figure 2, top 6 events cause property damage")
# show results of crop damage
barplot(mostcrop.damage, xlab = "event type", ylab ="numbers of crop damage", main = "Figure 3, top 6 events cause most crop damage")
As showed above, the most harmful event for property damage is tornado. And the most harmful event for crop damage is hail.