This data analysis determines the type of severe weather event, across the United States, that (i) is the most harmful with respect to population health; and (ii) has the greatest economic consequence. For the former, we tabulate the fatalities and injuries caused by each weather event, to find out which weather event inflicts the most damage on population health. For the latter, we tabulate the cost of property and crop damage, to determine which weather event causes the greatest economic consequence. These findings will help prepare for severe weather events and prioritise resources for different types of events. Post-analysis, we find that tornadoes have the greatest impact on population health, while floods cause the most economic damage.
This project involves exploring the US National Oceanic and Atmospheric Administration’s storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries and property damage. This section details steps to which we clean and process the data for further analysis.
original_data <- read.csv("repdata-data-StormData.csv.bz2")
head(original_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
subset_data <- original_data[, c(2, 8, 23, 24, 25, 26, 27, 28)]
head(subset_data)
## BGN_DATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1 4/18/1950 0:00:00 TORNADO 0 15 25.0 K
## 2 4/18/1950 0:00:00 TORNADO 0 0 2.5 K
## 3 2/20/1951 0:00:00 TORNADO 0 2 25.0 K
## 4 6/8/1951 0:00:00 TORNADO 0 2 2.5 K
## 5 11/15/1951 0:00:00 TORNADO 0 2 2.5 K
## 6 11/15/1951 0:00:00 TORNADO 0 6 2.5 K
## CROPDMG CROPDMGEXP
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
subset_data$PROPDMGEXP <- ifelse(subset_data$PROPDMGEXP == "K", 1000, ifelse(subset_data$PROPDMGEXP == "M", 1000000, ifelse(subset_data$PROPDMGEXP == "B", 1000000000, 0)))
subset_data$CROPDMGEXP <- ifelse(subset_data$CROPDMGEXP == "K", 1000, ifelse(subset_data$CROPDMGEXP == "M", 1000000, ifelse(subset_data$CROPDMGEXP == "B", 1000000000, 0)))
subset_data$propdamage <- subset_data$PROPDMG*subset_data$PROPDMGEXP
subset_data$cropdamage <- subset_data$CROPDMG*subset_data$CROPDMGEXP
subset_data$HEALTH <- subset_data$FATALITIES + subset_data$INJURIES
subset_data$ECONOMY <- subset_data$propdamage + subset_data$cropdamage
final_data <- subset_data[, c(1, 2, 11, 12)]
head(final_data)
## BGN_DATE EVTYPE HEALTH ECONOMY
## 1 4/18/1950 0:00:00 TORNADO 15 25000
## 2 4/18/1950 0:00:00 TORNADO 0 2500
## 3 2/20/1951 0:00:00 TORNADO 2 25000
## 4 6/8/1951 0:00:00 TORNADO 2 2500
## 5 11/15/1951 0:00:00 TORNADO 2 2500
## 6 11/15/1951 0:00:00 TORNADO 6 2500
We determine fatalities and injuries based on weather events, to find out their consequence on population health. The figure below shows the top 10 weather events that result in the most fatalities and injuries across the US.
health <- tapply(final_data$HEALTH, final_data$EVTYPE, FUN=sum)
descending_health <- sort(health, decreasing = TRUE)
top10_health <- head(descending_health, n=10)
par(mar=c(4,8,2,1))
barplot(top10_health, horiz = TRUE, xlab = "Sum of Fatalities and Injuries", main = "Top 10 Weather Events Affecting Health", cex.names=0.7, las=1)
The weather event with the greatest impact on population health is the Tornado.
We determine property and crop damage based on weather events, to find out their consequence on the economy. The figure below shows the top 10 weather events that result in the most property and crop damage across the US.
damage <- tapply(final_data$ECONOMY, final_data$EVTYPE, FUN=sum)
descending_damage <- sort(damage, decreasing = TRUE)
top10_damage <- head(descending_damage, n=10)
par(mar=c(4,8,2,1))
barplot(top10_damage, horiz = TRUE, xlab = "Sum of Property and Crop Damage ($)", main = "Top 10 Weather Events Affecting Economy", cex.names=0.7, las=1)
The weather event with the greatest impact on the economy is the Flood.