Severe weather events have public health and economic damage. For public health, tornado is the top one for casualties (fatalities and injuries). For economy, flood is the top one for property damage, and drought is the top one for crop damage. Nevertheless, hail and thunderstorm wind are the most frequent weather events.
Severe weather events such as storms have public health and economic impacts. They result in fatalities, injuries, and property damage. This project explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. This Courera project is to answer the following questions. Which types of events are most harmful with respect to population health? Which types of events have the greatest economic consequences?
The data is a comma-separated-value file compressed via bzip2. It can be downloaded from: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 The events in the database start in the year 1950 and end in November 2011. In the earlier years there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
## Read the csv file and save it in raw_data. (This may take some time.)
setwd("~/Dropbox/@Next/AI/JH_repro/HW2")
raw_data <- read.csv("repdata_data_StormData.csv.bz2")
# Sample view of the data
head(raw_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
# Samples size:
nrow(raw_data)
## [1] 902297
# Variables measured:
names(raw_data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
# Number of different weather event types:
length(table(raw_data$EVTYPE))
## [1] 985
In this project, we are interested in the consequences of the different types of events (EVTYPE) with respect to population health (FATALITIES,INJURIES) and the economy (PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP).
# Impact on population health
# Aggregate the data by event type
fatality <- aggregate(FATALITIES ~ EVTYPE, raw_data, sum)
injury <- aggregate(INJURIES ~ EVTYPE, raw_data, sum)
# Impact on the economy
economy <- raw_data[, c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
# Check the property damage exponent
unique(raw_data$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
# Convert the property damage exponent and calculate the property damage value
economy$PROPEXP <- c(1000,1e+06,1,1e+09,1e+06,1,1,1e+05,1e+06,1,10000,100,1000,100,1e+07,1,1,1,1e+08)[match(economy$PROPDMGEXP,as.character(unique(raw_data$PROPDMGEXP)))]
economy$PROPDMGVAL <- economy$PROPDMG*economy$PROPEXP
# Check the property damage exponent
unique(raw_data$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
# Converting the crop damage exponent
economy$CROPEXP<- c(1,1e+06,1000,1e+06,1e+09,1,1,1000,100)[match(economy$CROPDMGEXP,as.character(unique(raw_data$CROPDMGEXP)))]
economy$CROPDMGVAL <- economy$CROPDMG*economy$CROPEXP
# Aggregate the data by event type
property <- aggregate(PROPDMGVAL ~ EVTYPE, economy, sum)
crop <- aggregate(CROPDMGVAL ~ EVTYPE, economy, sum)
# Rank the top 10 events for fatalities
fatality10 <- fatality[order(-fatality$FATALITIES), ][1:10, ]
# Rank the top 10 events for injuries
injury10 <- injury[order(-injury$INJURIES), ][1:10, ]
# Plot the data
barplot(fatality10$FATALITIES, las = 3, names.arg = fatality10$EVTYPE, main = "Top 10 Events for Fatalities", ylab = "Number of Fatalities", col = "lightblue")
barplot(injury10$INJURIES, las = 3, names.arg = injury10$EVTYPE, main = "Top 10 Events for Injuries", ylab = "Number of Injuries", col = "lightblue")
They show that tornado and excessive heat are the weather events with the top 2 fatalities, where the fatality due to tornado of 5,633 is significantly higher than others. Similarly, tornado, with 91,346 injuries, is significantly more disastrous than others. Therefore, among the 985 weather types, tornado has the most impact on population health in the United States.
# Rank the top 10 events for property damage
property10 <- property[order(-property$PROPDMGVAL), ][1:10, ]
# Rank the top 10 events for crop damage
crop10 <- crop[order(-crop$CROPDMGVAL), ][1:10, ]
barplot(property10$PROPDMGVAL/(10^9), las = 3, names.arg = property10$EVTYPE, main = "Top 10 Events for Property Damages", ylab = "Property Damage ($ billions)", col = "moccasin")
barplot(crop10$CROPDMGVAL/(10^9), las = 3, names.arg = crop10$EVTYPE, main = "Top 10 Events for Crop Damages", ylab = "Crop Damage ($ billions)", col = "moccasin")
They show that flood, hurricane/typhoon and tornado are the weather events with the top 3 property damage. The damage by flood, 144.66 billion, is significantly higher than others. Drought, flood, river flood, and ice flood are the top 4 weather events for the crop damage. The damage by drought, $13.97 billion, is significantly higher than others. Therefore, among the 985 weather types, flood has the most economic consequence in the United States.
# Plotting the frequency of events
freq10 <- sort(table(raw_data$EVTYPE),decreasing = TRUE)[1:10]
nm <- c(names(freq10))
barplot(freq10, las = 3, names.arg = gsub(" ", "\n", nm), main = "Top 10 Frequent Weather Events", ylab = "Frequency Count", col = "gray", las = 3)
While hail and thunderstorm wind are the most frequent weather events, they do not incur severe health or economic consequences. Even though tornado, flood, and drought do not occur so frequently, their damages are relatively high.