Severe storms cause both public health and economic damage. This analysis uses the storm database from the National Oceanic and Atmospheric Administration to answer two questions. First, what type of storm event causes the greatest health damage? Second, what type of storm event causes the greatest economic damage.
I downloaded the source data file then imported it into R.
if (!file.exists("StormData.csv.bz2")) {
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "StormData.csv.bz2", method = "curl")
}
storm <- read.csv("StormData.csv.bz2", header = T)
str(storm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
The dataset consists of 902297 observations in 37 columns. I reformatted several columns, including the BGN_DATE into a date format, and STATE, COUNTYNAME, and ENVTYPE into factors to ensure correct treatment of those variables in statistical analysis and graphics. I then extracted the year variable, selected for events with known damages and fatalities, and created a bar plot of the storm events causing health and/or property damage per year.
storm$BGN_DATE <- as.Date(storm$BGN_DATE, format = "%m/%d/%Y")
storm$STATE <- as.factor(storm$STATE)
storm$COUNTYNAME <- as.factor(storm$COUNTYNAME)
storm$EVTYPE <- as.factor(storm$EVTYPE)
storm$year <- year(storm$BGN_DATE)
storm_sub <- subset(storm, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)
storms_per_year <- storm_sub %>% count(year)
ggplot(data = storms_per_year, aes(x = year, y = n)) +
geom_bar(stat = "identity", fill = "blue") +
labs(x = "Year",
y = "Number of known storms")
Figure 1. Number of known storms causing casualties and/or property damage in the USA 1950 - 2011. The number of storms jumped in 1993 and again in 1994, likely due to better reporting of such storms.
storms_1993 <- subset(storm_sub, year >= 1993)
Given the increase in known storms in 1993 due to better reporting, I limited my analysis to storms since 1993 when doing comparisons between types of storms in terms of casualties and property damage. This reduced the number of observations from 902297 in the original data set to 227257.
I then checked for NAs in the data set.
sum(is.na(storms_1993$FATALITIES))
## [1] 0
sum(is.na(storms_1993$INJURIES))
## [1] 0
sum(is.na(storms_1993$PROPDMG))
## [1] 0
sum(is.na(storms_1993$CROPDMG))
## [1] 0
sum(is.na(storms_1993$PROPDMGEXP))
## [1] 0
sum(is.na(storms_1993$CROPDMGEXP))
## [1] 0
Finally, there are 985 event types listed in the data set.
sort(table(storms_1993$EVTYPE), decreasing = TRUE)[1:30]
##
## TSTM WIND THUNDERSTORM WIND HAIL
## 61934 43655 26052
## FLASH FLOOD TORNADO LIGHTNING
## 20967 13946 13293
## THUNDERSTORM WINDS FLOOD HIGH WIND
## 12086 10175 5522
## STRONG WIND WINTER STORM HEAVY SNOW
## 3370 1508 1342
## HEAVY RAIN WILDFIRE ICE STORM
## 1105 857 708
## URBAN/SML STREAM FLD EXCESSIVE HEAT HIGH WINDS
## 702 698 657
## TSTM WIND/HAIL TROPICAL STORM WINTER WEATHER
## 441 416 407
## RIP CURRENT WILD/FOREST FIRE FLASH FLOODING
## 400 388 302
## FLOOD/FLASH FLOOD AVALANCHE DROUGHT
## 279 268 266
## BLIZZARD RIP CURRENTS HEAT
## 253 241 215
I grouped several similar types together. For example, TSTM WIND, THUNDERSTORM WIND, HIGH WIND, THUNDERSTORM WINDS and other wind events were grouped together as “Wind”.
storms_1993$type <- "other"
storms_1993$type[grep("WIND", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Wind"
storms_1993$type[grep("FLOOD", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Flood"
storms_1993$type[grep("HURRICANE", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Tropical Cyclone"
storms_1993$type[grep("SNOW", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Winter Storm"
storms_1993$type[grep("HEAT", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Heat"
storms_1993$type[grep("HAIL", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Hail"
storms_1993$type[grep("TORNADO", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Tornado"
storms_1993$type[grep("RAIN", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Rain"
storms_1993$type[grep("WIND", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Wind"
storms_1993$type[grep("TROPICAL", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Tropical Cyclone"
storms_1993$type[grep("WINTER", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Winter Storm"
storms_1993$type[grep("FIRE", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Wildfire"
storms_1993$type[grep("LIGHTNING", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Lightning"
storms_1993$type[grep("AVALANCHE", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Avalanche"
storms_1993$type[grep("ICE", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Winter Storm"
storms_1993$type[grep("BLIZZARD", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Winter Storm"
sort(table(storms_1993$type), decreasing = TRUE)[1:12]
##
## Wind Flood Hail Tornado
## 128583 32445 26083 13971
## Lightning Winter Storm other Wildfire
## 13310 4939 3504 1258
## Rain Heat Tropical Cyclone Avalanche
## 1237 980 679 268
Finally, I converted PROPDMGEXP and CROPDMGEXP to units of dollars using the metadata in National Weather Service Storm Data Documentation.
sort(table(storms_1993$PROPDMGEXP), decreasing = TRUE)
##
## K M 0 B 5 m H + 4 6
## 208203 10207 8547 210 40 18 7 6 5 4 3
## 7 - 2 3 h
## 3 1 1 1 1
sort(table(storms_1993$CROPDMGEXP), decreasing = TRUE)
##
## K M k 0 B ? m
## 125288 99932 1985 21 17 7 6 1
Conversions are as follow: * K or k = 1,000 dollars * M or m = 1,000,000 dollars * B or b = 1,000,000,000 dollars * Anything else is 1 dollar
storms_1993$PROPDMGEXP[is.na(storms_1993$PROPDMGEXP)] <- 1
storms_1993$PROPDMGEXP[!grepl("K|M|B", storms_1993$PROPDMGEXP, ignore.case = TRUE)] <- 1
storms_1993$PROPDMGEXP[grep("K", storms_1993$PROPDMGEXP, ignore.case = TRUE)] <- 1000
storms_1993$PROPDMGEXP[grep("M", storms_1993$PROPDMGEXP, ignore.case = TRUE)] <- 1000000
storms_1993$PROPDMGEXP[grep("B", storms_1993$PROPDMGEXP, ignore.case = TRUE)] <- 1000000000
storms_1993$PROPDMGEXP <- as.numeric(storms_1993$PROPDMGEXP)
storms_1993$property <- storms_1993$PROPDMG * storms_1993$PROPDMGEXP
storms_1993$CROPDMGEXP[is.na(storms_1993$PROPDMGEXP)] <- 1
storms_1993$CROPDMGEXP[!grepl("K|M|B", storms_1993$CROPDMGEXP, ignore.case = TRUE)] <- 1
storms_1993$CROPDMGEXP[grep("K", storms_1993$CROPDMGEXP, ignore.case = TRUE)] <- 1000
storms_1993$CROPDMGEXP[grep("M", storms_1993$CROPDMGEXP, ignore.case = TRUE)] <- 1000000
storms_1993$CROPDMGEXP[grep("B", storms_1993$CROPDMGEXP, ignore.case = TRUE)] <- 1000000000
storms_1993$CROPDMGEXP <- as.numeric(storms_1993$CROPDMGEXP)
storms_1993$crop <- storms_1993$CROPDMG * storms_1993$CROPDMGEXP
storms_1993$casualties <- storms_1993$INJURIES + storms_1993$FATALITIES
Casualties_per_type <- storms_1993 %>% group_by(type) %>% summarize(sum = sum(casualties))
ggplot(Casualties_per_type, aes(x = type, y = sum)) +
geom_bar(stat = "identity", fill = "red") +
coord_flip() +
labs(x = "Storm Event",
y = "Casualties")
Figure 2. Total deaths and injuries caused by different types of storms 1993 - 2011.
The data show that tornadoes (1910 casualties) have caused over twice as many injuries and fatalities between 1993 and 2011 as the next highest category (heatwaves at 1.2362^{4})
storms_1993$economic <- storms_1993$property + storms_1993$crop
Economic_damage <- storms_1993 %>% group_by(type) %>% summarize(sum = sum(economic))
ggplot(Economic_damage, aes(x = type, y = sum)) +
geom_bar(stat = "identity", fill = "green") +
coord_flip() +
labs(x = "Storm type",
y = "Damages (dollars)")
Figure 3. Total property and crop losses in dollars of different storm types 1993 - 2011.
The data shows that flood damage, at 1.7975883^{11}, caused more economic losses than tropical cyclones (9.8572496^{10}) between 1993 and 2001.