Synopsis

Severe storms cause both public health and economic damage. This analysis uses the storm database from the National Oceanic and Atmospheric Administration to answer two questions. First, what type of storm event causes the greatest health damage? Second, what type of storm event causes the greatest economic damage.

Data Processing

I downloaded the source data file then imported it into R.

if (!file.exists("StormData.csv.bz2")) {
        fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
        download.file(fileUrl, destfile = "StormData.csv.bz2", method = "curl")
}
storm <- read.csv("StormData.csv.bz2", header = T)
str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

The dataset consists of 902297 observations in 37 columns. I reformatted several columns, including the BGN_DATE into a date format, and STATE, COUNTYNAME, and ENVTYPE into factors to ensure correct treatment of those variables in statistical analysis and graphics. I then extracted the year variable, selected for events with known damages and fatalities, and created a bar plot of the storm events causing health and/or property damage per year.

storm$BGN_DATE <- as.Date(storm$BGN_DATE, format = "%m/%d/%Y")
storm$STATE <- as.factor(storm$STATE)
storm$COUNTYNAME <- as.factor(storm$COUNTYNAME)
storm$EVTYPE <- as.factor(storm$EVTYPE)
storm$year <- year(storm$BGN_DATE)
storm_sub <- subset(storm, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)
storms_per_year <- storm_sub %>% count(year)
ggplot(data = storms_per_year, aes(x = year, y = n)) +
        geom_bar(stat = "identity", fill = "blue") +
        labs(x = "Year",
             y = "Number of known storms")
Figure 1. Number of known storms causing casualties and/or property damage in the USA 1950 - 2011. The number of storms jumped in 1993 and again in 1994, likely due to better reporting of such storms.

Figure 1. Number of known storms causing casualties and/or property damage in the USA 1950 - 2011. The number of storms jumped in 1993 and again in 1994, likely due to better reporting of such storms.

storms_1993 <- subset(storm_sub, year >= 1993)

Given the increase in known storms in 1993 due to better reporting, I limited my analysis to storms since 1993 when doing comparisons between types of storms in terms of casualties and property damage. This reduced the number of observations from 902297 in the original data set to 227257.

I then checked for NAs in the data set.

sum(is.na(storms_1993$FATALITIES))
## [1] 0
sum(is.na(storms_1993$INJURIES))
## [1] 0
sum(is.na(storms_1993$PROPDMG))
## [1] 0
sum(is.na(storms_1993$CROPDMG))
## [1] 0
sum(is.na(storms_1993$PROPDMGEXP))
## [1] 0
sum(is.na(storms_1993$CROPDMGEXP))
## [1] 0

Finally, there are 985 event types listed in the data set.

sort(table(storms_1993$EVTYPE), decreasing = TRUE)[1:30]
## 
##            TSTM WIND    THUNDERSTORM WIND                 HAIL 
##                61934                43655                26052 
##          FLASH FLOOD              TORNADO            LIGHTNING 
##                20967                13946                13293 
##   THUNDERSTORM WINDS                FLOOD            HIGH WIND 
##                12086                10175                 5522 
##          STRONG WIND         WINTER STORM           HEAVY SNOW 
##                 3370                 1508                 1342 
##           HEAVY RAIN             WILDFIRE            ICE STORM 
##                 1105                  857                  708 
## URBAN/SML STREAM FLD       EXCESSIVE HEAT           HIGH WINDS 
##                  702                  698                  657 
##       TSTM WIND/HAIL       TROPICAL STORM       WINTER WEATHER 
##                  441                  416                  407 
##          RIP CURRENT     WILD/FOREST FIRE       FLASH FLOODING 
##                  400                  388                  302 
##    FLOOD/FLASH FLOOD            AVALANCHE              DROUGHT 
##                  279                  268                  266 
##             BLIZZARD         RIP CURRENTS                 HEAT 
##                  253                  241                  215

I grouped several similar types together. For example, TSTM WIND, THUNDERSTORM WIND, HIGH WIND, THUNDERSTORM WINDS and other wind events were grouped together as “Wind”.

storms_1993$type <- "other"
storms_1993$type[grep("WIND", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Wind"
storms_1993$type[grep("FLOOD", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Flood"
storms_1993$type[grep("HURRICANE", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Tropical Cyclone"
storms_1993$type[grep("SNOW", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Winter Storm"
storms_1993$type[grep("HEAT", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Heat"
storms_1993$type[grep("HAIL", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Hail"
storms_1993$type[grep("TORNADO", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Tornado"
storms_1993$type[grep("RAIN", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Rain"
storms_1993$type[grep("WIND", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Wind"
storms_1993$type[grep("TROPICAL", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Tropical Cyclone"
storms_1993$type[grep("WINTER", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Winter Storm"
storms_1993$type[grep("FIRE", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Wildfire"
storms_1993$type[grep("LIGHTNING", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Lightning"
storms_1993$type[grep("AVALANCHE", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Avalanche"
storms_1993$type[grep("ICE", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Winter Storm"
storms_1993$type[grep("BLIZZARD", storms_1993$EVTYPE, ignore.case = TRUE)] <- "Winter Storm"
sort(table(storms_1993$type), decreasing = TRUE)[1:12]
## 
##             Wind            Flood             Hail          Tornado 
##           128583            32445            26083            13971 
##        Lightning     Winter Storm            other         Wildfire 
##            13310             4939             3504             1258 
##             Rain             Heat Tropical Cyclone        Avalanche 
##             1237              980              679              268

Finally, I converted PROPDMGEXP and CROPDMGEXP to units of dollars using the metadata in National Weather Service Storm Data Documentation.

sort(table(storms_1993$PROPDMGEXP), decreasing = TRUE)
## 
##      K             M      0      B      5      m      H      +      4      6 
## 208203  10207   8547    210     40     18      7      6      5      4      3 
##      7      -      2      3      h 
##      3      1      1      1      1
sort(table(storms_1993$CROPDMGEXP), decreasing = TRUE)
## 
##             K      M      k      0      B      ?      m 
## 125288  99932   1985     21     17      7      6      1

Conversions are as follow: * K or k = 1,000 dollars * M or m = 1,000,000 dollars * B or b = 1,000,000,000 dollars * Anything else is 1 dollar

storms_1993$PROPDMGEXP[is.na(storms_1993$PROPDMGEXP)] <- 1
storms_1993$PROPDMGEXP[!grepl("K|M|B", storms_1993$PROPDMGEXP, ignore.case = TRUE)] <- 1
storms_1993$PROPDMGEXP[grep("K", storms_1993$PROPDMGEXP, ignore.case = TRUE)] <- 1000
storms_1993$PROPDMGEXP[grep("M", storms_1993$PROPDMGEXP, ignore.case = TRUE)] <- 1000000
storms_1993$PROPDMGEXP[grep("B", storms_1993$PROPDMGEXP, ignore.case = TRUE)] <- 1000000000
storms_1993$PROPDMGEXP <- as.numeric(storms_1993$PROPDMGEXP)
storms_1993$property <- storms_1993$PROPDMG * storms_1993$PROPDMGEXP

storms_1993$CROPDMGEXP[is.na(storms_1993$PROPDMGEXP)] <- 1
storms_1993$CROPDMGEXP[!grepl("K|M|B", storms_1993$CROPDMGEXP, ignore.case = TRUE)] <- 1
storms_1993$CROPDMGEXP[grep("K", storms_1993$CROPDMGEXP, ignore.case = TRUE)] <- 1000
storms_1993$CROPDMGEXP[grep("M", storms_1993$CROPDMGEXP, ignore.case = TRUE)] <- 1000000
storms_1993$CROPDMGEXP[grep("B", storms_1993$CROPDMGEXP, ignore.case = TRUE)] <- 1000000000
storms_1993$CROPDMGEXP <- as.numeric(storms_1993$CROPDMGEXP)
storms_1993$crop <- storms_1993$CROPDMG * storms_1993$CROPDMGEXP

Results

storms_1993$casualties <- storms_1993$INJURIES + storms_1993$FATALITIES
Casualties_per_type <- storms_1993 %>% group_by(type) %>% summarize(sum = sum(casualties))
ggplot(Casualties_per_type, aes(x = type, y = sum)) +
        geom_bar(stat = "identity", fill = "red") +
        coord_flip() +
        labs(x = "Storm Event",
             y = "Casualties")
Figure 2. Total deaths and injuries caused by different types of storms 1993 - 2011.

Figure 2. Total deaths and injuries caused by different types of storms 1993 - 2011.

The data show that tornadoes (1910 casualties) have caused over twice as many injuries and fatalities between 1993 and 2011 as the next highest category (heatwaves at 1.2362^{4})

storms_1993$economic <- storms_1993$property + storms_1993$crop
Economic_damage <- storms_1993 %>% group_by(type) %>% summarize(sum = sum(economic))
ggplot(Economic_damage, aes(x = type, y = sum)) +
        geom_bar(stat = "identity", fill = "green") +
        coord_flip() +
        labs(x = "Storm type",
             y = "Damages (dollars)")
Figure 3. Total property and crop losses in dollars of different storm types 1993 - 2011.

Figure 3. Total property and crop losses in dollars of different storm types 1993 - 2011.

The data shows that flood damage, at 1.7975883^{11}, caused more economic losses than tropical cyclones (9.8572496^{10}) between 1993 and 2001.