Synopsis: This report leverages catastrophic weather data from the National Weather Service to analyze the most physically hazardous and economically detrimental types of catastrophic weather. The following code, and its accompanying narrative, perform the following:

Data Processing

Packages: The following code detects, installs, and loads required packages for analysis:

if(!require(pacman)){install.packages("pacman")}; library(pacman)
## Loading required package: pacman
## Warning: package 'pacman' was built under R version 3.5.1
packages <- c("readr", "dplyr", "scales", "stringr", "ggplot2", "lubridate")
p_load(packages, character.only = TRUE); rm(packages) 

Data Retrieval & Reading: The directory, storm_data, is created if undetected, and the weather data are downloaded and read into R:

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
path <- "./storm_data"
file <- "./repdata%2Fdata%2FStormData.csv"

if(!file.exists(path)){dir.create(path)}; setwd(path)
download.file(url, file)
storms <- read.csv(file, stringsAsFactors = F)
rm(url, path, file)

Formatting: The data are then formatted to have lowercase variable names, while variable bgn_date, which timestamps the beginning of the weather event, is formatted for posterity. Lastly, dimension reduction trims the dataset to key variables of interest:

names(storms) <- tolower(names(storms))
storms <- storms %>%
    mutate(bgn_date = str_split(storms$bgn_date, " ", simplify = T)[,1],
           bgn_date = mdy(bgn_date)) %>%
    select(bgn_date, countyname:evtype, fatalities:cropdmgexp)

Filtering: The reader may view the unique values and total occurrences for variables propdmgexp and cropdmgexp, the multiplicative values which modify propdmg and cropdmg, variables indicating the economic impact, in USD, of property damage and crop damage, respectively:

table(storms$propdmgexp)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
table(storms$cropdmgexp)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

Due to the unknown nature of myriad unique values seen in both propdmgexp and cropdmgexp, only values mentioned specifically in the dataset documentation will be preserved during filtering, as well as 465,934 and 618,413 uncategorized values which, if filtered, may remove valid entries for variables of interest. This includes values H, K, M, and B, indicating hundreds, thousands, millions, and billions of USD, respectively. Values are capitalized for uniformity:

prop_exp <- c("H", "K", "M", "B")
storms <- storms %>% filter(propdmgexp %in% prop_exp)

storms$cropdmgexp <- toupper(storms$cropdmgexp)
crop_exp <- c("", "H", "B", "K", "M")
storms <- storms %>% filter(cropdmgexp %in% crop_exp)
rm(crop_exp, prop_exp)

Post-filtering, observe the following counts for preserved values among variables propdmgexp and cropdmgexp:

table(storms$propdmgexp)
## 
##      B      H      K      M 
##     40      6 424645  11329
table(storms$cropdmgexp)
## 
##             B      K      M 
## 156484      5 277982   1549

Transformations: Though minor transformations occur later in Processing, the most notable and time-consuming transformation in Processing involves detecting the multiplicative values in propdmgexp and cropdmgexp and modifying their respective cost-related variables, propdmg and cropdmg. In effect, these variables now accurately reflect weather event costs in USD. Now-obsolete variables are removed from the dataset and scientific notation disabled.

for(i in 1:nrow(storms)){
    if(storms$propdmgexp[i] == "H"){
        storms$propdmg[i] <- storms$propdmg[i] * 100
    } else if (storms$propdmgexp[i] == "K"){
        storms$propdmg[i] <- storms$propdmg[i] * 1000
    } else if (storms$propdmgexp[i] == "M"){
        storms$propdmg[i] <- storms$propdmg[i] * 1000000
    } else if (storms$propdmgexp[i] == "B"){
        storms$propdmg[i] <- storms$propdmg[i] * 1000000000
    } else {
        storms$propdmgexp[i] <- storms$propdmg[i]
    }
}

for(i in 1:nrow(storms)){
    if(storms$cropdmgexp[i] == "H"){
        storms$cropdmg[i] <- storms$cropdmg[i] * 100
    } else if (storms$cropdmgexp[i] == "K"){
        storms$cropdmg[i] <- storms$cropdmg[i] * 1000
    } else if (storms$cropdmgexp[i] == "M"){
        storms$cropdmg[i] <- storms$cropdmg[i] * 1000000
    } else if (storms$cropdmgexp[i] == "B"){
        storms$cropdmg[i] <- storms$cropdmg[i] * 1000000000
    } else {
        storms$cropdmgexp[i] <- storms$cropdmg[i]
    }
}

storms <- storms %>% select(-propdmgexp, -cropdmgexp)
options(scipen = 999); rm(i)

The results are cached for more readily-available knitting.

Aggregation, Arrangement, and Additional Transformations: The following creates a subset of the storm data, health, which aggregates per capita casualties per event type by summing fatalities and injuries divided by 1,000 to create variable per_cap and summing the same variables without division to create total casualties. The aggregate data are arranged in descending order and coerced to ordered factors for ease of visualization, while the top ten most catastrophic weather events remain:

health <- storms %>%
    group_by(evtype) %>%
    summarize(per_cap = sum(fatalities, injuries)/1000,
              total = sum(fatalities, injuries)) %>%
    arrange(desc(per_cap)) %>%
    head(10)

health$evtype <- factor(health$evtype, 
                        levels = health$evtype[order(health$per_cap, 
                                                     decreasing = T)])

The last Processing step prepares another subset of the storm data, economy, which again aggregates by event type and creates variable billions, the sum of propdmg and cropdmg in billions of USD, as well as the total USD value, derived from the same variables: total. The aggregate data are arranged in descending order and coerced to ordered factors for ease of visualization, while the top ten most catastrophic weather events remain:

economy <- storms %>%
    group_by(evtype) %>%
    summarize(billions = sum(propdmg, cropdmg)/1000000000,
              total = sum(propdmg, cropdmg)) %>%
    arrange(desc(billions)) %>%
    head(10)

economy$evtype <- factor(economy$evtype, 
                         levels = economy$evtype[order(economy$billions, 
                                                       decreasing = T)])

Results

Effects on Population Health: Per the above processing, the top ten most catastrophic weather events in terms of public health, measured in casualties, or the sum of fatalities and injuries: total are as follows:

print(health)
## # A tibble: 10 x 3
##    evtype            per_cap total
##    <fct>               <dbl> <dbl>
##  1 TORNADO             96.0  95981
##  2 FLOOD                7.15  7151
##  3 TSTM WIND            2.91  2910
##  4 FLASH FLOOD          2.27  2271
##  5 ICE STORM            1.89  1891
##  6 LIGHTNING            1.82  1817
##  7 THUNDERSTORM WIND    1.60  1599
##  8 HEAT                 1.38  1384
##  9 HURRICANE/TYPHOON    1.34  1337
## 10 EXCESSIVE HEAT       1.14  1137

Visualized, Tornado weather events dominate casualties juxtaposed to runner-ups:

ggplot(health, aes(x = evtype, y = per_cap)) +
    geom_col(fill = "tomato") +
    theme_classic() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_y_continuous(labels = comma) +
    ylab("Casualties (In Thousands)") +
    xlab("Event Type") +
    ggtitle("Total casualties by event type",
            "Including injuries and fatalities")

Figure 1: “Total casualties by event type” shows the overwhelmingly devestating effect on public health by “Tornado” events, causing 95 K casualties. See Figure 2 for a more informative look at remaining weather events.

For a more informative view of remaining catastrophic weather events, the following is the exact visualization as the one above, albeit with a y-axis truncated at a 10,000-casualty limit:

ggplot(health, aes(x = evtype, y = per_cap)) +
    geom_col(fill = "tomato") +
    theme_classic() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_y_continuous(labels = comma) +
    ylab("Casualties (In Thousands)") +
    xlab("Event Type") +
    ggtitle("Total casualties by event type, truncated",
            "Truncated at 10,000 casualties") +
    coord_cartesian(ylim = c(0, 10))

Figure 2: “Total casualties by event type, truncated” shows remaining contenders for catastrophic weather events in terms of public health impacts, with the y-axis now truncated at 10,000 casualties. In second, “Flood” events are markedly deadlier than remaining weather events, with the former causing nearly triple the casualties than remaining events, e.g. “Flash Floods” and “Lightning”.

Economic Consequences: Lastly, per the above processing, the top ten most catastrophic weather effects in terms of economic consequences, measured in USD by summing the multiplier-modified values of propdmg and cropdmg. Variable billions measures total costs in billions of USD, while variable total measures the total cost of the top 10 catastrophic weather events in USD:

print(economy)
## # A tibble: 10 x 3
##    evtype            billions        total
##    <fct>                <dbl>        <dbl>
##  1 FLOOD               150.   149828665250
##  2 HURRICANE/TYPHOON    71.9   71913712800
##  3 TORNADO              57.3   57278861940
##  4 STORM SURGE          43.3   43323541000
##  5 HAIL                 17.8   17756170123
##  6 FLASH FLOOD          17.5   17528250560
##  7 HURRICANE            14.6   14557229010
##  8 RIVER FLOOD          10.1   10147679500
##  9 ICE STORM             8.97   8967037810
## 10 TROPICAL STORM        8.16   8155601550

These costs are illustrated in the following visualization. Note that the y-axis is measured in billions of USD:

ggplot(economy, aes(x = evtype, y = billions)) +
    geom_col(fill = "limegreen") +
    theme_classic() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    ylab("USD Costs (In Billions)") +
    xlab("Event Type") +
    ggtitle("Total economic damage by event type",
            "Including property and crop damage")

Figure 3: “Total economic damage by event type” shows “Flood” events to dominate economic consequences among catastrophic weather events at nearly $150 B. Second, and less than half the economic detriment of “Flood” events are “Hurricane/Typhoon” events, at $72 B. “Tornado” and “Storm Surge” events closely follow, while remaining events, though not insignificant, pale in comparison to the leading four events.