This document will serve as the final assessment for the Reproducible Research Course sponsored by Johns Hopkins University and accessed via Coursera.

The goal of this assessment is to process storm data from NOAA to explore the risks to both human life/health and the economic consequences of different types of storms.

The following code serves to set defaults, load libraries, and reset one default parameter in RStudio.

Data Processing

Data Acquisition

Storm data from the year 1950 and end in November 2011 as collected by the National Weather Service are used as found from the following site:

https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

Additional data (descriptive) are available at:

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events[FAQ] (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf)

  url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  destfile <- "repdata_data_StormData.csv.bz2"
  download.file(url,destfile)
  weather <- read.csv(destfile)
   summary(weather)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES           INJURIES        
##  Min.   :0.00     Min.   :    0.0   Min.   :  0.00000   Min.   :   0.0000  
##  1st Qu.:0.00     1st Qu.:    0.0   1st Qu.:  0.00000   1st Qu.:   0.0000  
##  Median :1.00     Median :   50.0   Median :  0.00000   Median :   0.0000  
##  Mean   :0.91     Mean   :   46.9   Mean   :  0.01678   Mean   :   0.1557  
##  3rd Qu.:1.00     3rd Qu.:   75.0   3rd Qu.:  0.00000   3rd Qu.:   0.0000  
##  Max.   :5.00     Max.   :22000.0   Max.   :583.00000   Max.   :1700.0000  
##  NA's   :843563                                                            
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 
   head(weather)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
  start <- min(weather$BGN_DATE)
  end <- max(weather$BGN_DATE)

These data were collected from 1/1/1966 0:00:00 until 9/9/2011 0:00:00

Given the changes in the types and amount of data collected across the sampling period, for this analysis, all data will be summarized including all years of data, collapsed together.

Data Analysis:

Explore human health consequences for different event types.

Examine EVTYPE (event) variable to determine which storms present the greatest hazard to human health.

Note, there are hundreds of event types documented, many of which have no corresponding deaths or fatalities. Here, we explore the fatality numbers associated with different event types:

# calculate mean fatalities for each EVTYPE
  sumfatality <- weather %>%
    group_by(EVTYPE) %>%
    summarize(sumdeaths = sum(FATALITIES, na.rm = TRUE)) %>%
    #arrange(desc(sumdeaths)) %>%
    slice_max(order_by = sumdeaths, n = 20)

  print(sumfatality)
## # A tibble: 20 × 2
##    EVTYPE                  sumdeaths
##    <chr>                       <dbl>
##  1 TORNADO                      5633
##  2 EXCESSIVE HEAT               1903
##  3 FLASH FLOOD                   978
##  4 HEAT                          937
##  5 LIGHTNING                     816
##  6 TSTM WIND                     504
##  7 FLOOD                         470
##  8 RIP CURRENT                   368
##  9 HIGH WIND                     248
## 10 AVALANCHE                     224
## 11 WINTER STORM                  206
## 12 RIP CURRENTS                  204
## 13 HEAT WAVE                     172
## 14 EXTREME COLD                  160
## 15 THUNDERSTORM WIND             133
## 16 HEAVY SNOW                    127
## 17 EXTREME COLD/WIND CHILL       125
## 18 STRONG WIND                   103
## 19 BLIZZARD                      101
## 20 HIGH SURF                     101

Here, we explore the number of injuries associated with different event types:

 # calculate mean injuries for each EVTYPE
  suminjury <- weather %>%
     group_by(EVTYPE) %>%
   summarize(sumhurt = mean(INJURIES, na.rm = TRUE)) %>%
   arrange(desc(sumhurt),.by_group = TRUE) %>%
   slice(1:20)

  print(suminjury)
## # A tibble: 20 × 2
##    EVTYPE                  sumhurt
##    <chr>                     <dbl>
##  1 Heat Wave                 70   
##  2 TROPICAL STORM GORDON     43   
##  3 WILD FIRES                37.5 
##  4 THUNDERSTORMW             27   
##  5 HIGH WIND AND SEAS        20   
##  6 SNOW/HIGH WINDS           18   
##  7 GLAZE/ICE STORM           15   
##  8 HEAT WAVE DROUGHT         15   
##  9 WINTER STORM HIGH WINDS   15   
## 10 HURRICANE/TYPHOON         14.5 
## 11 WINTER WEATHER MIX        11.3 
## 12 EXTREME HEAT               7.05
## 13 NON-SEVERE WIND DAMAGE     7   
## 14 GLAZE                      6.75
## 15 TSUNAMI                    6.45
## 16 WINTER STORMS              5.67
## 17 TORNADO F2                 5.33
## 18 EXCESSIVE RAINFALL         5.25
## 19 WATERSPOUT/TORNADO         5.25
## 20 HEAT WAVE                  4.18

Explore the economic consequences of different storm types.

Examine EVTYPE (event) variable to determine which storms present the greatest hazard to economics.

Here, I summarize the costs due to crop damage with those due to property damage, then provide the top 20 event types corresponding to economic loss.

Note that these data report losses in the thousands of dollars.

# calculate mean economic cost for each EVTYPE
  sumcosts <- weather %>%
    mutate(costs = PROPDMG + CROPDMG) %>%
    group_by(EVTYPE) %>%
    summarize(sumcost = sum(costs, na.rm = TRUE)) %>%
    arrange(desc(sumcost)) %>%
    slice_max(order_by = sumcost, n = 20)
  print(sumcosts[1:20,])
## # A tibble: 20 × 2
##    EVTYPE              sumcost
##    <chr>                 <dbl>
##  1 TORNADO            3312277.
##  2 FLASH FLOOD        1599325.
##  3 TSTM WIND          1445168.
##  4 HAIL               1268290.
##  5 FLOOD              1067976.
##  6 THUNDERSTORM WIND   943636.
##  7 LIGHTNING           606932.
##  8 THUNDERSTORM WINDS  464978.
##  9 HIGH WIND           342015.
## 10 WINTER STORM        134700.
## 11 HEAVY SNOW          124418.
## 12 WILDFIRE             88824.
## 13 ICE STORM            67690.
## 14 STRONG WIND          64611.
## 15 HEAVY RAIN           61965.
## 16 HIGH WINDS           57385.
## 17 TROPICAL STORM       54323.
## 18 WILD/FOREST FIRE     43534.
## 19 DROUGHT              37998.
## 20 FLASH FLOODING       33623.

Results

Graphically demonstrate the effects of weather events on human health during the data collection from 1/1/1966 0:00:00 until 9/9/2011 0:00:00.

  ggplot(sumfatality, aes(x = reorder(EVTYPE, -sumdeaths),y = sumdeaths)) +
    geom_bar(stat="identity",fill = "#f68060", width = 0.5) +
    coord_flip() +
    ylab("Total Deaths per Event Type") +
    xlab("Event Type")

Here, we can see that tornadoes are, by far, the most deadly of the event types. Fatalities due to tornadoes exceeded 5000 during the data period.

  ggplot(suminjury, aes(x = reorder(EVTYPE, - sumhurt), y = sumhurt)) +
    geom_bar(stat="identity",fill = "#60d6f6", width = 0.4) +
    coord_flip() +
    ylab("Total Injuries per Event Type")  +
    xlab("Event Type")

Here, we can see that while tornadoes do cause some injuries, heat is implicated in far more cases of injury than are storms.

  ggplot(sumcosts, aes(x = reorder(EVTYPE, -sumcost), y = sumcost)) +
    geom_bar(stat="identity",fill = "forestgreen", width = 0.5) +
    coord_flip() +
    ylab("Total Cost per Event Type (thousands of dollars)") +
    xlab("Event Type")

Finally, we consider the economic costs to different forms of weather events. The costs in both property damage and crop damage were summed as both of these are significant economic indicators.

The combined costs to property and crops are the greatest due to tornadoes and flash floods, the sum of damages by these events over the time period approaches $5B.

To limit economic and human consequences from extreme weather, investment in stronger predictive forecasting should be a priority.