Impact of Weather Events on Public Health and Property between 1950 and 2011

Synopsis

In this report we aim to identify the weather events that have caused the most harm to publich health and economic impact. To investigate the events, we obtained the data collected by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) in the storm database. NOAA has published the data for severe weather events from 1950 to November 2011.

Based on our analysis, we found that Wild Fires have caused the highest impact on the public health, with highest mean fatalities and injuries per event. On the other hand Storm Surge has resulted in the highest mean economic damage for the property and crops, per event.

Data Processing

From the NOAA website we acquire the Storm Data along with the National Weather Service Storm Data Documentation and the National Climatic Data Center Storm Events FAQ. After downloading the data file “repdata-data-StormData.csv.bz2” in the current working directory, we wse read.csv() to read the data file with bzfile format.

        raw_data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))  

Steps for analyzing data for population harm are summarized below.
- We subset on the Event type, Fatalities and Injuries from the raw data.
- Using mutate() in dplyr package, we create a new variable (IMPACT_PUBLIC_HEALTH) to add up the number of reported fatalities and injuries.
- We drop the records which have no fatalities or injuries reported in the event.
- Using the aggregate(), we calculate the mean per event type.
- Lastly, we sort the data frame in descending order of population harm and subset the top 20 events.

        require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
        require(reshape2)
## Loading required package: reshape2
        NOAA_data1 <- raw_data[,c("EVTYPE","FATALITIES","INJURIES")]       
        # create a new variable to sum the public harm from fatalities and injuries
        NOAA_data1 <- mutate(NOAA_data1, IMPACT_PUBLIC_HEALTH = round(FATALITIES + INJURIES))
        # drop any event not giving any Fatalities or injuries data for this part of the analysis
        NOAA_data1 <- NOAA_data1[NOAA_data1$IMPACT_PUBLIC_HEALTH > 0,] 
        # Find the means per event
        NOAA_agg1 <- aggregate(IMPACT_PUBLIC_HEALTH ~ EVTYPE, data = NOAA_data1, FUN = mean, na.rm= TRUE)
        #Get top 20 events in decreasing order
        NOAA_plot1 <- head(NOAA_agg1[order((NOAA_agg1$IMPACT_PUBLIC_HEALTH), decreasing = TRUE),],n=20)

Steps for analyzing the property damage are similar. I create seperate data frame because the events causing the public harm can different from the events causing the economic damage.
- We create a new data frame with event type, property damage, property damage exponent, crop damage and crop damage exponent.
- There is a special processing steps needed for calculating the dollar impact. We need to multiply the damage exponent for property and crop damage to the respective value and divide by 1 Billion to normalize the data in billions. - Using mutate() in dplyr package, we create a new variable (PROP_VALUE) for calculating the value of property damage and a second variable (CROP_VALUE) for calculating the value of crop damage. We added PROP_VALUE and CROP_VALUE to find the total impact in a third variable (IMPACT_PROP_DAMAGE).
- We get rid of the rows that does not have any reported economic damage.
- Using the aggregate(), we calculate the mean per event type.
- Lastly, we sort the aggregated data in descending order and subset the top 20 events for the results.

        # create a table for highest impact on property damage
        NOAA_data2 <- raw_data[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
        #Create a new variable and multiply the exponent with the property damage value
        NOAA_data2 <- mutate(NOAA_data2, PROP_VALUE = 
                                     ifelse(PROPDMGEXP %in% c("h","H"), PROPDMG*100,
                                        ifelse(PROPDMGEXP %in% c("k","K"), PROPDMG*1000,
                                               ifelse(PROPDMGEXP %in% c("m","M"), PROPDMG*1000000,
                                                      ifelse(PROPDMGEXP %in% c("b","B"), PROPDMG*1000000000,0)))))
        
        #Create a new variable and multiply the exponent with the crop damage value
        NOAA_data2 <- mutate(NOAA_data2, CROP_VALUE = 
                                     ifelse(CROPDMGEXP %in% c("h","H"), CROPDMG*100,
                                        ifelse(CROPDMGEXP %in% c("k","K"), CROPDMG*1000,
                                               ifelse(CROPDMGEXP %in% c("m","M"), CROPDMG*1000000,
                                                      ifelse(CROPDMGEXP %in% c("b","B"), CROPDMG*1000000000,0)))))
        
        #Add the property damage value and the crop damage value to create a new variable for the total economic impact.
        #Normalize the damage to Billions of dollars
        NOAA_data2 <- mutate(NOAA_data2, 
                             IMPACT_PROP_DAMAGE = round((PROP_VALUE+CROP_VALUE)/1000000000),-3)
        #drop any event does not have any economic impact for this section of analysis.
        NOAA_data2 <- NOAA_data2[NOAA_data2$IMPACT_PROP_DAMAGE > 0,] 
        #Calculate mean per event before presenting the results
        NOAA_agg2 <- aggregate(IMPACT_PROP_DAMAGE ~ EVTYPE, data = NOAA_data2, FUN = mean, na.rm= TRUE)
        #Get top 20 event types causing the economic damage
        NOAA_plot2 <- head(NOAA_agg2[order((NOAA_agg2$IMPACT_PROP_DAMAGE), decreasing = TRUE),],n=20)

Figures below show the top 20 events resulting in population harm and economic damage. Instead of using the ploting the aggregated data from the analysis above, we show the average and the variability in the data using box plots.

        require(ggplot2)
## Loading required package: ggplot2
        #subset the unaggregated data for top 20 events identified with highest mean
        plot1<-ggplot(NOAA_data1[NOAA_data1$EVTYPE %in% NOAA_plot1$EVTYPE,], aes(factor(EVTYPE),IMPACT_PUBLIC_HEALTH), outlier.colour = "red")
        plot1 + geom_boxplot() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
                labs(x = "Event Types", 
                        title="Weather Events Responsible for Highest Fatalities & Injuries (1950-2011)",
                        y="Population Impact") 

        #subset the unaggregated data for top 20 events identified with highest mean  
        plot2<-ggplot(NOAA_data2[NOAA_data2$EVTYPE %in% NOAA_plot2$EVTYPE,], aes(factor(EVTYPE),IMPACT_PROP_DAMAGE), outliers.colour="red")
        plot2+geom_boxplot()   + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
                labs(x = "Event Types", 
                        title="Weather Events Responsible for Economic Damage (1950-2011)",
                        y="Economic Impact in Billions of US Dollars")

Results

Q1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Based on the above analysis, WILD FIRES are most harmful with respect to population health, with a mean of 153 fatalities or injuries per event. Table below shows top 20 severe weather events resulting in the most harm in decreasing order.

        NOAA_plot1$IMPACT_PUBLIC_HEALTH <- round(NOAA_plot1$IMPACT_PUBLIC_HEALTH, 2)
        colnames(NOAA_plot1) <- c("Event Type", "Fatalities & Injuries")
        NOAA_plot1
##                     Event Type Fatalities & Injuries
## 208                 WILD FIRES                153.00
## 196                    TSUNAMI                 81.00
## 70                   Heat Wave                 70.00
## 109          HURRICANE/TYPHOON                 51.50
## 190      TROPICAL STORM GORDON                 51.00
## 218         WINTER WEATHER MIX                 34.00
## 200  UNSEASONABLY WARM AND DRY                 29.00
## 181              THUNDERSTORMW                 27.00
## 216              WINTER STORMS                 27.00
## 145                RECORD HEAT                 26.00
## 220                 WINTRY MIX                 26.00
## 3                    BLACK ICE                 25.00
## 187 TORNADOES, TSTM WIND, HAIL                 25.00
## 33          EXCESSIVE RAINFALL                 23.00
## 95          HIGH WIND AND SEAS                 23.00
## 206         WATERSPOUT/TORNADO                 22.50
## 117                  ICE STORM                 21.73
## 72           HEAT WAVE DROUGHT                 19.00
## 164            SNOW/HIGH WINDS                 18.00
## 39                EXTREME HEAT                 17.93

Q2. Across the UnitedStates, which types of events have the greatest economic consequences?
Based on the above analysis, STORM SURGE have resulted in the greatest economic consequences, with a mean of 21 Billion dollars damage per event. Table below shows top 20 severe weather events resulting in the highest economic damage in decreasing order.

        NOAA_plot2$IMPACT_PROP_DAMAGE <- round(NOAA_plot2$IMPACT_PROP_DAMAGE,2)
        colnames(NOAA_plot2) <- c("Event Type", "Economic Impact (Billions US$)")

        NOAA_plot2
##                    Event Type Economic Impact (Billions US$)
## 14                STORM SURGE                          21.00
## 4                       FLOOD                          10.83
## 12                RIVER FLOOD                          10.00
## 11                  ICE STORM                           5.00
## 10          HURRICANE/TYPHOON                           4.73
## 15           STORM SURGE/TIDE                           4.00
## 19             TROPICAL STORM                           3.00
## 23               WINTER STORM                           3.00
## 6   HEAVY RAIN/SEVERE WEATHER                           2.00
## 18 TORNADOES, TSTM WIND, HAIL                           2.00
## 17                    TORNADO                           1.75
## 8                   HURRICANE                           1.50
## 9              HURRICANE OPAL                           1.50
## 21           WILD/FOREST FIRE                           1.50
## 5                        HAIL                           1.25
## 1                     DROUGHT                           1.00
## 2                EXTREME COLD                           1.00
## 3                 FLASH FLOOD                           1.00
## 7                   HIGH WIND                           1.00
## 13        SEVERE THUNDERSTORM                           1.00