Reproducible Research: Peer Assessment 2

Top Severe Weather Event which Impacted the United States Public Health and Economy (1950 - 2011)


Synopsis


Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This report discusses the impact of various weather event on public health and economy of the United States from year 1950 to year 2011.


Data Processing


This analysis is based on a data set gathered from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Following code snippet shows the steps to download the NOAA storm data set and load it into a data frame

cache = TRUE
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(file_url,"dataset.csv.bz2",method="libcurl")
bunzip2("dataset.csv.bz2", overwrite=T, remove=F)
orig_df <- read.csv("dataset.csv")

Sample data from the NOAA storm data set is listed below:

str(orig_df)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Following shows the dimension of the data set:

dim(orig_df)
## [1] 902297     37

Above result shows that the data set has 902297 of records and it consists of 37 attributes.

By referencing to the National Weather Service Storm Data Documentation, we have decided to choose the following attributes for further analysis:

Area of Concern Attributes Description
Event EVTYPE Event Type
Public Health FATALITIES Number of fatality cases
Public Health INJURIES Number of injury cases
Economy PROPDMG Dollar amounts on property damage
Economy PROPDMGEXP Unit/Multiplier of dollar amounts for property damage
Economy CROPDMG Dollar amounts on crop damage
Economy CROPDMGEXP Unit/Multiplier of dollar amounts for crop damage

As per Appendix B in National Weather Service Storm Data Documentation. Following rules have been derived to interpret PROPDMGEXP and CROPDMGEXP column value:

  • In the case of PROPDMGEXP and CROPDMGEXP is “K” or “k”, PROPDMG and CROPDMG is in unit of thousands
  • In the case of PROPDMGEXP and CROPDMGEXP is “M” or “m”, PROPDMG and CROPDMG is in unit of Millions
  • In the case of PROPDMGEXP and CROPDMGEXP is “B” or “b”, PROPDMG and CROPDMG is in unit of Billions
  • As the interpretation method of Character values “h”, “H” of PROPDMGEXP and CROPDMGEXP is not mentioned in the document, it is assumed that it is referring to “hundreds”, hence PROPDMG and CROPDMG should be multiply by hundreds in this case.
  • In the case of PROPDMGEXP and CROPDMGEXP is a number value, it is assumed that the PROPDMG and CROPDMG should be multiply by the mentioned number factor.
  • For any other character values contain in PROPDMGEXP and CROPDMGEXP, PROPDMG and CROPDMG should be consider 0, as it will not be considered in the summation of dollar value.

To enable this analysis, following subset of data has been prepared as base data set for subsequent analysis:

storm_df <- sqldf('select EVTYPE, FATALITIES, INJURIES, 
                  CASE WHEN PROPDMGEXP == "H" OR PROPDMGEXP == "h" 
                      THEN PROPDMG * 100
                  WHEN PROPDMGEXP == "K" OR PROPDMGEXP == "k"
                       THEN PROPDMG * 1000
                  WHEN PROPDMGEXP == "M" OR PROPDMGEXP == "m"
                       THEN PROPDMG * 1000000
                  WHEN PROPDMGEXP == "B" OR PROPDMGEXP == "b"
                       THEN PROPDMG * 1000000000
                  WHEN PROPDMGEXP IN ("1","2","3","4","5","6","7","8","9")
                       THEN PROPDMG * CAST(PROPDMGEXP AS INT)
                  ELSE 0
                  END PROPDMG_VALUE,
                  CASE WHEN CROPDMGEXP == "H" OR CROPDMGEXP == "h" 
                      THEN CROPDMG * 100
                  WHEN CROPDMGEXP == "K" OR CROPDMGEXP == "k"
                       THEN CROPDMG * 1000
                  WHEN CROPDMGEXP == "M" OR CROPDMGEXP == "m"
                       THEN CROPDMG * 1000000
                  WHEN CROPDMGEXP == "B" OR CROPDMGEXP == "b"
                       THEN CROPDMG * 1000000000
                  WHEN CROPDMGEXP IN ("1","2","3","4","5","6","7","8","9")
                       THEN CROPDMG * CAST(CROPDMGEXP AS INT)
                  ELSE 0
                  END CROPDMG_VALUE
                  from orig_df')
## Loading required package: tcltk

This subset data set has also been grouped by event type to simplify the analysis process. Necessary changes on data type has also been made.

group_event_df <- group_by(storm_df, EVTYPE)
group_event_df$PROPDMG_VALUE <- as.numeric(group_event_df$PROPDMG_VALUE)
group_event_df$CROPDMG_VALUE <- as.numeric(group_event_df$CROPDMG_VALUE)

For more information on NOAA storm data preparation process and measurement metrics, please refer to National Weather Service Storm Data Documentation


Results



Data Analysis: Population Health


The total sum of injury and fatality cases have been aggregated below. This codes analyze which type of severe weather event has most impact to population health:

injuries_df_tmp <- summarise(group_event_df,  Fatalities=sum(FATALITIES, na.rm = TRUE), Injuries=sum(INJURIES, na.rm = TRUE), Total=sum(sum(INJURIES, na.rm = TRUE),sum(FATALITIES, na.rm = TRUE)))

injuries_df <- sqldf('select * from injuries_df_tmp order by Total DESC')
## Loading required package: tcltk

We have assumed that both cases of injury and cases of fatality have the same impact to population health. No weightage index should be applied to both in this analysis.

Listed below shows the steps to derive top 10 most impactful weather events to population health:

top_injuries_df <- head(injuries_df, 10)
top_injuries_df <- melt(top_injuries_df[,1:3], id="EVTYPE")
colnames(top_injuries_df)[2] <- "Type"
colnames(top_injuries_df)[3] <- "Total"
top_injuries_df
##               EVTYPE             Type Total
## 1            TORNADO Total_Fatalities  5633
## 2     EXCESSIVE HEAT Total_Fatalities  1903
## 3          TSTM WIND Total_Fatalities   504
## 4              FLOOD Total_Fatalities   470
## 5          LIGHTNING Total_Fatalities   816
## 6               HEAT Total_Fatalities   937
## 7        FLASH FLOOD Total_Fatalities   978
## 8          ICE STORM Total_Fatalities    89
## 9  THUNDERSTORM WIND Total_Fatalities   133
## 10      WINTER STORM Total_Fatalities   206
## 11           TORNADO   Total_Injuries 91346
## 12    EXCESSIVE HEAT   Total_Injuries  6525
## 13         TSTM WIND   Total_Injuries  6957
## 14             FLOOD   Total_Injuries  6789
## 15         LIGHTNING   Total_Injuries  5230
## 16              HEAT   Total_Injuries  2100
## 17       FLASH FLOOD   Total_Injuries  1777
## 18         ICE STORM   Total_Injuries  1975
## 19 THUNDERSTORM WIND   Total_Injuries  1488
## 20      WINTER STORM   Total_Injuries  1321

Figure listed below shows the top 10 severe weather event types that are most impactful to the United States populaion health from year 1950 - 2011:

ggplot(top_injuries_df, aes(x=reorder(EVTYPE,Total) , y=Total, fill=Type)) + geom_bar(stat="identity") +  coord_flip() + theme_bw() + scale_fill_manual(values = c("orange","blue")) + ylab("Total Injuries and Fatalities") +xlab("Sever Weather Event Type") + ggtitle("Top 10 Severe Weather Events by Impact of Population Health\n") 


Data Analysis: Economy


The total of property damage and crop damage dollar amount have been aggregated as listed below. This is to analyze the type of severe weather event which has most impact to the United States Economy in year 1950 - 2011:

economy_df_tmp <- summarise(group_event_df,  Property_Damage=sum(PROPDMG_VALUE, na.rm = TRUE)/1000000, Crop_Damage=sum(CROPDMG_VALUE, na.rm = TRUE)/1000000, Total=sum(sum(PROPDMG_VALUE, na.rm = TRUE),sum(CROPDMG_VALUE, na.rm = TRUE))/1000000)

economy_df <- sqldf('select * from economy_df_tmp order by Total DESC')

Listed below shows the steps to derive top 10 most impactful weather events to economy :

top_economy_df <- head(economy_df, 10)
top_economy_df <- melt(top_economy_df[,1:3], id="EVTYPE")
colnames(top_economy_df)[2] <- "Type"
colnames(top_economy_df)[3] <- "Total"
top_economy_df
##               EVTYPE            Type       Total
## 1              FLOOD Property_Damage 144657.7098
## 2  HURRICANE/TYPHOON Property_Damage  69305.8400
## 3            TORNADO Property_Damage  56937.1610
## 4        STORM SURGE Property_Damage  43323.5360
## 5               HAIL Property_Damage  15732.2674
## 6        FLASH FLOOD Property_Damage  16140.8121
## 7            DROUGHT Property_Damage   1046.1060
## 8          HURRICANE Property_Damage  11868.3190
## 9     TROPICAL STORM Property_Damage   7703.8906
## 10      WINTER STORM Property_Damage   6688.4973
## 11             FLOOD     Crop_Damage   5661.9685
## 12 HURRICANE/TYPHOON     Crop_Damage   2607.8728
## 13           TORNADO     Crop_Damage    414.9531
## 14       STORM SURGE     Crop_Damage      0.0050
## 15              HAIL     Crop_Damage   3025.9545
## 16       FLASH FLOOD     Crop_Damage   1421.3171
## 17           DROUGHT     Crop_Damage  13972.5660
## 18         HURRICANE     Crop_Damage   2741.9100
## 19    TROPICAL STORM     Crop_Damage    678.3460
## 20      WINTER STORM     Crop_Damage     26.9440

Figure listed below shows the top severe weather event types that are most impactful to the United States economy from year 1950 - 2011:

ggplot(top_economy_df, aes(x=reorder(EVTYPE,Total) , y=Total, fill=Type)) + geom_bar(stat="identity") +  coord_flip() + theme_bw() + scale_fill_manual(values = c("orange","blue")) + ylab("Total Damages (Million Dollars)") +xlab("Sever Weather Event Type") + ggtitle("Top 10 Severe Weather Events by Impact of Economic Consequences\n") 


Conclusion


The following conclusions have been derived from the above analysis:

  • TORNADO are most harmful with respect to population health.
  • FLOOD have the greatest economic consequences.