Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This report discusses the impact of various weather event on public health and economy of the United States from year 1950 to year 2011.
This analysis is based on a data set gathered from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Following code snippet shows the steps to download the NOAA storm data set and load it into a data frame
cache = TRUE
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(file_url,"dataset.csv.bz2",method="libcurl")
bunzip2("dataset.csv.bz2", overwrite=T, remove=F)
orig_df <- read.csv("dataset.csv")
Sample data from the NOAA storm data set is listed below:
str(orig_df)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Following shows the dimension of the data set:
dim(orig_df)
## [1] 902297 37
Above result shows that the data set has 902297 of records and it consists of 37 attributes.
By referencing to the National Weather Service Storm Data Documentation, we have decided to choose the following attributes for further analysis:
| Area of Concern | Attributes | Description |
|---|---|---|
| Event | EVTYPE | Event Type |
| Public Health | FATALITIES | Number of fatality cases |
| Public Health | INJURIES | Number of injury cases |
| Economy | PROPDMG | Dollar amounts on property damage |
| Economy | PROPDMGEXP | Unit/Multiplier of dollar amounts for property damage |
| Economy | CROPDMG | Dollar amounts on crop damage |
| Economy | CROPDMGEXP | Unit/Multiplier of dollar amounts for crop damage |
As per Appendix B in National Weather Service Storm Data Documentation. Following rules have been derived to interpret PROPDMGEXP and CROPDMGEXP column value:
To enable this analysis, following subset of data has been prepared as base data set for subsequent analysis:
storm_df <- sqldf('select EVTYPE, FATALITIES, INJURIES,
CASE WHEN PROPDMGEXP == "H" OR PROPDMGEXP == "h"
THEN PROPDMG * 100
WHEN PROPDMGEXP == "K" OR PROPDMGEXP == "k"
THEN PROPDMG * 1000
WHEN PROPDMGEXP == "M" OR PROPDMGEXP == "m"
THEN PROPDMG * 1000000
WHEN PROPDMGEXP == "B" OR PROPDMGEXP == "b"
THEN PROPDMG * 1000000000
WHEN PROPDMGEXP IN ("1","2","3","4","5","6","7","8","9")
THEN PROPDMG * CAST(PROPDMGEXP AS INT)
ELSE 0
END PROPDMG_VALUE,
CASE WHEN CROPDMGEXP == "H" OR CROPDMGEXP == "h"
THEN CROPDMG * 100
WHEN CROPDMGEXP == "K" OR CROPDMGEXP == "k"
THEN CROPDMG * 1000
WHEN CROPDMGEXP == "M" OR CROPDMGEXP == "m"
THEN CROPDMG * 1000000
WHEN CROPDMGEXP == "B" OR CROPDMGEXP == "b"
THEN CROPDMG * 1000000000
WHEN CROPDMGEXP IN ("1","2","3","4","5","6","7","8","9")
THEN CROPDMG * CAST(CROPDMGEXP AS INT)
ELSE 0
END CROPDMG_VALUE
from orig_df')
## Loading required package: tcltk
This subset data set has also been grouped by event type to simplify the analysis process. Necessary changes on data type has also been made.
group_event_df <- group_by(storm_df, EVTYPE)
group_event_df$PROPDMG_VALUE <- as.numeric(group_event_df$PROPDMG_VALUE)
group_event_df$CROPDMG_VALUE <- as.numeric(group_event_df$CROPDMG_VALUE)
For more information on NOAA storm data preparation process and measurement metrics, please refer to National Weather Service Storm Data Documentation
The total sum of injury and fatality cases have been aggregated below. This codes analyze which type of severe weather event has most impact to population health:
injuries_df_tmp <- summarise(group_event_df, Fatalities=sum(FATALITIES, na.rm = TRUE), Injuries=sum(INJURIES, na.rm = TRUE), Total=sum(sum(INJURIES, na.rm = TRUE),sum(FATALITIES, na.rm = TRUE)))
injuries_df <- sqldf('select * from injuries_df_tmp order by Total DESC')
## Loading required package: tcltk
We have assumed that both cases of injury and cases of fatality have the same impact to population health. No weightage index should be applied to both in this analysis.
Listed below shows the steps to derive top 10 most impactful weather events to population health:
top_injuries_df <- head(injuries_df, 10)
top_injuries_df <- melt(top_injuries_df[,1:3], id="EVTYPE")
colnames(top_injuries_df)[2] <- "Type"
colnames(top_injuries_df)[3] <- "Total"
top_injuries_df
## EVTYPE Type Total
## 1 TORNADO Total_Fatalities 5633
## 2 EXCESSIVE HEAT Total_Fatalities 1903
## 3 TSTM WIND Total_Fatalities 504
## 4 FLOOD Total_Fatalities 470
## 5 LIGHTNING Total_Fatalities 816
## 6 HEAT Total_Fatalities 937
## 7 FLASH FLOOD Total_Fatalities 978
## 8 ICE STORM Total_Fatalities 89
## 9 THUNDERSTORM WIND Total_Fatalities 133
## 10 WINTER STORM Total_Fatalities 206
## 11 TORNADO Total_Injuries 91346
## 12 EXCESSIVE HEAT Total_Injuries 6525
## 13 TSTM WIND Total_Injuries 6957
## 14 FLOOD Total_Injuries 6789
## 15 LIGHTNING Total_Injuries 5230
## 16 HEAT Total_Injuries 2100
## 17 FLASH FLOOD Total_Injuries 1777
## 18 ICE STORM Total_Injuries 1975
## 19 THUNDERSTORM WIND Total_Injuries 1488
## 20 WINTER STORM Total_Injuries 1321
Figure listed below shows the top 10 severe weather event types that are most impactful to the United States populaion health from year 1950 - 2011:
ggplot(top_injuries_df, aes(x=reorder(EVTYPE,Total) , y=Total, fill=Type)) + geom_bar(stat="identity") + coord_flip() + theme_bw() + scale_fill_manual(values = c("orange","blue")) + ylab("Total Injuries and Fatalities") +xlab("Sever Weather Event Type") + ggtitle("Top 10 Severe Weather Events by Impact of Population Health\n")
The total of property damage and crop damage dollar amount have been aggregated as listed below. This is to analyze the type of severe weather event which has most impact to the United States Economy in year 1950 - 2011:
economy_df_tmp <- summarise(group_event_df, Property_Damage=sum(PROPDMG_VALUE, na.rm = TRUE)/1000000, Crop_Damage=sum(CROPDMG_VALUE, na.rm = TRUE)/1000000, Total=sum(sum(PROPDMG_VALUE, na.rm = TRUE),sum(CROPDMG_VALUE, na.rm = TRUE))/1000000)
economy_df <- sqldf('select * from economy_df_tmp order by Total DESC')
Listed below shows the steps to derive top 10 most impactful weather events to economy :
top_economy_df <- head(economy_df, 10)
top_economy_df <- melt(top_economy_df[,1:3], id="EVTYPE")
colnames(top_economy_df)[2] <- "Type"
colnames(top_economy_df)[3] <- "Total"
top_economy_df
## EVTYPE Type Total
## 1 FLOOD Property_Damage 144657.7098
## 2 HURRICANE/TYPHOON Property_Damage 69305.8400
## 3 TORNADO Property_Damage 56937.1610
## 4 STORM SURGE Property_Damage 43323.5360
## 5 HAIL Property_Damage 15732.2674
## 6 FLASH FLOOD Property_Damage 16140.8121
## 7 DROUGHT Property_Damage 1046.1060
## 8 HURRICANE Property_Damage 11868.3190
## 9 TROPICAL STORM Property_Damage 7703.8906
## 10 WINTER STORM Property_Damage 6688.4973
## 11 FLOOD Crop_Damage 5661.9685
## 12 HURRICANE/TYPHOON Crop_Damage 2607.8728
## 13 TORNADO Crop_Damage 414.9531
## 14 STORM SURGE Crop_Damage 0.0050
## 15 HAIL Crop_Damage 3025.9545
## 16 FLASH FLOOD Crop_Damage 1421.3171
## 17 DROUGHT Crop_Damage 13972.5660
## 18 HURRICANE Crop_Damage 2741.9100
## 19 TROPICAL STORM Crop_Damage 678.3460
## 20 WINTER STORM Crop_Damage 26.9440
Figure listed below shows the top severe weather event types that are most impactful to the United States economy from year 1950 - 2011:
ggplot(top_economy_df, aes(x=reorder(EVTYPE,Total) , y=Total, fill=Type)) + geom_bar(stat="identity") + coord_flip() + theme_bw() + scale_fill_manual(values = c("orange","blue")) + ylab("Total Damages (Million Dollars)") +xlab("Sever Weather Event Type") + ggtitle("Top 10 Severe Weather Events by Impact of Economic Consequences\n")
The following conclusions have been derived from the above analysis: