Storm data recorded by U.S. National Oceanic and Atmospheric Administration (NOAA) was explored in order to identify the most harmful storm events across the US. Observations between January 1993 and November 2011 were retained after processing the dataset. The total number of casualties (deaths and injuries) was calculated for each type of storm event as a measure of harm to population health. The total damage (in thousands of dollars) to property and crops was calculated for each type of storm event as a measure of the economic consequences of the events. The events causing the greatest harm to population health were found to be tornadoes and floods. The events with the largest economic consequences were found to be floods, hurricanes and tornadoes.
Data was downloaded as a compressed file from this link. The data was unzipped and read into R as a data frame.
Size of data: 902297 rows (observations) and 37 columns (variables).
The first few rows of the data look like this:
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
There are 48 storm event types defined in NWSI 10-1605 part 2.1.1.
There are 985 types of events recorded in the data frame.
For consistency, the event types had to be edited and organised into the 48 types. Firstly, any events which caused no damage to population health or economic damage are irrelevant to this study, and so were removed. Next, any observations recording summary or monthly data are not events, and were also removed. Thereafter, unnecessary adjectives or whitespace in the names of the events were removed, allowing some events of the same type to be united. The processed data was copied into a new data frame named ‘tidy’.
After this tidying process, the number of types of events was still much larger than 48.
length(unique(tidy$EVTYPE))
## [1] 404
In order to unify more types of events it was necessary to do string searches to look for synonyms or spelling mistakes and rename them with an event name from the list of the 48 defined types. Whilst searching through the names of the events, some of them were found to be too ambiguous and were defined as “bad”and subsequently removed.
Subsequently, all the event types were included in the defined categories. However, not all the 48 categories appear in the final data set. This is presumably because a few of the categories were not reported to have caused damage, and so were removed at the beginning of the process.
The variable EVTYPE was reclassed as a factor variable using the event types remaining after the above process as the levels.
## [1] "tornado" "thunderstorm wind"
## [3] "hail" "ice storm"
## [5] "winter storm" "hurricane(typhoon)"
## [7] "heavy rain" "lightning"
## [9] "freezing fog" "rip current"
## [11] "flash flood" "heat"
## [13] "high wind" "cold/wind chill"
## [15] "flood" "waterspout"
## [17] "extreme cold/wind chill" "frost/freeze"
## [19] "avalanche" "high surf"
## [21] "heavy snow" "dust storm"
## [23] "sleet" "dust devil"
## [25] "excessive heat" "wildfire"
## [27] "debris flow" "funnel cloud"
## [29] "strong wind" "blizzard"
## [31] "storm surge/tide" "tropical storm"
## [33] "winter weather" "lake-effect snow"
## [35] "coastal flood" "seiche"
## [37] "volcanic ash" "marine thunderstorm wind"
## [39] "tropical depression" "tsunami"
## [41] "marine strong wind" "dense smoke"
In the original dataset there are two variables relating to population health:FATALITIES and INJURIES. Both are recorded as number of incidents. In order to measure overall damage to population health a new variable “health” was created, as their sum.
Economic damage is recorded in four variables in the original dataset, PROPDMG - property damage (numeric), PROPDMGEXP - units of property damage, CROPDMG - damage to crops (numeric), and CROPDMGEXP - units of damage to crops. In order to measure overall economic damage, the PROPDMG and CROPDMG values all needed to be converted into values measured in thousands of dollars. The PROPDMGEXP and CROPDMGEXP variables with value k or K are in thousands of dollars, m or M are in millions and B are in billions. There is no information given as to the meaning of other values of these variables, but as the following tables show, there are relatively few occurences of other values, and so they were removed from the data set.
table(tidy$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 11534 1 0 5 210 0 1 1 4 18
## 6 7 8 B h H K m M
## 3 3 0 40 1 6 231237 7 11315
table(tidy$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 152451 6 17 0 7 21 99901 1 1982
Now the economic damage can be converted to thousands of dollars and a new variable “econ” is introduced as the sum of PROPDMG and CROPDMG in thousands of dollars.
Next, the dates variable was processed into an appropriate date format.
The observations remaining in the processed dataset range from 1993-01-04 to 2011-11-30.
The following figure shows the total number of deaths and injuries caused by each event type between 1993-01-04 and 2011-11-30.
Event number 34 and event number 12 clearly stand out as the most harmful types of events.
These events and total numbers of casualties are:
## tornado
## 13049
## flood
## 6763
Other events causing less harm, are:
## thunderstorm wind ice storm heat
## 2018 1629 1379
## lightning flash flood excessive heat
## 1183 1081 1070
## hurricane(typhoon) wildfire
## 1019 642
The following figure show the total damage to property and crops caused by each event type (in thousands of dollars).
Event type 12 clearly stands out as the event causing the greatest economic damage, followed by event 22, and then event 34. Events 16,11 and also 33,31 and 23 are non trivial . The events and the total damage in thousands of dollars are:
## flood hurricane(typhoon) tornado
## 148545617 44330001 18122666
## hail flash flood ice storm
## 10021701 9223297 5925147
## thunderstorm wind storm surge/tide
## 5512876 4644413