Synopsis

Storm data recorded by U.S. National Oceanic and Atmospheric Administration (NOAA) was explored in order to identify the most harmful storm events across the US. Observations between January 1993 and November 2011 were retained after processing the dataset. The total number of casualties (deaths and injuries) was calculated for each type of storm event as a measure of harm to population health. The total damage (in thousands of dollars) to property and crops was calculated for each type of storm event as a measure of the economic consequences of the events. The events causing the greatest harm to population health were found to be tornadoes and floods. The events with the largest economic consequences were found to be floods, hurricanes and tornadoes.

Data Processing

Data was downloaded as a compressed file from this link. The data was unzipped and read into R as a data frame.

Size of data: 902297 rows (observations) and 37 columns (variables).

The first few rows of the data look like this:

##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

There are 48 storm event types defined in NWSI 10-1605 part 2.1.1.
There are 985 types of events recorded in the data frame.
For consistency, the event types had to be edited and organised into the 48 types. Firstly, any events which caused no damage to population health or economic damage are irrelevant to this study, and so were removed. Next, any observations recording summary or monthly data are not events, and were also removed. Thereafter, unnecessary adjectives or whitespace in the names of the events were removed, allowing some events of the same type to be united. The processed data was copied into a new data frame named ‘tidy’.

After this tidying process, the number of types of events was still much larger than 48.

length(unique(tidy$EVTYPE))
## [1] 404

In order to unify more types of events it was necessary to do string searches to look for synonyms or spelling mistakes and rename them with an event name from the list of the 48 defined types. Whilst searching through the names of the events, some of them were found to be too ambiguous and were defined as “bad”and subsequently removed.

Subsequently, all the event types were included in the defined categories. However, not all the 48 categories appear in the final data set. This is presumably because a few of the categories were not reported to have caused damage, and so were removed at the beginning of the process.
The variable EVTYPE was reclassed as a factor variable using the event types remaining after the above process as the levels.

##  [1] "tornado"                  "thunderstorm wind"       
##  [3] "hail"                     "ice storm"               
##  [5] "winter storm"             "hurricane(typhoon)"      
##  [7] "heavy rain"               "lightning"               
##  [9] "freezing fog"             "rip current"             
## [11] "flash flood"              "heat"                    
## [13] "high wind"                "cold/wind chill"         
## [15] "flood"                    "waterspout"              
## [17] "extreme cold/wind chill"  "frost/freeze"            
## [19] "avalanche"                "high surf"               
## [21] "heavy snow"               "dust storm"              
## [23] "sleet"                    "dust devil"              
## [25] "excessive heat"           "wildfire"                
## [27] "debris flow"              "funnel cloud"            
## [29] "strong wind"              "blizzard"                
## [31] "storm surge/tide"         "tropical storm"          
## [33] "winter weather"           "lake-effect snow"        
## [35] "coastal flood"            "seiche"                  
## [37] "volcanic ash"             "marine thunderstorm wind"
## [39] "tropical depression"      "tsunami"                 
## [41] "marine strong wind"       "dense smoke"

In the original dataset there are two variables relating to population health:FATALITIES and INJURIES. Both are recorded as number of incidents. In order to measure overall damage to population health a new variable “health” was created, as their sum.

Economic damage is recorded in four variables in the original dataset, PROPDMG - property damage (numeric), PROPDMGEXP - units of property damage, CROPDMG - damage to crops (numeric), and CROPDMGEXP - units of damage to crops. In order to measure overall economic damage, the PROPDMG and CROPDMG values all needed to be converted into values measured in thousands of dollars. The PROPDMGEXP and CROPDMGEXP variables with value k or K are in thousands of dollars, m or M are in millions and B are in billions. There is no information given as to the meaning of other values of these variables, but as the following tables show, there are relatively few occurences of other values, and so they were removed from the data set.

table(tidy$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
##  11534      1      0      5    210      0      1      1      4     18 
##      6      7      8      B      h      H      K      m      M 
##      3      3      0     40      1      6 231237      7  11315
table(tidy$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 152451      6     17      0      7     21  99901      1   1982

Now the economic damage can be converted to thousands of dollars and a new variable “econ” is introduced as the sum of PROPDMG and CROPDMG in thousands of dollars.

Next, the dates variable was processed into an appropriate date format.

The observations remaining in the processed dataset range from 1993-01-04 to 2011-11-30.

Results

Total number of casualties (fatalities plus injuries) by event type

The following figure shows the total number of deaths and injuries caused by each event type between 1993-01-04 and 2011-11-30.

Event number 34 and event number 12 clearly stand out as the most harmful types of events.
These events and total numbers of casualties are:

## tornado 
##   13049
## flood 
##  6763

Other events causing less harm, are:

##  thunderstorm wind          ice storm               heat 
##               2018               1629               1379 
##          lightning        flash flood     excessive heat 
##               1183               1081               1070 
## hurricane(typhoon)           wildfire 
##               1019                642

Total economic damage by event type

The following figure show the total damage to property and crops caused by each event type (in thousands of dollars).

Event type 12 clearly stands out as the event causing the greatest economic damage, followed by event 22, and then event 34. Events 16,11 and also 33,31 and 23 are non trivial . The events and the total damage in thousands of dollars are:

##              flood hurricane(typhoon)            tornado 
##          148545617           44330001           18122666 
##               hail        flash flood          ice storm 
##           10021701            9223297            5925147 
##  thunderstorm wind   storm surge/tide 
##            5512876            4644413