In this study, we used the data published by the NOAA storm Database to study the events that are most harmful with respect to population health and has the greatest economic consequences.

Data Processing

setwd('~/Documents/')
dat <- read.table('repdata.data.StormData.csv.bz2', header = TRUE, sep = ',', as.is = TRUE)
head(dat, 2)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                        14   100 3   0          0
## 2         NA         0                         2   150 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2

There are 902297 lines loaded from the data, in which 985 kinds of events were recorded.

event <- dat$EVTYPE
event.table <- sort(table(event), decreasing = TRUE)
sum(event.table > 1e4)
## [1] 12
sum(event.table[event.table > 1e4])/sum(event.table)
## [1] 0.9166516

Among all recorded events, the pie chart shows the top 12 events that play a major roles (92%) of all observed events

top.events <- event.table[event.table > 1e4]/sum(event.table)
top.events <- c(1 - sum(top.events), top.events)
names(top.events)[1] <- 'Others'
top.events <- sort(top.events, decreasing = TRUE)
pie(top.events, main = 'Major Events Occureed Frequently')

Results

There are three variables of interest in this dataset, in which FATALITIES and INJURIES measure public health risk, which PROPDMG measures economic consequences. Both FATALITIES and INJURIES are counts, and in most of the cases, their values are zero

mean(dat$FATALITIES == 0)
## [1] 0.9922708
mean(dat$INJURIES == 0)
## [1] 0.9804898

To find the most harmful event that threatens the public health, we can calculate the means of FATALITIES and INJURIES for each type of events. The top 10 most harmful events for fatalities and injuries are listed as below

mean.fat <- sort(tapply(dat$FATALITIES, dat$EVTYPE, mean), decreasing = TRUE)
mean.inj <- sort(tapply(dat$INJURIES, dat$EVTYPE, mean), decreasing = TRUE)
head(mean.fat, 10)
## TORNADOES, TSTM WIND, HAIL              COLD AND SNOW 
##                  25.000000                  14.000000 
##      TROPICAL STORM GORDON      RECORD/EXCESSIVE HEAT 
##                   8.000000                   5.666667 
##               EXTREME HEAT          HEAT WAVE DROUGHT 
##                   4.363636                   4.000000 
##             HIGH WIND/SEAS              MARINE MISHAP 
##                   4.000000                   3.500000 
##              WINTER STORMS        Heavy surf and wind 
##                   3.333333                   3.000000
head(mean.inj, 10)
##               Heat Wave   TROPICAL STORM GORDON              WILD FIRES 
##                70.00000                43.00000                37.50000 
##           THUNDERSTORMW      HIGH WIND AND SEAS         SNOW/HIGH WINDS 
##                27.00000                20.00000                18.00000 
##         GLAZE/ICE STORM       HEAT WAVE DROUGHT WINTER STORM HIGH WINDS 
##                15.00000                15.00000                15.00000 
##       HURRICANE/TYPHOON 
##                14.48864

The most harmful event for fatalities and injuries are TORNADOES, TSTM WIND, HAIL and Heat Wave, respectively.

Similarly, the economic consequences can be measured by the mean of PROPDMG.

mean.eco <- sort(tapply(dat$PROPDMG, dat$EVTYPE, mean), decreasing = TRUE)
head(mean.eco)
##        COASTAL EROSION   HEAVY RAIN AND FLOOD RIVER AND STREAM FLOOD 
##                    766                    600                    600 
##              Landslump  BLIZZARD/WINTER STORM           FLASH FLOOD/ 
##                    570                    500                    500

Thus, the event that has the greatest economic consequences is COASTAL EROSION.