In this study, we used the data published by the NOAA storm Database to study the events that are most harmful with respect to population health and has the greatest economic consequences.
setwd('~/Documents/')
dat <- read.table('repdata.data.StormData.csv.bz2', header = TRUE, sep = ',', as.is = TRUE)
head(dat, 2)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14 100 3 0 0
## 2 NA 0 2 150 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
There are 902297 lines loaded from the data, in which 985 kinds of events were recorded.
event <- dat$EVTYPE
event.table <- sort(table(event), decreasing = TRUE)
sum(event.table > 1e4)
## [1] 12
sum(event.table[event.table > 1e4])/sum(event.table)
## [1] 0.9166516
Among all recorded events, the pie chart shows the top 12 events that play a major roles (92%) of all observed events
top.events <- event.table[event.table > 1e4]/sum(event.table)
top.events <- c(1 - sum(top.events), top.events)
names(top.events)[1] <- 'Others'
top.events <- sort(top.events, decreasing = TRUE)
pie(top.events, main = 'Major Events Occureed Frequently')
There are three variables of interest in this dataset, in which FATALITIES and INJURIES measure public health risk, which PROPDMG measures economic consequences. Both FATALITIES and INJURIES are counts, and in most of the cases, their values are zero
mean(dat$FATALITIES == 0)
## [1] 0.9922708
mean(dat$INJURIES == 0)
## [1] 0.9804898
To find the most harmful event that threatens the public health, we can calculate the means of FATALITIES and INJURIES for each type of events. The top 10 most harmful events for fatalities and injuries are listed as below
mean.fat <- sort(tapply(dat$FATALITIES, dat$EVTYPE, mean), decreasing = TRUE)
mean.inj <- sort(tapply(dat$INJURIES, dat$EVTYPE, mean), decreasing = TRUE)
head(mean.fat, 10)
## TORNADOES, TSTM WIND, HAIL COLD AND SNOW
## 25.000000 14.000000
## TROPICAL STORM GORDON RECORD/EXCESSIVE HEAT
## 8.000000 5.666667
## EXTREME HEAT HEAT WAVE DROUGHT
## 4.363636 4.000000
## HIGH WIND/SEAS MARINE MISHAP
## 4.000000 3.500000
## WINTER STORMS Heavy surf and wind
## 3.333333 3.000000
head(mean.inj, 10)
## Heat Wave TROPICAL STORM GORDON WILD FIRES
## 70.00000 43.00000 37.50000
## THUNDERSTORMW HIGH WIND AND SEAS SNOW/HIGH WINDS
## 27.00000 20.00000 18.00000
## GLAZE/ICE STORM HEAT WAVE DROUGHT WINTER STORM HIGH WINDS
## 15.00000 15.00000 15.00000
## HURRICANE/TYPHOON
## 14.48864
The most harmful event for fatalities and injuries are TORNADOES, TSTM WIND, HAIL and Heat Wave, respectively.
Similarly, the economic consequences can be measured by the mean of PROPDMG.
mean.eco <- sort(tapply(dat$PROPDMG, dat$EVTYPE, mean), decreasing = TRUE)
head(mean.eco)
## COASTAL EROSION HEAVY RAIN AND FLOOD RIVER AND STREAM FLOOD
## 766 600 600
## Landslump BLIZZARD/WINTER STORM FLASH FLOOD/
## 570 500 500
Thus, the event that has the greatest economic consequences is COASTAL EROSION.