Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. By using the storm database of the U.S. National Oceanic and Atmospheric Administration’s (NOAA), an analysis of which event types cause the most economic damage and damage to public health.

Fatalities are most caused by tornados and excessive heat, while injuries are caused most by tornados. The economic damage, the sum of crop and property damage, is most caused by floods. This is followed by hurricanes/typhoons, tornados and storm surges.

Data Processing

First, required libraries are loaded.

library(knitr)
library(ggplot2)
library(data.table)
library(dplyr)
library(gridExtra)

Then, the storm data file is read as CSV file and only required columns for analysis are extracted.

stormdata <- read.csv("repdata-data-StormData.csv.bz2")
stormdata <- stormdata[,c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG',
                          'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]

The EVTYPE column, that stands for event type, can be loaded as factor. The magnitude of the columns PROPDMG and CROPDMG are determined by PROPDMGEXP and CROPDMGEXP respectively. The two EXP columns stand for:

The columns PROPDMG and CROPDMG are properly transformed using their EXP columns as described above.

stormdata$EVTYPE <- as.factor(stormdata$EVTYPE)

stormdata$PROPDMG <- ifelse(stormdata$PROPDMGEXP == 'K',
                            stormdata$PROPDMG*1000,
                            ifelse(stormdata$PROPDMGEXP == 'M',
                                   stormdata$PROPDMG*1000000,
                                   ifelse(stormdata$PROPDMGEXP == 'B',
                                          stormdata$PROPDMG*1000000000,
                                          stormdata$PROPDMG)))

stormdata$CROPDMG <- ifelse(stormdata$CROPDMGEXP == 'K',
                            stormdata$CROPDMG*1000,
                            ifelse(stormdata$CROPDMGEXP == 'M',
                                   stormdata$CROPDMG*1000000,
                                   ifelse(stormdata$CROPDMGEXP == 'B',
                                          stormdata$CROPDMG*1000000000,
                                          stormdata$CROPDMG)))

The table and plot above show that for fatalities, tornados and excessive heat causes most fatalities. Injuries are most caused by tornados.

Results

The data will be analysed such that it is clear which types of events are:

Most harmful for population health

The total number of fatalities and injuries by event type are calculated. Next, the total number of fatalities and injuries are calculated and ordered descending to get a quick overview of the total number of fatalities and injuries caused by an event.

agg <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data=stormdata, sum)

totalAgg <- agg
totalAgg$TOTAL <- totalAgg$FATALITIES+totalAgg$INJURIES

totalAgg <- totalAgg[order(-totalAgg$TOTAL),]
totalAgg <- head(totalAgg, 10)

The contents of totalAgg kan be seen below.

kable(totalAgg)
EVTYPE FATALITIES INJURIES TOTAL
830 TORNADO 5633 91346 96979
123 EXCESSIVE HEAT 1903 6525 8428
854 TSTM WIND 504 6957 7461
164 FLOOD 470 6789 7259
452 LIGHTNING 816 5230 6046
269 HEAT 937 2100 3037
147 FLASH FLOOD 978 1777 2755
424 ICE STORM 89 1975 2064
759 THUNDERSTORM WIND 133 1488 1621
972 WINTER STORM 206 1321 1527
fatalitiesOrdered <- agg[order(-agg$FATALITIES),]
fatalitiesOrdered <- head(fatalitiesOrdered, 10)
injuriesOrdered <- agg[order(-agg$INJURIES),]
injuriesOrdered <- head(injuriesOrdered, 10)

To get an ordered bat plot, both fatalities and injuries are sorted.

fatalitiesOrdered$EVTYPE <-factor(fatalitiesOrdered$EVTYPE,
                                  levels=fatalitiesOrdered[
                                    order(fatalitiesOrdered$FATALITIES), "EVTYPE"])
injuriesOrdered$EVTYPE <-factor(injuriesOrdered$EVTYPE,
                                levels=injuriesOrdered[
                                  order(injuriesOrdered$INJURIES), "EVTYPE"])

Furthermore, plots of top fatalities and injuries by event type are created.

fatalitiesPlot <- ggplot(fatalitiesOrdered, aes(x=EVTYPE, y=FATALITIES)) + 
  geom_bar(stat='identity', position='dodge') +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  ggtitle('Top total fatalities by event') +
  coord_flip()

injuriesPlot <- ggplot(injuriesOrdered, aes(x=EVTYPE, y=INJURIES)) + 
  geom_bar(stat='identity') +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  ggtitle('Top total injuries by event') +
  coord_flip()

grid.arrange(fatalitiesPlot, injuriesPlot, ncol=2)

The left plot shows the top total fatalities by event type and the right plot shows top total injuries by event type. The x axis describes the number of people involved in a fatality or injury and the y axis describes the type of event that causes this fatality or injury.

Greatest economic consequences

The damage is aggregated by event type and then the property damage and crop damage are summed to get the total economic damage by event type.

economic <- aggregate(cbind(PROPDMG, CROPDMG) ~ EVTYPE, data=stormdata, sum)

economic$TOTAL <- economic$PROPDMG + economic$CROPDMG

economic <- economic[order(-economic$TOTAL),]
economic <- head(economic, 10)

The contents of the variable economic can be shown in a table.

kable(economic)
EVTYPE PROPDMG CROPDMG TOTAL
164 FLOOD 144657709807 5661968450 150319678257
406 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
830 TORNADO 56925660790 414953270 57340614060
666 STORM SURGE 43323536000 5000 43323541000
238 HAIL 15727367053 3025537890 18752904943
147 FLASH FLOOD 16140812067 1421317100 17562129167
88 DROUGHT 1046106000 13972566000 15018672000
397 HURRICANE 11868319010 2741910000 14610229010
586 RIVER FLOOD 5118945500 5029459000 10148404500
424 ICE STORM 3944927860 5022113500 8967041360

Next, the plot of top economic damage by event type is created.

economic$EVTYPE <-factor(economic$EVTYPE,
                                  levels=economic[order(economic$TOTAL),
                                                  "EVTYPE"])

ggplot(economic, aes(x=EVTYPE, y=TOTAL)) + 
  geom_bar(stat='identity', position='dodge') +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  ggtitle('Top economic damage by event type') +
  coord_flip()

This plot shows the total economic damage per event type. The x axis describes the total damage in USD, EVTYPE describes the event type that causes this damage.

As seen in the table and plot above, floods and hurricanes/typhoons cause most economic damage. This is followed by tornados, storm surges and hails.