Synopsis

The basic goal of this assignment is to explore the NOAA Storm Database to study which severe weather events have the most serious effect on healty and economy.

Documentation of the data can be found on the next link:

Storm Data documentations

Total event numbers and damages were calculated for each event types, and this numbers were described on barplots for the largest numbers in decreasing orders.

R version 4.0.2 [1], RStudio (Version 1.3.1056), the plyr [2], the ggplot2 [3] and the R.utils [4] packages were used to data processing, to carry out the anayses and for plotting.

  1. R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  2. Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL http://www.jstatsoft.org/v40/i01/.
  3. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
  4. Henrik Bengtsson (2019). R.utils: Various Programming Utilities. R package version 2.9.2. https://CRAN.R-project.org/package=R.utils

Data Processing

The storm database of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) was used for the analyses, that can be downloaded from the course website:

Storm data

The data were extracted, and imported using the read.csv() function. Two subsets were made. The first (stormdata_health) for those records where the number or fatalities or inhuries were not zero or missing, and the second for those record where neither type of damages was 0 or missing (stormdata_dmg).

Using the tapply function, the total number of health events were calculated. To reduce the number of events to plot, I subsetted those events where the number of fatalities was more then 100 and the number of injuries was more than 1000. These numbers were plotted using the ggplot() function.

In case of damages, PROPDMGEXP and CROPDMGEXP variables had to be recoded to be numbers given by the documentation, and using these number total damages had to be calculated.

After these calculations the two types of dameges were added upe, and a subset was made for those events where the total damage was more than 1 billion dollars. The amounts were transformed to 1 billion dollar units, and a bargraph was made with decreasing damages.

stormdata<-read.csv("repdata_data_StormData.csv")

# subsetting records with positive number of death and injuries

stormdata_health <-stormdata[stormdata$FATALITIES>0 | stormdata$INJURIES>0, c("EVTYPE", "FATALITIES", "INJURIES")]
storm_fatal<-with(stormdata,tapply(FATALITIES, EVTYPE, sum))
storm_fatal<-storm_fatal[storm_fatal>100]
storm_fatal_df<-data.frame(EVTYPE=names(storm_fatal),FATALITIES=storm_fatal)
storm_inj<-with(stormdata,tapply(INJURIES, EVTYPE, sum))
storm_inj<-storm_inj[storm_inj>1000]
storm_inj_df<-data.frame(EVTYPE=names(storm_inj), INJURIES=storm_inj)
library(plyr)
stormdata_dmg <-stormdata[stormdata$CROPDMG>0 | stormdata$PROPDMG>0, c("EVTYPE", "CROPDMG", "PROPDMG","PROPDMGEXP","CROPDMGEXP")]

#Proprietary damages

stormdata_dmg$PROPDMGEXP_num <- mapvalues(stormdata_dmg$PROPDMGEXP, from = c("K", "M", "",  "B", "m", "+", "0", "5", "6", "4", "2", "3", "h", "7", "H", "-"), to = c(1000, 1000000, 0, 1000000000, 1000000, 10, 10, 10, 0, 10, 10, 10, 100,10, 100, 0))
stormdata_dmg$PROPDMGEXP_num <- as.numeric(stormdata_dmg$PROPDMGEXP_num)
stormdata_dmg$PROPDMG_tot <- stormdata_dmg$PROPDMG * stormdata_dmg$PROPDMGEXP_num

# Crop damages

stormdata_dmg$CROPDMGEXP_num <- mapvalues(stormdata_dmg$CROPDMGEXP, from = c("", "M", "K", "m", "B", "?", "0", "k"), to = c(0, 1000000, 1000, 1000000, 1000000000, 0, 10, 1000))
stormdata_dmg$CROPDMGEXP_num <- as.numeric(stormdata_dmg$CROPDMGEXP_num)
stormdata_dmg$CROPDMG_tot <- stormdata_dmg$CROPDMG * stormdata_dmg$CROPDMGEXP_num
stormdata_dmg$tot<-stormdata_dmg$CROPDMG_tot+stormdata_dmg$PROPDMG_tot
storm_dmg<-with(stormdata_dmg,tapply(tot, EVTYPE, sum))
storm_dmg<-storm_dmg[storm_dmg>10000000000]
storm_dmg_df<-data.frame(EVTYPE=names(storm_dmg),DAMAGE=storm_dmg/1000000000)

Results

Total number of fatalities by event type

kable(storm_fatal,row.names=T, col.names = "Number of death", caption = "Table 1. Number of death by event types")
Table 1. Number of death by event types
Number of death
AVALANCHE 224
BLIZZARD 101
EXCESSIVE HEAT 1903
EXTREME COLD 160
EXTREME COLD/WIND CHILL 125
FLASH FLOOD 978
FLOOD 470
HEAT 937
HEAT WAVE 172
HEAVY SNOW 127
HIGH SURF 101
HIGH WIND 248
LIGHTNING 816
RIP CURRENT 368
RIP CURRENTS 204
STRONG WIND 103
THUNDERSTORM WIND 133
TORNADO 5633
TSTM WIND 504
WINTER STORM 206
Plot_fatal <-  ggplot(data=storm_fatal_df, aes(x=reorder(EVTYPE, -FATALITIES), y=FATALITIES)) +
              geom_bar(fill="blue",stat="identity")  +  
              ylab("Number of fatalities") + xlab("Event type") +
              theme(legend.position="none") + theme(axis.text.x = element_text(size = 6, angle = 30, hjust = 1))
print(Plot_fatal)

Total number of injuries by event types

kable(storm_inj,row.names=T, col.names = "Number of injuries", caption = "Table 2. Number of injuries by event type")
Table 2. Number of injuries by event type
Number of injuries
EXCESSIVE HEAT 6525
FLASH FLOOD 1777
FLOOD 6789
HAIL 1361
HEAT 2100
HEAVY SNOW 1021
HIGH WIND 1137
HURRICANE/TYPHOON 1275
ICE STORM 1975
LIGHTNING 5230
THUNDERSTORM WIND 1488
TORNADO 91346
TSTM WIND 6957
WINTER STORM 1321

Total number of injuries by event type

Plot_inj <-  ggplot(data=storm_inj_df, aes(x=reorder(EVTYPE, -INJURIES), y=INJURIES)) +
              geom_bar(fill="red", stat="identity")  +  
              ylab("Number of injuries") + xlab("Event type") +
              theme(legend.position="none") + theme(axis.text.x = element_text(size = 6, angle = 30, hjust = 1))
print(Plot_inj)

Total damage by event type

kable(storm_dmg,row.names=T, col.names = "Damage in $", caption = "Table 3. Damage by event type")
Table 3. Damage by event type
Damage in $
DROUGHT 15018672000
FLASH FLOOD 17562132111
FLOOD 150319678250
HAIL 18758224527
HURRICANE 14610229010
HURRICANE/TYPHOON 71913712800
RIVER FLOOD 10148404500
STORM SURGE 43323541000
TORNADO 57352118147
Plot_dmg <-  ggplot(data=storm_dmg_df, aes(x=reorder(EVTYPE, -DAMAGE), y=DAMAGE)) +
              geom_bar(fill="darkgreen",stat="identity")  +  
              ylab("Damage in billion dollars") + xlab("Event type") +
              theme(legend.position="none") + theme(axis.text.x = element_text(size = 6, angle = 30, hjust = 1))
print(Plot_dmg)

Conclusion

Based on the tables and figures, tornados have the largest effect on health events, and floods have the largest economic impact.