The basic goal of this assignment is to explore the NOAA Storm Database to study which severe weather events have the most serious effect on healty and economy.
Documentation of the data can be found on the next link:
Total event numbers and damages were calculated for each event types, and this numbers were described on barplots for the largest numbers in decreasing orders.
R version 4.0.2 [1], RStudio (Version 1.3.1056), the plyr [2], the ggplot2 [3] and the R.utils [4] packages were used to data processing, to carry out the anayses and for plotting.
The storm database of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) was used for the analyses, that can be downloaded from the course website:
The data were extracted, and imported using the read.csv() function. Two subsets were made. The first (stormdata_health) for those records where the number or fatalities or inhuries were not zero or missing, and the second for those record where neither type of damages was 0 or missing (stormdata_dmg).
Using the tapply function, the total number of health events were calculated. To reduce the number of events to plot, I subsetted those events where the number of fatalities was more then 100 and the number of injuries was more than 1000. These numbers were plotted using the ggplot() function.
In case of damages, PROPDMGEXP and CROPDMGEXP variables had to be recoded to be numbers given by the documentation, and using these number total damages had to be calculated.
After these calculations the two types of dameges were added upe, and a subset was made for those events where the total damage was more than 1 billion dollars. The amounts were transformed to 1 billion dollar units, and a bargraph was made with decreasing damages.
stormdata<-read.csv("repdata_data_StormData.csv")
# subsetting records with positive number of death and injuries
stormdata_health <-stormdata[stormdata$FATALITIES>0 | stormdata$INJURIES>0, c("EVTYPE", "FATALITIES", "INJURIES")]
storm_fatal<-with(stormdata,tapply(FATALITIES, EVTYPE, sum))
storm_fatal<-storm_fatal[storm_fatal>100]
storm_fatal_df<-data.frame(EVTYPE=names(storm_fatal),FATALITIES=storm_fatal)
storm_inj<-with(stormdata,tapply(INJURIES, EVTYPE, sum))
storm_inj<-storm_inj[storm_inj>1000]
storm_inj_df<-data.frame(EVTYPE=names(storm_inj), INJURIES=storm_inj)
library(plyr)
stormdata_dmg <-stormdata[stormdata$CROPDMG>0 | stormdata$PROPDMG>0, c("EVTYPE", "CROPDMG", "PROPDMG","PROPDMGEXP","CROPDMGEXP")]
#Proprietary damages
stormdata_dmg$PROPDMGEXP_num <- mapvalues(stormdata_dmg$PROPDMGEXP, from = c("K", "M", "", "B", "m", "+", "0", "5", "6", "4", "2", "3", "h", "7", "H", "-"), to = c(1000, 1000000, 0, 1000000000, 1000000, 10, 10, 10, 0, 10, 10, 10, 100,10, 100, 0))
stormdata_dmg$PROPDMGEXP_num <- as.numeric(stormdata_dmg$PROPDMGEXP_num)
stormdata_dmg$PROPDMG_tot <- stormdata_dmg$PROPDMG * stormdata_dmg$PROPDMGEXP_num
# Crop damages
stormdata_dmg$CROPDMGEXP_num <- mapvalues(stormdata_dmg$CROPDMGEXP, from = c("", "M", "K", "m", "B", "?", "0", "k"), to = c(0, 1000000, 1000, 1000000, 1000000000, 0, 10, 1000))
stormdata_dmg$CROPDMGEXP_num <- as.numeric(stormdata_dmg$CROPDMGEXP_num)
stormdata_dmg$CROPDMG_tot <- stormdata_dmg$CROPDMG * stormdata_dmg$CROPDMGEXP_num
stormdata_dmg$tot<-stormdata_dmg$CROPDMG_tot+stormdata_dmg$PROPDMG_tot
storm_dmg<-with(stormdata_dmg,tapply(tot, EVTYPE, sum))
storm_dmg<-storm_dmg[storm_dmg>10000000000]
storm_dmg_df<-data.frame(EVTYPE=names(storm_dmg),DAMAGE=storm_dmg/1000000000)
kable(storm_fatal,row.names=T, col.names = "Number of death", caption = "Table 1. Number of death by event types")
| Number of death | |
|---|---|
| AVALANCHE | 224 |
| BLIZZARD | 101 |
| EXCESSIVE HEAT | 1903 |
| EXTREME COLD | 160 |
| EXTREME COLD/WIND CHILL | 125 |
| FLASH FLOOD | 978 |
| FLOOD | 470 |
| HEAT | 937 |
| HEAT WAVE | 172 |
| HEAVY SNOW | 127 |
| HIGH SURF | 101 |
| HIGH WIND | 248 |
| LIGHTNING | 816 |
| RIP CURRENT | 368 |
| RIP CURRENTS | 204 |
| STRONG WIND | 103 |
| THUNDERSTORM WIND | 133 |
| TORNADO | 5633 |
| TSTM WIND | 504 |
| WINTER STORM | 206 |
Plot_fatal <- ggplot(data=storm_fatal_df, aes(x=reorder(EVTYPE, -FATALITIES), y=FATALITIES)) +
geom_bar(fill="blue",stat="identity") +
ylab("Number of fatalities") + xlab("Event type") +
theme(legend.position="none") + theme(axis.text.x = element_text(size = 6, angle = 30, hjust = 1))
print(Plot_fatal)
kable(storm_inj,row.names=T, col.names = "Number of injuries", caption = "Table 2. Number of injuries by event type")
| Number of injuries | |
|---|---|
| EXCESSIVE HEAT | 6525 |
| FLASH FLOOD | 1777 |
| FLOOD | 6789 |
| HAIL | 1361 |
| HEAT | 2100 |
| HEAVY SNOW | 1021 |
| HIGH WIND | 1137 |
| HURRICANE/TYPHOON | 1275 |
| ICE STORM | 1975 |
| LIGHTNING | 5230 |
| THUNDERSTORM WIND | 1488 |
| TORNADO | 91346 |
| TSTM WIND | 6957 |
| WINTER STORM | 1321 |
Plot_inj <- ggplot(data=storm_inj_df, aes(x=reorder(EVTYPE, -INJURIES), y=INJURIES)) +
geom_bar(fill="red", stat="identity") +
ylab("Number of injuries") + xlab("Event type") +
theme(legend.position="none") + theme(axis.text.x = element_text(size = 6, angle = 30, hjust = 1))
print(Plot_inj)
kable(storm_dmg,row.names=T, col.names = "Damage in $", caption = "Table 3. Damage by event type")
| Damage in $ | |
|---|---|
| DROUGHT | 15018672000 |
| FLASH FLOOD | 17562132111 |
| FLOOD | 150319678250 |
| HAIL | 18758224527 |
| HURRICANE | 14610229010 |
| HURRICANE/TYPHOON | 71913712800 |
| RIVER FLOOD | 10148404500 |
| STORM SURGE | 43323541000 |
| TORNADO | 57352118147 |
Plot_dmg <- ggplot(data=storm_dmg_df, aes(x=reorder(EVTYPE, -DAMAGE), y=DAMAGE)) +
geom_bar(fill="darkgreen",stat="identity") +
ylab("Damage in billion dollars") + xlab("Event type") +
theme(legend.position="none") + theme(axis.text.x = element_text(size = 6, angle = 30, hjust = 1))
print(Plot_dmg)
Based on the tables and figures, tornados have the largest effect on health events, and floods have the largest economic impact.