This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It mainly looks into the types of events that are most harmful with respect to population health (i.e. fatalities and injuries). In addition, it highlights the events affecting properties and crops resulting in severe economic consequences. The reported results indicate the top-ten events with negative consequences in each case and provide plots to show the calculated values. The most harmful events are also listed separately and cumulatively in tables, whether in the case of health or economic consequences.
#calling libraries
library(dplyr)
library(ggplot2)
library(tidyr)
library(pander)
#download and unzip data
data_url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("weather.csv.bz2"))
{download.file(data_url, destfile = "weather.csv.bz2")}
#read data
storm_data<-read.csv("weather.csv.bz2",header = T, stringsAsFactors = F)
#create a lookup table
lut<-c("0"=0,"1"=1,"2"=2,"3"=3,"4"=4,"5"=5,"6"=6,"7"=7,"8"=8,"h"=2,"H"=2,"K"=3,"m"=6,"M"=6,"B"=9)
# #replace the values in PROPDMGEXP & CROPDMGEXP columns with the corresponding values in the lookup table
storm_data$PROPDMGEXP<-(lut[storm_data$PROPDMGEXP])
storm_data$CROPDMGEXP<-(lut[storm_data$CROPDMGEXP])
The following results and plots show that tornados are the worst effects in terms of negative consequences on health as they cause thousands of injuries and fatalities. Excessive heat, TSTM Wind and floods come next if we look on their cumulative effect. However their severity differs with respect to injuries and fatalities. And it is clear that the number of injuries dominates when both injuries and fatalities are summed up.
#calculate the number of fatalities and injuries corresponding to each event type
E1<-storm_data %>%
select(EVTYPE,FATALITIES, INJURIES) %>%
group_by(EVTYPE) %>%
summarise(Fatalities=sum(FATALITIES),Injuries=sum(INJURIES)) %>%
mutate(Total=(Fatalities+Injuries))
#order the events based on the cumulative effect (fatalities+injuries)
E1t<-E1[order(E1$Total,decreasing = T),]
#Extract and order the top events corresponding to fatalities
E1f<-select(E1,EVTYPE,Fatalities)[order(E1$Fatalities,decreasing = T),]
#Extract and order the top corresponding to injuries
E1i<-select(E1,EVTYPE,Injuries)[order(E1$Injuries,decreasing = T),]
-Top tep ten events causing the most fatalities:
##
## ---------------------------
## EVTYPE Fatalities
## -------------- ------------
## TORNADO 5633
##
## EXCESSIVE HEAT 1903
##
## FLASH FLOOD 978
##
## HEAT 937
##
## LIGHTNING 816
##
## TSTM WIND 504
##
## FLOOD 470
##
## RIP CURRENT 368
##
## HIGH WIND 248
##
## AVALANCHE 224
## ---------------------------
-Top tep ten events causing the most injuries:
##
## ----------------------------
## EVTYPE Injuries
## ----------------- ----------
## TORNADO 91346
##
## TSTM WIND 6957
##
## FLOOD 6789
##
## EXCESSIVE HEAT 6525
##
## LIGHTNING 5230
##
## HEAT 2100
##
## ICE STORM 1975
##
## FLASH FLOOD 1777
##
## THUNDERSTORM WIND 1488
##
## HAIL 1361
## ----------------------------
-Top tep ten events causing the most fatalities+injuries cumulatively:
##
## -------------------------------------------------
## EVTYPE Fatalities Injuries Total
## ----------------- ------------ ---------- -------
## TORNADO 5633 91346 96979
##
## EXCESSIVE HEAT 1903 6525 8428
##
## TSTM WIND 504 6957 7461
##
## FLOOD 470 6789 7259
##
## LIGHTNING 816 5230 6046
##
## HEAT 937 2100 3037
##
## FLASH FLOOD 978 1777 2755
##
## ICE STORM 89 1975 2064
##
## THUNDERSTORM WIND 133 1488 1621
##
## WINTER STORM 206 1321 1527
## -------------------------------------------------
#gather data to use fatalities/injuries for coloring the bars
kk<-gather(head(E1t,10), value="Total_by_Type",key="type",Fatalities,Injuries,na.rm = T)
#Fatalities+Injuries Vs. events
pp1<-ggplot(data=kk,aes(x=reorder(EVTYPE,-Total_by_Type),y=Total_by_Type,fill=type))+
geom_bar(stat="identity")+
scale_x_discrete(name="Event Type")+
scale_y_continuous(name="Fatalities+Injuries")+
labs(title="Top 10 events with negative consequences on health (Injuries+Fatalities)")+
theme(axis.text.x=element_text(angle = 90,vjust=1))+
scale_fill_grey()
The following results and plots show that floods cause the highest losses in terms of properties, then comes the hurricanes/typhoons, tornadoes and storm surges. On the other hand droughts cause the highest crop damages, then comes the floods, river floods and ice storms. The results also indicate that the economic losses due to property damages are higher than the losses due to crop damages. Consuquently property damages’ losses dominate the sum of both losses for most of the top ten cumulative losses.
#calculate the number property and crop damage corresponding to each event type
E2<-storm_data %>%
select(EVTYPE,PROPDMG,PROPDMGEXP, CROPDMG,CROPDMGEXP) %>%
group_by(EVTYPE) %>%
summarise(Property_damage=sum(PROPDMG*10^PROPDMGEXP,na.rm=T),
Crop_damage=sum(CROPDMG*10^CROPDMGEXP,na.rm=T)) %>%
mutate(Total_Dmg=Property_damage+Crop_damage)
#order the events based on the cumulative effect (property damage+crop damage)
E2t<-E2[order(E2$Total_Dmg,decreasing=T),]
#Extract and order the top events corresponding to property damage
E2p<-select(E2,EVTYPE,Property_damage)[order(E2$Property_damage,decreasing = T),]
#Extract and order the top events corresponding to crop damage
E2c<-select(E2,EVTYPE,Crop_damage)[order(E2$Crop_damage,decreasing = T),]
-Top tep ten events causing the higest property damages:
##
## -----------------------------------
## EVTYPE Property_damage
## ----------------- -----------------
## FLOOD 144657709800
##
## HURRICANE/TYPHOON 69305840000
##
## TORNADO 56947380614
##
## STORM SURGE 43323536000
##
## FLASH FLOOD 16822673772
##
## HAIL 15735267456
##
## HURRICANE 11868319010
##
## TROPICAL STORM 7703890550
##
## WINTER STORM 6688497251
##
## HIGH WIND 5270046260
## -----------------------------------
-Top tep ten events causing the higest crop damages:
##
## -------------------------------
## EVTYPE Crop_damage
## ----------------- -------------
## DROUGHT 13972566000
##
## FLOOD 5661968450
##
## RIVER FLOOD 5029459000
##
## ICE STORM 5022113500
##
## HAIL 3025537470
##
## HURRICANE 2741910000
##
## HURRICANE/TYPHOON 2607872800
##
## FLASH FLOOD 1421317100
##
## EXTREME COLD 1292973000
##
## FROST/FREEZE 1094086000
## -------------------------------
-Top tep ten events causing the highest economic losses cumulatively (property damage+crop damage):
##
## --------------------------------------------------------------
## EVTYPE Property_damage Crop_damage Total_Dmg
## ----------------- ----------------- ------------- ------------
## FLOOD 144657709800 5661968450 150319678250
##
## HURRICANE/TYPHOON 69305840000 2607872800 71913712800
##
## TORNADO 56947380614 414953270 57362333884
##
## STORM SURGE 43323536000 5000 43323541000
##
## HAIL 15735267456 3025537470 18760804926
##
## FLASH FLOOD 16822673772 1421317100 18243990872
##
## DROUGHT 1046106000 13972566000 15018672000
##
## HURRICANE 11868319010 2741910000 14610229010
##
## RIVER FLOOD 5118945500 5029459000 10148404500
##
## ICE STORM 3944927860 5022113500 8967041360
## --------------------------------------------------------------
#gather data to use Property_damage,Crop_damage for coloring the bars
mm<-gather(head(E2t,10), value="Total_by_Type",key="type",Property_damage,Crop_damage,na.rm = T)
#Property Damage+Crop Damage Vs. events
pp2<-ggplot(data=mm,aes(x=reorder(EVTYPE,-Total_by_Type),y=Total_by_Type/10^9,fill=type))+
geom_bar(stat="identity")+
scale_x_discrete(name="Event Type")+
scale_y_continuous(name="Property Damage+Crop Damage (*10^9)")+
labs(title="Top 10 events with negative consequences on health (Property damage+Crop damage)")+
theme(axis.text.x=element_text(angle = 90,vjust=1))+
scale_fill_grey()
In conclusion, the weather events thet cause th highest public health and economic problems for communities and municipalities could be highlighted and the losses were reported per event. This could be used to allocate resources for proactive actions in order to reduce the losses whenever possible.