In this report we are analysing the effects of severe weather conditions and events in terms of the cost to the population and economy. The two questions that we are trying to answer are:
-1) Across the United States, which types of events are most harmful with respect to population health? -2) Across the United States, which types of events have the greatest economic consequences?
To perform this anaylsis, we are using data from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database records information of various weather events and their characteristics like when they occured, how many injuries and fatalities it caused etc.
As you will see in the report, the data has been filtered, cleaned and transformed to extract the 20 most impactful weather events on public health and economy.
First we download the NOAA storm database into our directory and read the data into variable storm. A quick review of the data shows that many entries, especially the older ones do not have records of the consequences of various weather events. Since our focus is on understanding the economic and public health consequences of weather events, we filter out the data to use only those entries where at least one consequence out of the four is greater that 0.
storm <- read.csv("repdata_data_StormData.csv.bz2")
storm <- subset(storm, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0) #Filtering data
After filtering the data, further cleaning needs to be done on the column 'EVTYPE'. Firstly, all entries under EVTYPE are converted to upper case. This step reduces some redundant cases. Next, same event types that occur under different names are grouped together. For example event type Thunderstorm Winds is recorded in the data base as TSTM, Thunderstrom Wind, Thunderstorm Winds etc. To ensure that all instance of Thunderstorm Winds are captured under a single event type, the word “Thunderstorm” is used for partial match with all entries of EVTYPE and assigned the EVTYPE “Thunderstorm Winds”.
storm$EVTYPE <- toupper(storm$EVTYPE)
storm[grep("FLOOD",storm$EVTYPE),"EVTYPE"] <- "FLOOD"
storm[grep("TORNADO",storm$EVTYPE),"EVTYPE"] <- "TORNADO"
storm[grep("HEAT",storm$EVTYPE),"EVTYPE"] <- "HEAT"
storm[grep("^T.+(WIND|WINDS)",storm$EVTYPE),"EVTYPE"] <- "THUNDERSTORM WINDS"
storm[grep("^WILD.+FIRE",storm$EVTYPE),"EVTYPE"] <- "WILDFIRE"
storm[grep("(HURRICANE|TYPHOON)",storm$EVTYPE),"EVTYPE"] <- "HURRICANE"
storm[grep("(HIGH|STRONG) WIND",storm$EVTYPE),"EVTYPE"] <- "HIGH WINDS"
storm[grep("RIP CURRENT",storm$EVTYPE),"EVTYPE"] <- "RIP CURRENT"
storm[grep("COLD",storm$EVTYPE),"EVTYPE"] <- "EXTREME COLD"
storm[grep("HIGH SURF",storm$EVTYPE),"EVTYPE"] <- "HIGH SURF"
storm[grep("HEAVY RAIN",storm$EVTYPE),"EVTYPE"] <- "HEAVY RAIN"
The database needs further cleaning to calculate property and crop damage. As the this data is recorded in the database in two columns in the form of magnitude and exponential, these columns need to be processed to give us the actual numbers. Once this step is done, total damage to property and crops is calculated.
storm$PROPDMGEXP = as.character(storm$PROPDMGEXP)
storm$PROPDMGEXP = toupper(storm$PROPDMGEXP)
storm$PROPDMGEXP[grep("K",storm$PROPDMGEXP)] = "3"
storm$PROPDMGEXP[grep("M",storm$PROPDMGEXP)] = "6"
storm$PROPDMGEXP[grep("B",storm$PROPDMGEXP)] = "9"
storm$PROPDMGEXP[grep("H",storm$PROPDMGEXP)] = "2"
storm$PROPDMGEXP[grep("[[:punct:]]",storm$PROPDMGEXP)]="0"
storm$PROPDMGEXP[grep("^$",storm$PROPDMGEXP)] = "0"
storm$PROPDMGEXP = as.numeric(storm$PROPDMGEXP)
storm$PropertyDamage = storm$PROPDMG * 10^storm$PROPDMGEXP
summary(storm$PropertyDamage)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 2.000e+03 1.000e+04 1.682e+06 3.500e+04 1.150e+11
storm$CROPDMGEXP = as.character(storm$CROPDMGEXP)
storm$CROPDMGEXP = toupper(storm$CROPDMGEXP)
storm$CROPDMGEXP[grep("K",storm$CROPDMGEXP)] = "3"
storm$CROPDMGEXP[grep("M",storm$CROPDMGEXP)] = "6"
storm$CROPDMGEXP[grep("B",storm$CROPDMGEXP)] = "9"
storm$CROPDMGEXP[grep("H",storm$CROPDMGEXP)] = "2"
storm$CROPDMGEXP[grep("[[:punct:]]",storm$CROPDMGEXP)]="0"
storm$CROPDMGEXP[grep("^$",storm$CROPDMGEXP)] = "0"
storm$CROPDMGEXP = as.numeric(storm$CROPDMGEXP)
storm$CropDamage = storm$CROPDMG * 10^storm$CROPDMGEXP
summary(storm$CropDamage)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 1.928e+05 0.000e+00 5.000e+09
library(plyr)
## Warning: package 'plyr' was built under R version 3.1.1
Fatalities <- ddply(storm,"EVTYPE",summarise,Total=sum(FATALITIES))
Fatalities <- Fatalities[order(Fatalities$Total, decreasing=T),]
Injuries <- ddply(storm,"EVTYPE",summarise,Total=sum(INJURIES))
Injuries <- Injuries[order(Injuries$Total, decreasing=T),]
# TOP 20 CAUSE OF FATALITIES
Fatalities[1:20,]
## EVTYPE Total
## 213 TORNADO 5661
## 85 HEAT 3138
## 42 FLOOD 1525
## 140 LIGHTNING 816
## 209 THUNDERSTORM WINDS 710
## 173 RIP CURRENT 577
## 38 EXTREME COLD 451
## 115 HIGH WINDS 422
## 9 AVALANCHE 224
## 245 WINTER STORM 206
## 110 HIGH SURF 146
## 116 HURRICANE 135
## 92 HEAVY SNOW 127
## 12 BLIZZARD 101
## 89 HEAVY RAIN 98
## 237 WILDFIRE 90
## 127 ICE STORM 89
## 43 FOG 62
## 217 TROPICAL STORM 58
## 131 LANDSLIDE 38
# TOP 20 CAUSES OF INJURIES
Injuries[1:20,]
## EVTYPE Total
## 213 TORNADO 91407
## 209 THUNDERSTORM WINDS 9469
## 85 HEAT 9224
## 42 FLOOD 8604
## 140 LIGHTNING 5230
## 127 ICE STORM 1975
## 115 HIGH WINDS 1846
## 237 WILDFIRE 1606
## 68 HAIL 1361
## 116 HURRICANE 1333
## 245 WINTER STORM 1321
## 92 HEAVY SNOW 1021
## 12 BLIZZARD 805
## 43 FOG 734
## 173 RIP CURRENT 529
## 33 DUST STORM 440
## 247 WINTER WEATHER 398
## 24 DENSE FOG 342
## 217 TROPICAL STORM 340
## 38 EXTREME COLD 316
PropDamage <- ddply(storm,"EVTYPE",summarise,Total=sum(PropertyDamage))
PropDamage <- PropDamage[order(PropDamage$Total, decreasing=T),]
PropDamage$Total <- PropDamage$Total/(1*10^9) # Dividing by 1 billion to assess damage in billions of dollars
CropDamage <- ddply(storm,"EVTYPE",summarise,Total=sum(CropDamage))
CropDamage <- CropDamage[order(CropDamage$Total, decreasing=T),]
CropDamage$Total <- CropDamage$Total/(1*10^9) # Dividing by 1 billion to assess damage in billions of dollars
# TOP 20 CAUSES OF PROPERTY DAMAGE
PropDamage[1:20,]
## EVTYPE Total
## 42 FLOOD 168.2122158
## 116 HURRICANE 85.3564100
## 213 TORNADO 58.6033179
## 203 STORM SURGE 43.3235360
## 68 HAIL 15.7352675
## 209 THUNDERSTORM WINDS 9.9623128
## 237 WILDFIRE 8.3910635
## 217 TROPICAL STORM 7.7038905
## 245 WINTER STORM 6.6884973
## 115 HIGH WINDS 6.2430926
## 204 STORM SURGE/TIDE 4.6411880
## 127 ICE STORM 3.9449279
## 89 HEAVY RAIN 3.2120711
## 179 SEVERE THUNDERSTORM 1.2053600
## 27 DROUGHT 1.0461060
## 92 HEAVY SNOW 0.9327591
## 140 LIGHTNING 0.9303794
## 12 BLIZZARD 0.6592140
## 131 LANDSLIDE 0.3245960
## 82 HAILSTORM 0.2410000
# TOP 20 CAUSES OF CROP DAMAGE
CropDamage[1:20,]
## EVTYPE Total
## 27 DROUGHT 13.9725660
## 42 FLOOD 12.3801091
## 116 HURRICANE 5.5161178
## 127 ICE STORM 5.0221135
## 68 HAIL 3.0259545
## 38 EXTREME COLD 1.4097655
## 209 THUNDERSTORM WINDS 1.2243790
## 53 FROST/FREEZE 1.0941860
## 85 HEAT 0.9044693
## 89 HEAVY RAIN 0.7938998
## 115 HIGH WINDS 0.7617554
## 217 TROPICAL STORM 0.6783460
## 45 FREEZE 0.4567250
## 213 TORNADO 0.4174615
## 237 WILDFIRE 0.4022816
## 23 DAMAGING FREEZE 0.2962300
## 37 EXCESSIVE WETNESS 0.1420000
## 92 HEAVY SNOW 0.1346531
## 12 BLIZZARD 0.1120600
## 52 FROST 0.0660000
Below are plots of the results we got.
library(ggplot2)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.1.2
## Loading required package: grid
options(scipen=10)
p1 <- ggplot(data=Fatalities[1:20,], aes(x=reorder(EVTYPE,-Total), y=Total, fill=EVTYPE)) + geom_bar(stat="identity") + xlab("Weather Events") + ylab("Total number of fatalities") + ggtitle("Impact of weather events on Fatalities") + theme(axis.text.x=element_text(angle=45, hjust=1)) + theme(axis.title.x = element_text(face="bold", colour="gray30",vjust=3),axis.title.y=element_text(face="bold",color="gray30"), plot.title = element_text(size=13,lineheight=.8, face="bold",vjust=2)) + coord_cartesian(ylim=c(0, 6000)) + scale_y_continuous(breaks=seq(0, 6000, 1000)) + theme(legend.position="none") + theme(plot.background = element_rect(size=1,linetype="solid",color="black"))
p2 <- ggplot(data=Injuries[1:20,], aes(x=reorder(EVTYPE,-Total), y=Total, fill=EVTYPE)) + geom_bar(stat="identity") + theme(axis.text.x=element_text(angle=45, hjust=1)) + xlab("Weather Events") + ylab("Total number of injuries") + ggtitle("Impact of weather events on Injuries") + theme(axis.title.x = element_text(face="bold", colour="gray30",vjust=3),axis.title.y=element_text(face="bold",color="gray30"), plot.title = element_text(size=13,lineheight=.8, face="bold",vjust=2)) + coord_cartesian(ylim=c(0, 100000)) + scale_y_continuous(breaks=seq(0, 100000, 20000)) + theme(legend.position="none") + theme(plot.background = element_rect(size=1,linetype="solid",color="black"))
grid.arrange(p1,p2,main=textGrob("Impact of severe weather on Population Health",gp=gpar(fontsize=14,fontface="bold"),vjust=0.3),ncol=2)
The first plot shows the impact of the top 20 weather events on fatalities as well as injuries. It can be concluded that tornadoes are the most harmful to human health.
library(ggplot2)
library(gridExtra)
options(scipen=10)
p3 <- ggplot(data=PropDamage[1:20,], aes(x=reorder(EVTYPE,-Total), y=Total, fill=EVTYPE)) + geom_bar(stat="identity") + xlab("Weather Events") + ylab("Total Property Damage in billions") + ggtitle("Impact of weather events on Property Damage") + theme(axis.text.x=element_text(angle=45, hjust=1)) + theme(axis.title.x = element_text(face="bold", colour="gray30",vjust=2),axis.title.y=element_text(face="bold",color="gray30"), plot.title = element_text(size=13,lineheight=.8, face="bold",vjust=2,hjust=1)) + theme(legend.position="none") + theme(plot.background = element_rect(size=1,linetype="solid",color="black"))
p4 <- ggplot(data=CropDamage[1:20,], aes(x=reorder(EVTYPE,-Total), y=Total, fill=EVTYPE)) + geom_bar(stat="identity") + theme(axis.text.x=element_text(angle=45, hjust=1)) + xlab("Weather Events") + ylab("Total Crop Damage in billions") + ggtitle("Impact of weather events on Crop Damage") + theme(axis.title.x = element_text(face="bold", colour="gray30",vjust=2),axis.title.y=element_text(face="bold",color="gray30"), plot.title = element_text(size=13,lineheight=.8, face="bold",vjust=2,hjust=1))+ theme(legend.position="none") + theme(plot.background = element_rect(size=1,linetype="solid",color="black")) + coord_cartesian(ylim=c(0, 15)) + scale_y_continuous(breaks=seq(0, 15, 3))
grid.arrange(p3,p4,main=textGrob("Economic Impact of weather events",gp=gpar(fontsize=14,fontface="bold"),vjust=0.3),ncol=2)
The second plot shows the impact of the top 20 weather events on property damage and crop damage. Floods have the worst affect on property damage while drought causes most crop damage.
From the results and plots, the following conclusions are made: