Impact of severe Weather Events on Population Health and Economy

Synopsis

In this report we are analysing the effects of severe weather conditions and events in terms of the cost to the population and economy. The two questions that we are trying to answer are:

-1) Across the United States, which types of events are most harmful with respect to population health? -2) Across the United States, which types of events have the greatest economic consequences?

To perform this anaylsis, we are using data from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database records information of various weather events and their characteristics like when they occured, how many injuries and fatalities it caused etc.

As you will see in the report, the data has been filtered, cleaned and transformed to extract the 20 most impactful weather events on public health and economy.

Data Processing

First we download the NOAA storm database into our directory and read the data into variable storm. A quick review of the data shows that many entries, especially the older ones do not have records of the consequences of various weather events. Since our focus is on understanding the economic and public health consequences of weather events, we filter out the data to use only those entries where at least one consequence out of the four is greater that 0.

storm <- read.csv("repdata_data_StormData.csv.bz2")
storm <- subset(storm, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0) #Filtering data 

After filtering the data, further cleaning needs to be done on the column 'EVTYPE'. Firstly, all entries under EVTYPE are converted to upper case. This step reduces some redundant cases. Next, same event types that occur under different names are grouped together. For example event type Thunderstorm Winds is recorded in the data base as TSTM, Thunderstrom Wind, Thunderstorm Winds etc. To ensure that all instance of Thunderstorm Winds are captured under a single event type, the word “Thunderstorm” is used for partial match with all entries of EVTYPE and assigned the EVTYPE “Thunderstorm Winds”.

storm$EVTYPE <- toupper(storm$EVTYPE)
storm[grep("FLOOD",storm$EVTYPE),"EVTYPE"] <- "FLOOD"
storm[grep("TORNADO",storm$EVTYPE),"EVTYPE"] <- "TORNADO"
storm[grep("HEAT",storm$EVTYPE),"EVTYPE"] <- "HEAT"
storm[grep("^T.+(WIND|WINDS)",storm$EVTYPE),"EVTYPE"] <- "THUNDERSTORM WINDS"
storm[grep("^WILD.+FIRE",storm$EVTYPE),"EVTYPE"] <- "WILDFIRE"
storm[grep("(HURRICANE|TYPHOON)",storm$EVTYPE),"EVTYPE"] <- "HURRICANE"
storm[grep("(HIGH|STRONG) WIND",storm$EVTYPE),"EVTYPE"] <- "HIGH WINDS"
storm[grep("RIP CURRENT",storm$EVTYPE),"EVTYPE"] <- "RIP CURRENT"
storm[grep("COLD",storm$EVTYPE),"EVTYPE"] <- "EXTREME COLD"
storm[grep("HIGH SURF",storm$EVTYPE),"EVTYPE"] <- "HIGH SURF"
storm[grep("HEAVY RAIN",storm$EVTYPE),"EVTYPE"] <- "HEAVY RAIN"

The database needs further cleaning to calculate property and crop damage. As the this data is recorded in the database in two columns in the form of magnitude and exponential, these columns need to be processed to give us the actual numbers. Once this step is done, total damage to property and crops is calculated.

storm$PROPDMGEXP = as.character(storm$PROPDMGEXP)
storm$PROPDMGEXP = toupper(storm$PROPDMGEXP)
storm$PROPDMGEXP[grep("K",storm$PROPDMGEXP)] = "3"
storm$PROPDMGEXP[grep("M",storm$PROPDMGEXP)] = "6"
storm$PROPDMGEXP[grep("B",storm$PROPDMGEXP)] = "9"
storm$PROPDMGEXP[grep("H",storm$PROPDMGEXP)] = "2"
storm$PROPDMGEXP[grep("[[:punct:]]",storm$PROPDMGEXP)]="0"
storm$PROPDMGEXP[grep("^$",storm$PROPDMGEXP)] = "0"
storm$PROPDMGEXP = as.numeric(storm$PROPDMGEXP)
storm$PropertyDamage = storm$PROPDMG * 10^storm$PROPDMGEXP
summary(storm$PropertyDamage)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 2.000e+03 1.000e+04 1.682e+06 3.500e+04 1.150e+11
storm$CROPDMGEXP = as.character(storm$CROPDMGEXP)
storm$CROPDMGEXP = toupper(storm$CROPDMGEXP)
storm$CROPDMGEXP[grep("K",storm$CROPDMGEXP)] = "3"
storm$CROPDMGEXP[grep("M",storm$CROPDMGEXP)] = "6"
storm$CROPDMGEXP[grep("B",storm$CROPDMGEXP)] = "9"
storm$CROPDMGEXP[grep("H",storm$CROPDMGEXP)] = "2"
storm$CROPDMGEXP[grep("[[:punct:]]",storm$CROPDMGEXP)]="0"
storm$CROPDMGEXP[grep("^$",storm$CROPDMGEXP)] = "0"
storm$CROPDMGEXP = as.numeric(storm$CROPDMGEXP)
storm$CropDamage = storm$CROPDMG * 10^storm$CROPDMGEXP
summary(storm$CropDamage)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 1.928e+05 0.000e+00 5.000e+09

Result

Total Fatalities and Injuries

library(plyr)
## Warning: package 'plyr' was built under R version 3.1.1
Fatalities <- ddply(storm,"EVTYPE",summarise,Total=sum(FATALITIES))
Fatalities <- Fatalities[order(Fatalities$Total, decreasing=T),]

Injuries <- ddply(storm,"EVTYPE",summarise,Total=sum(INJURIES))
Injuries <- Injuries[order(Injuries$Total, decreasing=T),]

# TOP 20 CAUSE OF FATALITIES
Fatalities[1:20,]
##                 EVTYPE Total
## 213            TORNADO  5661
## 85                HEAT  3138
## 42               FLOOD  1525
## 140          LIGHTNING   816
## 209 THUNDERSTORM WINDS   710
## 173        RIP CURRENT   577
## 38        EXTREME COLD   451
## 115         HIGH WINDS   422
## 9            AVALANCHE   224
## 245       WINTER STORM   206
## 110          HIGH SURF   146
## 116          HURRICANE   135
## 92          HEAVY SNOW   127
## 12            BLIZZARD   101
## 89          HEAVY RAIN    98
## 237           WILDFIRE    90
## 127          ICE STORM    89
## 43                 FOG    62
## 217     TROPICAL STORM    58
## 131          LANDSLIDE    38
# TOP 20 CAUSES OF INJURIES
Injuries[1:20,]
##                 EVTYPE Total
## 213            TORNADO 91407
## 209 THUNDERSTORM WINDS  9469
## 85                HEAT  9224
## 42               FLOOD  8604
## 140          LIGHTNING  5230
## 127          ICE STORM  1975
## 115         HIGH WINDS  1846
## 237           WILDFIRE  1606
## 68                HAIL  1361
## 116          HURRICANE  1333
## 245       WINTER STORM  1321
## 92          HEAVY SNOW  1021
## 12            BLIZZARD   805
## 43                 FOG   734
## 173        RIP CURRENT   529
## 33          DUST STORM   440
## 247     WINTER WEATHER   398
## 24           DENSE FOG   342
## 217     TROPICAL STORM   340
## 38        EXTREME COLD   316

Total Crop damage and Property damage

PropDamage <- ddply(storm,"EVTYPE",summarise,Total=sum(PropertyDamage))
PropDamage <- PropDamage[order(PropDamage$Total, decreasing=T),]
PropDamage$Total <- PropDamage$Total/(1*10^9) # Dividing by 1 billion to assess damage in billions of dollars

CropDamage <- ddply(storm,"EVTYPE",summarise,Total=sum(CropDamage))
CropDamage <- CropDamage[order(CropDamage$Total, decreasing=T),]
CropDamage$Total <- CropDamage$Total/(1*10^9) # Dividing by 1 billion to assess damage in billions of dollars

# TOP 20 CAUSES OF PROPERTY DAMAGE
PropDamage[1:20,]
##                  EVTYPE       Total
## 42                FLOOD 168.2122158
## 116           HURRICANE  85.3564100
## 213             TORNADO  58.6033179
## 203         STORM SURGE  43.3235360
## 68                 HAIL  15.7352675
## 209  THUNDERSTORM WINDS   9.9623128
## 237            WILDFIRE   8.3910635
## 217      TROPICAL STORM   7.7038905
## 245        WINTER STORM   6.6884973
## 115          HIGH WINDS   6.2430926
## 204    STORM SURGE/TIDE   4.6411880
## 127           ICE STORM   3.9449279
## 89           HEAVY RAIN   3.2120711
## 179 SEVERE THUNDERSTORM   1.2053600
## 27              DROUGHT   1.0461060
## 92           HEAVY SNOW   0.9327591
## 140           LIGHTNING   0.9303794
## 12             BLIZZARD   0.6592140
## 131           LANDSLIDE   0.3245960
## 82            HAILSTORM   0.2410000
# TOP 20 CAUSES OF CROP DAMAGE
CropDamage[1:20,]
##                 EVTYPE      Total
## 27             DROUGHT 13.9725660
## 42               FLOOD 12.3801091
## 116          HURRICANE  5.5161178
## 127          ICE STORM  5.0221135
## 68                HAIL  3.0259545
## 38        EXTREME COLD  1.4097655
## 209 THUNDERSTORM WINDS  1.2243790
## 53        FROST/FREEZE  1.0941860
## 85                HEAT  0.9044693
## 89          HEAVY RAIN  0.7938998
## 115         HIGH WINDS  0.7617554
## 217     TROPICAL STORM  0.6783460
## 45              FREEZE  0.4567250
## 213            TORNADO  0.4174615
## 237           WILDFIRE  0.4022816
## 23     DAMAGING FREEZE  0.2962300
## 37   EXCESSIVE WETNESS  0.1420000
## 92          HEAVY SNOW  0.1346531
## 12            BLIZZARD  0.1120600
## 52               FROST  0.0660000

Below are plots of the results we got.

library(ggplot2)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.1.2
## Loading required package: grid
options(scipen=10)
p1 <- ggplot(data=Fatalities[1:20,], aes(x=reorder(EVTYPE,-Total), y=Total, fill=EVTYPE)) + geom_bar(stat="identity") + xlab("Weather Events") + ylab("Total number of fatalities") + ggtitle("Impact of weather events on Fatalities") + theme(axis.text.x=element_text(angle=45, hjust=1)) + theme(axis.title.x = element_text(face="bold", colour="gray30",vjust=3),axis.title.y=element_text(face="bold",color="gray30"), plot.title = element_text(size=13,lineheight=.8, face="bold",vjust=2)) + coord_cartesian(ylim=c(0, 6000)) + scale_y_continuous(breaks=seq(0, 6000, 1000)) + theme(legend.position="none") + theme(plot.background = element_rect(size=1,linetype="solid",color="black"))

                                                                                                                                     p2 <- ggplot(data=Injuries[1:20,], aes(x=reorder(EVTYPE,-Total), y=Total, fill=EVTYPE)) + geom_bar(stat="identity") + theme(axis.text.x=element_text(angle=45, hjust=1)) + xlab("Weather Events") + ylab("Total number of injuries") + ggtitle("Impact of weather events on Injuries") + theme(axis.title.x = element_text(face="bold", colour="gray30",vjust=3),axis.title.y=element_text(face="bold",color="gray30"), plot.title = element_text(size=13,lineheight=.8, face="bold",vjust=2)) + coord_cartesian(ylim=c(0, 100000)) + scale_y_continuous(breaks=seq(0, 100000, 20000)) + theme(legend.position="none") + theme(plot.background = element_rect(size=1,linetype="solid",color="black"))


grid.arrange(p1,p2,main=textGrob("Impact of severe weather on Population Health",gp=gpar(fontsize=14,fontface="bold"),vjust=0.3),ncol=2)

plot of chunk unnamed-chunk-6

The first plot shows the impact of the top 20 weather events on fatalities as well as injuries. It can be concluded that tornadoes are the most harmful to human health.

library(ggplot2)
library(gridExtra)
options(scipen=10)
p3 <- ggplot(data=PropDamage[1:20,], aes(x=reorder(EVTYPE,-Total), y=Total, fill=EVTYPE)) + geom_bar(stat="identity") + xlab("Weather Events") + ylab("Total Property Damage in billions") + ggtitle("Impact of weather events on Property Damage") + theme(axis.text.x=element_text(angle=45, hjust=1)) + theme(axis.title.x = element_text(face="bold", colour="gray30",vjust=2),axis.title.y=element_text(face="bold",color="gray30"), plot.title = element_text(size=13,lineheight=.8, face="bold",vjust=2,hjust=1))  + theme(legend.position="none") + theme(plot.background = element_rect(size=1,linetype="solid",color="black"))

                                                                                                                                     p4 <- ggplot(data=CropDamage[1:20,], aes(x=reorder(EVTYPE,-Total), y=Total, fill=EVTYPE)) + geom_bar(stat="identity") + theme(axis.text.x=element_text(angle=45, hjust=1)) + xlab("Weather Events") + ylab("Total Crop Damage in billions") + ggtitle("Impact of weather events on Crop Damage") + theme(axis.title.x = element_text(face="bold", colour="gray30",vjust=2),axis.title.y=element_text(face="bold",color="gray30"), plot.title = element_text(size=13,lineheight=.8, face="bold",vjust=2,hjust=1))+ theme(legend.position="none") + theme(plot.background = element_rect(size=1,linetype="solid",color="black")) + coord_cartesian(ylim=c(0, 15)) + scale_y_continuous(breaks=seq(0, 15, 3))


grid.arrange(p3,p4,main=textGrob("Economic Impact of weather events",gp=gpar(fontsize=14,fontface="bold"),vjust=0.3),ncol=2)

plot of chunk unnamed-chunk-7

The second plot shows the impact of the top 20 weather events on property damage and crop damage. Floods have the worst affect on property damage while drought causes most crop damage.

From the results and plots, the following conclusions are made:

TORNADO is the leading cause of both Fatalities and Injuries.

FLOOD is the leading cause of Property Damage.

DROUGHT is the leading cause of Crop Damage.