Analysis of the impacts of severe weather events

Synopsis

The goal of this research is to explore the National Oceanic & Atmospheric Association (NOAA) Storm Database and answer some basic questions about severe weather events. We analyze data for impacts on population health by focusing in on 2 variables, fatalities and injuries. We discover that heat exposure, followed by tornadoes, cause the most fatalities. Our analysis also shows that tornadoes cause the most injuries, followed by heat exposure.

We also analyze the types of events that have the greatest economics consequences. We find that floods are the costliest cause of damage for both crops and property, followed by hurricanes for both categories.

Data Processing

Initially we need to read NOAA’s Storm data into the R system for analyis.

library(knitr)
storm <- read.csv("repdata-data-StormData.csv.bz2", header = T, na.strings="NA")
dim(storm)
## [1] 902297     37

The data is quite large (902,297 rows by 37 columns) so we will subset the data so that we can focus on the relevant columns, as well as the relevant events. The columns that we have identified as being of the most use are: BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. We also show some data processing in the results section that was specifically required for each of the individual questions.

keycolm <- storm[, c(2,8,23:28)]
keycolm$EVTYPE <- tolower(keycolm$EVTYPE) #standardizing events to lower case
keycolm$BGN_DATE <- gsub("0:00:00", "", keycolm$BGN_DATE) #tidying up the dates
keycolm$BGN_DATE <- as.Date(strptime(keycolm$BGN_DATE, format = "%m/%d/%Y"))  
keycolm$BGN_DATE <- as.numeric(format(keycolm$BGN_DATE,format="%Y"))
hist(keycolm$BGN_DATE, breaks=61, col="light blue", main="Most observations are after 1995", xlab="Years", ylab="Number of observations")

plot of chunk unnamed-chunk-2

data <- subset(keycolm, BGN_DATE >=1995)

From the histogram above we can see that there is less data before 1995, so we intend to focus on the data from 1995. Also during some exploratory data analysis we observed that the data pre 1995 was poorer in quality. We then set out to do some initial analysis and then do some cleaning of the event type(EVTYPE).

datafatal <- aggregate(FATALITIES ~ EVTYPE, data=data, sum)
head(datafatal[order(datafatal$FATALITIES, decreasing= T),], 30)
##                      EVTYPE FATALITIES
## 99           excessive heat       1903
## 594                 tornado       1545
## 120             flash flood        934
## 200                    heat        924
## 314               lightning        729
## 129                   flood        423
## 403             rip current        360
## 251               high wind        241
## 610               tstm wind        241
## 16                avalanche        223
## 404            rip currents        204
## 707            winter storm        195
## 201               heat wave        161
## 536       thunderstorm wind        131
## 107            extreme cold        128
## 108 extreme cold/wind chill        125
## 220              heavy snow        115
## 456             strong wind        103
## 244               high surf        102
## 60          cold/wind chill         95
## 207              heavy rain         95
## 109            extreme heat         91
## 288               ice storm         84
## 695                wildfire         75
## 25                 blizzard         71
## 276       hurricane/typhoon         64
## 141                     fog         61
## 269               hurricane         61
## 605          tropical storm         57
## 239    heavy surf/high surf         42
datainjur <- aggregate(INJURIES ~ EVTYPE, data=data, sum)
head(datainjur[order(datainjur$INJURIES, decreasing= T),], 30)
##                 EVTYPE INJURIES
## 594            tornado    21765
## 129              flood     6769
## 99      excessive heat     6525
## 314          lightning     4631
## 610          tstm wind     3630
## 200               heat     2030
## 120        flash flood     1734
## 536  thunderstorm wind     1426
## 707       winter storm     1298
## 276  hurricane/typhoon     1275
## 251          high wind     1093
## 175               hail      916
## 695           wildfire      911
## 220         heavy snow      751
## 141                fog      718
## 693   wild/forest fire      545
## 561 thunderstorm winds      444
## 91          dust storm      420
## 711     winter weather      398
## 25            blizzard      385
## 201          heat wave      378
## 288          ice storm      357
## 605     tropical storm      338
## 404       rip currents      297
## 456        strong wind      278
## 67           dense fog      276
## 207         heavy rain      237
## 403        rip current      227
## 160              glaze      216
## 16           avalanche      159

The simple analysis above shows that the storm dataset is not labelled consistently. We will try to group the most relevant event types into some of the key categories. We observed that there was significantly more event types than required which was due to poor labelling in the data. Next we grouped the event types into more sensible standardized categories. This was enough initial cleaning to attempt to answer the first question.

data$EVTYPE[grepl("heat|warm", data$EVTYPE, ignore.case=T)] <- "heat exposure"
data$EVTYPE[grepl("torn", data$EVTYPE, ignore.case=T)] <- "tornado"
data$EVTYPE[grepl("flood|fld", data$EVTYPE, ignore.case=T)] <- "flood"
data$EVTYPE[grepl("wind", data$EVTYPE, ignore.case=T)] <- "high wind"
data$EVTYPE[grepl("cold|winter|hypo", data$EVTYPE, ignore.case=T)] <- "cold exposure"
data$EVTYPE[grepl("hurr|trop|typh", data$EVTYPE, ignore.case=T)] <-"hurricane"
data$EVTYPE[grepl("fog", data$EVTYPE, ignore.case=T)] <-"fog accident"
data$EVTYPE[grepl("fire", data$EVTYPE, ignore.case=T)] <- "wild fire"
data$EVTYPE[grepl("rip", data$EVTYPE, ignore.case=T)] <- "rip current"
data$EVTYPE[grepl("surf|seas|coast", data$EVTYPE, ignore.case=T)] <- "rough seas"
data$EVTYPE[grepl("snow|blizz", data$EVTYPE, ignore.case=T)] <- "snow"
data$EVTYPE[grepl("land|mud", data$EVTYPE, ignore.case=T)] <- "landslide"
data$EVTYPE[grepl("surge", data$EVTYPE, ignore.case=T)] <- "storm surge"
data$EVTYPE[grepl("rain", data$EVTYPE, ignore.case=T)] <- "heavy rain"

Results

Question 1

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.2
datafatal <- aggregate(FATALITIES ~ EVTYPE, data=data, sum)
datainjur <- aggregate(INJURIES ~ EVTYPE, data=data, sum)
fatalinjur <- merge(datafatal, datainjur, by="EVTYPE")
fatalinjur <- head(fatalinjur[order(fatalinjur$FATALITIES, decreasing= T),], 10)
fatal <- fatalinjur[,1:2]
names(fatal) <- c("EVTYPE","NUMBER")
fatal$TYPE <- "fatal"

injured <- fatalinjur[,c(1,3)]
names(injured) <- c("EVTYPE","NUMBER")
injured$TYPE <- "injured"

casualty <- rbind(injured, fatal)

g <- ggplot(casualty, aes(x=EVTYPE, y=NUMBER, fill=TYPE))+
        geom_bar(stat="identity",colour="blue")+
        theme(axis.text.x =element_text(size  = 10,angle = 90,hjust = 1,vjust= 1))+
        ggtitle("Top 10 event types that cause casualties")+
        labs(x="EVENT TYPES", y="NUMBER OF PEOPLE")
print(g)

plot of chunk unnamed-chunk-5

While heat exposure caused the most fatalities we can see that tornadoes caused a lot of injuries and also affect the greatest proportion of the population. Tornadoes also account for the second highest amount of deaths and heat exposure accounts for the second highest amount of injuries. Other events that cause numerous fatalities and injuries are floods, and high winds.

Question 2

Across the United States, which types of events have the greatest economic consequences? Key data for this are fairly untidy and spread over 4 columns. PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. The dollar amounts are not provided in consistent units.

unique(data$PROPDMGEXP)
##  [1]   B M K m + 0 5 6 ? 4 2 3 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(data$CROPDMGEXP)
## [1]   M m K B ? 0 k 2
## Levels:  ? 0 2 B k K m M

According to the FAQ, alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, “B” for billions and “H” for hundred. The data uses both upper and lower case. There are some other values, for example “+”, “5”, and “6”, which we have assigned a value of 1 for the purposes of creating the dollar cost amounts. So the next step we must do is convert the damage amount to equivalent units. We also needed to add some extra columns for our calculations.

data$PROPDMGUNIT <- 1
data$PROPDMGUNIT[data$PROPDMGEXP %in% c("b","B")] <- 1000000000
data$PROPDMGUNIT[data$PROPDMGEXP %in% c("m","M")] <- 1000000
data$PROPDMGUNIT[data$PROPDMGEXP %in% c("k","K")] <- 1000
data$PROPDMGUNIT[data$PROPDMGEXP %in% c("h","H")] <- 100
data$PROPDMGAMT <- data$PROPDMG*data$PROPDMGUNIT
dataprop <- aggregate(PROPDMGAMT ~ EVTYPE, data=data, sum)

data$CROPDMGUNIT <- 1
data$CROPDMGUNIT[data$CROPDMGEXP %in% c("b","B")] <- 1000000000
data$CROPDMGUNIT[data$CROPDMGEXP %in% c("m","M")] <- 1000000
data$CROPDMGUNIT[data$CROPDMGEXP %in% c("k","K")] <- 1000
data$CROPDMGAMT <- data$CROPDMG*data$CROPDMGUNIT
datacrop <- aggregate(CROPDMGAMT ~ EVTYPE, data=data, sum)

databoth <- merge(datacrop, dataprop, by="EVTYPE")
databoth<- head(databoth[order(databoth$PROPDMGAMT, decreasing=T),],10)

propdata <- databoth[,1:2]
names(propdata) <- c("EVTYPE","COST")
propdata$ASSET <- "Property"

cropdata <- databoth[,c(1,3)]
names(cropdata) <- c("EVTYPE","COST")
cropdata$ASSET <- "Crop"

cropprop <- rbind(cropdata, propdata)

g <- ggplot(cropprop, aes(x=EVTYPE,y=COST/10^9, fill=ASSET))+
        geom_bar(stat="identity", colour="blue")+
        theme(axis.text.x =element_text(size  = 10,angle = 90,hjust = 1,vjust = 1))+
        ggtitle("Economic cost from natural disasters")+
        labs(x="EVENT TYPES", y="DAMAGE AMOUNT(Billion $)")
print(g)

plot of chunk unnamed-chunk-7

In the above graph we look at the economic consequences and we find that floods cause significant economic damage to crops ($160B). Floods are also the main economic damage for property as well with $7B of damage. The next major cause of economic damage was hurricanes, with $9.3B damage to property and $6.2B of damage to crops. Storm surges also cause a reasonable amount of property damages with $4.8B.