Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

The basic goal of this project is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Questions To Be Answered

Your data analysis must address the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb]

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

Load the required libraries

library(data.table)
library(ggplot2)
library(plyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Download the stormdata

url <-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "StormData.csv", method = "curl")

Read in the downloaded stormdata

stormdata <- read.csv("StormData.csv", header = TRUE, sep=",")

Subset for the required variables

cols <- c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')
stormsubset <- stormdata[, cols]
summary(stormsubset)
##     EVTYPE            FATALITIES          INJURIES            PROPDMG       
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Mode  :character   Median :  0.0000   Median :   0.0000   Median :   0.00  
##                     Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##                     Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:902297      Min.   :  0.000   Length:902297     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.527                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000

Check the unique elements of some of the variables The following codes won’t run because I set eval=FALSE in the code chunk.

unique(stormsubset$EVTYPE)
unique(stormsubset$FATALITIES)
unique(stormsubset$INJURIES)
unique(stormsubset$PROPDMG)
unique(stormsubset$CROPDMG)

Create subset of required rows

stormsubset1 <- stormsubset[(stormsubset$EVTYPE != "?" & (stormsubset$INJURIES > 0 | stormsubset$FATALITIES > 0 |                              stormsubset$PROPDMG > 0 | stormsubset$CROPDMG > 0)), cols]

Convert the values in the PROPDMGEXP column to their real values

unique(stormsubset1$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "4" "h" "2" "7" "3" "H" "-"
stormsubset1$PROPDMGEXP <- mapvalues(stormsubset1$PROPDMGEXP, 
                                     from = c("K", "M","", "B", "m", "+", "0", "5", "6", "4", "2", "3", "h", 
                                              "7", "H", "-"), 
                                     to = c(10^3, 10^6, 1, 10^9, 10^6, 0, 1, 10^5, 10^6, 10^4, 10^2, 10^3, 10^2, 
                                            10^7, 10^2, 0))

Convert the values in the CROPDMGEXP column to their real values

unique(stormsubset1$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k"
stormsubset1$CROPDMGEXP <- mapvalues(stormsubset1$CROPDMGEXP, 
                                    from = c("", "M", "K", "m", "B", "?", "0", "k"), 
                                    to = c(1, 10^6, 10^3, 10^6, 10^9, 0, 1, 10^3))

Create two new columns for Property Damage Cost and Crop Damage Cost

class(stormsubset1$PROPDMGEXP)
## [1] "character"
class(stormsubset1$PROPDMG)
## [1] "numeric"
stormsubset1$PROPDMGEXP <- as.numeric(as.character(stormsubset1$PROPDMGEXP))

stormsubset1$PROPDMGCOST <- (stormsubset1$PROPDMG * stormsubset1$PROPDMGEXP)
class(stormsubset1$CROPDMGEXP)
## [1] "character"
class(stormsubset1$CROPDMG)
## [1] "numeric"
stormsubset1$CROPDMGEXP <- as.numeric(as.character(stormsubset1$CROPDMGEXP))

stormsubset1$CROPDMGCOST <- (stormsubset1$CROPDMG * stormsubset1$CROPDMGEXP)

Analysis

Aggregate FATALITIES and INJURIES by EVTYPE

TotalOnHealth1 <- aggregate(cbind(FATALITIES,INJURIES, SUM_FATALITIES_INJURIES=FATALITIES + INJURIES)~EVTYPE, 
                            data = stormsubset1, FUN = sum)
TotalOnHealth2 <- TotalOnHealth1[order(-TotalOnHealth1$SUM_FATALITIES_INJURIES),][1:10,]

head(TotalOnHealth2)
##             EVTYPE FATALITIES INJURIES SUM_FATALITIES_INJURIES
## 406        TORNADO       5633    91346                   96979
## 60  EXCESSIVE HEAT       1903     6525                    8428
## 422      TSTM WIND        504     6957                    7461
## 85           FLOOD        470     6789                    7259
## 257      LIGHTNING        816     5230                    6046
## 150           HEAT        937     2100                    3037
TotalOnHealth2 <- as.data.table(TotalOnHealth2)
HealthImpact <- melt(TotalOnHealth2, id.vars="EVTYPE", variable.name="Fatalities_Injuries")

head(HealthImpact)
##            EVTYPE Fatalities_Injuries value
## 1:        TORNADO          FATALITIES  5633
## 2: EXCESSIVE HEAT          FATALITIES  1903
## 3:      TSTM WIND          FATALITIES   504
## 4:          FLOOD          FATALITIES   470
## 5:      LIGHTNING          FATALITIES   816
## 6:           HEAT          FATALITIES   937

Plot the Top 10 Weather Events Harmful to Population Health in The US

ggplot(HealthImpact, aes(x = reorder(EVTYPE, -value), y = value)) + 
        geom_bar(stat = "identity", aes(fill = Fatalities_Injuries), position = "dodge") + 
        ylab("Health Impact") + 
        xlab("Event Type")+
        theme(axis.text.x = element_text(angle=30, hjust=1)) + 
        ggtitle("Top 10 Weather Events Harmful to Population Health in The US") + 
        theme(plot.title = element_text(hjust = 0.5))

Aggregate Property damaged and Crop damaged Cost by EVTYPE

Totalcost1 <- aggregate(cbind(PROPDMGCOST,CROPDMGCOST,SUM_PROPDMGCOST_CROPDMGCOST=PROPDMGCOST + CROPDMGCOST)~
                                EVTYPE, data = stormsubset1, FUN = sum)

Totalcost2 <- Totalcost1[order(-Totalcost1$SUM_PROPDMGCOST_CROPDMGCOST),][1:10,]
head(Totalcost2)
##                EVTYPE PROPDMGCOST CROPDMGCOST SUM_PROPDMGCOST_CROPDMGCOST
## 85              FLOOD   144.65771   5.6619684                   150.31968
## 223 HURRICANE/TYPHOON    69.30584   2.6078728                    71.91371
## 406           TORNADO    56.94738   0.4149533                    57.36233
## 349       STORM SURGE    43.32354   0.0000050                    43.32354
## 133              HAIL    15.73527   3.0259545                    18.76122
## 72        FLASH FLOOD    16.82267   1.4213171                    18.24399
Totalcost2 <- as.data.table(Totalcost2)
Economic_Consequence <- melt(Totalcost2, id.vars="EVTYPE", variable.name="PROPDMGCOST_CROPDMGCOST")
head(Economic_Consequence)
##               EVTYPE PROPDMGCOST_CROPDMGCOST     value
## 1:             FLOOD             PROPDMGCOST 144.65771
## 2: HURRICANE/TYPHOON             PROPDMGCOST  69.30584
## 3:           TORNADO             PROPDMGCOST  56.94738
## 4:       STORM SURGE             PROPDMGCOST  43.32354
## 5:              HAIL             PROPDMGCOST  15.73527
## 6:       FLASH FLOOD             PROPDMGCOST  16.82267

Plot the Top 10 Weather Events with Greatest Economic Consequences in The US

gg_plot2 <- ggplot(Economic_Consequence, aes(x = reorder(EVTYPE, -value), y = value)) +
        geom_bar(stat = "identity", aes(fill = PROPDMGCOST_CROPDMGCOST), position = "dodge") + 
        ylab("Economic Consequences in billions") + xlab("Event Type")+
        theme(axis.text.x = element_text(angle=90, hjust=1)) + 
        ggtitle("Top 10 Weather Events with Greatest Economic Consequences in The US") 
        

gg_plot2

Results

Question 1 The gg_plot1 shows that TORNADO is the Most Harmful Weather Event for the Population’s Health.

Question 2 The gg_plot2 shows that FLOOD and DROUGHT have the Greatest Economic Consequences.