Title : Data Analysis of Adverse Weather Impacts on U.S. Population and Economy

1. Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events have resulted in fatalities, injuries, and property damage. Preventing such outcomes to the extent possible is a key concern.

This project uses the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The goal of this project is to explore the NOAA storm database and answer some basic questions about severe weather events. Database are used to answer those questions and all code is shown for the entire analysis. The analysis consists of tables, figures, or other summaries. Various R packages are also used to support the analysis.

2. Key Questions to be Addressed in the Data Analysis

Question 1 : Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Question 2 : Across the United States, which types of events have the greatest economic consequences?

3. Data Processing

3.1 Data

The data for this project come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file is downloaded from the web site:

There is also some documentation of the database available. You will find how some of the variables are constructed/defined.

  1. National Weather Service Storm Data Documentation (website link : https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf)

  2. National Climatic Data Center Storm Events FAQ (website link : https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf)

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

3.2 Data Preparation

Step 3.2.1 : load libraries and set global option to block warning message

library(rmarkdown)
library(knitr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(grid)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
knitr::opts_chunk$set(warning=FALSE)

Step 3.2.2 : Download data file to current working directory and read the data

## Perform the file download
if(!file.exists("./stormData.csv.bz2")){
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="./stormData.csv.bz2")
}

## read the data file
storm <- read.csv("stormData.csv.bz2", header=TRUE, sep=",")

## look at data
head(storm)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
summary(storm)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

Step 3.2.3 : Select relevant data for this Analysis

storm_subset <- storm[ , c("EVTYPE", "BGN_DATE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Step 3.2.4 : transform data for this analysis

## Transform Raw Date to Date:
storm_subset$BGN_DATE <- as.POSIXct(storm_subset$BGN_DATE,format="%m/%d/%Y %H:%M:%S")

## Look at data
head(storm_subset)
##    EVTYPE   BGN_DATE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 1950-04-18          0       15    25.0          K       0           
## 2 TORNADO 1950-04-18          0        0     2.5          K       0           
## 3 TORNADO 1951-02-20          0        2    25.0          K       0           
## 4 TORNADO 1951-06-08          0        2     2.5          K       0           
## 5 TORNADO 1951-11-15          0        2     2.5          K       0           
## 6 TORNADO 1951-11-15          0        6     2.5          K       0
str(storm_subset)
## 'data.frame':    902297 obs. of  8 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_DATE  : POSIXct, format: "1950-04-18" "1950-04-18" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
summary(storm_subset)
##     EVTYPE             BGN_DATE                     FATALITIES      
##  Length:902297      Min.   :1950-01-03 00:00:00   Min.   :  0.0000  
##  Class :character   1st Qu.:1995-04-20 00:00:00   1st Qu.:  0.0000  
##  Mode  :character   Median :2002-03-19 00:00:00   Median :  0.0000  
##                     Mean   :1998-12-30 13:50:20   Mean   :  0.0168  
##                     3rd Qu.:2007-07-28 00:00:00   3rd Qu.:  0.0000  
##                     Max.   :2011-11-30 00:00:00   Max.   :583.0000  
##                     NA's   :219                                     
##     INJURIES            PROPDMG         PROPDMGEXP           CROPDMG       
##  Min.   :   0.0000   Min.   :   0.00   Length:902297      Min.   :  0.000  
##  1st Qu.:   0.0000   1st Qu.:   0.00   Class :character   1st Qu.:  0.000  
##  Median :   0.0000   Median :   0.00   Mode  :character   Median :  0.000  
##  Mean   :   0.1557   Mean   :  12.06                      Mean   :  1.527  
##  3rd Qu.:   0.0000   3rd Qu.:   0.50                      3rd Qu.:  0.000  
##  Max.   :1700.0000   Max.   :5000.00                      Max.   :990.000  
##                                                                            
##   CROPDMGEXP       
##  Length:902297     
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

3.3 Exploratory Data Analysis

QUESTION 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health ?

Step 3.3.1 : Data pre-processing (Question 1)

## Aggregate fatalities and injuries by Event Type
fatalities <- aggregate(FATALITIES ~ EVTYPE, data=storm_subset, sum)
injuries <- aggregate(INJURIES ~ EVTYPE, data=storm_subset, sum)

## Arrange fatalities and injuries in descending order by Event Type
## and find out the top 10
## (note: using arrange function is required to load library dplyr)
top10_fatalities <- arrange(fatalities,desc(FATALITIES),EVTYPE)[1:10,]
top10_injuries <- arrange(injuries,desc(INJURIES),EVTYPE)[1:10,]

## convert event type variable to factor for analysis
top10_fatalities$EVTYPE <- factor(top10_fatalities$EVTYPE, levels=top10_fatalities$EVTYPE)
top10_injuries$EVTYPE <- factor(top10_injuries$EVTYPE, levels=top10_injuries$EVTYPE)

## look at the fatalities by top 10 weather Event Type
top10_fatalities
##            EVTYPE FATALITIES
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224
## look at the injuries by top 10 weather Event Type
top10_injuries
##               EVTYPE INJURIES
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361

Step 3.3.2 : Plots of fatalities and injuries by top 10 weather event type (Question 1)

# plot of fatalities by Event Type
top10_fatalities_plot <- ggplot(top10_fatalities, aes(x = EVTYPE, y = FATALITIES)) + 
      geom_bar(stat = "identity", fill = "blue", width = NULL) + 
      theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
      xlab("Event Type") + ylab("Fatalities") 

# plot of injuries by Event Type
top10_injuries_plot <- ggplot(top10_injuries, aes(x = EVTYPE, y = INJURIES)) + 
      geom_bar(stat = "identity", fill = "blue", width = NULL) + 
      theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
      xlab("Event Type") + ylab("Injuries") 

# load libraries grid and gridExtra for grid.arrange function
grid.arrange(top10_fatalities_plot, top10_injuries_plot, ncol=2, nrow=1,
     top = textGrob("Fatalities & Injuries for top 10 Weather Events",gp=gpar(fontsize=14,font=3)))

Question 2 : Across the United States, which types of events have the greatest economic consequences?

With respect to Economic consequences, damages caused by weather events include properties and crops.

In the database, the property damage (“PROPDMG” column) and crop damage (“CROPDMG” column) are related to another 2 columns titled ‘exponents’ (i.e “PROPDMGEXP” column and “CROPDMGEXP” column).

Step 3.3.3 : Data pre-processing (Question 2)

# convert event damage amount (property and crop) to integer format
# (note: load library plyr for mapvalues function)
unique(storm_subset$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storm_subset$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
tmp_PROPDMG <- mapvalues(storm_subset$PROPDMGEXP,
                          c("K", "M", "", "B", "m", "+", "0", "5", "6", "?", "4",
                            "2", "3", "h", "7", "H", "-", "1", "8"), 
                          c(1e3, 1e6, 1, 1e9,1e6,  1,  1,1e5,1e6,  1,1e4,1e2,1e3,
                            1,1e7,1e2,  1, 10,1e8)
                        )
tmp_CROPDMG <- mapvalues(storm_subset$CROPDMGEXP,
                          c("", "M", "K", "m", "B", "?", "0", "k", "2"),
                          c(1, 1e6, 1e3, 1e6, 1e9, 1, 1, 1e3, 1e2)
                        )

Step 3.3.4 : Calculate the damage amount (Question 2)

# calculate damage (property and crop)
storm_subset$TOTAL_PROPDMG <- as.numeric(tmp_PROPDMG) * storm_subset$PROPDMG
storm_subset$TOTAL_CROPDMG <- as.numeric(tmp_CROPDMG) * storm_subset$CROPDMG

# Total Damage Amount = Property Damage Amount + Crop Damage Amount
storm_subset$TOTAL_DMG <- storm_subset$TOTAL_PROPDMG + storm_subset$TOTAL_CROPDMG

# sum the total for (1) property damage, (2) corp damage, and
# (3) damage of property and corp
propdamage_sum <- aggregate(TOTAL_PROPDMG ~ EVTYPE, data=storm_subset, sum)
cropdamage_sum <- aggregate(TOTAL_CROPDMG ~ EVTYPE, data=storm_subset, sum)
totaldamage_sum <- aggregate(TOTAL_DMG ~ EVTYPE, data=storm_subset, sum)

# Arrange in descending order for damages of property and crop by Weather Event Type (EVTYPE)
# (Top 10 Events)
propdamage_sum_desc <- arrange(propdamage_sum, desc(propdamage_sum$TOTAL_PROPDMG),EVTYPE)[1:10,]
cropdamage_sum_desc <- arrange(cropdamage_sum, desc(cropdamage_sum$TOTAL_CROPDMG),EVTYPE)[1:10,]
totaldamage_sum_desc <- arrange(totaldamage_sum, desc(totaldamage_sum$TOTAL_DMG),EVTYPE)[1:10,]

# Show table for total of PROPERTY DAMAGE by Weather Event Type (in descending order)
propdamage_sum_desc
##               EVTYPE TOTAL_PROPDMG
## 1              FLOOD  144657709807
## 2  HURRICANE/TYPHOON   69305840000
## 3            TORNADO   56947380677
## 4        STORM SURGE   43323536000
## 5        FLASH FLOOD   16822673979
## 6               HAIL   15735267513
## 7          HURRICANE   11868319010
## 8     TROPICAL STORM    7703890550
## 9       WINTER STORM    6688497251
## 10         HIGH WIND    5270046295
# Show table for total of CROP DAMAGE by Weather Event Type (in descending order)
cropdamage_sum_desc
##               EVTYPE TOTAL_CROPDMG
## 1            DROUGHT   13972566000
## 2              FLOOD    5661968450
## 3        RIVER FLOOD    5029459000
## 4          ICE STORM    5022113500
## 5               HAIL    3025954473
## 6          HURRICANE    2741910000
## 7  HURRICANE/TYPHOON    2607872800
## 8        FLASH FLOOD    1421317100
## 9       EXTREME COLD    1292973000
## 10      FROST/FREEZE    1094086000
# Show table for total of PROPERTY DAMAGE and CROP DAMAGE by Weather Event Type (in descending order)
totaldamage_sum_desc
##               EVTYPE    TOTAL_DMG
## 1              FLOOD 150319678257
## 2  HURRICANE/TYPHOON  71913712800
## 3            TORNADO  57362333947
## 4        STORM SURGE  43323541000
## 5               HAIL  18761221986
## 6        FLASH FLOOD  18243991079
## 7            DROUGHT  15018672000
## 8          HURRICANE  14610229010
## 9        RIVER FLOOD  10148404500
## 10         ICE STORM   8967041360

Step 3.3.5 : plot graphs for damage amounts (Question 2)

# show plot for TOTAL PROPERTY DAMAGE by Event Type (with setting ENVTYPE as factor)

propdamage_sum_desc$EVTYPE <- factor(propdamage_sum_desc$EVTYPE, levels=propdamage_sum_desc$EVTYPE)
propdamage_ggplot <- ggplot(propdamage_sum_desc, aes(x=EVTYPE, y=TOTAL_PROPDMG)) + 
                  geom_bar(stat="identity", fill="red") + 
                  theme(axis.text.x = element_text(angle=90, hjust=1)) + 
                  xlab("Event Type") + ylab("Property Damages ($)") 

# show plot for TOTAL PROPERTY DAMAGE by Event Type (with setting ENVTYPE as factor)

cropdamage_sum_desc$EVTYPE <- factor(cropdamage_sum_desc$EVTYPE, levels=cropdamage_sum_desc$EVTYPE)
cropdamage_ggplot <- ggplot(cropdamage_sum_desc, aes(x=EVTYPE, y=TOTAL_CROPDMG)) + 
                        geom_bar(stat="identity", fill="green") + 
                        theme(axis.text.x = element_text(angle=90, hjust=1)) + 
                        xlab("Event Type") + ylab("Crop Damages ($)") 

# show plot for TOTAL DAMAGE by Event Type (with setting ENVTYPE as factor)
totaldamage_sum_desc$EVTYPE <- factor(totaldamage_sum_desc$EVTYPE, levels = totaldamage_sum_desc$EVTYPE)
totaldamage_ggplot <- ggplot(totaldamage_sum_desc, aes(x=EVTYPE, y=TOTAL_DMG)) + 
                    geom_bar(stat="identity", fill="blue") + 
                    theme(axis.text.x = element_text(angle=90, hjust=1)) + 
                    xlab("Event Type") + ylab("Total Prop & Crop Damages ($)") 

# plot the final graph
grid.arrange(propdamage_ggplot,cropdamage_ggplot, totaldamage_ggplot,
             ncol=3, nrow=1, top=textGrob("Impacts of Damages on Property, Crop, & Overall from the top 10 Weather Events", gp=gpar(fontsize=14,font=3)
                                          )
             )

4. Results

Findings from the 2 graphs for Question 1:

  • Tornado causes the most fatalities and injuries and is considered as having the most signifcant harmful impact on public health.

  • Excessive heat and flash flooding are the second and third leading causes of fatalities respectively.

  • TSTM wind and flood are the second and third leading causes of injuries respectively.

Findings from the 3 graphs for Question 2:

  • Flood causes the most significant total damages on property.

  • Drought causes the most significant total damages on crop.

  • Typhoon and tornado cause the second and third significant total damages on property respectively.

  • Flood and river flood cause the second and third significant total damages on crop respectively.

  • Flooding, typhoon and tornado are the 3 leading causes of damages on property and crop.