Assignment: Reproducible Research: Week 4 - Course Project 2

Overview

Weather events cause public health and economic problems for communities and municipalities. Severe events result in fatalities, injuries, and damage. Predicting and/or preventing these outcomes is a primary objective.

This analysis examines the damaging effects of severe weather conditions (e.g. hurricanes, tornadoes, thunderstorms, floods, etc.) on human populations and the econonomy in the U.S. from 1950 to 2011.

As a result, the analysis will highlight the severe weather events associated with the greatest impact on the economy and population health.

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

In this report,effect of weather events on personal as well as property damages was studied. Barplots were plotted seperately for the top 8 weather events that causes highest fatalities and highest injuries. Results indicate that most Fatalities and injuries were caused by Tornados.Also, barplots were plotted for the top 8 weather events that causes the highest property damage and crop damage.

DATA PROCESSING

  1. Data The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer the following basic questions about severe weather events.

Modus Operandi / Process

Data Preparation

1.1 Install packages & Load libraries

Install packages …

library(rmarkdown)
library(knitr)
#install.packages("R.utils",repos="http://cran.us.r-project.org")
library(R.utils)
 #install.packages("gridExtra", repos="'https://cran.rstudio.com")
 library(gridExtra)
#install.packages("ggplot2", repos="'https://cran.rstudio.com")
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.3

1.2 Download the storm data file into the designated working directory folder

temp <- tempfile()
 
if(!file.exists("/stormData.csv.bz2")){
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="./stormData.csv.bz2")
}
## Uncompressing the file
if(!file.exists("stormdata.csv"))
{
  bunzip2("stormData.csv.bz2","stormdata.csv",remove=F)
}

# 1.3 load & read data from file

storm <- read.csv("stormdata.csv",header=TRUE,sep=",")
summary(storm)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 
names(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

RESULTS

QUESTION 1.

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

1.1 Variable selection (reducing the data set to only needed columns and variables)

variables<-c("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP","CROPDMG","CROPDMGEXP")

strmdata<-storm[variables]

dim(strmdata)
## [1] 902297      7
names(strmdata)
## [1] "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"    "PROPDMGEXP"
## [6] "CROPDMG"    "CROPDMGEXP"

1.2 Reviewing events that cause the most fatalities ( The Top-10 Fatalities by Weather Event )

Procedure = aggregate the top 10 fatalities by the event type and sort the output in descending order

Fatalities <- aggregate(FATALITIES ~ EVTYPE, data = strmdata, FUN = sum)

Top10_Fatalities <- Fatalities[order(-Fatalities$FATALITIES), ][1:10, ] 

Top10_Fatalities 
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

1.3 Reviewing events that cause the most injuries ( The Top-10 Injuries by Weather Event )

Procedure = aggregate the top 10 injuries by the event type and sort the output in descending order

Injuries <- aggregate(INJURIES ~ EVTYPE, data = strmdata, FUN = sum)

Top10_Injuries <- Injuries[order(-Injuries$INJURIES), ][1:10, ] 

Top10_Injuries 
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

1.4 Plot of Top 10 Fatalities & Injuries for Weather Event Types ( Population Health Impact )

Proecedure = plot graphs showing the top 10 fatalities and injuries

p1 = ggplot(Top10_Fatalities, aes(x = EVTYPE, y = FATALITIES, theme_set(theme_bw()))) + 
    geom_bar(stat = "identity", fill = "orange") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 6)) + 
    xlab("Event Type") + ylab("Fatalities") + ggtitle("Fatalities by top 10 Weather Event Types") +
    theme(plot.title = element_text(size = 10))

p2 = ggplot(Top10_Injuries, aes(x = EVTYPE, y = INJURIES, theme_set(theme_bw()))) + 
    geom_bar(stat = "identity", fill = "pink") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 6)) + 
    xlab("Event Type") + ylab("Injuries") + ggtitle("Injuries by top 10 Weather Event Types") +
    theme(plot.title = element_text(size = 10))

## Plot both side by side using gridExtra package
grid.arrange(p1, p2, ncol = 2, top = "Most Harmful Events with Respect to Population Health")

Figure 1: The weather event responsbile for the highest fatalities and injuries is the ‘Tornado’

QUESTION 2. Across the United States, which types of events have the greatest economic consequences?

An analysis of the weather events responsible for the greatest economic consequences

Hypothesis: Economic consequences means damages. The two significant types of damage typically caused by weather events include ‘properties and crops’

2.1 Data Exploration & Findings …

Upon reviewing the column names, the property damage(PROPDMG) and crop damage(CROPDMG) columns both have another related column titled ‘exponents’ (i.e - PROPDMGEXP and CROPDMGEXP respectively).

As a result, let’s convert the exponent columns into numeric data for the calculation of total property and crop damages encountered.

2.2 Defining & Calcuating [ Property Damage ]

Property damage exponents for each level listed out & assigned those values for the property exponent data.

Invalid data was excluded by assigning the value as ‘0’.

Then, the property damage value was calculated by multiplying the property damage and property exponent value.

unique(strmdata$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
#Assigning values for the property exponent strmdata 
strmdata$PROPEXP[strmdata$PROPDMGEXP == "K"] <- 1000
strmdata$PROPEXP[strmdata$PROPDMGEXP == "M"] <- 1e+06
strmdata$PROPEXP[strmdata$PROPDMGEXP == ""] <- 1
strmdata$PROPEXP[strmdata$PROPDMGEXP == "B"] <- 1e+09
strmdata$PROPEXP[strmdata$PROPDMGEXP == "m"] <- 1e+06
strmdata$PROPEXP[strmdata$PROPDMGEXP == "0"] <- 1
strmdata$PROPEXP[strmdata$PROPDMGEXP == "5"] <- 1e+05
strmdata$PROPEXP[strmdata$PROPDMGEXP == "6"] <- 1e+06
strmdata$PROPEXP[strmdata$PROPDMGEXP == "4"] <- 10000
strmdata$PROPEXP[strmdata$PROPDMGEXP == "2"] <- 100
strmdata$PROPEXP[strmdata$PROPDMGEXP == "3"] <- 1000
strmdata$PROPEXP[strmdata$PROPDMGEXP == "h"] <- 100
strmdata$PROPEXP[strmdata$PROPDMGEXP == "7"] <- 1e+07
strmdata$PROPEXP[strmdata$PROPDMGEXP == "H"] <- 100
strmdata$PROPEXP[strmdata$PROPDMGEXP == "1"] <- 10
strmdata$PROPEXP[strmdata$PROPDMGEXP == "8"] <- 1e+08

# Assigning '0' to invalid exponent strmdata
strmdata$PROPEXP[strmdata$PROPDMGEXP == "+"] <- 0
strmdata$PROPEXP[strmdata$PROPDMGEXP == "-"] <- 0
strmdata$PROPEXP[strmdata$PROPDMGEXP == "?"] <- 0

# Calculating the property damage value
strmdata$PROPDMGVAL <- strmdata$PROPDMG * strmdata$PROPEXP


# 2.3 Defining & Calcuating [ Crop Damage ]

## Crop damage exponents for each level listed out & assigned those values for the crop exponent data. 
## Invalid data was excluded by assigning the value as '0'. 
## Then, the crop damage value was calculated by multiplying the crop damage and crop exponent value.

unique(strmdata$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
# Assigning values for the crop exponent strmdata 
strmdata$CROPEXP[strmdata$CROPDMGEXP == "M"] <- 1e+06
strmdata$CROPEXP[strmdata$CROPDMGEXP == "K"] <- 1000
strmdata$CROPEXP[strmdata$CROPDMGEXP == "m"] <- 1e+06
strmdata$CROPEXP[strmdata$CROPDMGEXP == "B"] <- 1e+09
strmdata$CROPEXP[strmdata$CROPDMGEXP == "0"] <- 1
strmdata$CROPEXP[strmdata$CROPDMGEXP == "k"] <- 1000
strmdata$CROPEXP[strmdata$CROPDMGEXP == "2"] <- 100
strmdata$CROPEXP[strmdata$CROPDMGEXP == ""] <- 1

# Assigning '0' to invalid exponent strmdata
strmdata$CROPEXP[strmdata$CROPDMGEXP == "?"] <- 0

# calculating the crop damage 
strmdata$CROPDMGVAL <- strmdata$CROPDMG * strmdata$CROPEXP
# 2.4 Property Damage Summary

## Procedure = aggregate the property damage by the event type and sort the output it in descending order

prop <- aggregate(PROPDMGVAL~EVTYPE,data=strmdata,FUN=sum,na.rm=TRUE)

prop <- prop[with(prop,order(-PROPDMGVAL)),]

prop <- head(prop,10)

print(prop)
##                EVTYPE   PROPDMGVAL
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56947380617
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822673979
## 244              HAIL  15735267513
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497251
## 359         HIGH WIND   5270046260
# Q2.5 Crop Damage Summary

## Procedure = aggregate the crop damage by the event type and sort the output it in descending order

crop <- aggregate(CROPDMGVAL~EVTYPE,data=strmdata,FUN=sum,na.rm=TRUE)

crop <- crop[with(crop,order(-CROPDMGVAL)),]

crop <- head(crop,10)

print(crop)
##                EVTYPE  CROPDMGVAL
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954473
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000
## Plot using ggplot2 for crop
p1 <- ggplot(crop, aes(x = EVTYPE, y = CROPDMGVAL, theme_set(theme_bw()))) +
    geom_bar(stat = "identity", fill = "green") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Total Damage in $USD") + ggtitle("Total  Crop Damage")

## Plot using ggplot2 for property
p2 <- ggplot(prop, aes(x = EVTYPE, y = PROPDMGVAL, theme_set(theme_bw()))) +
    geom_bar(stat = "identity", fill = "green") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Total Damage in $USD") + ggtitle("Total Property  Damage")

## Plot both side by side using gridExtra package
grid.arrange(p1, p2, ncol = 2, top = "Total Property & Crop Damage by top 10 Weather Events")

Figure 2: ‘Floods’ are responsbile for the highest property damage while ‘droughts’ cause the greatest crop damage.

Summary of Conclusions Tornados are responsible for the maximum number of fatalities and injuries, followed by Excessive Heat for fatalities and Thunderstorm wind for injuries.

Floods are responsbile for maximum property damage, while Droughts cause maximum crop damage. Second major events that caused the maximum damage was Hurricanes/Typhoos for property damage and Floods for crop damage.