Reproducible Research

USA storm event types effect on public health and Economics

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The purpose was to answer the 2 following questions:
1. Which types of events are most harmful with respect to population health?
2. Which types of events have the greatest economic consequences?
We will show that the top 3 event types that answer first question are (by assending order) Tornado, flood, and ice storms, while the top 3 for the 2nd question are (by order) Flood, huricane/typhoon and tornado.

Data processing

The data for this assignment comes in the form of a comma-separated-value file.

# general libraries used:
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(scales)

Reading the data

Reading the data file to a data frame (setting cache=TRUE as this part may take a while since csv file is quite large)

StormData<-read.csv("repdata_data_StormData.csv",header=T)

Subsetting the data to only relevant columns and rows:

A new data set (called sub_storm) will be generated based on main dataset (called StormData).
The new data set will include only relevant columns and rows that are relevant for the analysis of this project.

Relevant columns to subset on:
- BGN_DATE
- EVTYPE
- FATALITIES
- INJURIES
- PROPDMG
- PROPDMGEXP
- CROPDMG
- CROPDMGEXP
relevant rows to subset on:
Ignoring rows where PROPDMGEXP or CROPDMGEXP are equal to ? or + or -
Removing lines from before 1966, since not all event types were recorded before that year.

PROP_val<-unique(StormData$PROPDMGEXP)
PROP_val<-droplevels(PROP_val,c("+","-","?","NA",""))
CROP_val<-unique(StormData$CROPDMGEXP)
CROP_val<-droplevels(CROP_val,c("+","-","?","NA",""))
col_val<-c("BGN_DATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
# subsetting on relevant columns:
sub_storm<-select(StormData,col_val)
# subsetting on rows with proper values:
sub_storm<-filter(sub_storm,CROPDMGEXP %in% CROP_val & PROPDMGEXP %in% PROP_val)
# subsetting on rows from before 1966:
sub_storm$BGN_DATE<-as.Date(sub_storm$BGN_DATE,format="%m/%d/%Y")
sub_storm<-filter(sub_storm,BGN_DATE>="01/01/1966")

Adding aditional columns to the small data set.

Adding the followoing new colunms:
- health_incidents: The total number of Injuries and Falities for each event (FATALITIES+INJURIES).
- PROP_D: The total Property damage for each event (in US $, calculations explained bellow)
- CROP_D: The total CROP damage for each event (in US $, calculations explained bellow)
- total_damage: CROP_D + PROP_D (for each event).

Generation of PROP_D and CROP_D will be done according to the following table:
- PROP_D = PROPDMG x factor
- CROP_D = CROPDMG x factor

When “factor” is set according to the value in PROPDMGEXP and CROPDMGEXP per following table:

CROPDMGEXP/PROPDMGEXP	factor
M or m	1000000
B or b	1000000000
K or k	1000
h or H	100
number between 0-9	10^number
————————————

# adding health_incidents
sub_storm<-mutate(sub_storm,health_incidents=INJURIES+FATALITIES)
# The bellow is the factorization factor function per above table:  
factor1<-function(x) {
        case_when (
                x == "M" ~ 1000000,
                x == "m" ~ 1000000,
                x == "B" ~ 1000000000,
                x == "b" ~ 1000000000,
                x == "K" ~ 1000,
                x == "K" ~ 1000,
                x == "h" ~ 100,
                x == "H" ~ 100,
                x == "0" ~ 1,
                x == "1" ~ 10,
                x == "2" ~ 100,
                x == "3" ~ 1000,
                x == "4" ~ 10000,
                x == "5" ~ 100000,
                x == "6" ~ 1000000,
                x == "7" ~ 10000000,
                x == "8" ~ 100000000,
                x == "9" ~ 1000000000
                
        )
}

# Adding new columns CROP_D and PROP_D to data set:
sub_storm<-mutate(sub_storm,PROP_D=PROPDMG*factor1(PROPDMGEXP))
sub_storm<-mutate(sub_storm,CROP_D=CROPDMG*factor1(CROPDMGEXP))
# generating total_damage columns:
sub_storm<-mutate(sub_storm,total_damage=CROP_D+PROP_D)

Analyzing the effect of different events on public health:

The analysis summarizes the total number of injuries and fatalities (in new column “health_incidents”) per each event type.
The sum is done over all the years (from 1966). After that the Events are sorted by number of health_incidents (high to low), and the plot bellow shows the 10 events with most health_incidents.

health_summary<-tapply(sub_storm$health_incidents,sub_storm$EVTYPE,sum,na.rm=T)
health_summary<-as.data.frame((health_summary))
health_summary<-mutate(health_summary,event_type=rownames(health_summary))
health_summary<-filter(health_summary,health_summary$`(health_summary)`!="NA")
# ordering the data from hight to low
health_summary<-arrange(health_summary,desc(`(health_summary)`))
#ploting a bar plot for the 10 highest:
par(mar=c(4,8,2,2))
barplot(health_summary$`(health_summary)`[1:10],col="purple",names.arg = health_summary$event_type[1:10],horiz = T,
        xlab="Total number of injuries and fatalities",
        main="Total number of Injuries and Fatalities per event type \n (USA from year 1966 and on)",
        xlim=c(0,14000),las=1,cex.axis = 0.8,cex.names = 0.6)

Analyzing the effect of different events on economics:

The analysis summarizes the total damage in crop and prop (in new column “total_damage”) per each event type.
The sum is done over all the years (from 1966). After that the Events are sorted by total_damage (high to low), and the plot bellow shows the 15 events with the most total_damage.

damage_summary<-tapply(sub_storm$total_damage,sub_storm$EVTYPE,sum,na.rm=T)
damage_summary<-as.data.frame((damage_summary))
damage_summary<-mutate(damage_summary,event_type=rownames(damage_summary))
damage_summary<-mutate(damage_summary,`(damage_summary)`=`(damage_summary)`/1000000)
damage_summary<-filter(damage_summary,damage_summary$`(damage_summary)`!="NA")
# ordering the data from hight to low
damage_summary<-arrange(damage_summary,desc(`(damage_summary)`))
par(mar=c(4,8,2,2))
barplot(damage_summary$`(damage_summary)`[1:15],col="violet",names.arg = damage_summary$event_type[1:15],horiz = T,
        xlab="Total Property and Crop related damage in Millions of $",
        main="Total Property and Crop related damage in US Dollars (Miilion $) per event type \n (USA from year 1966 and on)",
        xlim=c(0,140000),las=1,cex.axis = 0.8,cex.names = 0.6)

Results

# health:
sum_damage<-sum(damage_summary$`(damage_summary)`)
damage_summary<-mutate(damage_summary,percent_damage=`(damage_summary)`/sum_damage)
damage_first10<-sum(damage_summary$percent_damage[1:10])
damage_first3<-sum(damage_summary$percent_damage[1:3])
# damage:
sum_health<-sum(health_summary$`(health_summary)`)
health_summary<-mutate(health_summary,percent_health=`(health_summary)`/sum_health)
health_first15<-sum(health_summary$percent_health[1:15])
health_first3<-sum(health_summary$percent_health[1:3])

15 events are responsible for 92.7% of the injuries and fatalities cases across all years. Those events are listed in the first figure above.
The top 3 event types responsible for 63.7% of the injuries and fatalities are (in descenfding order):
- Tornado
- Flood
- Ice storm
10 events are responsible for 91.6% of the prop and crop damages accross all years. Those events are lister in the second figure above.
The top 3 event types responsible for 70.3% of the crop and prop damages are (in descending order):
- Flood
- Huricane/Typhoon
- Tornado

Reproducible Research - Project #2

Milca Tarshish

5/22/2019