Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The purpose was to answer the 2 following questions:
1. Which types of events are most harmful with respect to population health?
2. Which types of events have the greatest economic consequences?
We will show that the top 3 event types that answer first question are (by assending order) Tornado, flood, and ice storms, while the top 3 for the 2nd question are (by order) Flood, huricane/typhoon and tornado.
The data for this assignment comes in the form of a comma-separated-value file.
# general libraries used:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(scales)
Reading the data file to a data frame (setting cache=TRUE as this part may take a while since csv file is quite large)
StormData<-read.csv("repdata_data_StormData.csv",header=T)
A new data set (called sub_storm) will be generated based on main dataset (called StormData).
The new data set will include only relevant columns and rows that are relevant for the analysis of this project.
Relevant columns to subset on:
- BGN_DATE
- EVTYPE
- FATALITIES
- INJURIES
- PROPDMG
- PROPDMGEXP
- CROPDMG
- CROPDMGEXP
relevant rows to subset on:
Ignoring rows where PROPDMGEXP or CROPDMGEXP are equal to ? or + or -
Removing lines from before 1966, since not all event types were recorded before that year.
PROP_val<-unique(StormData$PROPDMGEXP)
PROP_val<-droplevels(PROP_val,c("+","-","?","NA",""))
CROP_val<-unique(StormData$CROPDMGEXP)
CROP_val<-droplevels(CROP_val,c("+","-","?","NA",""))
col_val<-c("BGN_DATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
# subsetting on relevant columns:
sub_storm<-select(StormData,col_val)
# subsetting on rows with proper values:
sub_storm<-filter(sub_storm,CROPDMGEXP %in% CROP_val & PROPDMGEXP %in% PROP_val)
# subsetting on rows from before 1966:
sub_storm$BGN_DATE<-as.Date(sub_storm$BGN_DATE,format="%m/%d/%Y")
sub_storm<-filter(sub_storm,BGN_DATE>="01/01/1966")
Adding the followoing new colunms:
- health_incidents: The total number of Injuries and Falities for each event (FATALITIES+INJURIES).
- PROP_D: The total Property damage for each event (in US $, calculations explained bellow)
- CROP_D: The total CROP damage for each event (in US $, calculations explained bellow)
- total_damage: CROP_D + PROP_D (for each event).
Generation of PROP_D and CROP_D will be done according to the following table:
- PROP_D = PROPDMG x factor
- CROP_D = CROPDMG x factor
When “factor” is set according to the value in PROPDMGEXP and CROPDMGEXP per following table:
| CROPDMGEXP/PROPDMGEXP | factor |
|---|---|
| M or m | 1000000 |
| B or b | 1000000000 |
| K or k | 1000 |
| h or H | 100 |
| number between 0-9 | 10^number |
| ———————————— |
# adding health_incidents
sub_storm<-mutate(sub_storm,health_incidents=INJURIES+FATALITIES)
# The bellow is the factorization factor function per above table:
factor1<-function(x) {
case_when (
x == "M" ~ 1000000,
x == "m" ~ 1000000,
x == "B" ~ 1000000000,
x == "b" ~ 1000000000,
x == "K" ~ 1000,
x == "K" ~ 1000,
x == "h" ~ 100,
x == "H" ~ 100,
x == "0" ~ 1,
x == "1" ~ 10,
x == "2" ~ 100,
x == "3" ~ 1000,
x == "4" ~ 10000,
x == "5" ~ 100000,
x == "6" ~ 1000000,
x == "7" ~ 10000000,
x == "8" ~ 100000000,
x == "9" ~ 1000000000
)
}
# Adding new columns CROP_D and PROP_D to data set:
sub_storm<-mutate(sub_storm,PROP_D=PROPDMG*factor1(PROPDMGEXP))
sub_storm<-mutate(sub_storm,CROP_D=CROPDMG*factor1(CROPDMGEXP))
# generating total_damage columns:
sub_storm<-mutate(sub_storm,total_damage=CROP_D+PROP_D)
The analysis summarizes the total number of injuries and fatalities (in new column “health_incidents”) per each event type.
The sum is done over all the years (from 1966). After that the Events are sorted by number of health_incidents (high to low), and the plot bellow shows the 10 events with most health_incidents.
health_summary<-tapply(sub_storm$health_incidents,sub_storm$EVTYPE,sum,na.rm=T)
health_summary<-as.data.frame((health_summary))
health_summary<-mutate(health_summary,event_type=rownames(health_summary))
health_summary<-filter(health_summary,health_summary$`(health_summary)`!="NA")
# ordering the data from hight to low
health_summary<-arrange(health_summary,desc(`(health_summary)`))
#ploting a bar plot for the 10 highest:
par(mar=c(4,8,2,2))
barplot(health_summary$`(health_summary)`[1:10],col="purple",names.arg = health_summary$event_type[1:10],horiz = T,
xlab="Total number of injuries and fatalities",
main="Total number of Injuries and Fatalities per event type \n (USA from year 1966 and on)",
xlim=c(0,14000),las=1,cex.axis = 0.8,cex.names = 0.6)
The analysis summarizes the total damage in crop and prop (in new column “total_damage”) per each event type.
The sum is done over all the years (from 1966). After that the Events are sorted by total_damage (high to low), and the plot bellow shows the 15 events with the most total_damage.
damage_summary<-tapply(sub_storm$total_damage,sub_storm$EVTYPE,sum,na.rm=T)
damage_summary<-as.data.frame((damage_summary))
damage_summary<-mutate(damage_summary,event_type=rownames(damage_summary))
damage_summary<-mutate(damage_summary,`(damage_summary)`=`(damage_summary)`/1000000)
damage_summary<-filter(damage_summary,damage_summary$`(damage_summary)`!="NA")
# ordering the data from hight to low
damage_summary<-arrange(damage_summary,desc(`(damage_summary)`))
par(mar=c(4,8,2,2))
barplot(damage_summary$`(damage_summary)`[1:15],col="violet",names.arg = damage_summary$event_type[1:15],horiz = T,
xlab="Total Property and Crop related damage in Millions of $",
main="Total Property and Crop related damage in US Dollars (Miilion $) per event type \n (USA from year 1966 and on)",
xlim=c(0,140000),las=1,cex.axis = 0.8,cex.names = 0.6)
# health:
sum_damage<-sum(damage_summary$`(damage_summary)`)
damage_summary<-mutate(damage_summary,percent_damage=`(damage_summary)`/sum_damage)
damage_first10<-sum(damage_summary$percent_damage[1:10])
damage_first3<-sum(damage_summary$percent_damage[1:3])
# damage:
sum_health<-sum(health_summary$`(health_summary)`)
health_summary<-mutate(health_summary,percent_health=`(health_summary)`/sum_health)
health_first15<-sum(health_summary$percent_health[1:15])
health_first3<-sum(health_summary$percent_health[1:3])