This analysis describes the impact of Storms and other severe weather events on public health and economy for communities and municipalities in US. The U.S. National Oceanic and Atmospheric Administration’s (NOAA) tracks characteristics of major storms and weather events in the United States and frequently publishes their data set.
The report presents the following study
The data obtained from the NOAA website Storm Data is in a zipped CSV format. We first download the data into the working directory.
Load Libraries
##Load the required R Library
install.packages('tidyr', repos="http://cran.rstudio.com/")
library(tidyr)
library(dplyr)
library(ggplot2)
Here we load the required data
StormData <- tbl_df(read.csv("repdata-data-StormData.csv.bz2"))
After loading the dataset we check the number of observations and variables.
dim(StormData)
## [1] 902297 37
We have about 902297 observations for 37 variables
Here are the first few rows from the Storm Data. We are particularly interested in the following variables
head(StormData)
## Source: local data frame [6 x 37]
##
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## (dbl) (fctr) (fctr) (fctr) (dbl) (fctr) (fctr)
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## Variables not shown: EVTYPE (fctr), BGN_RANGE (dbl), BGN_AZI (fctr),
## BGN_LOCATI (fctr), END_DATE (fctr), END_TIME (fctr), COUNTY_END (dbl),
## COUNTYENDN (lgl), END_RANGE (dbl), END_AZI (fctr), END_LOCATI (fctr),
## LENGTH (dbl), WIDTH (dbl), F (int), MAG (dbl), FATALITIES (dbl),
## INJURIES (dbl), PROPDMG (dbl), PROPDMGEXP (fctr), CROPDMG (dbl),
## CROPDMGEXP (fctr), WFO (fctr), STATEOFFIC (fctr), ZONENAMES (fctr),
## LATITUDE (dbl), LONGITUDE (dbl), LATITUDE_E (dbl), LONGITUDE_ (dbl),
## REMARKS (fctr), REFNUM (dbl)
In order to present our analysis we summarize the observations based on Health Impact and Economic Impact for the different Event Types
## Group the Storm Data by the Event Type and Calculate the sum for Fatalities, Injuries seperately and the total of both together
## Filter the summary data that have positive values and gather them by the Impact type
## Select only the top 20 by the Total Impact and Impact Type
HealthImpactByStorm <-
StormData %>%
group_by(EVTYPE) %>%
summarise(Fatalities = sum(FATALITIES), Injuries = sum(INJURIES), TotalHealthImpact = sum(FATALITIES + INJURIES)) %>%
filter(TotalHealthImpact > 0) %>%
gather(ImpactType, ImpactByType, Fatalities:Injuries) %>%
arrange(desc(TotalHealthImpact) ) %>%
top_n(20, TotalHealthImpact)
## Group the Storm Data by the Event Type and Calculate the sum for Property, Crop Damage seperately and the total of both together
## Filter the summary data that have positive values and gather them by the Impact type
## Select only the top 20 by the Total Impact and Impact Type
EconomicImpactByStorm <-
StormData %>%
group_by(EVTYPE) %>%
summarise(PropertyDamage = sum(PROPDMG), CropDamage = sum(CROPDMG), TotalEconomicImpact = sum(PROPDMG + CROPDMG)) %>%
filter(TotalEconomicImpact > 0) %>%
gather(ImpactType, ImpactByType, PropertyDamage:CropDamage) %>%
arrange(desc(TotalEconomicImpact) ) %>%
top_n(20, TotalEconomicImpact)
Evaluating the graph below we observer the Tornados are by far the weather event that caused the most impact on the Health in terms of Injuries and fatalities in the US
ggplot(HealthImpactByStorm,aes(EVTYPE,ImpactByType, fill=ImpactType))+
geom_bar(position = "stack",stat = "identity")+
ggtitle("Impact of Storm on Health (Injuries or Fatalities) in US") +
xlab("Event Type") +
ylab("Total Impact") +
guides(fill=guide_legend(title="Impact Type")) +
theme(plot.title = element_text(lineheight=3, face="bold", color="black", size=13)
, axis.text.x = element_text(angle = 90)
, axis.title.x = element_text(size=10, face="bold")
, axis.title.y = element_text(size=10, face="bold")
, legend.title = element_text(size=10, face="bold")
)
Evaluating the graph below we observer the Tornados are by far the weather event that caused the most impact on the US Economy in terms of property and crop damage
ggplot(EconomicImpactByStorm,aes(EVTYPE,ImpactByType/1000, fill=ImpactType))+
geom_bar(position = "stack",stat = "identity")+
ggtitle("Impact of Storm on Economy") +
xlab("Event Type") +
ylab("Total Impact") +
guides(fill=guide_legend(title="Impact Type")) +
theme(plot.title = element_text(lineheight=3, face="bold", color="black", size=15)
, axis.text.x = element_text(angle = 90)
, axis.title.x = element_text(size=12, face="bold")
, axis.title.y = element_text(size=12, face="bold")
, legend.title = element_text(size=12, face="bold")
)