Wheather events bring severe impact to public health and econmoic impact in US. Understanding the type of event and its impact, could help reseachers to better make preparation before the impact and minimize the loss.
This report analyzes the wheather event data from 1950 to 2011, recorded by U.S. National Oceanic and Atmospheric Administration (NOAA) [The Storm Data, https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2]. We particularly focus on the type of event (EVTYPE), FATALITIES, INJURIES and property damage (PROPDMG), to find the wheather event make the most impact to public health and economy.
The wheather data is pre-downloaded to the respository for reading in R. Below is the code to read the data from the .bz2 format. Then, we explore the data and find the columns that we wanted: EVTYPE, FATALITIES, INJURIES and PROPDMG.
rawdata<-read.csv("repdata%2Fdata%2FStormData.csv.bz2")
mydata<-rawdata[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG")]
There are 985 different types of wheather events, with the respective columns show the death (FATALITIES), injured (INJURIES) and property loss (PROPDMG). And we also shows the first few sample records in the data.
str(mydata)
## 'data.frame': 902297 obs. of 4 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
head(mydata)
## EVTYPE FATALITIES INJURIES PROPDMG
## 1 TORNADO 0 15 25.0
## 2 TORNADO 0 0 2.5
## 3 TORNADO 0 2 25.0
## 4 TORNADO 0 2 2.5
## 5 TORNADO 0 2 2.5
## 6 TORNADO 0 6 2.5
To make the analysis easier to find the top 10 wheather events causing the fatalities, injuries, and property damages. We prepare 3 subset data namely: mydata_f, mydata_i and mydata_p.
mydata_f<-mydata[,c("EVTYPE","FATALITIES")]
mydata_i<-mydata[,c("EVTYPE","INJURIES")]
mydata_p<-mydata[,c("EVTYPE","PROPDMG")]
First we load the required library for the analysis, then we define the analysis function called “myplotfunc” to find the sum of each loss by wheather event. Finally, we pareto the top 10 events with the higher impact loss and plot the barchart using ggplot.
library(reshape2)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
myplotfunc <- function(plotdata,plottype){
melted <- melt(plotdata, id.vars=c("EVTYPE"))
grouped <- group_by(melted, EVTYPE)
sum_data<-summarise(grouped, sum=sum(value))
sum_data_order <- sum_data[order(sum_data$sum, decreasing=TRUE),]
plot_data<-head(sum_data_order,n=10)
plot_data$EVTYPE<-factor(plot_data$EVTYPE,ordered=TRUE,levels=plot_data$EVTYPE)
ggplot(plot_data,aes(EVTYPE, sum))+geom_bar(aes(fill=EVTYPE), stat="identity")+theme(axis.text.x = element_text(angle=60, hjust=1))+xlab("Wheather Event")+ylab(plottype)+ggtitle(paste(plottype,"vs. Wheather Event", sep = " "))
}
After the function is defined, now we could pass in the fatalities (mydata_f), injuries (mydata_i) and property damage (mydata_p) data into the function, to plot the top 10 wheather events.
myplotfunc(mydata_f,"Fatalities")
myplotfunc(mydata_i,"Injuries")
myplotfunc(mydata_p,"Property Damages")
From plots, we find that tornado causes the most impact on the fatalities, injuries and property damage among all the wheather events. Focusing the resource to prepare, minimize and prevent the tornado impact could help in minimize the impact on public health and economic consequences.