This data comes from the National Weather Service Storm Data Docuentation. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
We initially load in the data
setwd("C:/Users/Bobby/Google Drive/Data Science/R Files/4) Reproducible Reseach/Assignment 2")
Data <- read.csv("repdata-data-StormData.csv", header=T, sep=",")
Firstly we clean the data and the date, the data before 1995 is messy so we can remove it for analysis. Then we aggregate the fatalities and injuries by event, then we order the values sorting by highest for analysis and take the top 10.
Data$BGN_DATE <- as.Date(Data$BGN_DATE, "%m/%d/%Y")
Data$BGN_DATE <- as.numeric(format(Data$BGN_DATE,"%Y"))
Data <- Data[which(Data$BGN_DATE>1995),]
Data <- Data[which(Data$INJURIES>0),]
FatalPlot <- aggregate(Data$FATALITIES, by=list(Data$EVTYPE), FUN="sum")
InjurePlot <- aggregate(Data$INJURIES, by=list(Data$EVTYPE), FUN="sum")
FatalPlot <- FatalPlot[order(FatalPlot$x, decreasing=T),]
FatalPlot <- FatalPlot[1:10,]
InjurePlot <- InjurePlot[order(InjurePlot$x, decreasing=T),]
InjurePlot <- InjurePlot[1:10,]
Once the fatalities and injuries are plotted we move to outlining the economic damage, to do this we need to clean the data by calculating the cost, this can be done by changing the K=thousands, M=millions and B=billions into values, this can be used to calculate the property and crop damage for each event.
Data$PROPDMGEXP <- as.character(Data$PROPDMGEXP)
Data$PROPDMGEXP = gsub("\\-|\\+|\\?|h|H|0","0",Data$PROPDMGEXP)
Data$PROPDMGEXP = gsub("k|K", "1000", Data$PROPDMGEXP)
Data$PROPDMGEXP = gsub("m|M", "1000000", Data$PROPDMGEXP)
Data$PROPDMGEXP = gsub("b|B", "1000000000", Data$PROPDMGEXP)
Data$PROPDMGEXP <- as.numeric(Data$PROPDMGEXP)
Data$PROPDMGEXP[is.na(Data$PROPDMGEXP)] = 0
Data$CROPDMGEXP <- as.character(Data$CROPDMGEXP)
Data$CROPDMGEXP = gsub("\\-|\\+|\\?|h|H|0","0",Data$CROPDMGEXP)
Data$CROPDMGEXP = gsub("k|K", "1000", Data$CROPDMGEXP)
Data$CROPDMGEXP = gsub("m|M", "1000000", Data$CROPDMGEXP)
Data$CROPDMGEXP = gsub("b|B", "1000000000", Data$CROPDMGEXP)
Data$CROPDMGEXP <- as.numeric(Data$CROPDMGEXP)
Data$CROPDMGEXP[is.na(Data$CROPDMGEXP)] = 0
Data <- mutate(Data, Property = PROPDMG * PROPDMGEXP, Crops = CROPDMG * CROPDMGEXP)
Then we do the same as previously and aggregate the data and take the top 10 for economic impact for both property and crop, the econ plot is ordered as a factor so the plot will be ordered by highest.
Data$EconDmg <- Data$Property+Data$Crops
EconPlot <- aggregate(Data$EconDmg, by=list(Data$EVTYPE), FUN="sum")
EconPlot <- EconPlot[order(EconPlot$x, decreasing=T),]
EconPlot <- EconPlot[1:10,]
EconPlot$Group.1 <- factor(EconPlot$Group.1, as.character(EconPlot$Group.1))
FatalPlot$Group.1 <- factor(FatalPlot$Group.1, as.character(FatalPlot$Group.1))
g <- ggplot(FatalPlot, aes(x=factor(FatalPlot$Group.1), y=FatalPlot$x))
g + geom_bar(stat="identity") + labs(x="Weather Event") + labs(y="No. Fatalities") + labs(title="Fatalities from Top 10 Extreme Weather Events in 1995-2013") + theme_classic() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
InjurePlot$Group.1 <- factor(InjurePlot$Group.1, as.character(InjurePlot$Group.1))
g <- ggplot(InjurePlot, aes(x=factor(InjurePlot$Group.1), y=InjurePlot$x))
g + geom_bar(stat="identity") + labs(x="Weather Event") + labs(y="No. Injuries") + labs(title="Injuries from Top 10 Extreme Weather Events in 1995-2013") + theme_classic() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
g <- ggplot(EconPlot, aes(x=factor(EconPlot$Group.1), y=EconPlot$x/1000000000))
g + geom_bar(stat="identity") + labs(x="Weather Event") + labs(y="Economical Damage (Billions $)") + labs(title="Economical Damage Top 10 from Extreme Weather Events in 1995-2013") + theme_classic() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
In conclusion Tornado’s have the greatest impact to heath, having the highest fatalities and injuries, they are also the highest in economical damage followed closely by Hurricans/Typhoons.