The basic goal of this assignment is to explore the NOAA Storm Database and to define which sever weather events have been most harmful with respect to population health and to economy. We may argue that “Tornado” is by far the most fatal for the population and “Flood” and “Hurricane/Typhon” have the greatest economic consequences
Libraries used
library(dplyr)
library(car)
library(ggplot2)
Loading and preprocessing the data. Assuming that we have downloaded the repdata-data-StormData.csv.bz2 file then we unzipped it to our local pc we may run the following R command
###once we have downloaded the data and save it to the local disk
df<-read.csv(bzfile("repdata-data-StormData.csv.bz2"))
However in order to calculate the financial damage we need to do some calculations. The code below shows how the total damage in property and crop was calculated. The H is refered to hundreds th K to thousands the M to millions and the B to billions.
df$PROPDM_RECODE<-recode(df$PROPDMGEXP, " c('B', 'b')=1000000000; c('k', 'K')=1000; c('h', 'H')=100; c('m', 'M')=1000000; else=0")
df$CROPDM_RECODE<-recode(df$CROPDMGEXP, " c('B', 'b')=1000000000; c('k', 'K')=1000; c('h', 'H')=100; c('m', 'M')=1000000; else=0")
df$TOTAL_DM<- as.numeric(as.character(df$PROPDM_RECODE))*df$PROPDMG + as.numeric(as.character(df$CROPDM_RECODE))*df$CROPDMG
In order to specify which severe weather events are most harmful with respect to population health we will focus solely on the fatalities ingoring the injuries. However as expected there is positive correlation between fatalies and injuries.
q1df<-select(df, EVTYPE, FATALITIES, INJURIES) %>% group_by(EVTYPE) %>%
summarize(Fatalities=sum(FATALITIES), Injuries=sum(INJURIES) ) %>% arrange(desc(Fatalities))
qplot(EVTYPE, Fatalities, data=head(q1df,5), geom="bar", stat="identity", main="Total Fatalities by Event", ylab="Fatalities", xlab="Event")
From the Bar Plot we can see that “Tornado” is by far the most fatal for the population
In order to specify which types of events have the greatest economic consequences we include the Property and the Corp Damage. Here I have cleaned the data, doing some notifications in PROPDMGEXP and CROPDMGEXP which have been used to calculate the Total Financial Damage
q2df<-select(df, EVTYPE, TOTAL_DM) %>% group_by(EVTYPE) %>% summarize(Total_Damage=sum(TOTAL_DM) ) %>% arrange(desc(Total_Damage))
qplot(EVTYPE, Total_Damage, data=head(q2df,5), geom="bar", stat="identity", main="Total Financial Damage by Event in $", ylab="Total Damage", xlab="Event")
From the Bar Plot we can see that “Flood” and “Hurricane/Typhon” have the greatest economic consequences