The Severe Weather data was downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and extracted as a csv file.
The data was subsetted to include only necessary columns and only rows in which the variables for harmful effects to the human population and economic consequences are not 0 (FATALTIES, INJURIES, PROPDMG, CROPDMG).
All the wrong event names were replaced with the correct names.
Then, the PROPDMG and CROPDMG variables were converted to $ amounts and 2 new columns were created. The two columns were added to get the total damage amount and it was added as a new column.
Then the data was grouped by Events (EVTYPE) and the sums of the damages (to crop and property) and the impact on health (fatalities and injuries) were plotted to analyze the trends.
It was concluded that
The data was downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and extracted as a csv file called StormData.csv and was read into R. Then the data set was subsetted to include only neccessary columns. All rows where the Fatalities, Injuries, Property Damage and Crop Damage were 0, were removed.
#Read in the csv file
storm <- read.csv("StormData.csv")
#subset and take only needed columns
library(dplyr)
stormsub <- storm[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
stormsub <- filter(stormsub, FATALITIES != 0, INJURIES!=0, PROPDMG!=0, CROPDMG!=0)
Replace all wrong event names with standard names. Standard names were obtained from the document at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
stormsub$EVTYPE[stormsub$EVTYPE=="EXCESSIVE HEAT"] <- "HEAT"
stormsub$EVTYPE[stormsub$EVTYPE=="HEAT WAVE DROUGHT"] <- "DROUGHT"
stormsub$EVTYPE[stormsub$EVTYPE=="THUNDERSTORM WINDS"] <- "THUNDERSTORM WIND"
stormsub$EVTYPE[stormsub$EVTYPE=="TROPICAL STORM GORDON"] <- "TROPICAL STORM"
stormsub$EVTYPE[stormsub$EVTYPE=="TSTM WIND"] <- "THUNDERSTORM WIND"
stormsub$EVTYPE[stormsub$EVTYPE=="WINTER STORM HIGH WINDS"] <- "WINTER STORM"
stormsub$EVTYPE[stormsub$EVTYPE=="WINTER STORMS"] <- "WINTER STORM"
stormsub$EVTYPE[stormsub$EVTYPE=="HIGH WINDS"] <- "HIGH WIND"
stormsub$EVTYPE[stormsub$EVTYPE=="HURRICANE"] <- "HURRICANE/TYPHOON"
Create new columns for Property Damage amount and Crop Damage amount using the number in the these columns and the exponential column.
#Property Damage
stormsub$propamt[which(stormsub$PROPDMGEXP=="M")] <- stormsub$PROPDMG[which(stormsub$PROPDMGEXP=="M")] * 1000000
stormsub$propamt[which(stormsub$PROPDMGEXP=="K")] <- stormsub$PROPDMG[which(stormsub$PROPDMGEXP=="K")] * 1000
stormsub$propamt[which(stormsub$PROPDMGEXP=="B")] <- stormsub$PROPDMG[which(stormsub$PROPDMGEXP=="B")] * 1000000000
#Crop damage
stormsub$cropamt[which(stormsub$CROPDMGEXP=="M")] <- stormsub$CROPDMG[which(stormsub$CROPDMGEXP=="M")] * 1000000
stormsub$cropamt[which(stormsub$CROPDMGEXP=="K")] <- stormsub$CROPDMG[which(stormsub$CROPDMGEXP=="K")] * 1000
stormsub$cropamt[which(stormsub$CROPDMGEXP=="B")] <- stormsub$CROPDMG[which(stormsub$CROPDMGEXP=="B")] * 1000000000
Create total column for the damages
stormsub$totalamt <- stormsub$propamt + stormsub$cropamt
The results are shown as plots below.
Draw the plot for the damages
library(ggplot2)
group <- group_by(stormsub, EVTYPE)
damagesumm <- summarize(group, sum(totalamt))
damagesumm <- as.data.frame(damagesumm)
names(damagesumm) <- c("event","amount")
ggplot(damagesumm, aes(event,amount)) + geom_bar(stat="identity") + labs(title="Damage amounts for events", x="Events", y="Damage Amounts") + theme(axis.text.x=element_text(angle=90,hjust=1))
The plot shows that Hurricanes/Typhoons damages cost the most.
Draw the plot for the Fatalities
fatalities <- as.data.frame(summarize(group, sum(FATALITIES)))
names(fatalities) <- c("event","amount")
ggplot(fatalities, aes(event,amount)) + geom_bar(stat="identity") + labs(title="Fatalities for events", x="Events", y="Fatalities") + theme(axis.text.x=element_text(angle=90,hjust=1))
The plot shows that Tornadoes cause the most fatalities.
Draw the plot for Injuries
injuries <- as.data.frame(summarize(group, sum(INJURIES)))
names(injuries) <- c("event","amount")
ggplot(injuries, aes(event,amount)) + geom_bar(stat="identity") + labs(title="Injuries for events", x="Events", y="Injuries") + theme(axis.text.x=element_text(angle=90,hjust=1))
The plot shows that Floods cause the most injuries.