Analysing the US NOAA storm database, which captures data from 1951 to Nov 2011, to arrive at which “EVTYPE” (Event Type) causes more harm(Fatalities and Injuries) and more economic damage (Property & Crop)
library(dplyr)
We analyse Storm Events Database. We start by downloading the zipped data file(csv.bz2) and read it into R and name the file as ‘storm’.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","Storm_data.csv.bz2")
storm<-read.csv("Storm_data.csv.bz2",sep=",",header=TRUE,stringsAsFactors = FALSE)
Upon veiwing the data, it is clear that we need 8th column (EVTYPE) and 23rd to 28th columns (FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) to answer the questions asked
Across the United States, which types of events (as indicated in the EVTYPE) are most harmful with respect to population health? As stated earlier, we need 8th column (EVTYPE) and 23rd column (FATALITIES) & 24th column (INJURIES) to find the answer.Hence we select those columns and then group it by EVTYPEs.From the resultant data we summarise by summing up Fatalities and Injuries. Subsequent to that we arrange the sum of Fatalities and Injuries, which is an actual indicator of the harmfulness wreaked upon these natural disaster EVTYPEs.We arrange the result in descending oderand select the top 20 by slciing it.
Table.Harmful<-storm[ ,c(8,23:24)]%>%
group_by(EVTYPE)%>%
summarise(FT=sum(FATALITIES),IJ=sum(INJURIES))%>%
arrange(desc(FT+IJ))%>%
slice(1:20)
Across the United States, which types of events have the greatest economic consequences? We need 8th column (EVTYPE) and 25th to 28th columns(PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) to find the solution for the above question. Hence we form a subset (econ) by selecting the required columns and we bring in the powerful combination of ‘mutate’ and ‘ifelse’ functions to replace the values of “H”,“T”,“M” & “B” with 100(hundred),1000(thousand),1,000,000(million) & 1,000,000,000(billion) respectively.We then group it by EVTYPE, as we did for question 1.
econ<-storm[ ,c(8,25:28)]%>%
mutate(PR_DMG_VAL = PROPDMG*ifelse(PROPDMGEXP=="H"|PROPDMGEXP=="h",10^2,
ifelse(PROPDMGEXP=="K"|PROPDMGEXP=="k",10^3,
ifelse(PROPDMGEXP=="M"|PROPDMGEXP=="m",10^6,
ifelse(PROPDMGEXP=="B"|PROPDMGEXP=="b",10^9,0)))),
CR_DMG_VAL = CROPDMG*ifelse(CROPDMGEXP=="H"|CROPDMGEXP=="h",10^2,
ifelse(CROPDMGEXP=="K"|CROPDMGEXP=="k",10^3,
ifelse(CROPDMGEXP=="M"|CROPDMGEXP=="m",10^6,
ifelse(CROPDMGEXP=="B"|CROPDMGEXP=="b",10^9,0)))),
PRP_CRP_DMG_VAL_IN_BILLION_USD= (PR_DMG_VAL+CR_DMG_VAL)/10^9) %>%
group_by(EVTYPE) %>%
summarise(TOTAL_DMG_VAL_IN_BILLION_USD=sum(PRP_CRP_DMG_VAL_IN_BILLION_USD)) %>%
arrange(desc(TOTAL_DMG_VAL_IN_BILLION_USD))%>%
slice(1:20)
FOR Q1
As is evident from the barplot graph below, it is the Tornado which causes the most harm
barplot(t(Table.Harmful[,-1]),
names.arg = Table.Harmful$EVTYPE,
ylim=c(0,95000),
beside=TRUE,cex.names=0.8,
las=2,col=c("blue","red"),
main ="Total Disaster Casualties")
legend("topright",c("Fatalities","Injuries"),fill=c("blue","red"))
For Q2
As is evident from the barplot graph below, itis the Flood which causes the most economical damage.
barplot(t(econ[,-1]),
names.arg = econ$EVTYPE,
ylim=c(0,200),
beside=TRUE,cex.names=0.8,
las=2,col="red",ylab="Cost in Billion USD",
main ="Total Property and Crop Damage")