We are looking at the “Storm Data” from the NOAA Storm Database. We are trying to explore this data and answer the following questions:
Approach
To answer the first question we shall be first looking at the leading number of fatalities and identify the events causing them. Similary, look at leading number of injuries and identify events causing them.
To answer the second question we shall first clean the eco damage related columns, add up both poperty and crop damage and identify the top 5 events causing them
Storm <- read.csv("repdata%2Fdata%2FStormData.csv")
storm <- Storm[c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
storm_types <- aggregate(FATALITIES ~ EVTYPE,sum,data = storm)
# sorting the data on basis on number of fatalities
storm_types <- storm_types[order(storm_types$FATALITIES,decreasing = TRUE),]
#picking the top 10 events
storm_types10 <- storm_types[1:10,]
#plotting a barplot
qplot(EVTYPE,FATALITIES,data = storm_types10,color = EVTYPE,xlab = "Top Fatalities Causing Events",ylab="No. of Fatalities",main = "Most Harmful Events - Fatalities (1950 - 2011)")+theme(axis.text.x = element_text(angle = 60, hjust = 1))+geom_bar(stat = "identity")
Clearly we observe that “Tornado” is causing the most number of fatalities followed by “Excessive Heat”
storm_types1 <- aggregate(INJURIES ~ EVTYPE,sum,data = storm)
# sorting the data on basis on number of fatalities
storm_types1 <- storm_types1[order(storm_types1$INJURIES,decreasing = TRUE),]
#picking the top 10 events
storm_injuries10 <- storm_types1[1:10,]
#plotting a histogram
qplot(EVTYPE,INJURIES,data = storm_injuries10,color = EVTYPE,xlab = "Top Injuries Causing Events",ylab="No. of Injuries",main = "Most Harmful Events - Injuries (1950-2011)")+theme(axis.text.x = element_text(angle = 60, hjust = 1))+geom_bar(stat = "identity")
Even in this case “Tornado” causes the maximum damage by a way ahead of other events. It is followed by “Thunderstorms”,“Flood” and “Excessive Heat”
As the expenses have been represented with the use of two columns where the second column “PROPDMGEXP” indicates the exponential (on base of 10) we need to standardise all the values.
We shall be representing all vaues as Millions
storm$prop[storm$PROPDMGEXP == "0"]<- 1
storm$prop[storm$PROPDMGEXP == "1"]<- 10
storm$prop[storm$PROPDMGEXP == "2"|storm$PROPDMGEXP == "h"|storm$PROPDMGEXP == "H"]<- 100
storm$prop[storm$PROPDMGEXP == "3" | storm$PROPDMGEXP == "K"]<- 1000
storm$prop[storm$PROPDMGEXP == "4"]<- 10000
storm$prop[storm$PROPDMGEXP == "5"]<- 100000
storm$prop[storm$PROPDMGEXP == "6"|storm$PROPDMGEXP == "m"|storm$PROPDMGEXP == "M"]<- 1000000
storm$prop[storm$PROPDMGEXP == "7"]<- 10000000
storm$prop[storm$PROPDMGEXP == "8"]<- 100000000
storm$prop[storm$PROPDMGEXP == "B"]<- 1000000000
storm$prop[storm$PROPDMGEXP == ""|storm$PROPDMGEXP == "?"|storm$PROPDMGEXP == "+"|storm$PROPDMGEXP == "-"]<- 0
storm$prop <- as.integer(storm$prop)
storm$prop <- storm$prop/1000000
storm$PROPDMG <- as.integer(storm$PROPDMG)
storm$prop_expense <- storm$prop*storm$PROPDMG
storm$crop[storm$CROPDMGEXP == "0"]<- 1
storm$crop[storm$PROPDMGEXP == "2"]<- 100
storm$crop[storm$PROPDMGEXP == "B"]<- 1000000000
storm$crop[storm$PROPDMGEXP == "k"|storm$PROPDMGEXP == "K"]<- 1000
storm$crop[storm$PROPDMGEXP == "m"|storm$PROPDMGEXP == "M"]<- 1000000
storm$crop <- as.integer(storm$crop)
storm$crop <- storm$crop/1000000
storm$CROPDMG <- as.integer(storm$CROPDMG)
storm$crop_expense <- storm$crop*storm$CROPDMG
storm$total_exp <- storm$prop_expense + storm$crop_expense
storm_total_exp <-aggregate(total_exp ~ EVTYPE,sum,data = storm)
storm_total_exp <- storm_total_exp[order(storm_total_exp$total_exp,decreasing = TRUE),]
Well, lets have a look at the top 5 culprits of Property & Crop Damage
head(storm_total_exp,5)
## EVTYPE total_exp
## 170 HURRICANE 814035.95
## 178 HURRICANE/TYPHOON 796839.98
## 62 FLOOD 230614.68
## 331 TORNADO 82541.93
## 50 FLASH FLOOD 54734.72
qplot(EVTYPE,total_exp,data = storm_total_exp[1:5,],color = EVTYPE,xlab="Top Events causing economic loss",ylab = "Economic Loss in Millions",main = "Events causing economic loss to US 1950-2011")+theme(axis.text.x = element_text(angle = 60, hjust = 1))+geom_bar(stat = "identity")
Clearly “Hurricane” has caused lot of damage in term of economics for the country
From the above results we can summarize that
1.“Tornado” is causing the most number of fatalities & injuries followed by “Excessive Heat” in Fatalities & “Thunderstorms”,“Flood” and “Excessive Heat” in Injuries section.
2.The top 5 events causing econmic loss to the country are “Hurricane”,“Flood” & “Tornado” in descending order.