This report contains analysis done on the NOAA Storm Events data. The raw data starts from year 1950 to November 2011, with 902297 observations and 37 variables. However, only 7 variables are relevant to the analysis. The following displays how the data was cleaned, re-orderd and analyzed to answer the following questions:
In both the cases, top 10 events are displayed. Nevertheless, the direct conclusions that can be drawn are:
The data was downloaded from the website, unzipped and stored in the variable “storm”.
#The follwoing code runs to check if the file is present. If not, then is downloaded
if(!file.exists("repdata%2Fdata%2FStormData.csv.bz2")){
temp <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", temp)
unzip(temp)
unlink(temp)
}
#The dataset is then stored in storm
storm <- read.csv("repdata%2Fdata%2FStormData.csv")
These are the variables selected for the analysis:
#Using the dplyr package, the important variables were selected and the dataset was stored in dat
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dat <- select(storm, EVTYPE, FATALITIES,
INJURIES, PROPDMG, PROPDMGEXP, CROPDMG,CROPDMGEXP)
Economic damage is classified by two categories:
The damage can be calculated by multiplying the PROPDMG with the 10PROPDMGEXP. However, the PROPDMGEXP required a proper numerical assignment. It was done in the following way.
#A dmgexp dataframe was created with two columns. One contained the variables present in the PROPDMGEXP, including the NA, "?", "+", and "=". The other contained the corresponding numerical anssignment of the exponenent.
dmgexp<-cbind(c("K", "M", "B", "0", "5", "6", "4", "2", "3", "H", "7", "1", "8","-", ""), c(3, 6, 9, 0, 5, 6, 4, 2, 3, 2, 7, 1, 8, 1, 1))
dmgexp <- as.data.frame(dmgexp)
#The variables were named accordingly, numerically assigning the PROPDMGEXP to a new variable #PRPEXP
names(dmgexp) <- c("PROPDMGEXP", "PRPEXP")
#The matching of dmgexp and dat datset resulted in the new variable, PRPEXP, being created an assigned according to PROPDMGEXP in the dat dataset
dat$PRPEXP <- dmgexp$PRPEXP[match(dat$PROPDMGEXP, dmgexp$PROPDMGEXP)]
#A similar operation was carried out for CROPDMGEXP
names(dmgexp) <- c("CROPDMGEXP", "CRPEXP")
dat$CRPEXP <- dmgexp$CRPEXP[match(dat$CROPDMGEXP, dmgexp$CROPDMGEXP)]
#The new variables were changed from factor to numeric class
dat$CRPEXP <- as.numeric(dat$CRPEXP)
dat$PRPEXP <- as.numeric(dat$PRPEXP)
Total economic damage was calculated by adding Property and Crop damage.
#Total damage was stored under a new variable,TOTALDMG, in the dat dataset
dat$TOTALDMG <- dat$PROPDMG * 10^dat$PRPEXP + dat$CROPDMG * 10^dat$CRPEXP
The following bar plots exhibit the top 10 events that cause the highest fatalities.
#Data was aggregated based on fatalities and event type
fatalities <- aggregate(FATALITIES ~ EVTYPE, dat, sum)
#The fatalities were arranged in descending order and the top 10 were filtered out
fatalities <- arrange(fatalities, desc(FATALITIES))
top10_fatalities <- fatalities[1:10,]
#Using ggplot2 package to plot the barplot
library(ggplot2)
ggplot(top10_fatalities, aes(x = EVTYPE, y = FATALITIES)) +
geom_bar(stat = "identity", fill = "blue", col= "black") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENT TYPE") + ylab("Fatalities") +
ggtitle("Number of Fatalities by Top 10 Weather Events")
Similarly, the following bar plots exhibit the top 10 events that cause the highest injuries.
#Data was aggregated based on injusries and event type
injuries <- aggregate(INJURIES ~ EVTYPE, dat, sum)
#The fatalities were arranged in descending order and the top 10 were filtered out
injuries <- arrange(injuries, desc(INJURIES))
top10_injuries <- injuries[1:10,]
#Using ggplot2 package to plot the barplot
ggplot(top10_injuries, aes(x = EVTYPE, y = INJURIES)) +
geom_bar(stat = "identity", fill = "red", col= "black") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENT TYPE") + ylab("Injuries") +
ggtitle("Number of Injuries by Top 10 Weather Events")
It is seen that tornado is seen to cause the most injuries and fatalities.
The bar plot below states the top 10 weather events that have caused maximum damage.
economic <- aggregate(TOTALDMG ~ EVTYPE, dat, sum)
economic <- arrange(economic, desc(TOTALDMG))
top10_totaldamage <- economic[1:10,]
ggplot(top10_totaldamage, aes(x = EVTYPE, y = TOTALDMG)) +
geom_bar(stat = "identity", fill = "dark green", col= "black") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENT TYPE") + ylab("Total Cost of Damage") +
ggtitle("Adverse Economic Consequence by Top 10 Weather Events")
The plot indicates that floods have the greatest economic consequence with the highest property and crop damage put together.
In summary, the analysis has shown that tornado caused the maximum number of both, fatalities and injuries. Floods had the greatest economic impact. Second major events that caused the maximum damage was Hurricanes/Typhoons.