library("dplyr")
library("sqldf")
library("ggplot2")
library("colorspace")
library("plotrix")
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
In this poject we have analyzed strom dataset to answer following questions:
The data for this analysis come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. We can download the file from the course web site:
The documentation of the database available at:
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
#Download the data and save it into the working direcory and set the below path
setwd("G:/vimal/data science/JHU/RepResearch/RR2")
#Read the data and store it into the working data frame
stromdata <- read.csv('repdata-data-StormData.csv.bz2', header = TRUE)
#Extract the interested dat from the data frame and store it into the sd_req
sd_req <- stromdata %>% filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 |
PROPDMGEXP > 0 | CROPDMG > 0 |
CROPDMGEXP > 0 ) %>% select(STATE, EVTYPE, FATALITIES,
INJURIES, PROPDMG, PROPDMGEXP,
CROPDMG, CROPDMGEXP)
summary(sd_req)
## STATE EVTYPE FATALITIES
## TX : 22144 TSTM WIND :63234 Min. : 0.0
## IA : 16093 THUNDERSTORM WIND:43655 1st Qu.: 0.0
## OH : 13337 TORNADO :39944 Median : 0.0
## MS : 12023 HAIL :26130 Mean : 0.1
## GA : 11207 FLASH FLOOD :20967 3rd Qu.: 0.0
## AL : 11121 LIGHTNING :13293 Max. :583.0
## (Other):168708 (Other) :47410
## INJURIES PROPDMG PROPDMGEXP CROPDMG
## Min. : 0.0 Min. : 0 K :231428 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 2 : 11585 1st Qu.: 0.0
## Median : 0.0 Median : 5 M : 11320 Median : 0.0
## Mean : 0.6 Mean : 43 0 : 210 Mean : 5.4
## 3rd Qu.: 0.0 3rd Qu.: 25 B : 40 3rd Qu.: 0.0
## Max. :1700.0 Max. :5000 5 : 18 Max. :990.0
## (Other): 32
## CROPDMGEXP
## :152664
## K : 99932
## M : 1985
## k : 21
## 0 : 17
## B : 7
## (Other): 7
To answer this question we will consider consider both Injuries and Fatalities. We will sum them up according to the EVTYPE, plot a Barplot of Top 10 Events to get the Most Harmful natural calamity with respect to population health.
#Extacting the Fatalities and Injuries
sd_req <- group_by(sd_req, EVTYPE)
sum_human_ef <- summarise(sd_req,
FATALITIES = sum(FATALITIES),
INJURIES = sum(INJURIES),
tot = FATALITIES + INJURIES)
sum_human_ef <- arrange(sum_human_ef, desc(tot), desc(FATALITIES),desc(INJURIES))
#extract only top 10
top10_human_ef <- head(sum_human_ef, n = 10)
top_hev <- as.character(top10_human_ef[1,]$EVTYPE)
top_hval <- as.integer(top10_human_ef[1,]$tot)
#print the barplot
options(scipen=999)
par(las=1) # make label text perpendicular to axis
par(mar=c(6,10,4,2)) # increase y-axis margin.
bplt <- barplot(top10_human_ef$tot, horiz = T, col = heat.colors(10), names.arg = top10_human_ef$EVTYPE,
cex.names=0.8, xlim = c(0,110000), xlab = 'Number of Fatalities (Deaths + Injuries)',
main = 'Total Fatalities caused by different Natual Calamities')
text(x=top10_human_ef$tot, y= bplt , labels=as.character(top10_human_ef$tot), pos = 4)
From the above plot it is very clear that, the TORNADO is the most dangerous natural calmity in terms of human health, with 96979 fatalities.
To answer this question, we have munch the data a bit. In the original dataset we have two types of economic losses PROPDMG for property damage and CROPDMG for damage to crops. The amount is in not in actual units(USD). The exponents are given separately as PROPDMGEXP and CROPDMGEXP.
So first we will calculated the actual damage in USD, the addup PROPDMG and CROPDMG to get Total Damage. We use Barplot of the top 15 natual calmities to display the most dangerous natual calamityy in terms of economic consequences
#function to return the exponent of given data type
exponent.value <- function (cvec) sapply(cvec, function (c) switch (as.character(c), "B"=1e9, "b" = 1e9,
"M"=1e6, "m" = 1e6, "k" = 1e3, "K"=1e3, 1))
#calculate the actual value
sd_req <- mutate(sd_req, PROPVALNEW = PROPDMG*exponent.value(PROPDMGEXP),
CROPVALNEW=CROPDMG*exponent.value(CROPDMGEXP))
#calculate the sums
sum_eco_ef <- summarise(sd_req,
PROPVAL = sum(PROPVALNEW),
CROPVAL = sum(CROPVALNEW),
tot = PROPVAL + CROPVAL)
sum_eco_ef <- arrange(sum_eco_ef, desc(tot), desc(PROPVAL),desc(CROPVAL))
top10_eco_ef <- head(sum_eco_ef, n = 15)
top16 <- top10_eco_ef[15,]
top16$EVTYPE = 'OTHERS'
top16$PROPVAL = sum(sum_eco_ef$PROPVAL) - sum(top10_eco_ef$PROPVAL)
top16$CROPVAL = sum(sum_eco_ef$CROPVAL) - sum(top10_eco_ef$CROPVAL)
top16$tot = sum(sum_eco_ef$tot) -sum(top10_eco_ef$tot)
top16_eco_ef <- rbind(top10_eco_ef,top16)
top10_eco_ef <- top16_eco_ef
top10_eco_ef <- mutate(top10_eco_ef , tot_bn = round(tot/1000000000, 2))
top_eev <- as.character(top10_eco_ef[1,]$EVTYPE)
top_eval <- as.integer(top10_eco_ef[1,]$tot_bn)
#barplot of for the Natural Calamities
options(scipen=999)
par(las=1) # make label text perpendicular to axis
par(mar=c(6,10,4,2)) # increase y-axis margin.
bplt <- barplot(top10_eco_ef$tot_bn, horiz = T, col = heat.colors(16), names.arg = top10_eco_ef$EVTYPE,
cex.names=0.8, xlab = 'Quantum of Loss($ in Billion)',
main = 'Total Economic Loss caused by different Natual Calamities', xlim = c(0,180))
text(x=top10_eco_ef$tot_bn, y= bplt , labels=as.character(top10_eco_ef$tot_bn), pos = 4)
From the above plot it is very clear that the FLOOD has caused the greatest economic loss, with an amount grater than 150 billion USD.
The same thing can also be shown with a Pie Chart as Follows
pie3D(top10_eco_ef$tot_bn,labels = top10_eco_ef$EVTYPE, explode=0.1,labelcex = 0.6, start = 180,
height = .01, theta = .5, main = 'Pie Chart of Proportional Economic Loss due to natural Calamities', radius = 1.5)
With the above analysis we can conclude that: