Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
#setting the working directory
username <- Sys.getenv('USERNAME') #getting the username in order to #create a path to the desktop and set it
directory <- paste('C:\\Users\\',username,'\\Desktop', sep='')
setwd(directory)
#creating a desktop directory Reproducible Research to store the data
if (!file.exists('./Reproducible Research')){
dir.create('./Reproducible Research')
}
setwd('./Reproducible Research')
dataurl<- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(dataurl,destfile = 'StormData.csv.bz2', mode='wb')
#reading the data (this might take a while)
dat <- read.csv("StormData.csv.bz2")
Grouping and summarizing the data
require(dplyr)
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 3.2.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
total.fatalities <- dat %>%
group_by(EVTYPE) %>%
summarize(Fatalities = sum(FATALITIES), Injuries = sum(INJURIES)) %>%
top_n(20, wt = Fatalities) %>%
print
## Source: local data frame [20 x 3]
##
## EVTYPE Fatalities Injuries
## (fctr) (dbl) (dbl)
## 1 AVALANCHE 224 170
## 2 BLIZZARD 101 805
## 3 EXCESSIVE HEAT 1903 6525
## 4 EXTREME COLD 160 231
## 5 EXTREME COLD/WIND CHILL 125 24
## 6 FLASH FLOOD 978 1777
## 7 FLOOD 470 6789
## 8 HEAT 937 2100
## 9 HEAT WAVE 172 309
## 10 HEAVY SNOW 127 1021
## 11 HIGH SURF 101 152
## 12 HIGH WIND 248 1137
## 13 LIGHTNING 816 5230
## 14 RIP CURRENT 368 232
## 15 RIP CURRENTS 204 297
## 16 STRONG WIND 103 280
## 17 THUNDERSTORM WIND 133 1488
## 18 TORNADO 5633 91346
## 19 TSTM WIND 504 6957
## 20 WINTER STORM 206 1321
fatalities <- total.fatalities[order(total.fatalities$Fatalities, decreasing = T),]
injuries <- total.fatalities[order(total.fatalities$Injuries, decreasing = T), ]
#Code for plots
par(mfrow=c(1,2))
barplot(fatalities$Fatalities, names.arg = fatalities$EVTYPE, main = 'Fatalities Count grouped by Event Type', ylab = 'Count of Fatalities', cex.names = .7,las=2)
barplot(injuries$Injuries, names.arg = fatalities$EVTYPE, main = 'Injuries Count grouped by Event Type', ylab = 'Count of Injuries', cex.names = .7,las=2, cex.axis = .7)
To address the question we created two barcharts showing the count of Fatalities (left) and Injuries (right). Using these charts we can visually identify the extreme weather conditions which cause deaths and Injury. We have included just the top 20 with the leading causes being Tornados and Excessive Heat.
#storing the data to a second dataframe
dat2 <- dat
#checking the values that need recoding so they can be used in our analysis
unique(dat2$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
All these values require recoding
#Alphabetical characters used to signify magnitude include "K" for thousands, "M" for millions, and "B" for billions
require(car)
## Loading required package: car
## Warning: package 'car' was built under R version 3.2.4
levels(dat2$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
dat2$PROPDMGEXP <- as.numeric(recode(as.character(dat2$PROPDMGEXP),
"'0'=1;'1'=10;'2'=10^2;'3'=10^3;'4'=10^4;'5'=10^5;'6'=10^6;'7'=10^7;'8'=10^8;'B'=10^9;'h'=10^2;'H'=10^2;'K'=10^3;'m'=10^6;'M'=10^6;'-'=0;'?'=0;'+'=0"))
levels(dat2$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
dat2$CROPDMGEXP <- as.numeric(recode(as.character(dat2$CROPDMGEXP),
"'2'=10^2;'B'=10^9;'K'=10^2;'k'=10^3;'m'=10^6;'M'=10^6;'?'=0;'0'=1"))
Now to Calculate the Damages in Dollars
#converting the values in dollars
dat2$PROPDMGDOLLARS <- dat2$PROPDMGEXP*dat2$PROPDMG
dat2$CROPDMGDOLLARS <- dat2$CROPDMGEXP*dat2$CROPDMG
total.dollars.prop <- dat2 %>% group_by(EVTYPE) %>%
summarize(Damages.Dollars= sum(PROPDMGDOLLARS)) %>%
top_n(15,Damages.Dollars) %>%
arrange(desc(Damages.Dollars)) %>%
print
## Source: local data frame [15 x 2]
##
## EVTYPE Damages.Dollars
## (fctr) (dbl)
## 1 TORNADOES, TSTM WIND, HAIL 1600000000
## 2 WILD FIRES 624100000
## 3 HAILSTORM 241000000
## 4 HIGH WINDS/COLD 110500000
## 5 River Flooding 106155000
## 6 MAJOR FLOOD 105000000
## 7 HURRICANE OPAL/HIGH WINDS 100000000
## 8 WINTER STORM HIGH WINDS 60000000
## 9 HURRICANE EMILY 50000000
## 10 Erosion/Cstl Flood 16200000
## 11 COASTAL FLOODING/EROSION 15000000
## 12 Heavy Rain/High Surf 13500000
## 13 LAKESHORE FLOOD 7540000
## 14 HIGH WINDS HEAVY RAINS 7500000
## 15 FLOODS 6000000
total.dollars.crop <- dat2 %>% group_by(EVTYPE) %>%
summarize(Damages.Dollars= sum(CROPDMGDOLLARS)) %>%
top_n(15,Damages.Dollars) %>%
arrange(desc(Damages.Dollars)) %>%
print
## Source: local data frame [16 x 2]
##
## EVTYPE Damages.Dollars
## (fctr) (dbl)
## 1 EXCESSIVE WETNESS 142000000
## 2 COLD AND WET CONDITIONS 66000000
## 3 Early Frost 42000000
## 4 Damaging Freeze 34103000
## 5 Freeze 10500000
## 6 HURRICANE OPAL/HIGH WINDS 10000000
## 7 UNSEASONAL RAIN 10000000
## 8 HIGH WINDS/COLD 5200000
## 9 Unseasonable Cold 5100000
## 10 COOL AND WET 5000000
## 11 WINTER STORM HIGH WINDS 5000000
## 12 TORNADOES, TSTM WIND, HAIL 2500000
## 13 Heavy Rain/High Surf 1500000
## 14 DUST STORM/HIGH WINDS 50000
## 15 FOREST FIRES 50000
## 16 TROPICAL STORM GORDON 50000
Code used for creating the plots
require(ggplot2)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.2.3
require(gridExtra)
## Loading required package: gridExtra
## Warning: package 'gridExtra' was built under R version 3.2.3
p <- ggplot(total.dollars.prop,aes(x=reorder(EVTYPE,-Damages.Dollars),y=Damages.Dollars) ) +
geom_bar(stat = 'identity') +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),axis.title.x=element_blank(), axis.title.y=element_blank())
c <- ggplot(total.dollars.crop,aes(x=reorder(EVTYPE,-Damages.Dollars),y=Damages.Dollars) ) +
geom_bar(stat = 'identity') +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),axis.title.x=element_blank(), axis.title.y=element_blank())
grid.arrange(p,c, top= 'Amounts ($) of economic damages caused by extreme weather events from 1950-2011 in USA')
To address the question we created two barcharts showing the ammount of economic damages on Properties (top) and Crops (bottom). Using these charts we can visually identify the extreme weather conditions. We have included just the top 15 with the leading cause for property damage being Tornados and for crops being Excessive wetness.