Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.


Reading the Data

#setting the working directory
username <- Sys.getenv('USERNAME') #getting the username in order to #create a path to the desktop and set it
directory <- paste('C:\\Users\\',username,'\\Desktop', sep='')
setwd(directory)

#creating a desktop directory Reproducible Research to store the data
if (!file.exists('./Reproducible Research')){       
      dir.create('./Reproducible Research')
}
setwd('./Reproducible Research')



dataurl<- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(dataurl,destfile = 'StormData.csv.bz2', mode='wb')

#reading the data (this might take a while)
dat <- read.csv("StormData.csv.bz2")

Q1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Data Processing

Grouping and summarizing the data

require(dplyr)
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 3.2.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
total.fatalities <- dat %>%
      group_by(EVTYPE) %>%
      summarize(Fatalities = sum(FATALITIES), Injuries = sum(INJURIES)) %>%
      top_n(20, wt = Fatalities) %>%
      print
## Source: local data frame [20 x 3]
## 
##                     EVTYPE Fatalities Injuries
##                     (fctr)      (dbl)    (dbl)
## 1                AVALANCHE        224      170
## 2                 BLIZZARD        101      805
## 3           EXCESSIVE HEAT       1903     6525
## 4             EXTREME COLD        160      231
## 5  EXTREME COLD/WIND CHILL        125       24
## 6              FLASH FLOOD        978     1777
## 7                    FLOOD        470     6789
## 8                     HEAT        937     2100
## 9                HEAT WAVE        172      309
## 10              HEAVY SNOW        127     1021
## 11               HIGH SURF        101      152
## 12               HIGH WIND        248     1137
## 13               LIGHTNING        816     5230
## 14             RIP CURRENT        368      232
## 15            RIP CURRENTS        204      297
## 16             STRONG WIND        103      280
## 17       THUNDERSTORM WIND        133     1488
## 18                 TORNADO       5633    91346
## 19               TSTM WIND        504     6957
## 20            WINTER STORM        206     1321
fatalities <- total.fatalities[order(total.fatalities$Fatalities, decreasing = T),]
injuries <- total.fatalities[order(total.fatalities$Injuries, decreasing = T), ]

Results:

#Code for plots
par(mfrow=c(1,2))

barplot(fatalities$Fatalities, names.arg = fatalities$EVTYPE, main = 'Fatalities Count grouped by Event Type', ylab = 'Count of Fatalities', cex.names =  .7,las=2)

barplot(injuries$Injuries, names.arg = fatalities$EVTYPE, main = 'Injuries Count grouped by Event Type', ylab = 'Count of Injuries', cex.names =  .7,las=2, cex.axis = .7)

To address the question we created two barcharts showing the count of Fatalities (left) and Injuries (right). Using these charts we can visually identify the extreme weather conditions which cause deaths and Injury. We have included just the top 20 with the leading causes being Tornados and Excessive Heat.

—————————————————————-

Q2.Across the United States, which types of events have the greatest economic consequences?

Data Processing

#storing the data to a second dataframe
dat2 <- dat

#checking the values that need recoding so they can be used in our analysis
unique(dat2$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

All these values require recoding

#Alphabetical characters used to signify magnitude include "K" for thousands, "M" for millions, and "B" for billions
require(car)
## Loading required package: car
## Warning: package 'car' was built under R version 3.2.4
levels(dat2$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
dat2$PROPDMGEXP <- as.numeric(recode(as.character(dat2$PROPDMGEXP), 
    "'0'=1;'1'=10;'2'=10^2;'3'=10^3;'4'=10^4;'5'=10^5;'6'=10^6;'7'=10^7;'8'=10^8;'B'=10^9;'h'=10^2;'H'=10^2;'K'=10^3;'m'=10^6;'M'=10^6;'-'=0;'?'=0;'+'=0"))


levels(dat2$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"
dat2$CROPDMGEXP <- as.numeric(recode(as.character(dat2$CROPDMGEXP), 
    "'2'=10^2;'B'=10^9;'K'=10^2;'k'=10^3;'m'=10^6;'M'=10^6;'?'=0;'0'=1"))

Now to Calculate the Damages in Dollars

#converting the values in dollars
dat2$PROPDMGDOLLARS  <- dat2$PROPDMGEXP*dat2$PROPDMG
dat2$CROPDMGDOLLARS  <- dat2$CROPDMGEXP*dat2$CROPDMG

total.dollars.prop <- dat2 %>% group_by(EVTYPE) %>%
      summarize(Damages.Dollars= sum(PROPDMGDOLLARS)) %>%
      top_n(15,Damages.Dollars) %>%
      arrange(desc(Damages.Dollars)) %>%
      print
## Source: local data frame [15 x 2]
## 
##                        EVTYPE Damages.Dollars
##                        (fctr)           (dbl)
## 1  TORNADOES, TSTM WIND, HAIL      1600000000
## 2                  WILD FIRES       624100000
## 3                   HAILSTORM       241000000
## 4             HIGH WINDS/COLD       110500000
## 5              River Flooding       106155000
## 6                 MAJOR FLOOD       105000000
## 7   HURRICANE OPAL/HIGH WINDS       100000000
## 8     WINTER STORM HIGH WINDS        60000000
## 9             HURRICANE EMILY        50000000
## 10         Erosion/Cstl Flood        16200000
## 11  COASTAL  FLOODING/EROSION        15000000
## 12       Heavy Rain/High Surf        13500000
## 13            LAKESHORE FLOOD         7540000
## 14     HIGH WINDS HEAVY RAINS         7500000
## 15                     FLOODS         6000000
total.dollars.crop <- dat2 %>% group_by(EVTYPE) %>%
      summarize(Damages.Dollars= sum(CROPDMGDOLLARS)) %>%
      top_n(15,Damages.Dollars) %>%
      arrange(desc(Damages.Dollars)) %>%
      print
## Source: local data frame [16 x 2]
## 
##                        EVTYPE Damages.Dollars
##                        (fctr)           (dbl)
## 1           EXCESSIVE WETNESS       142000000
## 2     COLD AND WET CONDITIONS        66000000
## 3                 Early Frost        42000000
## 4             Damaging Freeze        34103000
## 5                      Freeze        10500000
## 6   HURRICANE OPAL/HIGH WINDS        10000000
## 7             UNSEASONAL RAIN        10000000
## 8             HIGH WINDS/COLD         5200000
## 9           Unseasonable Cold         5100000
## 10               COOL AND WET         5000000
## 11    WINTER STORM HIGH WINDS         5000000
## 12 TORNADOES, TSTM WIND, HAIL         2500000
## 13       Heavy Rain/High Surf         1500000
## 14      DUST STORM/HIGH WINDS           50000
## 15               FOREST FIRES           50000
## 16      TROPICAL STORM GORDON           50000

Results:

Code used for creating the plots

require(ggplot2)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.2.3
require(gridExtra) 
## Loading required package: gridExtra
## Warning: package 'gridExtra' was built under R version 3.2.3
p <- ggplot(total.dollars.prop,aes(x=reorder(EVTYPE,-Damages.Dollars),y=Damages.Dollars) ) + 
      geom_bar(stat = 'identity') +
      theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),axis.title.x=element_blank(), axis.title.y=element_blank()) 


c <- ggplot(total.dollars.crop,aes(x=reorder(EVTYPE,-Damages.Dollars),y=Damages.Dollars) ) + 
      geom_bar(stat = 'identity') +
      theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),axis.title.x=element_blank(), axis.title.y=element_blank())



grid.arrange(p,c, top= 'Amounts ($) of economic damages caused by extreme weather events from 1950-2011 in USA') 

To address the question we created two barcharts showing the ammount of economic damages on Properties (top) and Crops (bottom). Using these charts we can visually identify the extreme weather conditions. We have included just the top 15 with the leading cause for property damage being Tornados and for crops being Excessive wetness.