1 Synopsis

This project analyzed the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine the effects of weather events on US population and economy. Impact on the populous, measured in injuries and fatalities, was caused by similar weather event patterns, with Tornados inflicting the harshest toll. Economic impact, measured in crop and property damage, followed a very different pattern of weather events, with Floods causing the largest total damage(Property damage estimates should be entered as actual dollar amounts, if a reasonably accurate estimate from an insurance company or other qualified individual is available. If this estimate is not available, then the preparer has two choices: either check the “no information available” box, or make an estimate. The exception is for flood events. The Storm Data preparer must enter monetary damage amounts for flood events, even if it is a “guesstimate.” The U.S. Army Corps of Engineers requires the NWS to provide monetary damage amounts (property and/or crop) resulting from any flood event. )

2.0 Assignment & Structure

Storms and other severe weather events can cause both public health and economic problems. Severe events can result in fatalities, injuries, and property damage. The prevention of such outcomes is a key concern. This project explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

For this analysis - Data The database is located in the website: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and details on its content are presented in the following documents: National Weather Service Storm Data Documentation https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf National Climatic Data Center Storm Events FAQ https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

2.1 Data Processing

The data for this assignment came in the form of a comma-separated-value file (‘csv’) compressed via the bzip2 algorithm to reduce its size. The data was downloaded from the Coursera site as well as additional documentation about the database itself. Additional documentatin includes .National Weather Service Storm Data Documentation .National Climatic Data Center Storm Events FAQ

Page 12 of the manual outlines the factor levels of the exponent, converted into numericals at a later point in the assignment. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database fewer events recorded, most likely due to a lack of good records keeping. More recent years should be considered more complete. ## Loading Data & Subsetting The initial data set was loaded into R. The raw data set remained unchanged (Raw_Storm.Data), data manipulation was completed on the subset Storm.Data.

library(knitr)
library(plyr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(DT)
knitr::opts_chunk$set(echo = TRUE)

setwd("C:/Users/damjan/Desktop/Course Projects/StormDataProject2")
if(!file.exists("StormData.csv.bz2"))
   download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
   destfile="StormData.csv.bz2")

if(!file.exists("StormData.csv"))
   bzfile("StormData.csv.bz2")
##         description               class                mode 
## "StormData.csv.bz2"            "bzfile"                "rb" 
##                text              opened            can read 
##              "text"            "closed"               "yes" 
##           can write 
##               "yes"
## Reading of the raw data from bzfile
Raw.Storm.Data<- read.csv("StormData.csv.bz2")

## Subsetting data using dplyr for Analysis - Raw Data remains untouched
Storm.Data<- select(Raw.Storm.Data,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,  CROPDMG,CROPDMGEXP)
head(Storm.Data)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Cleaning and transforming / flitering the data.

typeevents <- ddply(Storm.Data, ~EVTYPE, count)
ntypeevents <- nrow(typeevents)
# converting events to upper case letters

Storm.Data$EVTYPE <- toupper(Storm.Data$EVTYPE)

Filter data

stormdata <- Storm.Data[Storm.Data$STATE != "" & Storm.Data$COUNTY != "",]

2.2 Calculating Property Damage Values

As indicated above, Page 12 of “NationalWeatherService_Project2.pdf” explains Damage Exponent Values, stored as a Factor in a combination of numeric and alphanumeric values. Once the levels of the exponent were identified, they were converted into numerical values. Total property damage in $ was simply calculated by multiplying DMG data with modfified EXP. (Also this process below will convert the property exponent values as well)

majorhealthevents <- ddply(Storm.Data, ~EVTYPE, summarise, fatalities = sum(FATALITIES), injuries = sum(INJURIES))
topfatalevents <- top_n(majorhealthevents, 25, fatalities)

2.3 Summarizing Fatality Data by Weather Event Type

fatalities <- aggregate(Storm.Data$FATALITIES, by = list(EVTYPE = Storm.Data$EVTYPE), sum)
fatalities <- fatalities[order(fatalities$x, decreasing = TRUE), ]
datatable(head(fatalities), rownames = head(LETTERS))

2.5 Summarizing Injury Data by Weather Event Type

Dplyr was also used to group, summarize, and rank the data in the df Sum.Injuries. Again, The data had to be ordered in descending order using the order comand, so that the data can be displayed in descending order in ggplot.

## Summarizing Injuries
injuries <- aggregate(Storm.Data$INJURIES, by = list(EVTYPE = Storm.Data$EVTYPE), sum)
injuries <- injuries[order(injuries$x, decreasing = TRUE), ]
datatable(head(injuries), rownames = head(LETTERS))

2.6 Modifying data for Plot 3

To find out which types of events have the greatest economic consequences the following strategy was chosen: -the summation of property and crop damage, was calculated to see the total damage. -the data was ranked, and the top 15 data points were selected using dplyr -and then the data was ordered using the order comand

datasub <- Storm.Data[, c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
datasub[datasub$PROPDMGEXP %in% c("", "+", "-", "?"), "PROPDMGEXP"] <- "0"
datasub[datasub$CROPDMGEXP %in% c("", "+", "-", "?"), "CROPDMGEXP"] <- "0"
unique(c(datasub$PROPDMGEXP, datasub$CROPDMGEXP))
##  [1] 17 19  5 14 18 10 11  9  7  8 15 12 16  6 13  3  4
# Now combine the exponent with the value
datasub$PROPDMGEXP <- 10^(as.numeric(datasub$PROPDMGEXP))
datasub$CROPDMGEXP <- 10^(as.numeric(datasub$CROPDMGEXP))
datasub[is.na(datasub$PROPDMG), "PROPDMG"] <- 0
datasub[is.na(datasub$CROPDMG), "CROPDMG"] <- 0
# Calculate the total damage
datasub <- within(datasub, TOTALDMG <- PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)

DamageByType <- aggregate(datasub$TOTALDMG, by = list(EVTYPE = datasub$EVTYPE), 
    FUN = sum)
DamageByType <- DamageByType[order(DamageByType$x, decreasing = TRUE), ]

datatable(head(DamageByType), rownames = head(LETTERS))

3.1 Fatalities by Weather Event Type

Plotting Fatalities by Weather Event in descending order clearly highlights the top 10 weather events causing the most signifcant human toll over the past 60+ years.

g1 <- ggplot(topfatalevents, aes(EVTYPE, fatalities))
g1 + geom_histogram(stat = "identity", fill = "chocolate") + 
      labs(x = "Types of Events", y = "Number of Fatalities", title = "Fatal storm events:", subtitle = "Tornados, excessive heat, fluds and lightning are few of the most fatal") + 
      theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) + 
      coord_cartesian(ylim = c(0,8000))
## Warning: Ignoring unknown parameters: binwidth, bins, pad

3.2 Injuries by Weather Event Type

Plotting Injuries by Weather Event in descending order clearly highlights the top 10 weather events causing the most signifcant number of injuries inflicted on the US population over the past 60+ years.

g2 <- ggplot(topfatalevents, aes(EVTYPE, injuries))
g2 + geom_col(stat = "identity", fill = "gold") + 
      labs(x = "Types of Events", y = "Number of Injuries", title = "Storms with most injuries reported:", subtitle = " Again Tornados are the most sudden, upredictable, and ranks highest on injuries list") + 
      theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))
## Warning: Ignoring unknown parameters: stat

3.3 Crop & Property Damage by Weather Event Type

To plot the data in ggplot, damage values were displayed in billion US dollars. As one can see, total economic damage followed a very different weather event pattern as in the earlier analysis. Tornados may be the deadliest weather event, yet Floods are responsible for the largest economic damage.

economic_damage <- ddply(Storm.Data, ~EVTYPE, summarise, damage = sum(PROPDMG))
economic_events <- top_n(economic_damage, 25, damage)

# plotting the economic impact
g3 <- ggplot(economic_events, aes(EVTYPE, damage))
g3 + geom_bar(stat = "identity", fill = "tomato") + 
      labs(x = "Types of Events", y = "Property Damage Cost in USD$", title = "Stoms with the highest property damage:", subtitle = "Flash flood and winds are very high in this category as well as the Tornado") + 
      theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))