Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This analysis involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database.The database used here tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

You can get the Storm Data here: Storm Data (NOAA)

Main Question - What are the Major Weather Events that are hazardous in terms of Population health and Economic Damages?

In the first part, the analysis processes the required raw data to analyse what are the major weather/storm events which causes the maximum destruction to population health. This includes estimates of both fatalities and injuries during that period due to the event. The summation of numbers of both the variables into one gives the estimated number of population affected by the storm/weather event. Then we determine top 15 hazardous events.

In the second part, we take economic damages into account, namely Property Damage and Crop Damage. The summation of both the variables gives us the estimated amount of economic damage caused by the event. We then determine top 15 event which are harmful in terms of economic damage.

Data Processing

In this section we will know how we process the data as per our need from the source.

First we will download the file from URL and load the data into a data frame.

##DOWNLOADING FILE FROM URL
if (file.exists("Storm Data.csv.bz2") == FALSE) {
  download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
                destfile = "./Storm Data.csv.bz2")
}

##EXTRACTING DATA FROM BZIP2 FILE
storm_data <- read.csv("Storm Data.csv.bz2", header = TRUE)

Incorporating exponential component of Damages into Property and Crop Damage data.

##H/h MEANS HUNDREDS, K MEANS THOUSANDS, m/M MEANS MILLIONS, B MEANS BILLION
storm_data[storm_data$PROPDMGEXP == "h" | storm_data$PROPDMGEXP == "H",]$PROPDMG <- storm_data[storm_data$PROPDMGEXP == "h" | storm_data$PROPDMGEXP == "H",]$PROPDMG * 10^2
storm_data[storm_data$PROPDMGEXP == "K",]$PROPDMG <- storm_data[storm_data$PROPDMGEXP == "K",]$PROPDMG * 10^3
storm_data[storm_data$PROPDMGEXP == "m" | storm_data$PROPDMGEXP == "M",]$PROPDMG <- storm_data[storm_data$PROPDMGEXP == "m" | storm_data$PROPDMGEXP == "M",]$PROPDMG * 10^6
storm_data[storm_data$PROPDMGEXP == "B",]$PROPDMG <- storm_data[storm_data$PROPDMGEXP == "B",]$PROPDMG * 10^9

Analysis

Loading Data Packages required for the analysis.

##LOADING PACKAGES
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Subset the required variables in another data frame.

##SUBSETTING REQUIRED DATA
req_data <- select(.data = storm_data, 
                   c("EVTYPE", seq(from = 23, to = 28, by = 1)))

Question 1 - Across the United States, which types of events (as indicated in the EVETYPE are most harmful with respect to population health?

For the above mentioned question we will group the data by Event type and add the variables of fatalities and injuries into one for determining one variable for Population Health Damage. We will do this by using sum function.

##AGGREGATING THE SUM OF FATALITIES & INJURIES BY EVENT TYPE - APPLYING SUM FUNCTION
human_agg <- aggregate(req_data$FATALITIES + req_data$INJURIES, 
                       by = list(req_data$EVTYPE), 
                       FUN = sum, na.rm = TRUE)

##SORTING THE PUBLIC DAMAGE DATA IN DESCENDING ORDER
human_agg <- human_agg[order(human_agg$x, decreasing = TRUE), ]
##SUBSETTING TOP 15 EVENT TYPES RESPONSIBLE FOR PUBLIC HEALTH DAMAGE
top15_human <- human_agg[1:15,]

Question 2 - Across the United States, which types of events have the greatest economic consequences?

For this part, we will group the data by Event Type and add the variables for crop and property damage into one variable for further analysis.

##AGGREGATING THE SUM OF CROP & PROPERTY DAMAGE BY EVENT TYPE - APPLYING SUM FUNCTION
eco_agg <- aggregate(req_data$PROPDMG + req_data$CROPDMG, 
                     by = list(req_data$EVTYPE), 
                     FUN = sum, na.rm = TRUE)

##SORTING THE PROPERTY DAMAGE DATA IN DESCENDING ORDER
eco_agg <- eco_agg[order(eco_agg$x, decreasing = TRUE), ]
##SUBSETTING TOP 15 EVENT TYPES RESPONSIBLE FOR ECONOMIC DAMAGE
top15_economic <- eco_agg[1:15,]

Results

Results Part 1 - Following are the top 15 weather event types responsible for maximum human fatalities and injuries.

##PLOT FOR PUBLIC HEALTH
ggplot(data = top15_human, 
       aes(x = factor(Group.1, levels = factor(Group.1)), y = x, fill = x)) + 
  geom_col() +
  ggtitle(label = "Population Heath Damage (Number of Fatalities & Injuries) \nTop 15 Event Types") +
  theme(axis.text.x = element_text(angle = 45, size = 6, colour = "black",hjust = 1),
        plot.title = element_text(hjust = 0.5), 
        legend.title = element_blank()) + 
  xlab("Event Types") + 
  ylab("Fatalities & Injuries")

Top 15 Weather Events which cause maximum damage to population health are:

top15_human$Group.1
##  [1] "TORNADO"           "EXCESSIVE HEAT"    "TSTM WIND"        
##  [4] "FLOOD"             "LIGHTNING"         "HEAT"             
##  [7] "FLASH FLOOD"       "ICE STORM"         "THUNDERSTORM WIND"
## [10] "WINTER STORM"      "HIGH WIND"         "HAIL"             
## [13] "HURRICANE/TYPHOON" "HEAVY SNOW"        "WILDFIRE"

Results Part 2 - Following are the top 15 weather event types responsible for maxuimum economic damages mainly classified as - Property and Crop Damages.

##PLOT FOR ECONOMIC DAMAGE
ggplot(data = top15_economic, 
       aes(x = factor(Group.1, levels = factor(Group.1)), y = x, fill = x)) + 
  geom_col() +
  ggtitle(label = "Economic Damage (Property & Crop) \nTop 15 Event Types") +
  theme(axis.text.x = element_text(angle = 45, size = 6, colour = "black"  , hjust = 1),
        plot.title = element_text(hjust = 0.5), 
        legend.title = element_blank()) + 
  xlab("Event Types") + 
  ylab("Economic Damage(In $)")

Top 15 Weather Events which cause maximum economic damage are:

top15_economic$Group.1
##  [1] "FLOOD"             "HURRICANE/TYPHOON" "TORNADO"          
##  [4] "STORM SURGE"       "FLASH FLOOD"       "HAIL"             
##  [7] "HURRICANE"         "TROPICAL STORM"    "WINTER STORM"     
## [10] "HIGH WIND"         "RIVER FLOOD"       "WILDFIRE"         
## [13] "STORM SURGE/TIDE"  "TSTM WIND"         "ICE STORM"

This analysis shows following results :
1. Tornados are the most harmfull events on population health (including both injury and fatalities).
2. Floods are responsible for the most economic damage (including both property and crop damages.