In this report we aim to identify the weather events that have caused the most harm to publich health and economic impact. To investigate the events, we obtained the data collected by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) in the storm database. NOAA has published the data for severe weather events from 1950 to November 2011.
Based on our analysis, we found that Wild Fires have caused the highest impact on the public health, with highest mean fatalities and injuries per event. On the other hand Storm Surge has resulted in the highest mean economic damage for the property and crops, per event.
From the NOAA website we acquire the Storm Data along with the National Weather Service Storm Data Documentation and the National Climatic Data Center Storm Events FAQ. After downloading the data file “repdata-data-StormData.csv.bz2” in the current working directory, we wse read.csv() to read the data file with bzfile format.
raw_data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
Steps for analyzing data for population harm are summarized below.
- We subset on the Event type, Fatalities and Injuries from the raw data.
- Using mutate() in dplyr package, we create a new variable (IMPACT_PUBLIC_HEALTH) to add up the number of reported fatalities and injuries.
- We drop the records which have no fatalities or injuries reported in the event.
- Using the aggregate(), we calculate the mean per event type.
- Lastly, we sort the data frame in descending order of population harm and subset the top 20 events.
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(reshape2)
## Loading required package: reshape2
NOAA_data1 <- raw_data[,c("EVTYPE","FATALITIES","INJURIES")]
# create a new variable to sum the public harm from fatalities and injuries
NOAA_data1 <- mutate(NOAA_data1, IMPACT_PUBLIC_HEALTH = round(FATALITIES + INJURIES))
# drop any event not giving any Fatalities or injuries data for this part of the analysis
NOAA_data1 <- NOAA_data1[NOAA_data1$IMPACT_PUBLIC_HEALTH > 0,]
# Find the means per event
NOAA_agg1 <- aggregate(IMPACT_PUBLIC_HEALTH ~ EVTYPE, data = NOAA_data1, FUN = mean, na.rm= TRUE)
#Get top 20 events in decreasing order
NOAA_plot1 <- head(NOAA_agg1[order((NOAA_agg1$IMPACT_PUBLIC_HEALTH), decreasing = TRUE),],n=20)
Steps for analyzing the property damage are similar. I create seperate data frame because the events causing the public harm can different from the events causing the economic damage.
- We create a new data frame with event type, property damage, property damage exponent, crop damage and crop damage exponent.
- There is a special processing steps needed for calculating the dollar impact. We need to multiply the damage exponent for property and crop damage to the respective value and divide by 1 Billion to normalize the data in billions. - Using mutate() in dplyr package, we create a new variable (PROP_VALUE) for calculating the value of property damage and a second variable (CROP_VALUE) for calculating the value of crop damage. We added PROP_VALUE and CROP_VALUE to find the total impact in a third variable (IMPACT_PROP_DAMAGE).
- We get rid of the rows that does not have any reported economic damage.
- Using the aggregate(), we calculate the mean per event type.
- Lastly, we sort the aggregated data in descending order and subset the top 20 events for the results.
# create a table for highest impact on property damage
NOAA_data2 <- raw_data[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
#Create a new variable and multiply the exponent with the property damage value
NOAA_data2 <- mutate(NOAA_data2, PROP_VALUE =
ifelse(PROPDMGEXP %in% c("h","H"), PROPDMG*100,
ifelse(PROPDMGEXP %in% c("k","K"), PROPDMG*1000,
ifelse(PROPDMGEXP %in% c("m","M"), PROPDMG*1000000,
ifelse(PROPDMGEXP %in% c("b","B"), PROPDMG*1000000000,0)))))
#Create a new variable and multiply the exponent with the crop damage value
NOAA_data2 <- mutate(NOAA_data2, CROP_VALUE =
ifelse(CROPDMGEXP %in% c("h","H"), CROPDMG*100,
ifelse(CROPDMGEXP %in% c("k","K"), CROPDMG*1000,
ifelse(CROPDMGEXP %in% c("m","M"), CROPDMG*1000000,
ifelse(CROPDMGEXP %in% c("b","B"), CROPDMG*1000000000,0)))))
#Add the property damage value and the crop damage value to create a new variable for the total economic impact.
#Normalize the damage to Billions of dollars
NOAA_data2 <- mutate(NOAA_data2,
IMPACT_PROP_DAMAGE = round((PROP_VALUE+CROP_VALUE)/1000000000),-3)
#drop any event does not have any economic impact for this section of analysis.
NOAA_data2 <- NOAA_data2[NOAA_data2$IMPACT_PROP_DAMAGE > 0,]
#Calculate mean per event before presenting the results
NOAA_agg2 <- aggregate(IMPACT_PROP_DAMAGE ~ EVTYPE, data = NOAA_data2, FUN = mean, na.rm= TRUE)
#Get top 20 event types causing the economic damage
NOAA_plot2 <- head(NOAA_agg2[order((NOAA_agg2$IMPACT_PROP_DAMAGE), decreasing = TRUE),],n=20)
Figures below show the top 20 events resulting in population harm and economic damage. Instead of using the ploting the aggregated data from the analysis above, we show the average and the variability in the data using box plots.
require(ggplot2)
## Loading required package: ggplot2
#subset the unaggregated data for top 20 events identified with highest mean
plot1<-ggplot(NOAA_data1[NOAA_data1$EVTYPE %in% NOAA_plot1$EVTYPE,], aes(factor(EVTYPE),IMPACT_PUBLIC_HEALTH), outlier.colour = "red")
plot1 + geom_boxplot() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(x = "Event Types",
title="Weather Events Responsible for Highest Fatalities & Injuries (1950-2011)",
y="Population Impact")
#subset the unaggregated data for top 20 events identified with highest mean
plot2<-ggplot(NOAA_data2[NOAA_data2$EVTYPE %in% NOAA_plot2$EVTYPE,], aes(factor(EVTYPE),IMPACT_PROP_DAMAGE), outliers.colour="red")
plot2+geom_boxplot() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(x = "Event Types",
title="Weather Events Responsible for Economic Damage (1950-2011)",
y="Economic Impact in Billions of US Dollars")
Q1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Based on the above analysis, WILD FIRES are most harmful with respect to population health, with a mean of 153 fatalities or injuries per event. Table below shows top 20 severe weather events resulting in the most harm in decreasing order.
NOAA_plot1$IMPACT_PUBLIC_HEALTH <- round(NOAA_plot1$IMPACT_PUBLIC_HEALTH, 2)
colnames(NOAA_plot1) <- c("Event Type", "Fatalities & Injuries")
NOAA_plot1
## Event Type Fatalities & Injuries
## 208 WILD FIRES 153.00
## 196 TSUNAMI 81.00
## 70 Heat Wave 70.00
## 109 HURRICANE/TYPHOON 51.50
## 190 TROPICAL STORM GORDON 51.00
## 218 WINTER WEATHER MIX 34.00
## 200 UNSEASONABLY WARM AND DRY 29.00
## 181 THUNDERSTORMW 27.00
## 216 WINTER STORMS 27.00
## 145 RECORD HEAT 26.00
## 220 WINTRY MIX 26.00
## 3 BLACK ICE 25.00
## 187 TORNADOES, TSTM WIND, HAIL 25.00
## 33 EXCESSIVE RAINFALL 23.00
## 95 HIGH WIND AND SEAS 23.00
## 206 WATERSPOUT/TORNADO 22.50
## 117 ICE STORM 21.73
## 72 HEAT WAVE DROUGHT 19.00
## 164 SNOW/HIGH WINDS 18.00
## 39 EXTREME HEAT 17.93
Q2. Across the UnitedStates, which types of events have the greatest economic consequences?
Based on the above analysis, STORM SURGE have resulted in the greatest economic consequences, with a mean of 21 Billion dollars damage per event. Table below shows top 20 severe weather events resulting in the highest economic damage in decreasing order.
NOAA_plot2$IMPACT_PROP_DAMAGE <- round(NOAA_plot2$IMPACT_PROP_DAMAGE,2)
colnames(NOAA_plot2) <- c("Event Type", "Economic Impact (Billions US$)")
NOAA_plot2
## Event Type Economic Impact (Billions US$)
## 14 STORM SURGE 21.00
## 4 FLOOD 10.83
## 12 RIVER FLOOD 10.00
## 11 ICE STORM 5.00
## 10 HURRICANE/TYPHOON 4.73
## 15 STORM SURGE/TIDE 4.00
## 19 TROPICAL STORM 3.00
## 23 WINTER STORM 3.00
## 6 HEAVY RAIN/SEVERE WEATHER 2.00
## 18 TORNADOES, TSTM WIND, HAIL 2.00
## 17 TORNADO 1.75
## 8 HURRICANE 1.50
## 9 HURRICANE OPAL 1.50
## 21 WILD/FOREST FIRE 1.50
## 5 HAIL 1.25
## 1 DROUGHT 1.00
## 2 EXTREME COLD 1.00
## 3 FLASH FLOOD 1.00
## 7 HIGH WIND 1.00
## 13 SEVERE THUNDERSTORM 1.00