This report aims to perform an exploratory data analysis of the NOAA (National Oceanic and Atmospheric Administration) Storm Database to evaluate the population health and economic consequences of specific weather events. The data analyzed in this report contains information about major storms and weather events in the United States. In addition, this database provides information of the dates, locations, and fatalities, injuries, and property damage estimates. The years covered in the data analyzed in this report are from 1950 to 2011. From this data, it was found that tornadoes are the most harmful to population health (as measured by estimates of fatalities and injuries), where as tornadoes, thunderstorms, and floods seems to be the leading cause of economic consequences in the US.
Across the United States,
which types of weather events are most harmful with respect to population health?
which types of weather events have the greatest economic consequences?
The dataset analyzed in this report was loaded from from the weblink listed below
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "./StormData.csv.bz2")
Data.Storm <- read.csv("./StormData.csv.bz2")
## evaluate dimensions of the dataset
dim(Data.Storm)
## [1] 902297 37
Examine the names of the 37 variables
names(Data.Storm)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Select the variables of interest: EVTYPE, FATALITIES, INJURIES, PROPDMG - EVTYPE = weather event - FATALITIES = fatalities as result of a weather event - INJURIES = injuries as result of a weather event - PROPDMG = property damage as result of a weather event - CROPDMG = crop damage as result of a weather event - ECONOMICDMG = combines property damage and crop damage; this variable was created in the reduced data set
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## select variables of interest, and Combine Crop damage and Property Damage into a new variable: Economic Damage
Data.Storm.reduced <- select(Data.Storm, EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)
Data.Storm.reduced <- mutate(Data.Storm.reduced, ECONOMICDMG = CROPDMG + PROPDMG)
names (Data.Storm.reduced)
## [1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG" "CROPDMG"
## [6] "ECONOMICDMG"
Examine the names of weather event under the variable EVTYPE in the reduced dataset
summary(Data.Storm.reduced$EVTYPE)
## HAIL TSTM WIND THUNDERSTORM WIND
## 288661 219940 82563
## TORNADO FLASH FLOOD FLOOD
## 60652 54277 25326
## THUNDERSTORM WINDS HIGH WIND LIGHTNING
## 20843 20212 15754
## HEAVY SNOW HEAVY RAIN WINTER STORM
## 15708 11723 11433
## WINTER WEATHER FUNNEL CLOUD MARINE TSTM WIND
## 7026 6839 6175
## MARINE THUNDERSTORM WIND WATERSPOUT STRONG WIND
## 5812 3796 3566
## URBAN/SML STREAM FLD WILDFIRE BLIZZARD
## 3392 2761 2719
## DROUGHT ICE STORM EXCESSIVE HEAT
## 2488 2006 1678
## HIGH WINDS WILD/FOREST FIRE FROST/FREEZE
## 1533 1457 1342
## DENSE FOG WINTER WEATHER/MIX TSTM WIND/HAIL
## 1293 1104 1028
## EXTREME COLD/WIND CHILL HEAT HIGH SURF
## 1002 767 725
## TROPICAL STORM FLASH FLOODING EXTREME COLD
## 690 682 655
## COASTAL FLOOD LAKE-EFFECT SNOW FLOOD/FLASH FLOOD
## 650 636 624
## LANDSLIDE SNOW COLD/WIND CHILL
## 600 587 539
## FOG RIP CURRENT MARINE HAIL
## 538 470 442
## DUST STORM AVALANCHE WIND
## 427 386 340
## RIP CURRENTS STORM SURGE FREEZING RAIN
## 304 261 250
## URBAN FLOOD HEAVY SURF/HIGH SURF EXTREME WINDCHILL
## 249 228 204
## STRONG WINDS DRY MICROBURST ASTRONOMICAL LOW TIDE
## 196 186 174
## HURRICANE RIVER FLOOD LIGHT SNOW
## 174 173 154
## STORM SURGE/TIDE RECORD WARMTH COASTAL FLOODING
## 148 146 143
## DUST DEVIL MARINE HIGH WIND UNSEASONABLY WARM
## 141 135 126
## FLOODING ASTRONOMICAL HIGH TIDE MODERATE SNOWFALL
## 120 103 101
## URBAN FLOODING WINTRY MIX HURRICANE/TYPHOON
## 98 90 88
## FUNNEL CLOUDS HEAVY SURF RECORD HEAT
## 87 84 81
## FREEZE HEAT WAVE COLD
## 74 74 72
## RECORD COLD ICE THUNDERSTORM WINDS HAIL
## 64 61 61
## TROPICAL DEPRESSION SLEET UNSEASONABLY DRY
## 60 59 56
## FROST GUSTY WINDS THUNDERSTORM WINDSS
## 53 53 51
## MARINE STRONG WIND OTHER SMALL HAIL
## 48 48 47
## FUNNEL FREEZING FOG THUNDERSTORM
## 46 45 45
## Temperature record TSTM WIND (G45) Coastal Flooding
## 43 39 38
## WATERSPOUTS MONTHLY PRECIPITATION WINDS
## 37 36 36
## (Other)
## 2940
Standardize the names of the weather event in the EVTYPE variable because there is redundancy and typos. Tropical Storms and Tropical Depression were included into floods because their damage is mainly due to flooding.
Data.Storm.reduced$EVTYPE <- gsub("^HEAT$", "EXCESSIVE HEAT", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^TSTM WIND$", "THUNDERSTORM WIND", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^THUNDERSTORM WIND$", "THUNDERSTORM WINDS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Flood$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Flood$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Floods$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Flooding$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Flash Flood$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Coastal Flood$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^FLD$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^NADO$", "TORNADO", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^FUNNEL$", "TORNADO", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^WATERSPOUT$", "TORNADO", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^FROST$", "COLD WEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^FREEZ$", "COLD wEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^BLIZZARD$", "COLD WEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^WINTER$", "COLD WEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^COLD$", "COLD WEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^LOW$", "COLD WEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Aggregate the reduced dataset by injuries, fatalities, and economic damage, and create a column listing the total injuries, fatalities, and economic damage, respectively.
Data.Injuries.Agg <- aggregate(Data.Storm.reduced$INJURIES, by=list(Data.Storm.reduced$EVTYPE), FUN=sum, na.rm=TRUE)
colnames(Data.Injuries.Agg) = c("weather.event", "total.injuries")
Data.Fatalities.Agg <- aggregate(Data.Storm.reduced$FATALITIES, by=list(Data.Storm.reduced$EVTYPE), FUN=sum, na.rm=TRUE)
colnames(Data.Fatalities.Agg) = c("weather.event", "total.fatalities")
Data.EconomicDamage.Agg <- aggregate(Data.Storm.reduced$ECONOMICDMG, by=list(Data.Storm.reduced$EVTYPE), FUN=sum, na.rm=TRUE)
colnames(Data.EconomicDamage.Agg) = c("weather.event", "economic.damage")
Arrange the aggregated datasets in descending order
Injuries.Ordered <- arrange(Data.Injuries.Agg, desc(total.injuries))
Fatalities.Ordered <- arrange (Data.Fatalities.Agg, desc(total.fatalities))
Economic.Damage.Ordered <- arrange (Data.EconomicDamage.Agg, desc(economic.damage))
Select the top 10 counts from the ordered datasets
Top.Injuries <- Injuries.Ordered[1:10,]
Top.Injuries$weather.event <- factor(Top.Injuries$weather.event, levels = Top.Injuries$weather.event, ordered = TRUE)
Top.Fatalities <- Fatalities.Ordered[1:10,]
Top.Fatalities$weather.event <- factor(Top.Fatalities$weather.event, levels = Top.Fatalities$weather.event, ordered = TRUE)
Top.Economic.Damage <- Economic.Damage.Ordered[1:10,]
Top.Economic.Damage$weather.event <- factor(Top.Economic.Damage$weather.event, levels = Top.Economic.Damage$weather.event, ordered = TRUE)
Plots the top 10 counts of injuries as result of weather event
library(ggplot2)
plot.injuries <- ggplot(Top.Injuries, aes(x=weather.event, y=total.injuries)) + geom_bar(stat = "identity") + xlab("Weather Event") + ylab("Total Injuries") + ggtitle("Injuries by Weather Event between 1950 to 2011") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
plot.injuries
Plots the top 10 counts of fatalities as result of weather events
plot.fatalities <- ggplot(Top.Fatalities, aes(x=weather.event, y=total.fatalities)) + geom_bar(stat = "identity") + xlab("Weather Event") + ylab("Total Fatalities") + ggtitle("Fatalities by Weather Event between 1950 to 2011") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
plot.fatalities
Plots the top 10 counts of property damage as result of weather events
plot.economic.damage <- ggplot(Top.Economic.Damage, aes(x=weather.event, y=economic.damage)) + geom_bar(stat = "identity") + xlab("Weather Event") + ylab("Total Economic Damage (dollars)") + ggtitle("Economic Damage by Weather Event between 1950 to 2011") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
plot.economic.damage
Findings: 1) Tornadoes are the weather events that affects human health the most in the United States within the years 1950 to 2011. 2) Tornadoes, thunderstorms, and floods are the weather events that have the most economic economic in the United States within the years 1950 to 2011.