Synopsis

This report aims to perform an exploratory data analysis of the NOAA (National Oceanic and Atmospheric Administration) Storm Database to evaluate the population health and economic consequences of specific weather events. The data analyzed in this report contains information about major storms and weather events in the United States. In addition, this database provides information of the dates, locations, and fatalities, injuries, and property damage estimates. The years covered in the data analyzed in this report are from 1950 to 2011. From this data, it was found that tornadoes are the most harmful to population health (as measured by estimates of fatalities and injuries), where as tornadoes, thunderstorms, and floods seems to be the leading cause of economic consequences in the US.

Questions answered in this report

  1. Across the United States,

    1. which types of weather events are most harmful with respect to population health?

    2. which types of weather events have the greatest economic consequences?

Loading and Processing of the Data

The dataset analyzed in this report was loaded from from the weblink listed below

download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "./StormData.csv.bz2")

Data.Storm <- read.csv("./StormData.csv.bz2")

Processing of the Data

## evaluate dimensions of the dataset
dim(Data.Storm)
## [1] 902297     37

Examine the names of the 37 variables

names(Data.Storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Select the variables of interest: EVTYPE, FATALITIES, INJURIES, PROPDMG - EVTYPE = weather event - FATALITIES = fatalities as result of a weather event - INJURIES = injuries as result of a weather event - PROPDMG = property damage as result of a weather event - CROPDMG = crop damage as result of a weather event - ECONOMICDMG = combines property damage and crop damage; this variable was created in the reduced data set

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## select variables of interest, and Combine Crop damage and Property Damage into a new variable: Economic Damage
Data.Storm.reduced <- select(Data.Storm, EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)
Data.Storm.reduced <- mutate(Data.Storm.reduced, ECONOMICDMG = CROPDMG + PROPDMG)
names (Data.Storm.reduced)
## [1] "EVTYPE"      "FATALITIES"  "INJURIES"    "PROPDMG"     "CROPDMG"    
## [6] "ECONOMICDMG"

Examine the names of weather event under the variable EVTYPE in the reduced dataset

summary(Data.Storm.reduced$EVTYPE)
##                     HAIL                TSTM WIND        THUNDERSTORM WIND 
##                   288661                   219940                    82563 
##                  TORNADO              FLASH FLOOD                    FLOOD 
##                    60652                    54277                    25326 
##       THUNDERSTORM WINDS                HIGH WIND                LIGHTNING 
##                    20843                    20212                    15754 
##               HEAVY SNOW               HEAVY RAIN             WINTER STORM 
##                    15708                    11723                    11433 
##           WINTER WEATHER             FUNNEL CLOUD         MARINE TSTM WIND 
##                     7026                     6839                     6175 
## MARINE THUNDERSTORM WIND               WATERSPOUT              STRONG WIND 
##                     5812                     3796                     3566 
##     URBAN/SML STREAM FLD                 WILDFIRE                 BLIZZARD 
##                     3392                     2761                     2719 
##                  DROUGHT                ICE STORM           EXCESSIVE HEAT 
##                     2488                     2006                     1678 
##               HIGH WINDS         WILD/FOREST FIRE             FROST/FREEZE 
##                     1533                     1457                     1342 
##                DENSE FOG       WINTER WEATHER/MIX           TSTM WIND/HAIL 
##                     1293                     1104                     1028 
##  EXTREME COLD/WIND CHILL                     HEAT                HIGH SURF 
##                     1002                      767                      725 
##           TROPICAL STORM           FLASH FLOODING             EXTREME COLD 
##                      690                      682                      655 
##            COASTAL FLOOD         LAKE-EFFECT SNOW        FLOOD/FLASH FLOOD 
##                      650                      636                      624 
##                LANDSLIDE                     SNOW          COLD/WIND CHILL 
##                      600                      587                      539 
##                      FOG              RIP CURRENT              MARINE HAIL 
##                      538                      470                      442 
##               DUST STORM                AVALANCHE                     WIND 
##                      427                      386                      340 
##             RIP CURRENTS              STORM SURGE            FREEZING RAIN 
##                      304                      261                      250 
##              URBAN FLOOD     HEAVY SURF/HIGH SURF        EXTREME WINDCHILL 
##                      249                      228                      204 
##             STRONG WINDS           DRY MICROBURST    ASTRONOMICAL LOW TIDE 
##                      196                      186                      174 
##                HURRICANE              RIVER FLOOD               LIGHT SNOW 
##                      174                      173                      154 
##         STORM SURGE/TIDE            RECORD WARMTH         COASTAL FLOODING 
##                      148                      146                      143 
##               DUST DEVIL         MARINE HIGH WIND        UNSEASONABLY WARM 
##                      141                      135                      126 
##                 FLOODING   ASTRONOMICAL HIGH TIDE        MODERATE SNOWFALL 
##                      120                      103                      101 
##           URBAN FLOODING               WINTRY MIX        HURRICANE/TYPHOON 
##                       98                       90                       88 
##            FUNNEL CLOUDS               HEAVY SURF              RECORD HEAT 
##                       87                       84                       81 
##                   FREEZE                HEAT WAVE                     COLD 
##                       74                       74                       72 
##              RECORD COLD                      ICE  THUNDERSTORM WINDS HAIL 
##                       64                       61                       61 
##      TROPICAL DEPRESSION                    SLEET         UNSEASONABLY DRY 
##                       60                       59                       56 
##                    FROST              GUSTY WINDS      THUNDERSTORM WINDSS 
##                       53                       53                       51 
##       MARINE STRONG WIND                    OTHER               SMALL HAIL 
##                       48                       48                       47 
##                   FUNNEL             FREEZING FOG             THUNDERSTORM 
##                       46                       45                       45 
##       Temperature record          TSTM WIND (G45)         Coastal Flooding 
##                       43                       39                       38 
##              WATERSPOUTS    MONTHLY PRECIPITATION                    WINDS 
##                       37                       36                       36 
##                  (Other) 
##                     2940

Standardize the names of the weather event in the EVTYPE variable because there is redundancy and typos. Tropical Storms and Tropical Depression were included into floods because their damage is mainly due to flooding.

Data.Storm.reduced$EVTYPE <- gsub("^HEAT$", "EXCESSIVE HEAT", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^TSTM WIND$", "THUNDERSTORM WIND", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^THUNDERSTORM WIND$", "THUNDERSTORM WINDS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Flood$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Flood$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Floods$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Flooding$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Flash Flood$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^Coastal Flood$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^FLD$", "FLOODS", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^NADO$", "TORNADO", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^FUNNEL$", "TORNADO", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^WATERSPOUT$", "TORNADO", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^FROST$", "COLD WEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^FREEZ$", "COLD wEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^BLIZZARD$", "COLD WEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^WINTER$", "COLD WEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^COLD$", "COLD WEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)
Data.Storm.reduced$EVTYPE <- gsub("^LOW$", "COLD WEATHER", Data.Storm.reduced$EVTYPE, ignore.case = TRUE)

Aggregate the reduced dataset by injuries, fatalities, and economic damage, and create a column listing the total injuries, fatalities, and economic damage, respectively.

Data.Injuries.Agg <- aggregate(Data.Storm.reduced$INJURIES, by=list(Data.Storm.reduced$EVTYPE), FUN=sum, na.rm=TRUE)

colnames(Data.Injuries.Agg) = c("weather.event", "total.injuries")

Data.Fatalities.Agg <- aggregate(Data.Storm.reduced$FATALITIES, by=list(Data.Storm.reduced$EVTYPE), FUN=sum, na.rm=TRUE)

colnames(Data.Fatalities.Agg) = c("weather.event", "total.fatalities")

Data.EconomicDamage.Agg <- aggregate(Data.Storm.reduced$ECONOMICDMG, by=list(Data.Storm.reduced$EVTYPE), FUN=sum, na.rm=TRUE)

colnames(Data.EconomicDamage.Agg) = c("weather.event", "economic.damage")

Arrange the aggregated datasets in descending order

Injuries.Ordered <- arrange(Data.Injuries.Agg, desc(total.injuries))

Fatalities.Ordered <- arrange (Data.Fatalities.Agg, desc(total.fatalities))

Economic.Damage.Ordered <- arrange (Data.EconomicDamage.Agg, desc(economic.damage))

Select the top 10 counts from the ordered datasets

Top.Injuries <- Injuries.Ordered[1:10,]
Top.Injuries$weather.event <- factor(Top.Injuries$weather.event, levels = Top.Injuries$weather.event, ordered = TRUE)

Top.Fatalities <- Fatalities.Ordered[1:10,]
Top.Fatalities$weather.event <- factor(Top.Fatalities$weather.event, levels = Top.Fatalities$weather.event, ordered = TRUE)

Top.Economic.Damage <- Economic.Damage.Ordered[1:10,]
Top.Economic.Damage$weather.event <- factor(Top.Economic.Damage$weather.event, levels = Top.Economic.Damage$weather.event, ordered = TRUE)

Results

Top 10 Weather Events that Cause the Most Injuries in the United States

Plots the top 10 counts of injuries as result of weather event

library(ggplot2)
plot.injuries <- ggplot(Top.Injuries, aes(x=weather.event, y=total.injuries)) + geom_bar(stat = "identity") + xlab("Weather Event") + ylab("Total Injuries") + ggtitle("Injuries by Weather Event between 1950 to 2011") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

plot.injuries

Top 10 Weather Events that Cause the Most Fatalities in the United States

Plots the top 10 counts of fatalities as result of weather events

plot.fatalities <- ggplot(Top.Fatalities, aes(x=weather.event, y=total.fatalities)) + geom_bar(stat = "identity") + xlab("Weather Event") + ylab("Total Fatalities") + ggtitle("Fatalities by Weather Event between 1950 to 2011") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

plot.fatalities

Top 10 Weather Events that Cause the Most Economic Damage in the United States

Plots the top 10 counts of property damage as result of weather events

plot.economic.damage <- ggplot(Top.Economic.Damage, aes(x=weather.event, y=economic.damage)) + geom_bar(stat = "identity") + xlab("Weather Event") + ylab("Total Economic Damage (dollars)") + ggtitle("Economic Damage by Weather Event between 1950 to 2011") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

plot.economic.damage

Findings: 1) Tornadoes are the weather events that affects human health the most in the United States within the years 1950 to 2011. 2) Tornadoes, thunderstorms, and floods are the weather events that have the most economic economic in the United States within the years 1950 to 2011.