Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The storm database includes weather events from 1950 through the year 2011 and contains data estimates such as the number fatalities and injuries for each weather event as well as economic cost damage to properties and crops for each weather event.

The estimates for fatalities and injuries were used to determine weather events with the most harmful impact to population health. Property damage and crop damage cost estimates were used to determine weather events with the greatest economic consequences.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. File can be downloaded from the course web site:

Storm Data [https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation [https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf]

National Climatic Data Center Storm Events FAQ [https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf]

General Settings

knitr::opts_chunk$set(fig.path='Plots/')

Load required libraries

library(data.table)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.1
library(R.utils)
## Warning: package 'R.utils' was built under R version 4.4.1
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.26.0 (2024-01-24 05:12:50 UTC) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
## 
##     throw
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, load, save
## R.utils v2.12.3 (2023-11-18 01:00:02 UTC) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, isOpen, nullfile, parse, use, warnings
library(knitr)
## Warning: package 'knitr' was built under R version 4.4.1

Data Processing

We first need to get the data. To do this we have to download the file, unzip it, and read it into R.

Download and unzip data

download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "stormData.csv.bz2")
bunzip2("stormData.csv.bz2", overwrite=T, remove=F)

Read the data into R

stormData <- read.csv("stormData.csv")

Then we have a quick look at the data

dim(stormData)
## [1] 902297     37
names(stormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
head(stormData, 3)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1 2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
tail(stormData, 3)
##        STATE__           BGN_DATE    BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 902295       2  11/8/2011 0:00:00 02:58:00 PM       AKS    213     AKZ213    AK
## 902296       2  11/9/2011 0:00:00 10:21:00 AM       AKS    202     AKZ202    AK
## 902297       1 11/28/2011 0:00:00 08:00:00 PM       CST      6     ALZ006    AL
##            EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI           END_DATE    END_TIME
## 902295  HIGH WIND         0                     11/9/2011 0:00:00 01:15:00 PM
## 902296   BLIZZARD         0                     11/9/2011 0:00:00 05:00:00 PM
## 902297 HEAVY SNOW         0                    11/29/2011 0:00:00 04:00:00 AM
##        COUNTY_END COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH  F MAG
## 902295          0         NA         0                         0     0 NA  81
## 902296          0         NA         0                         0     0 NA   0
## 902297          0         NA         0                         0     0 NA   0
##        FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO
## 902295          0        0       0          K       0          K AFG
## 902296          0        0       0          K       0          K AFG
## 902297          0        0       0          K       0          K HUN
##              STATEOFFIC
## 902295 ALASKA, Northern
## 902296 ALASKA, Northern
## 902297   ALABAMA, North
##                                                            ZONENAMES LATITUDE
## 902295 ST LAWRENCE IS. BERING STRAIT - ST LAWRENCE IS. BERING STRAIT        0
## 902296                 NORTHERN ARCTIC COAST - NORTHERN ARCTIC COAST        0
## 902297                                             MADISON - MADISON        0
##        LONGITUDE LATITUDE_E LONGITUDE_
## 902295         0          0          0
## 902296         0          0          0
## 902297         0          0          0
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        REMARKS
## 902295 EPISODE NARRATIVE: A 960 mb low over the southern Aleutians at 0300AKST on the 8th intensified to 945 mb near the Gulf of Anadyr by 2100AKST on the 8th. The low crossed the Chukotsk Peninsula as a 956 mb low at 0900AKST on the 9th, and moved into the southern Chukchi Sea as a 958 mb low by 2100AKST on the 9th. The low then tracked to the northwest and weakened to 975 mb about 150 miles north of Wrangel Island by 1500AKST on the 10th. The storm was one of the strongest storms to impact the west coast of Alaska since November 1974. \n\nZone 201: Blizzard conditions were observed at Wainwright from approximately 1153AKST through 1611AKST on the 9th. The visibility was frequently reduced to one quarter mile in snow and blowing snow. There was a peak wind gust to 43kt (50 mph) at the Wainwright ASOS. During this event, there was also a peak wind gust to \n68 kt (78 mph) at the Cape Lisburne AWOS. \n\nZone 202: Blizzard conditions were observed at Barrow from approximately 1021AKST through 1700AKST on the 9th. The visibility was frequently reduced to one quarter mile or less in blowing snow. There was a peak wind gust to 46 kt (53 mph) at the Barrow ASOS. \n\nZone 207: Blizzard conditions were observed at Kivalina from approximately 0400AKST through 1230AKST on the 9th. The visibility was frequently reduced to one quarter of a mile in snow and blowing snow. There was a peak wind gust to 61 kt (70 mph) at the Kivalina ASOS.  The doors to the village transportation shed were blown out to sea.  Many homes lost portions of their tin roofing, and satellite dishes were ripped off of roofs. One home had its door blown off.  At Point Hope, severe blizzard conditions were observed. There was a peak wind gust of 68 kt (78 mph) at the Point Hope AWOS before power was lost to the AWOS. It was estimated that the wind gusted as high as 85 mph in the village during the height of the storm during the morning and early afternoon hours on the 9th. Five power poles were knocked down in the storm EVENT NARRATIVE: 
## 902296 EPISODE NARRATIVE: A 960 mb low over the southern Aleutians at 0300AKST on the 8th intensified to 945 mb near the Gulf of Anadyr by 2100AKST on the 8th. The low crossed the Chukotsk Peninsula as a 956 mb low at 0900AKST on the 9th, and moved into the southern Chukchi Sea as a 958 mb low by 2100AKST on the 9th. The low then tracked to the northwest and weakened to 975 mb about 150 miles north of Wrangel Island by 1500AKST on the 10th. The storm was one of the strongest storms to impact the west coast of Alaska since November 1974. \n\nZone 201: Blizzard conditions were observed at Wainwright from approximately 1153AKST through 1611AKST on the 9th. The visibility was frequently reduced to one quarter mile in snow and blowing snow. There was a peak wind gust to 43kt (50 mph) at the Wainwright ASOS. During this event, there was also a peak wind gust to \n68 kt (78 mph) at the Cape Lisburne AWOS. \n\nZone 202: Blizzard conditions were observed at Barrow from approximately 1021AKST through 1700AKST on the 9th. The visibility was frequently reduced to one quarter mile or less in blowing snow. There was a peak wind gust to 46 kt (53 mph) at the Barrow ASOS. \n\nZone 207: Blizzard conditions were observed at Kivalina from approximately 0400AKST through 1230AKST on the 9th. The visibility was frequently reduced to one quarter of a mile in snow and blowing snow. There was a peak wind gust to 61 kt (70 mph) at the Kivalina ASOS.  The doors to the village transportation shed were blown out to sea.  Many homes lost portions of their tin roofing, and satellite dishes were ripped off of roofs. One home had its door blown off.  At Point Hope, severe blizzard conditions were observed. There was a peak wind gust of 68 kt (78 mph) at the Point Hope AWOS before power was lost to the AWOS. It was estimated that the wind gusted as high as 85 mph in the village during the height of the storm during the morning and early afternoon hours on the 9th. Five power poles were knocked down in the storm EVENT NARRATIVE: 
## 902297                           EPISODE NARRATIVE: An intense upper level low developed on the 28th at the base of a highly amplified upper trough across the Great Lakes and Mississippi Valley.  The upper low closed off over the mid South and tracked northeast across the Tennessee Valley during the morning of the 29th.   A warm conveyor belt of heavy rainfall developed in advance of the low which dumped from around 2 to over 5 inches of rain across the eastern two thirds of north Alabama and middle Tennessee.  The highest rain amounts were recorded in Jackson and DeKalb Counties with 3 to 5 inches.  The rain fell over 24 to 36 hour period, with rainfall remaining light to moderate during most its duration.  The rainfall resulted in minor river flooding along the Little River, Big Wills Creek and Paint Rock.   A landslide occurred on Highway 35 just north of Section in Jackson County.  A driver was trapped in his vehicle, but was rescued unharmed.  Trees, boulders and debris blocked 100 to 250 yards of Highway 35.\n\nThe rain mixed with and changed to snow across north Alabama during the afternoon and  evening hours of the 28th, and lasted into the 29th.  The heaviest bursts of snow occurred in northwest Alabama during the afternoon and evening hours, and in north central and northeast Alabama during the overnight and morning hours.  Since ground temperatures were in the 50s, and air temperatures in valley areas only dropped into the mid 30s, most of the snowfall melted on impact with mostly trace amounts reported in valley locations.  However, above 1500 foot elevation, snow accumulations of 1 to 2 inches were reported.  The heaviest amount was 2.3 inches on Monte Sano Mountain, about 5 miles northeast of Huntsville.EVENT NARRATIVE: Snowfall accumulations of up to 2.3 inches were reported on the higher elevations of eastern Madison County.  A snow accumulation of 1.5 inches was reported 2.7 miles south of Gurley, while 2.3 inches was reported 3 miles east of Huntsville atop Monte Sano Mountain.
##        REFNUM
## 902295 902295
## 902296 902296
## 902297 902297

We then create a subset of the data including only the following columns. This will make it easier to work with.

BGN_DATE - Date EVTYPE - Event type FATALITIES - No. of fatalities INJURIES - No. of injuries PROPDMG - Property damage PROPDMGEXP - Unit of property damage amount CROPDMG - Crop damage CROPDMGEXP -Unit of crop damage amount

keepCols <- c("BGN_DATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
StormData_Used <- stormData[keepCols]

Format the date and create a new column for year

StormData_Used$Year <- as.numeric(format(as.Date(StormData_Used$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))

As the units for property and crop damage are not the same for each entry, we will have to normalise the data.

unique(StormData_Used$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
StormData_Used$PROPDMGEXP <- as.character(StormData_Used$PROPDMGEXP)
StormData_Used$PROPDMGEXP[toupper(StormData_Used$PROPDMGEXP) == 'H'] <- "2"
StormData_Used$PROPDMGEXP[toupper(StormData_Used$PROPDMGEXP) == 'K'] <- "3"
StormData_Used$PROPDMGEXP[toupper(StormData_Used$PROPDMGEXP) == 'M'] <- "6"
StormData_Used$PROPDMGEXP[toupper(StormData_Used$PROPDMGEXP) == 'B'] <- "9"
StormData_Used$PROPDMGEXP <- as.numeric(StormData_Used$PROPDMGEXP)
## Warning: NAs introduced by coercion
StormData_Used$PROPDMGEXP[is.na(StormData_Used$PROPDMGEXP)] <- 0
StormData_Used$TOTALPROPDMG <- StormData_Used$PROPDMG * 10^StormData_Used$PROPDMGEXP

Now for Crop Damage

unique(StormData_Used$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
StormData_Used$CROPDMGEXP <- as.character(StormData_Used$CROPDMGEXP)
StormData_Used$CROPDMGEXP[toupper(StormData_Used$CROPDMGEXP) == 'H'] <- "2"
StormData_Used$CROPDMGEXP[toupper(StormData_Used$CROPDMGEXP) == 'K'] <- "3"
StormData_Used$CROPDMGEXP[toupper(StormData_Used$CROPDMGEXP) == 'M'] <- "6"
StormData_Used$CROPDMGEXP[toupper(StormData_Used$CROPDMGEXP) == 'B'] <- "9"
StormData_Used$CROPDMGEXP <- as.numeric(StormData_Used$CROPDMGEXP)
## Warning: NAs introduced by coercion
StormData_Used$CROPDMGEXP[is.na(StormData_Used$CROPDMGEXP)] <- 0
StormData_Used$TOTALCROPDMG <- StormData_Used$CROPDMG * 10^StormData_Used$CROPDMGEXP

Now let’s look at new dataset following manipulation

head(StormData_Used, 3)
##            BGN_DATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 4/18/1950 0:00:00 TORNADO          0       15    25.0          3       0
## 2 4/18/1950 0:00:00 TORNADO          0        0     2.5          3       0
## 3 2/20/1951 0:00:00 TORNADO          0        2    25.0          3       0
##   CROPDMGEXP Year TOTALPROPDMG TOTALCROPDMG
## 1          0 1950        25000            0
## 2          0 1950         2500            0
## 3          0 1951        25000            0

Results

To determine the impact on public health we would like to plot the data for both injuries and fatalities caused by severe weather events.

To do this we will subset the data to include the top ten most damaging, then combine these to form one dataset, and finally plot this data.

Fatalities

TotFatalities <- aggregate(StormData_Used$FATALITIES, by = list(StormData_Used$EVTYPE), "sum")
names(TotFatalities) <- c("Event", "Fatalities")
TotFatalities <- TotFatalities[order(-TotFatalities$Fatalities), ][1:10, ]
TotFatalities
##              Event Fatalities
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

Injuries

TotInjuries <- aggregate(StormData_Used$INJURIES, by = list(StormData_Used$EVTYPE), "sum")
names(TotInjuries) <- c("Event", "Injuries")
TotInjuries <- TotInjuries[order(-TotInjuries$Injuries), ][1:10, ]
TotInjuries
##                 Event Injuries
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

Merge the datasets

TotHealthDamage <- merge(x = TotFatalities, y = TotInjuries, by = "Event", all = TRUE)
TotHealthDamage <- melt(TotHealthDamage, id.vars = 'Event')
## Warning: The melt generic in data.table has been passed a data.frame and will
## attempt to redirect to the relevant reshape2 method; please note that reshape2
## is superseded and is no longer actively developed, and this redirection is now
## deprecated. To continue using melt methods from reshape2 while both libraries
## are attached, e.g. melt.list, you can prepend the namespace, i.e.
## reshape2::melt(TotHealthDamage). In the next version, this warning will become
## an error.

Plot the data

ggplot(TotHealthDamage, aes(Event, value)) +   
        geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Event Type") + 
        ylab("Damage, USD") + ggtitle("Efect on Health damage  by Event type")
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_bar()`).

In conclusion, tornados are the most damaging to people in the US.

Economic Impact

To determine the impact the economy we would like to plot the data for both property and crop damage caused by severe weather events.

To do this we will subset the data to include the top ten most damaging, then combine these to form one dataset, and finally plot this data.

What are the top ten causes of property damage?

TotPropDmg <- aggregate(StormData_Used$TOTALPROPDMG, by = list(StormData_Used$EVTYPE), "sum")
names(TotPropDmg) <- c("Event", "Prop_Cost")
TotPropDmg <- TotPropDmg[order(-TotPropDmg$Prop_Cost), ][1:10, ]
TotPropDmg
##                 Event    Prop_Cost
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56947380677
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822673979
## 244              HAIL  15735267513
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497251
## 359         HIGH WIND   5270046295

What are the top 10 causes of crop damage?

TotCropDmg <- aggregate(StormData_Used$TOTALCROPDMG, by = list(StormData_Used$EVTYPE), "sum")
names(TotCropDmg) <- c("Event", "Crop_Cost")
TotCropDmg <- TotCropDmg[order(-TotCropDmg$Crop_Cost), ][1:10, ]
TotCropDmg
##                 Event   Crop_Cost
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954473
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000

Lets merge the datasets

TotEcoDamage <- merge(x = TotPropDmg, y = TotCropDmg, by = "Event", all = TRUE)
TotEcoDamage <- melt(TotEcoDamage, id.vars = 'Event')
## Warning: The melt generic in data.table has been passed a data.frame and will
## attempt to redirect to the relevant reshape2 method; please note that reshape2
## is superseded and is no longer actively developed, and this redirection is now
## deprecated. To continue using melt methods from reshape2 while both libraries
## are attached, e.g. melt.list, you can prepend the namespace, i.e.
## reshape2::melt(TotEcoDamage). In the next version, this warning will become an
## error.

Now let’s look at the data.

ggplot(TotEcoDamage, aes(Event, value)) +   
        geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Event Type") + 
        ylab("Damage, USD") + ggtitle("Crop/Property damage  by type")
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_bar()`).

In conclusion, flood damage is the most significant to both property and crops.