Synopsis

Every day people check the weather report, what clothing to wear for heat, cold, or rain, should they move property inside to protect it from high winds, or change locations to be above a flood or below ground if a tornado. Then they check the weather again to plan for the next day, next week, next season. At last month’s public hearing, local citizens asked the City Council to protect them from recent storms. Instead the Council asked our department to analyze historical weather events and help it prioritize how to use its limited resources. This report provides analysis on past weather events recorded in the NOAA Storm Database throughout the US from 1950 to November 2011. The database includes almost a million observations with data on fatalities, injuries, property and crop damage, plus the date of the event, location, and size. Our department’s research shows which weather events cause the most deaths and injuries and which are the deadliest. Tornadoes are the cause the most deaths, the most injuries and the largest property damgage, and hail causes the most crop damage. Also examined are which events cause the most property and crop damage. This report follows Literate Statistical Programming standards, which involve weaving human readable text and tangling it with machine readable code in the same report.

Data Processing

The NOAA data set was loaded directly without any modifications.

library(readr)
library(data.table)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#read in data
setwd("C://Users//hlevy//Documents//R//StormData")
stormData <- read_csv("./repdata%2Fdata%2FStormData.csv.bz2")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   STATE__ = col_double(),
##   COUNTY = col_double(),
##   BGN_RANGE = col_double(),
##   COUNTY_END = col_double(),
##   END_RANGE = col_double(),
##   LENGTH = col_double(),
##   WIDTH = col_double(),
##   F = col_integer(),
##   MAG = col_double(),
##   FATALITIES = col_double(),
##   INJURIES = col_double(),
##   PROPDMG = col_double(),
##   CROPDMG = col_double(),
##   LATITUDE = col_double(),
##   LONGITUDE = col_double(),
##   LATITUDE_E = col_double(),
##   LONGITUDE_ = col_double(),
##   REFNUM = col_double()
## )
## See spec(...) for full column specifications.
knitr::opts_chunk$set(echo = TRUE)

Through exploring the data, it was discovered to be a very noisy data set, with many spelling, capitalization, and punctuation variations on the same weather event. For example, “Freezing rain”, “Freezing Rain”, “FREEZING RAIN”, “FREEZING RAIN AND SLEET”, “FREEZING RAIN AND SNOW”, “FREEZING RAIN SLEET AND”, “FREEZING RAIN SLEET AND LIGHT”, “FREEZING RAIN/SLEET”, “FREEZING RAIN/SNOW”. With 902,297 observations, there are 997 unique event names with 818 named events having ten or fewer mentions. It was decided to focus on the 87 named events that had at least 50 observations, making them statistically relevant.

#Create alphabetical list of event names, discover many slight variations on the same event, WINTER WEATHER, 
#WINTER WEATHER MIX, WINTER WEATHER/MIX, WINTERY MIX, Wintry mix, Wintry Mix 
EventList <- sort(unique(stormData$EVTYPE)) 

#Very noisy data, 818 named weather events have 10 or fewer observations out of 902,297 observations
EventList1 <- table(stormData$EVTYPE)
EventList1 <- EventList1 <= 10
sum(EventList1)
## [1] 818
#create a list of weather events with at least 50 observations
EventList2 <- table(stormData$EVTYPE)
EventList2 <- EventList2 >= 50
EventList2 <- sort(EventList2 == 1, decreasing = TRUE)
sum(EventList2 == 1)
## [1] 87
EventList3 <- EventList2[1:87]
EventList3
## 
##   ASTRONOMICAL HIGH TIDE    ASTRONOMICAL LOW TIDE                AVALANCHE 
##                     TRUE                     TRUE                     TRUE 
##                 BLIZZARD            COASTAL FLOOD         COASTAL FLOODING 
##                     TRUE                     TRUE                     TRUE 
##                     COLD          COLD/WIND CHILL                DENSE FOG 
##                     TRUE                     TRUE                     TRUE 
##                  DROUGHT           DRY MICROBURST               DUST DEVIL 
##                     TRUE                     TRUE                     TRUE 
##               DUST STORM           EXCESSIVE HEAT             EXTREME COLD 
##                     TRUE                     TRUE                     TRUE 
##  EXTREME COLD/WIND CHILL        EXTREME WINDCHILL              FLASH FLOOD 
##                     TRUE                     TRUE                     TRUE 
##           FLASH FLOODING                    FLOOD        FLOOD/FLASH FLOOD 
##                     TRUE                     TRUE                     TRUE 
##                 FLOODING                      FOG                   FREEZE 
##                     TRUE                     TRUE                     TRUE 
##            FREEZING RAIN                    FROST             FROST/FREEZE 
##                     TRUE                     TRUE                     TRUE 
##             FUNNEL CLOUD            FUNNEL CLOUDS              GUSTY WINDS 
##                     TRUE                     TRUE                     TRUE 
##                     HAIL                     HEAT                HEAT WAVE 
##                     TRUE                     TRUE                     TRUE 
##               HEAVY RAIN               HEAVY SNOW               HEAVY SURF 
##                     TRUE                     TRUE                     TRUE 
##     HEAVY SURF/HIGH SURF                HIGH SURF                HIGH WIND 
##                     TRUE                     TRUE                     TRUE 
##               HIGH WINDS                HURRICANE        HURRICANE/TYPHOON 
##                     TRUE                     TRUE                     TRUE 
##                      ICE                ICE STORM         LAKE-EFFECT SNOW 
##                     TRUE                     TRUE                     TRUE 
##                LANDSLIDE               LIGHT SNOW                LIGHTNING 
##                     TRUE                     TRUE                     TRUE 
##              MARINE HAIL         MARINE HIGH WIND MARINE THUNDERSTORM WIND 
##                     TRUE                     TRUE                     TRUE 
##         MARINE TSTM WIND        MODERATE SNOWFALL              RECORD COLD 
##                     TRUE                     TRUE                     TRUE 
##              RECORD HEAT            RECORD WARMTH              RIP CURRENT 
##                     TRUE                     TRUE                     TRUE 
##             RIP CURRENTS              RIVER FLOOD                    SLEET 
##                     TRUE                     TRUE                     TRUE 
##                     SNOW              STORM SURGE         STORM SURGE/TIDE 
##                     TRUE                     TRUE                     TRUE 
##              STRONG WIND             STRONG WINDS        THUNDERSTORM WIND 
##                     TRUE                     TRUE                     TRUE 
##       THUNDERSTORM WINDS  THUNDERSTORM WINDS HAIL      THUNDERSTORM WINDSS 
##                     TRUE                     TRUE                     TRUE 
##                  TORNADO      TROPICAL DEPRESSION           TROPICAL STORM 
##                     TRUE                     TRUE                     TRUE 
##                TSTM WIND           TSTM WIND/HAIL         UNSEASONABLY DRY 
##                     TRUE                     TRUE                     TRUE 
##        UNSEASONABLY WARM              URBAN FLOOD           URBAN FLOODING 
##                     TRUE                     TRUE                     TRUE 
##     URBAN/SML STREAM FLD               WATERSPOUT         WILD/FOREST FIRE 
##                     TRUE                     TRUE                     TRUE 
##                 WILDFIRE                     WIND             WINTER STORM 
##                     TRUE                     TRUE                     TRUE 
##           WINTER WEATHER       WINTER WEATHER/MIX               WINTRY MIX 
##                     TRUE                     TRUE                     TRUE

Results: Population Health

The City Council wanted to know which weather events were most harmful to our citizens. Harm was examined as the most fatal, causing the greatest number of injuries and the deadliest, or the highest ratio of deaths to injuries. The highest fatalities and injuries were calculated.

#calculate highest fatalities
stormData1 <- stormData
stormData1 <- group_by(stormData1, EVTYPE) %>% summarize(TotalInjuries = sum(INJURIES, na.rm = TRUE),
                                                         TotalFatalities = sum(FATALITIES, na.rm = TRUE))
stormData1a <- arrange(stormData1, desc(TotalFatalities))
stormData1a<- stormData1a[1:10,]
stormData1a
## # A tibble: 10 x 3
##            EVTYPE TotalInjuries TotalFatalities
##             <chr>         <dbl>           <dbl>
##  1        TORNADO         91346            5633
##  2 EXCESSIVE HEAT          6525            1903
##  3    FLASH FLOOD          1777             978
##  4           HEAT          2100             937
##  5      LIGHTNING          5230             816
##  6      TSTM WIND          6957             504
##  7          FLOOD          6789             470
##  8    RIP CURRENT           232             368
##  9      HIGH WIND          1137             248
## 10      AVALANCHE           170             224
#calculate highest injuries
stormData1b <- arrange(stormData1, desc(TotalInjuries))
stormData1b<- stormData1b[1:10,]
stormData1b
## # A tibble: 10 x 3
##               EVTYPE TotalInjuries TotalFatalities
##                <chr>         <dbl>           <dbl>
##  1           TORNADO         91346            5633
##  2         TSTM WIND          6957             504
##  3             FLOOD          6789             470
##  4    EXCESSIVE HEAT          6525            1903
##  5         LIGHTNING          5230             816
##  6              HEAT          2100             937
##  7         ICE STORM          1975              89
##  8       FLASH FLOOD          1777             978
##  9 THUNDERSTORM WIND          1488             133
## 10              HAIL          1361              15
#Interesting data, most deadly is not on list - extreme cold has few injuries, but significant deaths
stormData1c <- mutate(stormData1, Deadliest = TotalFatalities/TotalInjuries) %>% filter(Deadliest > 0 & Deadliest != "NaN" & Deadliest != "Inf") %>%
                                                        arrange(desc(Deadliest))
head(stormData1c)
## # A tibble: 6 x 4
##                    EVTYPE TotalInjuries TotalFatalities Deadliest
##                     <chr>         <dbl>           <dbl>     <dbl>
## 1         COLD/WIND CHILL            12              95  7.916667
## 2          HURRICANE ERIN             1               6  6.000000
## 3 EXTREME COLD/WIND CHILL            24             125  5.208333
## 4              ROUGH SURF             1               4  4.000000
## 5            SNOW AND ICE             1               4  4.000000
## 6       EXTREME WINDCHILL             5              17  3.400000

Interesting to note is that the top 10 weather events that cause fatalities are not the same top 10 weather events resulting in injuries. Avalanches and Rip Currents are among the top 10 weather events that cause fatalities, but not injuries. Hail and Ice Storms are among the top 10 injury causing weather events, but not fatalities. The deadliest cause of death is Cold/Wind Chill, because the number of injuries is so low compared with the number of deaths. The overwhelming cause of both human injuries and fatalities are tornadoes, which cause significant damage and occur suddenly without warning.

Fortunately, there are many more injuries than deaths. There are more than 16 times tornado-related injuries than tornado-related deaths. The two charts below are on very different scales reflecting this.

#plot in Total Fatalities and Total Injuries in base
par(mfrow = c(2,1), mar = c(4, 6, 1, 1), las = 1)
barplot(stormData1a$TotalFatalities, col = "red", xlab = "Number of Fatalities",  horiz = TRUE,
        main = "Total Fatalities and Injuries from Weather Events", axisnames = TRUE, 
        names.arg = stormData1a$EVTYPE, cex.names = .5)
barplot(stormData1b$TotalInjuries, col = "orange", xlab = "Number of Injuries", horiz = TRUE,
        axisnames = TRUE, names.arg = stormData1b$EVTYPE, cex.names = .5)

Results: Economic Consequences

People can survive a tornado then have significant financial losses. The NOAA data includes both property and crop damage losses by event type which were calculated.

#calculate property damage
stormData2 <- stormData
stormData2 <- group_by(stormData2, EVTYPE) %>% summarize(TotalPropertyDamage = sum(PROPDMG, na.rm = TRUE),
                                                         TotalCropDamage = sum(CROPDMG, na.rm = TRUE))
stormData2a <- arrange(stormData2, desc(TotalPropertyDamage))
stormData2a<- stormData2a[1:10,]
stormData2a
## # A tibble: 10 x 3
##                EVTYPE TotalPropertyDamage TotalCropDamage
##                 <chr>               <dbl>           <dbl>
##  1            TORNADO           3212258.2       100018.52
##  2        FLASH FLOOD           1420174.6       179200.46
##  3          TSTM WIND           1336073.6       109202.60
##  4              FLOOD            899938.5       168037.88
##  5  THUNDERSTORM WIND            876844.2        66791.45
##  6               HAIL            688693.4       579596.28
##  7          LIGHTNING            603351.8         3580.61
##  8 THUNDERSTORM WINDS            446293.2        18684.93
##  9          HIGH WIND            324731.6        17283.21
## 10       WINTER STORM            132720.6         1978.99
#calculate crop damage
stormData2b <- arrange(stormData2, desc(TotalCropDamage))
stormData2b<- stormData2b[1:10,]
stormData2b
## # A tibble: 10 x 3
##                EVTYPE TotalPropertyDamage TotalCropDamage
##                 <chr>               <dbl>           <dbl>
##  1               HAIL           688693.38       579596.28
##  2        FLASH FLOOD          1420174.59       179200.46
##  3              FLOOD           899938.48       168037.88
##  4          TSTM WIND          1336073.61       109202.60
##  5            TORNADO          3212258.16       100018.52
##  6  THUNDERSTORM WIND           876844.17        66791.45
##  7            DROUGHT             4099.05        33898.62
##  8 THUNDERSTORM WINDS           446293.18        18684.93
##  9          HIGH WIND           324731.56        17283.21
## 10         HEAVY RAIN            50842.14        11122.80

Tornadoes cause the most significant property loss, but Hail which can cover a large geographic area causes the largest crop damage. Property damage which includes buildings and infrastructure is a significantly larger dollar amount and the different scales reflect that.

#plot property and crop damage
par(mfrow = c(2,1), mar = c(4, 6, 1, 1), las = 1, options(scipen=10))
barplot(stormData2a$TotalPropertyDamage, col = "blue", xlab = "Property Damage (US Dollars)",  horiz = TRUE,
        main = "Total Property & Crop Damage from Weather Events", axisnames = TRUE, 
        names.arg = stormData2a$EVTYPE, cex.names = .5)
barplot(stormData2b$TotalCropDamage, col = "green", xlab = "Crop Damage (US Dollars)", horiz = TRUE,
        axisnames = TRUE, names.arg = stormData2b$EVTYPE, cex.names = .5)

Data Processing Check

Because these plots are based on the top 10 weather events, we wanted to make sure that it was still representative of the entire data set. The top 10 weather events caused 79.8% of the fatalities and 89.3% of the injuries. Tornadoes alone caused 37.1% of all fatalities reported and 65.0% of all injuries.

#calculate percentage that top 10 of total fatalities and injuries are of total
sum(stormData1a$TotalFatalities)/sum(stormData1$TotalFatalities)
## [1] 0.797689
sum(stormData1b$TotalInjuries)/sum(stormData1$TotalInjuries)
## [1] 0.893402
#calculate percentage of Torando of total
stormData1a[1,3]/sum(stormData1$TotalFatalities)
##   TotalFatalities
## 1       0.3719379
stormData1a[1,2]/sum(stormData1$TotalInjuries)
##   TotalInjuries
## 1     0.6500199

We also checked property and crop damage to make sure it also represented the weather events for the most economic risk. The top 10 weather-related property events were 91.3% of the total property damage and the top 10 weather related crop events were 93.1% of the all crop damages. It seems property and crop damage comes in many more forms as tornado-related property damage was only 7.3% of total damage and hail was 6.3% of all crop damage.

#calculate percentage that top 10 of total property and crop damage are of total
sum(stormData2a$TotalPropertyDamage)/sum(stormData2$TotalPropertyDamage)
## [1] 0.9133244
sum(stormData2b$TotalCropDamage)/sum(stormData2$TotalCropDamage)
## [1] 0.9317835
#calculate percentage of Hail or Torando of total
stormData2a[1,3]/sum(stormData2$TotalCropDamage)
##   TotalCropDamage
## 1      0.07259148
stormData2b[1,2]/sum(stormData2$TotalPropertyDamage)
##   TotalPropertyDamage
## 1          0.06327285
dev.off()
## null device 
##           1