Every day people check the weather report, what clothing to wear for heat, cold, or rain, should they move property inside to protect it from high winds, or change locations to be above a flood or below ground if a tornado. Then they check the weather again to plan for the next day, next week, next season. At last month’s public hearing, local citizens asked the City Council to protect them from recent storms. Instead the Council asked our department to analyze historical weather events and help it prioritize how to use its limited resources. This report provides analysis on past weather events recorded in the NOAA Storm Database throughout the US from 1950 to November 2011. The database includes almost a million observations with data on fatalities, injuries, property and crop damage, plus the date of the event, location, and size. Our department’s research shows which weather events cause the most deaths and injuries and which are the deadliest. Tornadoes are the cause the most deaths, the most injuries and the largest property damgage, and hail causes the most crop damage. Also examined are which events cause the most property and crop damage. This report follows Literate Statistical Programming standards, which involve weaving human readable text and tangling it with machine readable code in the same report.
The NOAA data set was loaded directly without any modifications.
library(readr)
library(data.table)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#read in data
setwd("C://Users//hlevy//Documents//R//StormData")
stormData <- read_csv("./repdata%2Fdata%2FStormData.csv.bz2")
## Parsed with column specification:
## cols(
## .default = col_character(),
## STATE__ = col_double(),
## COUNTY = col_double(),
## BGN_RANGE = col_double(),
## COUNTY_END = col_double(),
## END_RANGE = col_double(),
## LENGTH = col_double(),
## WIDTH = col_double(),
## F = col_integer(),
## MAG = col_double(),
## FATALITIES = col_double(),
## INJURIES = col_double(),
## PROPDMG = col_double(),
## CROPDMG = col_double(),
## LATITUDE = col_double(),
## LONGITUDE = col_double(),
## LATITUDE_E = col_double(),
## LONGITUDE_ = col_double(),
## REFNUM = col_double()
## )
## See spec(...) for full column specifications.
knitr::opts_chunk$set(echo = TRUE)
Through exploring the data, it was discovered to be a very noisy data set, with many spelling, capitalization, and punctuation variations on the same weather event. For example, “Freezing rain”, “Freezing Rain”, “FREEZING RAIN”, “FREEZING RAIN AND SLEET”, “FREEZING RAIN AND SNOW”, “FREEZING RAIN SLEET AND”, “FREEZING RAIN SLEET AND LIGHT”, “FREEZING RAIN/SLEET”, “FREEZING RAIN/SNOW”. With 902,297 observations, there are 997 unique event names with 818 named events having ten or fewer mentions. It was decided to focus on the 87 named events that had at least 50 observations, making them statistically relevant.
#Create alphabetical list of event names, discover many slight variations on the same event, WINTER WEATHER,
#WINTER WEATHER MIX, WINTER WEATHER/MIX, WINTERY MIX, Wintry mix, Wintry Mix
EventList <- sort(unique(stormData$EVTYPE))
#Very noisy data, 818 named weather events have 10 or fewer observations out of 902,297 observations
EventList1 <- table(stormData$EVTYPE)
EventList1 <- EventList1 <= 10
sum(EventList1)
## [1] 818
#create a list of weather events with at least 50 observations
EventList2 <- table(stormData$EVTYPE)
EventList2 <- EventList2 >= 50
EventList2 <- sort(EventList2 == 1, decreasing = TRUE)
sum(EventList2 == 1)
## [1] 87
EventList3 <- EventList2[1:87]
EventList3
##
## ASTRONOMICAL HIGH TIDE ASTRONOMICAL LOW TIDE AVALANCHE
## TRUE TRUE TRUE
## BLIZZARD COASTAL FLOOD COASTAL FLOODING
## TRUE TRUE TRUE
## COLD COLD/WIND CHILL DENSE FOG
## TRUE TRUE TRUE
## DROUGHT DRY MICROBURST DUST DEVIL
## TRUE TRUE TRUE
## DUST STORM EXCESSIVE HEAT EXTREME COLD
## TRUE TRUE TRUE
## EXTREME COLD/WIND CHILL EXTREME WINDCHILL FLASH FLOOD
## TRUE TRUE TRUE
## FLASH FLOODING FLOOD FLOOD/FLASH FLOOD
## TRUE TRUE TRUE
## FLOODING FOG FREEZE
## TRUE TRUE TRUE
## FREEZING RAIN FROST FROST/FREEZE
## TRUE TRUE TRUE
## FUNNEL CLOUD FUNNEL CLOUDS GUSTY WINDS
## TRUE TRUE TRUE
## HAIL HEAT HEAT WAVE
## TRUE TRUE TRUE
## HEAVY RAIN HEAVY SNOW HEAVY SURF
## TRUE TRUE TRUE
## HEAVY SURF/HIGH SURF HIGH SURF HIGH WIND
## TRUE TRUE TRUE
## HIGH WINDS HURRICANE HURRICANE/TYPHOON
## TRUE TRUE TRUE
## ICE ICE STORM LAKE-EFFECT SNOW
## TRUE TRUE TRUE
## LANDSLIDE LIGHT SNOW LIGHTNING
## TRUE TRUE TRUE
## MARINE HAIL MARINE HIGH WIND MARINE THUNDERSTORM WIND
## TRUE TRUE TRUE
## MARINE TSTM WIND MODERATE SNOWFALL RECORD COLD
## TRUE TRUE TRUE
## RECORD HEAT RECORD WARMTH RIP CURRENT
## TRUE TRUE TRUE
## RIP CURRENTS RIVER FLOOD SLEET
## TRUE TRUE TRUE
## SNOW STORM SURGE STORM SURGE/TIDE
## TRUE TRUE TRUE
## STRONG WIND STRONG WINDS THUNDERSTORM WIND
## TRUE TRUE TRUE
## THUNDERSTORM WINDS THUNDERSTORM WINDS HAIL THUNDERSTORM WINDSS
## TRUE TRUE TRUE
## TORNADO TROPICAL DEPRESSION TROPICAL STORM
## TRUE TRUE TRUE
## TSTM WIND TSTM WIND/HAIL UNSEASONABLY DRY
## TRUE TRUE TRUE
## UNSEASONABLY WARM URBAN FLOOD URBAN FLOODING
## TRUE TRUE TRUE
## URBAN/SML STREAM FLD WATERSPOUT WILD/FOREST FIRE
## TRUE TRUE TRUE
## WILDFIRE WIND WINTER STORM
## TRUE TRUE TRUE
## WINTER WEATHER WINTER WEATHER/MIX WINTRY MIX
## TRUE TRUE TRUE
The City Council wanted to know which weather events were most harmful to our citizens. Harm was examined as the most fatal, causing the greatest number of injuries and the deadliest, or the highest ratio of deaths to injuries. The highest fatalities and injuries were calculated.
#calculate highest fatalities
stormData1 <- stormData
stormData1 <- group_by(stormData1, EVTYPE) %>% summarize(TotalInjuries = sum(INJURIES, na.rm = TRUE),
TotalFatalities = sum(FATALITIES, na.rm = TRUE))
stormData1a <- arrange(stormData1, desc(TotalFatalities))
stormData1a<- stormData1a[1:10,]
stormData1a
## # A tibble: 10 x 3
## EVTYPE TotalInjuries TotalFatalities
## <chr> <dbl> <dbl>
## 1 TORNADO 91346 5633
## 2 EXCESSIVE HEAT 6525 1903
## 3 FLASH FLOOD 1777 978
## 4 HEAT 2100 937
## 5 LIGHTNING 5230 816
## 6 TSTM WIND 6957 504
## 7 FLOOD 6789 470
## 8 RIP CURRENT 232 368
## 9 HIGH WIND 1137 248
## 10 AVALANCHE 170 224
#calculate highest injuries
stormData1b <- arrange(stormData1, desc(TotalInjuries))
stormData1b<- stormData1b[1:10,]
stormData1b
## # A tibble: 10 x 3
## EVTYPE TotalInjuries TotalFatalities
## <chr> <dbl> <dbl>
## 1 TORNADO 91346 5633
## 2 TSTM WIND 6957 504
## 3 FLOOD 6789 470
## 4 EXCESSIVE HEAT 6525 1903
## 5 LIGHTNING 5230 816
## 6 HEAT 2100 937
## 7 ICE STORM 1975 89
## 8 FLASH FLOOD 1777 978
## 9 THUNDERSTORM WIND 1488 133
## 10 HAIL 1361 15
#Interesting data, most deadly is not on list - extreme cold has few injuries, but significant deaths
stormData1c <- mutate(stormData1, Deadliest = TotalFatalities/TotalInjuries) %>% filter(Deadliest > 0 & Deadliest != "NaN" & Deadliest != "Inf") %>%
arrange(desc(Deadliest))
head(stormData1c)
## # A tibble: 6 x 4
## EVTYPE TotalInjuries TotalFatalities Deadliest
## <chr> <dbl> <dbl> <dbl>
## 1 COLD/WIND CHILL 12 95 7.916667
## 2 HURRICANE ERIN 1 6 6.000000
## 3 EXTREME COLD/WIND CHILL 24 125 5.208333
## 4 ROUGH SURF 1 4 4.000000
## 5 SNOW AND ICE 1 4 4.000000
## 6 EXTREME WINDCHILL 5 17 3.400000
Interesting to note is that the top 10 weather events that cause fatalities are not the same top 10 weather events resulting in injuries. Avalanches and Rip Currents are among the top 10 weather events that cause fatalities, but not injuries. Hail and Ice Storms are among the top 10 injury causing weather events, but not fatalities. The deadliest cause of death is Cold/Wind Chill, because the number of injuries is so low compared with the number of deaths. The overwhelming cause of both human injuries and fatalities are tornadoes, which cause significant damage and occur suddenly without warning.
Fortunately, there are many more injuries than deaths. There are more than 16 times tornado-related injuries than tornado-related deaths. The two charts below are on very different scales reflecting this.
#plot in Total Fatalities and Total Injuries in base
par(mfrow = c(2,1), mar = c(4, 6, 1, 1), las = 1)
barplot(stormData1a$TotalFatalities, col = "red", xlab = "Number of Fatalities", horiz = TRUE,
main = "Total Fatalities and Injuries from Weather Events", axisnames = TRUE,
names.arg = stormData1a$EVTYPE, cex.names = .5)
barplot(stormData1b$TotalInjuries, col = "orange", xlab = "Number of Injuries", horiz = TRUE,
axisnames = TRUE, names.arg = stormData1b$EVTYPE, cex.names = .5)
People can survive a tornado then have significant financial losses. The NOAA data includes both property and crop damage losses by event type which were calculated.
#calculate property damage
stormData2 <- stormData
stormData2 <- group_by(stormData2, EVTYPE) %>% summarize(TotalPropertyDamage = sum(PROPDMG, na.rm = TRUE),
TotalCropDamage = sum(CROPDMG, na.rm = TRUE))
stormData2a <- arrange(stormData2, desc(TotalPropertyDamage))
stormData2a<- stormData2a[1:10,]
stormData2a
## # A tibble: 10 x 3
## EVTYPE TotalPropertyDamage TotalCropDamage
## <chr> <dbl> <dbl>
## 1 TORNADO 3212258.2 100018.52
## 2 FLASH FLOOD 1420174.6 179200.46
## 3 TSTM WIND 1336073.6 109202.60
## 4 FLOOD 899938.5 168037.88
## 5 THUNDERSTORM WIND 876844.2 66791.45
## 6 HAIL 688693.4 579596.28
## 7 LIGHTNING 603351.8 3580.61
## 8 THUNDERSTORM WINDS 446293.2 18684.93
## 9 HIGH WIND 324731.6 17283.21
## 10 WINTER STORM 132720.6 1978.99
#calculate crop damage
stormData2b <- arrange(stormData2, desc(TotalCropDamage))
stormData2b<- stormData2b[1:10,]
stormData2b
## # A tibble: 10 x 3
## EVTYPE TotalPropertyDamage TotalCropDamage
## <chr> <dbl> <dbl>
## 1 HAIL 688693.38 579596.28
## 2 FLASH FLOOD 1420174.59 179200.46
## 3 FLOOD 899938.48 168037.88
## 4 TSTM WIND 1336073.61 109202.60
## 5 TORNADO 3212258.16 100018.52
## 6 THUNDERSTORM WIND 876844.17 66791.45
## 7 DROUGHT 4099.05 33898.62
## 8 THUNDERSTORM WINDS 446293.18 18684.93
## 9 HIGH WIND 324731.56 17283.21
## 10 HEAVY RAIN 50842.14 11122.80
Tornadoes cause the most significant property loss, but Hail which can cover a large geographic area causes the largest crop damage. Property damage which includes buildings and infrastructure is a significantly larger dollar amount and the different scales reflect that.
#plot property and crop damage
par(mfrow = c(2,1), mar = c(4, 6, 1, 1), las = 1, options(scipen=10))
barplot(stormData2a$TotalPropertyDamage, col = "blue", xlab = "Property Damage (US Dollars)", horiz = TRUE,
main = "Total Property & Crop Damage from Weather Events", axisnames = TRUE,
names.arg = stormData2a$EVTYPE, cex.names = .5)
barplot(stormData2b$TotalCropDamage, col = "green", xlab = "Crop Damage (US Dollars)", horiz = TRUE,
axisnames = TRUE, names.arg = stormData2b$EVTYPE, cex.names = .5)
Because these plots are based on the top 10 weather events, we wanted to make sure that it was still representative of the entire data set. The top 10 weather events caused 79.8% of the fatalities and 89.3% of the injuries. Tornadoes alone caused 37.1% of all fatalities reported and 65.0% of all injuries.
#calculate percentage that top 10 of total fatalities and injuries are of total
sum(stormData1a$TotalFatalities)/sum(stormData1$TotalFatalities)
## [1] 0.797689
sum(stormData1b$TotalInjuries)/sum(stormData1$TotalInjuries)
## [1] 0.893402
#calculate percentage of Torando of total
stormData1a[1,3]/sum(stormData1$TotalFatalities)
## TotalFatalities
## 1 0.3719379
stormData1a[1,2]/sum(stormData1$TotalInjuries)
## TotalInjuries
## 1 0.6500199
We also checked property and crop damage to make sure it also represented the weather events for the most economic risk. The top 10 weather-related property events were 91.3% of the total property damage and the top 10 weather related crop events were 93.1% of the all crop damages. It seems property and crop damage comes in many more forms as tornado-related property damage was only 7.3% of total damage and hail was 6.3% of all crop damage.
#calculate percentage that top 10 of total property and crop damage are of total
sum(stormData2a$TotalPropertyDamage)/sum(stormData2$TotalPropertyDamage)
## [1] 0.9133244
sum(stormData2b$TotalCropDamage)/sum(stormData2$TotalCropDamage)
## [1] 0.9317835
#calculate percentage of Hail or Torando of total
stormData2a[1,3]/sum(stormData2$TotalCropDamage)
## TotalCropDamage
## 1 0.07259148
stormData2b[1,2]/sum(stormData2$TotalPropertyDamage)
## TotalPropertyDamage
## 1 0.06327285
dev.off()
## null device
## 1