This document will serve as the final assessment for the Reproducible Research Course sponsored by Johns Hopkins University and accessed via Coursera.
The goal of this assessment is to process storm data from NOAA to explore the risks to both human life/health and the economic consequences of different types of storms.
The following code serves to set defaults, load libraries, and reset one default parameter in RStudio.
Data Acquisition
Storm data from the year 1950 and end in November 2011 as collected by the National Weather Service are used as found from the following site:
https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
Additional data (descriptive) are available at:
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events[FAQ] (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "repdata_data_StormData.csv.bz2"
download.file(url,destfile)
weather <- read.csv(destfile)
summary(weather)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.00 Min. : 0.0 Min. : 0.00000 Min. : 0.0000
## 1st Qu.:0.00 1st Qu.: 0.0 1st Qu.: 0.00000 1st Qu.: 0.0000
## Median :1.00 Median : 50.0 Median : 0.00000 Median : 0.0000
## Mean :0.91 Mean : 46.9 Mean : 0.01678 Mean : 0.1557
## 3rd Qu.:1.00 3rd Qu.: 75.0 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. :5.00 Max. :22000.0 Max. :583.00000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
head(weather)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
start <- min(weather$BGN_DATE)
end <- max(weather$BGN_DATE)
These data were collected from 1/1/1966 0:00:00 until 9/9/2011 0:00:00
Given the changes in the types and amount of data collected across the sampling period, for this analysis, all data will be summarized including all years of data, collapsed together.
Explore human health consequences for different event types.
Examine EVTYPE (event) variable to determine which storms present the greatest hazard to human health.
Note, there are hundreds of event types documented, many of which have no corresponding deaths or fatalities. Here, we explore the fatality numbers associated with different event types:
# calculate mean fatalities for each EVTYPE
sumfatality <- weather %>%
group_by(EVTYPE) %>%
summarize(sumdeaths = sum(FATALITIES, na.rm = TRUE)) %>%
#arrange(desc(sumdeaths)) %>%
slice_max(order_by = sumdeaths, n = 20)
print(sumfatality)
## # A tibble: 20 × 2
## EVTYPE sumdeaths
## <chr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
## 11 WINTER STORM 206
## 12 RIP CURRENTS 204
## 13 HEAT WAVE 172
## 14 EXTREME COLD 160
## 15 THUNDERSTORM WIND 133
## 16 HEAVY SNOW 127
## 17 EXTREME COLD/WIND CHILL 125
## 18 STRONG WIND 103
## 19 BLIZZARD 101
## 20 HIGH SURF 101
Here, we explore the number of injuries associated with different event types:
# calculate mean injuries for each EVTYPE
suminjury <- weather %>%
group_by(EVTYPE) %>%
summarize(sumhurt = mean(INJURIES, na.rm = TRUE)) %>%
arrange(desc(sumhurt),.by_group = TRUE) %>%
slice(1:20)
print(suminjury)
## # A tibble: 20 × 2
## EVTYPE sumhurt
## <chr> <dbl>
## 1 Heat Wave 70
## 2 TROPICAL STORM GORDON 43
## 3 WILD FIRES 37.5
## 4 THUNDERSTORMW 27
## 5 HIGH WIND AND SEAS 20
## 6 SNOW/HIGH WINDS 18
## 7 GLAZE/ICE STORM 15
## 8 HEAT WAVE DROUGHT 15
## 9 WINTER STORM HIGH WINDS 15
## 10 HURRICANE/TYPHOON 14.5
## 11 WINTER WEATHER MIX 11.3
## 12 EXTREME HEAT 7.05
## 13 NON-SEVERE WIND DAMAGE 7
## 14 GLAZE 6.75
## 15 TSUNAMI 6.45
## 16 WINTER STORMS 5.67
## 17 TORNADO F2 5.33
## 18 EXCESSIVE RAINFALL 5.25
## 19 WATERSPOUT/TORNADO 5.25
## 20 HEAT WAVE 4.18
Explore the economic consequences of different storm types.
Examine EVTYPE (event) variable to determine which storms present the greatest hazard to economics.
Here, I summarize the costs due to crop damage with those due to property damage, then provide the top 20 event types corresponding to economic loss.
Note that these data report losses in the thousands of dollars.
# calculate mean economic cost for each EVTYPE
sumcosts <- weather %>%
mutate(costs = PROPDMG + CROPDMG) %>%
group_by(EVTYPE) %>%
summarize(sumcost = sum(costs, na.rm = TRUE)) %>%
arrange(desc(sumcost)) %>%
slice_max(order_by = sumcost, n = 20)
print(sumcosts[1:20,])
## # A tibble: 20 × 2
## EVTYPE sumcost
## <chr> <dbl>
## 1 TORNADO 3312277.
## 2 FLASH FLOOD 1599325.
## 3 TSTM WIND 1445168.
## 4 HAIL 1268290.
## 5 FLOOD 1067976.
## 6 THUNDERSTORM WIND 943636.
## 7 LIGHTNING 606932.
## 8 THUNDERSTORM WINDS 464978.
## 9 HIGH WIND 342015.
## 10 WINTER STORM 134700.
## 11 HEAVY SNOW 124418.
## 12 WILDFIRE 88824.
## 13 ICE STORM 67690.
## 14 STRONG WIND 64611.
## 15 HEAVY RAIN 61965.
## 16 HIGH WINDS 57385.
## 17 TROPICAL STORM 54323.
## 18 WILD/FOREST FIRE 43534.
## 19 DROUGHT 37998.
## 20 FLASH FLOODING 33623.
Graphically demonstrate the effects of weather events on human health during the data collection from 1/1/1966 0:00:00 until 9/9/2011 0:00:00.
ggplot(sumfatality, aes(x = reorder(EVTYPE, -sumdeaths),y = sumdeaths)) +
geom_bar(stat="identity",fill = "#f68060", width = 0.5) +
coord_flip() +
ylab("Total Deaths per Event Type") +
xlab("Event Type")
Here, we can see that tornadoes are, by far, the most deadly of the event types. Fatalities due to tornadoes exceeded 5000 during the data period.
ggplot(suminjury, aes(x = reorder(EVTYPE, - sumhurt), y = sumhurt)) +
geom_bar(stat="identity",fill = "#60d6f6", width = 0.4) +
coord_flip() +
ylab("Total Injuries per Event Type") +
xlab("Event Type")
Here, we can see that while tornadoes do cause some injuries, heat is implicated in far more cases of injury than are storms.
ggplot(sumcosts, aes(x = reorder(EVTYPE, -sumcost), y = sumcost)) +
geom_bar(stat="identity",fill = "forestgreen", width = 0.5) +
coord_flip() +
ylab("Total Cost per Event Type (thousands of dollars)") +
xlab("Event Type")
Finally, we consider the economic costs to different forms of weather events. The costs in both property damage and crop damage were summed as both of these are significant economic indicators.
The combined costs to property and crops are the greatest due to tornadoes and flash floods, the sum of damages by these events over the time period approaches $5B.
To limit economic and human consequences from extreme weather, investment in stronger predictive forecasting should be a priority.