Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
Here we explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
We determine the the event type that is most harmful with respect to population health to be torndado, and the event type that has the greatest economic consequences across the U.S. to be flood.
Both data and documentation are available on the Coursera website.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "data.csv.bz2", method = "curl")
original_data <- read.csv("data.csv.bz2", header = TRUE, stringsAsFactors = FALSE)
# check structure
str(original_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
There are 902297 observations of 37 variables. The events start in 1950 and continue to 2011. We will consider the following variables to answer the questions: state, type of event, number of fatalities and injuries and estimates of damage.
# subset data frame
data <- original_data[, c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
#look at structure
str(data)
## 'data.frame': 902297 obs. of 8 variables:
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
# check for missing values
sum(is.na(data))
## [1] 0
This data frame has 902297 observations in 8 variables. It has no missing values.
First, we investigate the event type that caused the largest total number of fatalities from 1950 to November 2011.
fatal <- aggregate (FATALITIES ~ EVTYPE, data, sum)
# order
fatal <- fatal[order(fatal$FATALITIES, decreasing = TRUE) ,]
# plot
barplot(height = fatal$FATALITIES[1:20], names.arg = fatal$EVTYPE[1:20], las = 2, cex.names = .3, col = rainbow(20, start = 0, end = .5))
title(main = "Top 20 Severe Weather Events that Cause Fatalities from 1950-2011")
title (ylab = "Total Number of Fatalities")
The above plot shows the total number of fatalities for the top 20 severe weather events. Tornados have claimed the most fatalities over 1950-2011.
Next, we investigate the event type that caused the largest total number of injuries from 1950 to November 2011.
injury <- aggregate (INJURIES ~ EVTYPE, data, sum)
# order
injury <- injury[order(injury$INJURIES, decreasing = TRUE) ,]
# plot
barplot(height = injury$INJURIES[1:20]/1000, names.arg = injury$EVTYPE[1:20], las = 2, cex.names = .3, col = rainbow(20, start = 0, end = .5))
title(main = "Top 20 Severe Weather Events that Cause Injuries from 1950-2011")
title (ylab = "Total Number of Injuries in Thousands")
The above plot shows the total number of fatalities for the top 20 severe weather events. Tornados are responsible for the most injuries over 1950-2011. Across the U.S., tornadoes are the severe weather event most harmful with respect to population health, as they cause the most fatalities and injuries.
We now investigate the total ecomonic domage produced by each type of severe weather event. We consider both property and crop damages in the U.S. over 1950-2011.
# check exponent data-- there is a mix of "K", "M", etc.
unique(data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
# force to upper case for consistency
data$PROPDMGEXP <- toupper(data$PROPDMGEXP)
# look at values and counts
table(data$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B H K M
## 4 5 1 40 7 424665 11337
# convert the exponents to numerical values
data$PROPEXP[data$PROPDMGEXP == "K"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "M"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == ""] <- 1
data$PROPEXP[data$PROPDMGEXP == "B"] <- 1e+09
data$PROPEXP[data$PROPDMGEXP == "0"] <- 1
data$PROPEXP[data$PROPDMGEXP == "5"] <- 1e+05
data$PROPEXP[data$PROPDMGEXP == "6"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "4"] <- 10000
data$PROPEXP[data$PROPDMGEXP == "2"] <- 100
data$PROPEXP[data$PROPDMGEXP == "3"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "7"] <- 1e+07
data$PROPEXP[data$PROPDMGEXP == "H"] <- 100
data$PROPEXP[data$PROPDMGEXP == "1"] <- 10
data$PROPEXP[data$PROPDMGEXP == "8"] <- 1e+08
# set any invalid exponent to zero
data$PROPEXP[data$PROPDMGEXP == "+"] <- 0
data$PROPEXP[data$PROPDMGEXP == "-"] <- 0
data$PROPEXP[data$PROPDMGEXP == "?"] <- 0
# compute property damage
data$PROPDMGVAL <- data$PROPDMG * data$PROPEXP
# aggregate
propdmg <- aggregate(PROPDMGVAL ~ EVTYPE, data = data, FUN = sum)
### Prepare the Crop Damage Data
unique(data$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
# force to upper case for consistency
data$CROPDMGEXP <- toupper(data$CROPDMGEXP)
# look at values and counts
table(data$CROPDMGEXP)
##
## ? 0 2 B K M
## 618413 7 19 1 9 281853 1995
# convert the exponents to numerical values
data$CROPEXP[data$CROPDMGEXP == "M"] <- 1e+06
data$CROPEXP[data$CROPDMGEXP == "K"] <- 1000
data$CROPEXP[data$CROPDMGEXP == "B"] <- 1e+09
data$CROPEXP[data$CROPDMGEXP == "0"] <- 1
data$CROPEXP[data$CROPDMGEXP == "2"] <- 100
data$CROPEXP[data$CROPDMGEXP == ""] <- 1
# set any invalid exponent to zero
data$CROPEXP[data$CROPDMGEXP == "?"] <- 0
# compute the crop damage
data$CROPDMGVAL <- data$CROPDMG * data$CROPEXP
# aggregate
cropdmg <- aggregate(CROPDMGVAL ~ EVTYPE, data = data, FUN = sum)
# get top 10 events with highest property damage
propdmg10 <- propdmg[order(-propdmg$PROPDMGVAL), ][1:10, ]
# get top 10 events with highest crop damage
cropdmg10 <- cropdmg[order(-cropdmg$CROPDMGVAL), ][1:10, ]
# make a panel plot
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(propdmg10$PROPDMGVAL/(10^9), las = 3, names.arg = propdmg10$EVTYPE,
main = "Top 10 Events with Greatest Property Damages", ylab = "Cost of damages ($ billions)",
col = "blue")
barplot(cropdmg10$CROPDMGVAL/(10^9), las = 3, names.arg = cropdmg10$EVTYPE,
main = "Top 10 Events With Greatest Crop Damages", ylab = "Cost of damages ($ billions)",
col = "blue")
The severe weather event that caused the greatest amount of property damage over 1950-2011 across the U.S. is flooding. The severe weather event that caused the greatest amount of crop damage over 1950-2011 across the U.S. is drought.