Storms and other severe weather events have been a serious problem in terms of public health and economic impact for communities and municipalities.
This analysis involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database collects the data related to the major storms and weather events in the United States, including the time and the place of their occurance, as well as the estimates related to any fatalities, injuries, and property damage.
The following questions are addressed in this analysis:
Which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health across the United States?
Which types of events have the greatest economic consequences across the United States?
After the data by storm events type are aggregared and analyzed, the following conclusions are reached:
Tornados are the most harmfull events when it comes to the population health (including both injuries and fatalities).
Floods are the most responsible when it comes to the economic damage.
The storm data can be downloaded from the following website: Storm Data.
The documentation of the storm database can be downloaded from the following website: Storm Data Documentation.
Finally, the National Climatic Data Center Storm Events FAQ can be found on the following website: National Climatic Data Center Storm Events FAQ.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
Set working directory.
setwd("C:/Users/Owner/Documents/Coursera/NOAA")
Download the storm dataset (if not present) and load it into R.
data.file.name <- "C:/Users/Owner/Documents/Coursera/NOAA/repdata-data-StormData.csv.bz2"
if (!file.exists(data.file.name)) {
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url = url, destfile = data.file.name)
}
storm.data <- read.csv("C:/Users/Owner/Documents/Coursera/NOAA/repdata-data-StormData.csv.bz2")
Let us examine the size of the storm dataset and list the first six rowns.
storm.data$EVTYPE = toupper(storm.data$EVTYPE)
dim(storm.data)
## [1] 902297 37
head(storm.data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
This storm dataset contains substantially more information than it is required for the current analysis. Only the required data related to the health and the economic impact will be extracted.
storm.event <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
req.storm.data <- storm.data[storm.event]
Let us examine the size of the storm dataset that is now reduced to just seven required columns and let us list again the first six rowns.
dim(req.storm.data)
## [1] 902297 7
head(req.storm.data)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
Property damage exponents (PROPDMGEXP) are listed and assigned appropriate numerical values. Invalid data are excluded by assigning the value of zero. The property damage value is calculated by multiplying the property damage (PROPDMG) and the property exponent value extracted from property damage exponents (PROPDMGEXP).
unique(req.storm.data$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "K"] <- 1e+03
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "M"] <- 1e+06
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "" ] <- 1e+00
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "B"] <- 1e+09
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "m"] <- 1e+06
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "0"] <- 1e+00
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "5"] <- 1e+05
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "6"] <- 1e+06
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "4"] <- 1e+04
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "2"] <- 1e+02
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "3"] <- 1e+03
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "h"] <- 1e+02
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "7"] <- 1e+07
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "H"] <- 1e+02
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "1"] <- 1e+01
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "8"] <- 1e+08
# Assign the value of zero to invalid exponent data
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "+"] <- 0.0
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "-"] <- 0.0
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "?"] <- 0.0
# Calculate the value of property damage
req.storm.data$PROPDMGVAL <- req.storm.data$PROPDMG * req.storm.data$PROPEXP
Crop damage exponents (CROPDMGEXP) are listed and assigned appropriate numerical values. Invalid data are excluded by assigning the value of zero. The crop damage value is calculated by multiplying the crop damage (CROPDMG) and the crop exponent value extracted from crop damage exponents (CROPDMGEXP).
unique(req.storm.data$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "M"] <- 1e+06
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "K"] <- 1e+03
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "m"] <- 1e+06
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "B"] <- 1e+09
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "0"] <- 1e+00
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "k"] <- 1e+03
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "2"] <- 1e+02
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "" ] <- 1e+00
# Assigning the value of zero to invalid exponent data
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "?"] <- 0.0
# Calculate the value of crop damage
req.storm.data$CROPDMGVAL <- req.storm.data$CROPDMG * req.storm.data$CROPEXP
In this analysis only FATALITIES and INJURIES are selected and designated as “most harmful to population health” events. Similarly, only PROPDMG and CROPDMG are selected and designated as events that have the “greatest economic consequences”.
The total values are calculated for each of these four incident types.
total.fatalities <- aggregate(FATALITIES ~ EVTYPE, req.storm.data, FUN = sum)
total.injuries <- aggregate(INJURIES ~ EVTYPE, req.storm.data, FUN = sum)
total.propdmg <- aggregate(PROPDMGVAL ~ EVTYPE, req.storm.data, FUN = sum)
total.cropdmg <- aggregate(CROPDMGVAL ~ EVTYPE, req.storm.data, FUN = sum)
Top ten causes of fatalities and injuries are calculated and plotted. Clearly, TORNADOES are the number one cause of both fatalities and injuries in the United States.
top.fatalities <- total.fatalities[order(-total.fatalities$FATALITIES), ][1:10, ]
top.injuries <- total.injuries[order(-total.injuries$INJURIES), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(top.fatalities$FATALITIES, las = 3, names.arg = top.fatalities$EVTYPE, main = "Main Causes of Fatalities",
ylab = "Number of Fatalities", col = "red")
barplot(top.injuries$INJURIES, las = 3, names.arg = top.injuries$EVTYPE, main = "Main Causes of Injuries",
ylab = "Number of Injuries", col = "red")
Top ten causes of property and crop damage are calculated and plotted. Clearly, FLOODS are the number one cause of property damage and DROUGHTS are the number one cause of crop damage in the United States.
top.propdmg <- total.propdmg[order(-total.propdmg$PROPDMGVAL), ][1:10, ]
top.cropdmg <- total.cropdmg[order(-total.cropdmg$CROPDMGVAL), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(top.propdmg$PROPDMGVAL/10^9, las = 3, names.arg = top.propdmg$EVTYPE, main = "Main Causes of Property Damage",
ylab = "Cost of Damage ($ Billions)", col = "blue")
barplot(top.cropdmg$CROPDMGVAL/10^9, las = 3, names.arg = top.cropdmg$EVTYPE, main = "Main Causes of Crop Damage",
ylab = "Cost of Damage ($ Billions)", col = "blue")
TORNADOES are by far the number one cause of both fatalities and injuries in the United States.
FLOODS are the number one cause of property damage and DROUGHTS are the number one cause of crop damage in the United States.