Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This aim of this report consists in exploring the NOAA storm database containing data on extreme natural events. The events in the database start in the year 1950 and end in November 2011. The purpose of this analysis is to answer the following two questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Main conclusions of the study:
1. Tornado is the most harmful event with more than 5600 deaths and
91400 injuries.
2. Floods are the type of events causing the most significant economic
damage with more than 157 billion USD.
#Load the data The data for this report are available here.
Some documentation of the variables data which is available here.
First, lets’s download the data file and unzip it.
if (!"repdata_data_StormData.csv.bz2" %in% dir("./")) {
print("Downloading File.....")
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "repdata_data_StormData.csv.bz2")
}
Then, let’s read the csv file and save the data in a database called storm
if (!"storm" %in% ls()) {
storm <- read.csv(bzfile("repdata_data_StormData.csv.bz2"), sep = ",", header = TRUE, stringsAsFactors = FALSE)
}
dim(storm)
## [1] 902297 37
str(storm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Following the documentation, there are 48 types of events which we will save in the variable “events”.
events <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme cold/Wind Chill", "Flash Flood", "Flood", "Freezing", "Frost/Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
In addition, some events are combined events. As such, regular expressions are needed to extract the part of the event.
events_regex <- c("Astronomical Low Tide|Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme cold/Wind Chill|Extreme Cold|Wind Chill", "Flash Flood", "Flood", "Freezing", "Frost/Freeze|Frost|Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon|Hurricane|Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind|Marine tstm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind|tstm wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
Let’s extract some relevant columns from *storm” for our analysis. These are the following:
EVTYPE: Type of event
FATALITIES: Number of fatalities
INJURIES: Number of injuries
PROPDMG: Amount of property damage in orders of magnitude
PROPDMGEXP: Order of magnitude for property damage
CROPDMG: Amount of crop damage in orders of magnitude
PROPDMGEXP: Order of magnitude for crop damage
newdata <- data.frame(EVTYPE = character(0), FATALITIES = numeric(0), INJURIES = numeric(0), PROPDMG = numeric(0), PROPDMGEXP = character(0), CROPDMG = numeric(0), CROPDMGEXP = character(0))
for (i in 1:length(events)) {
rows <- storm[grep(events_regex[i], ignore.case = TRUE, storm$EVTYPE), ]
rows <- rows[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
CLEANNAME <- c(rep(events[i], nrow(rows)))
rows <- cbind(rows, CLEANNAME)
newdata <- rbind(newdata, rows)
}
The order of magnitude for property and crop damage are labelled with letters H,K,M and B. We will convert these letters into integers like shown below:
newdata[(newdata$PROPDMGEXP == "K" | newdata$PROPDMGEXP == "k"), ]$PROPDMGEXP <- 3
newdata[(newdata$PROPDMGEXP == "M" | newdata$PROPDMGEXP == "m"), ]$PROPDMGEXP <- 6
newdata[(newdata$PROPDMGEXP == "B" | newdata$PROPDMGEXP == "b"), ]$PROPDMGEXP <- 9
newdata[(newdata$CROPDMGEXP == "K" | newdata$CROPDMGEXP == "k"), ]$CROPDMGEXP <- 3
newdata[(newdata$CROPDMGEXP == "M" | newdata$CROPDMGEXP == "m"), ]$CROPDMGEXP <- 6
newdata[(newdata$CROPDMGEXP == "B" | newdata$CROPDMGEXP == "b"), ]$CROPDMGEXP <- 9
Let’s convert the property and crops damage as well.
suppressWarnings(newdata$PROPDMG <- newdata$PROPDMG * 10^as.numeric(newdata$PROPDMGEXP))
suppressWarnings(newdata$CROPDMG <- newdata$CROPDMG * 10^as.numeric(newdata$CROPDMGEXP))
The total economic damage is the sum of the property and crops damages.
suppressWarnings(TOTECODMG <- newdata$PROPDMG + newdata$CROPDMG)
newdata <- cbind(newdata, TOTECODMG)
Let’s delete the columns ‘PROPDMGEXP’ and ‘CROPDMGEXP’ which are not needed now that we have the total damage variable.
newdata <- newdata[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG", "CLEANNAME", "TOTECODMG")]
We are now ready to start answering the two questions.
##Results
Let’s aggregate the data for fatalities.
fatalities <- aggregate(FATALITIES ~ CLEANNAME, data = newdata, FUN = sum)
fatalities <- fatalities[order(fatalities$FATALITIES, decreasing = TRUE), ]
# 10 most harmful causes of fatalities
MaxFatalities <- fatalities[1:10, ]
print(MaxFatalities)
## CLEANNAME FATALITIES
## 38 Tornado 5661
## 19 Heat 3138
## 11 Excessive Heat 1922
## 14 Flood 1525
## 13 Flash Flood 1035
## 28 Lightning 817
## 37 Thunderstorm Wind 753
## 33 Rip Current 577
## 12 Extreme cold/Wind Chill 382
## 23 High Wind 299
Do the same for Injuries
injuries <- aggregate(INJURIES ~ CLEANNAME, data = newdata, FUN = sum)
injuries <- injuries[order(injuries$INJURIES, decreasing = TRUE), ]
# 10 most harmful causes of injuries
MaxInjuries <- injuries[1:10, ]
print(MaxInjuries)
## CLEANNAME INJURIES
## 38 Tornado 91407
## 37 Thunderstorm Wind 9493
## 19 Heat 9224
## 14 Flood 8604
## 11 Excessive Heat 6525
## 28 Lightning 5232
## 25 Ice Storm 1992
## 13 Flash Flood 1802
## 23 High Wind 1523
## 18 Hail 1467
Let’s plot a pair of graphs of Total Fatalities and Total Injuries caused by these natural events.
par(mfrow = c(1, 2), mar = c(15, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(MaxFatalities$FATALITIES, las = 3, names.arg = MaxFatalities$CLEANNAME, main = "Events with\n The Top 10 Highest Fatalities", ylab = "Number of Fatalities", col = "blue")
barplot(MaxInjuries$INJURIES, las = 3, names.arg = MaxInjuries$CLEANNAME, main = "Events with\n The Top 10 Highest Injuries", ylab = "Number of Injuries", col = "blue")
Based on the above histograms, most fatalities have been caused by Tornado and Heat.Tornado had caused most injuries across the United States between 1995 and 2011.
As for the impact on public health, we create two sorted lists below to aggregate the data by damages.
First, let’s aggregate the data for Property Damage.
propdmg <- aggregate(PROPDMG ~ CLEANNAME, data = newdata, FUN = sum)
propdmg <- propdmg[order(propdmg$PROPDMG, decreasing = TRUE), ]
# 5 most harmful causes of injuries
propdmgMax <- propdmg[1:10, ]
print(propdmgMax)
## CLEANNAME PROPDMG
## 14 Flood 168212215589
## 24 Hurricane/Typhoon 85356410010
## 38 Tornado 58603317864
## 18 Hail 17622990956
## 13 Flash Flood 17588791879
## 37 Thunderstorm Wind 11575228673
## 40 Tropical Storm 7714390550
## 45 Winter Storm 6749997251
## 23 High Wind 6166300000
## 44 Wildfire 4865614000
Do the same with the data for Crop Damage
cropdmg <- aggregate(CROPDMG ~ CLEANNAME, data = newdata, FUN = sum)
cropdmg <- cropdmg[order(cropdmg$CROPDMG, decreasing = TRUE), ]
# 5 most harmful causes of injuries
cropdmgMax <- cropdmg[1:10, ]
print(cropdmgMax)
## CLEANNAME CROPDMG
## 8 Drought 13972621780
## 14 Flood 12380109100
## 24 Hurricane/Typhoon 5516117800
## 25 Ice Storm 5022113500
## 18 Hail 3114212870
## 16 Frost/Freeze 1997061000
## 13 Flash Flood 1532197150
## 12 Extreme cold/Wind Chill 1313623000
## 37 Thunderstorm Wind 1255947980
## 19 Heat 904469280
Finally, we aggregate Total Economic Damage
ecodmg <- aggregate(TOTECODMG ~ CLEANNAME, data = newdata, FUN = sum)
ecodmg <- ecodmg[order(ecodmg$TOTECODMG, decreasing = TRUE), ]
The 5 most harmful causes of property damage are:
ecodmgMax <- ecodmg[1:10, ]
print(ecodmgMax)
## CLEANNAME TOTECODMG
## 14 Flood 157764680787
## 24 Hurricane/Typhoon 44330000800
## 38 Tornado 18172843863
## 18 Hail 11681050140
## 13 Flash Flood 9224527227
## 37 Thunderstorm Wind 7098296330
## 25 Ice Storm 5925150850
## 44 Wildfire 3685468370
## 23 High Wind 3472442200
## 8 Drought 1886667000
Let’s plot the graphs of total property damages, total crop damages and total economic damages caused by these natural events.
par(mfrow = c(1, 3), mar = c(15, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(propdmgMax$PROPDMG/(10^9), las = 3, names.arg = propdmgMax$CLEANNAME, main = "Top 10 Events with\nGreatest Property Damages", ylab = "Cost of damages (in $ billions)", col = "blue")
barplot(cropdmgMax$CROPDMG/(10^9), las = 3, names.arg = cropdmgMax$CLEANNAME, main = "Top 10 Events with\nGreatest Crop Damages", ylab = "Cost of damages (in $ billions)", col = "blue")
barplot(ecodmgMax$TOTECODMG/(10^9), las = 3, names.arg = ecodmgMax$CLEANNAME, main = "Top 10 Events with\nGreatest Economic Damages", ylab = "Cost of damages (in $ billions)", col = "blue")
The events with the greatest economic consequences are:
Flood, Drought,
Tornado and Typhoon.
Across the United States, Flood,
Tornado and Typhoon have caused the
greatest damage to properties.
Drought and Flood had been the causes
for the greatest damage to crops.
Main conclusions of the study:
1. Tornado is the most harmful event with more than 5600 deaths and
90000 injuries.
2. Floods are the type of events causing the most significant economic
damage with more than 150 billion USD.