Weather events can cause public health and economic problems. Severe events result in fatalities, injuries, and property damage.
The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database will be explored. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, which type of event, as well as the estimates of relevant fatalities, injuries, and various forms of damage.
This analysis examines the damaging effects of severe weather conditions (e.g. thunderstorms, floods, etc.) on human populations and the econonomy in the U.S. from 1950 to 2011. As a result, the analysis will highlight the severe weather events associated with the greatest impact on the economy and population health.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The data was downloaded from the URL provided in the course project page. The file was unzipped and extracted to the working directory manually.
data = read.csv("repdata_data_StormData.csv")
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
The dataset consists of a lot of variables (columns) which are not required. Therefore, only the project relevant columns will be kept.
imp = c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
"PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
rel_data = data[imp]
Processing property damage data:
We will list the property damage exponents (PROPDMGEXP) for each leveland assigne those values for the property exponent data. Invalid data will be excluded by assigning the value as ‘0’. Then property damage value will be calculated by multiplying the property damage and property exponent value.
unique(rel_data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
# Assigning values for the property exponent rel_data
rel_data$PROPEXP[rel_data$PROPDMGEXP == "K"] = 1000
rel_data$PROPEXP[rel_data$PROPDMGEXP == "M"] = 1e+06
rel_data$PROPEXP[rel_data$PROPDMGEXP == ""] = 1
rel_data$PROPEXP[rel_data$PROPDMGEXP == "B"] = 1e+09
rel_data$PROPEXP[rel_data$PROPDMGEXP == "m"] = 1e+06
rel_data$PROPEXP[rel_data$PROPDMGEXP == "0"] = 1
rel_data$PROPEXP[rel_data$PROPDMGEXP == "5"] = 1e+05
rel_data$PROPEXP[rel_data$PROPDMGEXP == "6"] = 1e+06
rel_data$PROPEXP[rel_data$PROPDMGEXP == "4"] = 10000
rel_data$PROPEXP[rel_data$PROPDMGEXP == "2"] = 100
rel_data$PROPEXP[rel_data$PROPDMGEXP == "3"] = 1000
rel_data$PROPEXP[rel_data$PROPDMGEXP == "h"] = 100
rel_data$PROPEXP[rel_data$PROPDMGEXP == "7"] = 1e+07
rel_data$PROPEXP[rel_data$PROPDMGEXP == "H"] = 100
rel_data$PROPEXP[rel_data$PROPDMGEXP == "1"] = 10
rel_data$PROPEXP[rel_data$PROPDMGEXP == "8"] = 1e+08
# Assigning '0' to invalid exponent rel_data
rel_data$PROPEXP[rel_data$PROPDMGEXP == "+"] = 0
rel_data$PROPEXP[rel_data$PROPDMGEXP == "-"] = 0
rel_data$PROPEXP[rel_data$PROPDMGEXP == "?"] = 0
# Calculating the property damage value
rel_data$PROPDMGVAL = rel_data$PROPDMG * rel_data$PROPEXP
Processing crop damage data:
We will use the same process for crop damage data.
unique(rel_data$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
# Assigning values for the crop exponent rel_data
rel_data$CROPEXP[rel_data$CROPDMGEXP == "M"] = 1e+06
rel_data$CROPEXP[rel_data$CROPDMGEXP == "K"] = 1000
rel_data$CROPEXP[rel_data$CROPDMGEXP == "m"] = 1e+06
rel_data$CROPEXP[rel_data$CROPDMGEXP == "B"] = 1e+09
rel_data$CROPEXP[rel_data$CROPDMGEXP == "0"] = 1
rel_data$CROPEXP[rel_data$CROPDMGEXP == "k"] = 1000
rel_data$CROPEXP[rel_data$CROPDMGEXP == "2"] = 100
rel_data$CROPEXP[rel_data$CROPDMGEXP == ""] = 1
# Assigning '0' to invalid exponent rel_data
rel_data$CROPEXP[rel_data$CROPDMGEXP == "?"] = 0
# calculating the crop damage value
rel_data$CROPDMGVAL = rel_data$CROPDMG * rel_data$CROPEXP
We will find the events that caused the most fatalities (top 10). For this, we will use the aggregrate() function to sum the fatalities caused by each event and then use dplyr package’s %>% (chaining) operator (to save some typing) and arrange() function to easily arrange the rows to see which events caused the highest no. of fatalities.
fatalities = aggregate(FATALITIES ~ EVTYPE, data = rel_data, sum)
ordered_fatalities = fatalities %>% arrange(desc(FATALITIES))
t10f = ordered_fatalities[1:10,]
t10f
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
Next, we’ll do the same to find out the events that caused the highest no. of injuries.
injuries = aggregate(INJURIES ~ EVTYPE, data = rel_data, sum)
ordered_injuries = injuries %>% arrange(desc(INJURIES))
t10i = ordered_injuries[1:10,]
t10i
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
Now, we can plot our findings:
par(mfrow = c(1,2), mar = c(11, 5, 3, 2), mgp = c(3,1,0), cex = 0.8)
barplot(t10f$FATALITIES, names.arg = t10f$EVTYPE, col = "red",
las = 3,
ylab = "Fatalities",
main = "Top 10 Fatalities")
barplot(t10i$INJURIES, names.arg = t10i$EVTYPE, col = "yellow",
las = 3,
ylab = "Injuries",
main = "Top 10 Injuries")
Figure 1: Top 10 events causing the highest no. of fatalities and injuries.
The two significant damages to the economy are the property and crop damages. Upon closer examination, we see that the columns PROPDMG and CROPDMG are related to PROPDMGEXP and CROPDMGEXP columns. The data has already been processed in the Data Processing section.
We repeat the same steps as we took in calculating the fatalities and injuries from different events.
Finding events causing highest property damage:
prop = aggregate(PROPDMGVAL ~ EVTYPE, data = rel_data, sum)
ordered_prop = prop %>% arrange(desc(PROPDMGVAL))
t10p = ordered_prop[1:10,]
t10p
## EVTYPE PROPDMGVAL
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56947380617
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16822673979
## 6 HAIL 15735267513
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046260
Finding events causing highest crop damage:
crop = aggregate(CROPDMGVAL ~ EVTYPE, data = rel_data, sum)
ordered_crop = crop %>% arrange(desc(CROPDMGVAL))
t10c = ordered_crop[1:10,]
t10c
## EVTYPE CROPDMGVAL
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954473
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
Now, we plot our findings:
par(mfrow = c(1,2), mar = c(11, 5, 3, 2), mgp = c(3,1,0), cex = 0.8)
barplot(t10p$PROPDMGVAL/(10^9), names.arg = t10p$EVTYPE, col = "red",
las = 3,
ylab = "Property Damage (Billion USD)",
main = "Top 10 Property Damages")
barplot(t10c$CROPDMGVAL/(10^9), names.arg = t10c$EVTYPE, col = "yellow",
las = 3,
ylab = "Crop Damage (Billion USD)",
main = "Top 10 Crop Damages")
Figure 2: Top 10 events causing the highest economic damage.
Tornados have caused the highest number of fatalities (5633) as well as injuries (91346), followed by excessive heat for fatalities (1903) and thunderstorm winds for injuries (6957).
Floods have caused the highest property damage (~144 billion USD), while droughts have caused the highest crop damage (~13.9 billion USD)