This report analyses the affect that severe weather conditions can have on public health and the economics within the U.S. The analysis is based on historical data collected from 1950 to November 2011. For the purpose of this analysis the Storm Data was used. The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, include when and where they occur, aswell as estimates of any fatalities, injuries and property damage. From this data, we found that tornadoes causes the most amount of harm to public health, whereas flood and drought cause the most amount of damage to property and crops.
The data used for this analysis is the Storm Events Database and comes in comma-separated-value file compressed via the bzip2. The documentation for the database is available from the National Weather Service.
Load the required libraries
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(gridExtra)
Load the data (i.e. read.csv())
storm_data <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE, stringsAsFactors = FALSE)
The dim command returns that the data contains 902297 observations and 37 variables.
dim(storm_data)
## [1] 902297 37
The structure of the dataset using the str command.
str(storm_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
The first 6 lines of the dataset using the head command.
head(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
List all the variables within the data using the names command to determine which columns will be needed to answer the questions.
names(storm_data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
The dataset contains 37 variables plus newly created YEAR variable, but for this analysis we will only require 8 of them.
We will start off be creating a dataset with the 8 required variables. They are:
data <- storm_data[,c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
deaths <- aggregate(FATALITIES ~ EVTYPE, data=data, FUN=sum)
injuries <- aggregate(INJURIES ~ EVTYPE, data=data, FUN=sum)
Converting PROPDMG and CROPDMG to same scale values.
unique(data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
data$PROPDMGMUL[data$PROPDMGEXP == "b" | data$PROPDMGEXP == "B"] <- 1e+09
data$PROPDMGMUL[data$PROPDMGEXP == "M" | data$PROPDMGEXP == "M"] <- 1e+06
data$PROPDMGMUL[data$PROPDMGEXP == "k" | data$PROPDMGEXP == "K"] <- 1e+03
data$PROPDMGMUL[data$PROPDMGEXP == "h" | data$PROPDMGEXP == "H"] <- 100
data$PROPDMGMUL[data$PROPDMGEXP == "8"] <- 1e+08
data$PROPDMGMUL[data$PROPDMGEXP == "7"] <- 1e+07
data$PROPDMGMUL[data$PROPDMGEXP == "6"] <- 1e+06
data$PROPDMGMUL[data$PROPDMGEXP == "5"] <- 1e+05
data$PROPDMGMUL[data$PROPDMGEXP == "4"] <- 1e+04
data$PROPDMGMUL[data$PROPDMGEXP == "3"] <- 1e+03
data$PROPDMGMUL[data$PROPDMGEXP == "2"] <- 100
data$PROPDMGMUL[data$PROPDMGEXP == "1"] <- 10
data$PROPDMGMUL[data$PROPDMGEXP == "0"] <- 1
data$PROPDMGMUL[data$PROPDMGEXP == "-"] <- 1
data$PROPDMGMUL[data$PROPDMGEXP == "+"] <- 0
data$PROPDMGMUL[data$PROPDMGEXP == "?"] <- 0
data$PROPDMGVAL <- data$PROPDMG * data$PROPDMGMUL
unique(data$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
data$CROPDMGMUL[data$CROPDMGEXP == "2"] <- 2
data$CROPDMGMUL[data$CROPDMGEXP == "0"] <- 1
data$CROPDMGMUL[data$CROPDMGEXP == "?"] <- 0
data$CROPDMGMUL[data$CROPDMGEXP == "b" | data$CROPDMGEXP == "B"] <- 1e+09
data$CROPDMGMUL[data$CROPDMGEXP == "M" | data$CROPDMGEXP == "M"] <- 1e+06
data$CROPDMGMUL[data$CROPDMGEXP == "k" | data$CROPDMGEXP == "K"] <- 1e+03
data$CROPDMGVAL <- data$CROPDMG * data$CROPDMGMUL
property_damage <- aggregate(PROPDMGVAL ~ EVTYPE, data=data, FUN=sum)
crop_damage <- aggregate(CROPDMGVAL ~ EVTYPE, data=data, FUN=sum)
The following show the top 20 severe weather events that have caused the most fatalities.
deaths_20 <- arrange(deaths, desc(FATALITIES))[1:20,]
deaths_20
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
## 11 WINTER STORM 206
## 12 RIP CURRENTS 204
## 13 HEAT WAVE 172
## 14 EXTREME COLD 160
## 15 THUNDERSTORM WIND 133
## 16 HEAVY SNOW 127
## 17 EXTREME COLD/WIND CHILL 125
## 18 STRONG WIND 103
## 19 BLIZZARD 101
## 20 HIGH SURF 101
g1 = ggplot(data=deaths_20, aes(x=reorder(EVTYPE, -FATALITIES), y=FATALITIES, fill=FATALITIES)) +
geom_bar(stat="identity", fill = heat.colors(20), color = "black") +
labs(x="", y="Number of Fatalities") +
labs(title="Top 20 fatalities caused by Weather Events in the U.S. (1950 - 2011)") +
scale_y_continuous(breaks = seq(0,6000, by=1000)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(g1)
The plot shows the total number of fatalities for the 20 top weather events, ordered by fatalities in descending order. Based on the plot above, we find that Tornado caused the most fatalities.
The following show the top 20 severe weather events that have caused the most injuries.
injuries_20 <- arrange(injuries, desc(INJURIES))[1:20,]
injuries_20
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
## 11 WINTER STORM 1321
## 12 HURRICANE/TYPHOON 1275
## 13 HIGH WIND 1137
## 14 HEAVY SNOW 1021
## 15 WILDFIRE 911
## 16 THUNDERSTORM WINDS 908
## 17 BLIZZARD 805
## 18 FOG 734
## 19 WILD/FOREST FIRE 545
## 20 DUST STORM 440
g2 = ggplot(data=injuries_20, aes(x=reorder(EVTYPE, -INJURIES), y=INJURIES, fill=INJURIES)) +
geom_bar(stat="identity", fill = topo.colors(20), color = "black") +
labs(x="", y="Number of Injuries") +
labs(title="Top 20 injuries caused by Weather Events in the U.S. (1950 - 2011)") +
scale_y_continuous(breaks = seq(0,90000, by=10000)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(g2)
The plot shows the total number of injuries for the 20 top weather events, ordered by injuries in descending order. Based on the plot above, we find that Tornado caused the most fatalities.
The following show the top 10 severe weather events that have caused the most damage to property in USD.
property_10 <- arrange(property_damage, desc(PROPDMGVAL))[1:10,]
property_10
## EVTYPE PROPDMGVAL
## 1 FLOOD 144657709800
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56935880614
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16822673772
## 6 HAIL 15730367456
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046275
The following show the top 10 severe weather events that have caused the most damage to crops in USD.
crop_10 <- arrange(crop_damage, desc(CROPDMGVAL))[1:10,]
crop_10
## EVTYPE CROPDMGVAL
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954470
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
Plotting the top 10 events that caused the most amount damage to properties and crops in the U.S. from 1950-2011.
p1 <- ggplot(data=property_10, aes(x=reorder(EVTYPE, -PROPDMGVAL), y=PROPDMGVAL / 1e+09, fill=PROPDMGVAL)) +
geom_bar(stat="identity", fill = topo.colors(10), color = "black") +
labs(x="", y="Cost of Property Damage ($ Billions)") +
labs(title="Top 10 Weather Events Causing Highest Property Damage") +
scale_y_continuous(breaks = seq(0,150, by=10)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(size=8))
p2 <- ggplot(data=crop_10, aes(x=reorder(EVTYPE, -CROPDMGVAL), y=CROPDMGVAL / 1e+09, fill=CROPDMGVAL)) +
geom_bar(stat="identity", fill = terrain.colors(10), color = "black") +
labs(x="", y="Cost of Crop Damage ($ Billions)") +
labs(title="Top 10 Weather Events Causing Highest Crop Damage") +
scale_y_continuous(breaks = seq(0,15, by=2)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(size=8))
grid.arrange(p1, p2, ncol=2)
The plots show the total amount of damage in billions of dollars caused to property and crops. Both plots show the 10 top weather events that caused the most damage in descending order. Based on the plots above, we find that Flood caused the most property damage and Drought caused the most crop damage.
Over the last 60 years tornadoes have caused the most number of deaths and injuries among all other events. There have been more than 5300 deaths and more than 91000 injuries.
The most severe weather events in terms of propery damage is flood which caused a total of nearly $145 Billion dollars worth of damage. This is followed by Hurricanes/Typhoons and Tornadoes.
With regards to crop damage, drought caused the most amount of damage, nearly $14 Billion dollars. This was followed by floods, river floods and ice storms.