C. Saquel
8 November 2018
Storms and other severe weather events can cause public health and economic problems for communities and municipalities. Many serious events can cause deaths, injuries and property damage, and the prevention of such results as much as possible is a key concern.
This project involves exploring the storm database of the National Oceanic and Atmospheric Administration of the United States (NOAA). This database tracks the characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of deaths, injuries and property damage.
The following are the most significant events with respect to the damage to the health of the population and the greatest economic consequences.
The data corresponds to a database that contains climatic events that occurred in the United States between 1950 and the year 2011. You can download the file from the website:
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.:
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
folder <- "C:/Users/HP/Documents/Data Science/Reproducible Research/Week 4/Project"
setwd(folder)
download.file(url, "StormData.csv.bz2")
StormData <- read.csv("StormData.csv.bz2")
Once the data has been downloaded, we can see the variables it contains:
names(StormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
From these data we will select the ones that interest us:
Then we select the data associated with the damage to the health of the population and the greatest economic consequences, in the variables DataMostHarmful and DataEconConseq respectively. In addition we make lowercase the names of the variables.
library(dplyr)
namesSD <- names(StormData)
DataMostHarmful <- select(StormData,namesSD[c(8,23:24)])
names(DataMostHarmful) <- tolower(names(DataMostHarmful))
DataEconConseq <- select(StormData,namesSD[c(8,25:28)])
names(DataEconConseq) <- tolower(names(DataEconConseq))
Then the data of fatalities and injuries are grouped and ordered according to the type of event (evtype).
MostHarmful <- summarise(group_by(DataMostHarmful, evtype), fatalities = sum(fatalities, na.rm = TRUE), injuries = sum(injuries, na.rm = TRUE))
MostHarmfulFat <- arrange(MostHarmful,desc(fatalities))
MostHarmfulInj <- arrange(MostHarmful,desc(injuries))
For economic damage a similar action is taken, the data of property damage (propdmg) and damage to crops (cropdmg) are grouped and ordered according to the type of event (evtype). In this case, you must first adjust the variables according to the orders of magnitude described in the variables ** propdmgexp ** and ** cropdmgexp **.
levels(DataEconConseq$propdmgexp)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(DataEconConseq$cropdmgexp)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
The alphabetic characters used to signify magnitude include “K” for thousands, “M” for millions and “B” for billions. For the rest of the symbols, the values indicated on the website How to handle the exponent value of PROPDMGEXP and CROPDMGEXP are considered, delivers values for multipliers not mentioned in the official information.
valuesMult <- unique(c(levels(DataEconConseq$propdmgexp),levels(DataEconConseq$cropdmgexp)))
Mult <- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,100,100,10^3,10^6,10^6,10^3)
convert <- data.frame( valuesMult = valuesMult, Mult = Mult)
tbl_df(convert)
## # A tibble: 20 x 2
## valuesMult Mult
## <fct> <dbl>
## 1 "" 0
## 2 - 0
## 3 ? 0
## 4 + 1
## 5 0 10
## 6 1 10
## 7 2 10
## 8 3 10
## 9 4 10
## 10 5 10
## 11 6 10
## 12 7 10
## 13 8 10
## 14 B 1000000000
## 15 h 100
## 16 H 100
## 17 K 1000
## 18 m 1000000
## 19 M 1000000
## 20 k 1000
This generates a variable ** convert** used to do the corresponding multiplication and thus group and order the data.
DataEconConseq$propdmgMult <- convert$Mult[match(DataEconConseq$propdmgexp, convert$valuesMult)]
DataEconConseq$cropdmgMult <- convert$Mult[match(DataEconConseq$cropdmgexp, convert$valuesMult)]
DataEconConseq$propdmgMult <- DataEconConseq$propdmgMult*DataEconConseq$propdmg
DataEconConseq$cropdmgMult <- DataEconConseq$cropdmgMult*DataEconConseq$cropdmg
EconConseq <- summarise(group_by(DataEconConseq, evtype), propdmg = sum(propdmgMult, na.rm = TRUE), cropdmg = sum(cropdmgMult, na.rm = TRUE))
EconConseq <- mutate(EconConseq, total = propdmg + cropdmg)
EconConseqProp <- arrange(EconConseq,desc(propdmg))
EconConseqCrop <- arrange(EconConseq,desc(cropdmg))
EconConseqTotal<- arrange(EconConseq,desc(total))
In the following table you can see that in the case of fatalities and injuries, the most significant events tend to repeat.
arrange(merge(head(MostHarmfulFat,10),head(MostHarmfulInj,10),all = TRUE),desc(fatalities),desc(injuries))
## evtype fatalities injuries
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
## 6 TSTM WIND 504 6957
## 7 FLOOD 470 6789
## 8 RIP CURRENT 368 232
## 9 HIGH WIND 248 1137
## 10 AVALANCHE 224 170
## 11 THUNDERSTORM WIND 133 1488
## 12 ICE STORM 89 1975
## 13 HAIL 15 1361
The top 10 events with the highest total fatalities and injuries are shown below.
library(ggplot2)
library(gridExtra)
library(grid)
n <- 10
p1 <- ggplot(data=head(MostHarmfulFat,n), aes(x=reorder(tolower(evtype), fatalities), y=fatalities)) + geom_bar(fill="royalblue",stat="identity", width = 0.9) + coord_flip() +
ylab("Total number of fatalities") + xlab("Event type") +
theme(legend.position="none")
p2 <- ggplot(data=head(MostHarmfulInj,n), aes(x=reorder(tolower(evtype), injuries), y=injuries)) +
geom_bar(fill="firebrick3",stat="identity") + coord_flip() +
ylab("Total number of injuries") + xlab("Event type")
grid.arrange(p1, p2, nrow = 2, top = "Health impact of weather events in the US - Top 10")
We can see that tornadoes are the main reason for injuries and deaths that affect the health of the population.
In the following table you can see that in the case of property damage, crop damage y total damage, the most significant events tend to repeat. (total damage is property damage + crop damage)
arrange(merge(head(EconConseqTotal,10),merge(head(EconConseqProp,10),head(EconConseqCrop,10), all = TRUE), all = TRUE),desc(total),desc(propdmg),desc(cropdmg))
## evtype propdmg cropdmg total
## 1 FLOOD 144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937162897 414954710 57352117607
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15732269877 3025954650 18758224527
## 6 FLASH FLOOD 16140815011 1421317100 17562132111
## 7 DROUGHT 1046106000 13972566000 15018672000
## 8 HURRICANE 11868319010 2741910000 14610229010
## 9 RIVER FLOOD 5118945500 5029459000 10148404500
## 10 ICE STORM 3944928310 5022113500 8967041810
## 11 TROPICAL STORM 7703890550 678346000 8382236550
## 12 WINTER STORM 6688497260 26944000 6715441260
## 13 HIGH WIND 5270046280 638571300 5908617580
## 14 EXTREME COLD 67737400 1292973000 1360710400
## 15 FROST/FREEZE 9480000 1094086000 1103566000
The top 10 events with the highest total property damage, crop damage y total damage are shown below.
n <- 10
p1 <- ggplot(data=head(EconConseqProp,n), aes(x=reorder(tolower(evtype), propdmg), y=propdmg/1000000)) +
geom_bar(fill="royalblue",stat="identity", width = 0.9) + coord_flip() +
ylab("Property damage (MUS$)") + xlab("Event type") +
theme(legend.position="none")
p2 <- ggplot(data=head(EconConseqCrop,n), aes(x=reorder(tolower(evtype), cropdmg), y=cropdmg/1000000)) +
geom_bar(fill="firebrick3",stat="identity") + coord_flip() +
ylab("Crop damage (MUS$)") + xlab("Event type") + scale_y_continuous(limit = c(0,max(EconConseqTotal$total/1000000))) + theme(legend.position="none")
p3 <- ggplot(data=head(EconConseqTotal,n), aes(x=reorder(tolower(evtype), total), y=total/1000000)) +
geom_bar(fill="darkolivegreen3",stat="identity") + coord_flip() +
ylab("Property + crop damage (MUS$)") + xlab("Event type") + scale_y_continuous(limit = c(0,max(EconConseqTotal$total/1000000))) +
theme(legend.position="none")
grid.arrange(p1, p2, p3, nrow = 3, top = "Economic Consequences of weather events in the US - Top 10")
We can see that floods, storm surges, hurricanes and tornadoes are the ones that contribute most to damage to property.
In the case of damage to crops, drought is the one that contributes the most to crop damage.
Globally, damage to crops is not significant compared to damage to property.