This document explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (N=902.297). This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property and crops damage. This answers the questions of which types of events are most harmful to population health and of which types of events have the greatest economic consequences. This analysis include a time perspective to understand if the most harmful and with the greatest economic consequences events have changed in the period 1950-2011 that the database addresses.The results indicate that the most harmful and damaging events for the property are tornadoes. Also, the falsh flood, flood and tstm wind are the most damaging to crops. Although tornadoes have been progressively more damaging to property in the last fifty years, the fatalities and injuries they produce occur at different times without a clear trend, especially in the years 2011, 1999 and 1953.
After installing the packages required for the analysis, the database was downloaded from the link Storm Data that Professor Roger Peng made available on the coursera website. Once downloaded it was opened with the read.csv
function and explored, as follows:
## Install and load packages
packages = c("tidyverse", "lubridate","ggplot2","data.table","ggpubr","kableExtra")
## Now load or install & load all
package.check <- lapply(
packages,
FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)
# Load data
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"StormData.csv.bz2", method = "libcurl")
data<-read.csv("StormData.csv.bz2", header = TRUE, sep = ",")
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
In order to work the database without problems, the variables to be used were transformed into the appropriate format:
## Change format variables
data$BGN_DATE<-mdy_hms(as.character(data$BGN_DATE))
data$BGN_YEAR<-year(data$BGN_DATE)
data$END_DATE<-mdy_hms(as.character(data$END_DATE))
data$FATALITIES<-as.numeric(data$FATALITIES)
data$INJURIES<-as.numeric(data$INJURIES)
data$PROPDMG<-as.numeric(data$PROPDMG)
data$CROPDMG<-as.numeric(data$CROPDMG)
data$PROPDMGEXP<-as.factor(data$PROPDMGEXP)
data$CROPDMGEXP<-as.factor(data$CROPDMGEXP)
The first results indicate that the events with the most fatalities, injuries and property damages are similar in the United States in the period from 1950 to 2011. Tornadoes are indisputably the events with the most fatalities (5,633), injuries (91,346) and property damages (3,212,258) produce respect other. However, the events that cause the greatest damage to crops are flash floods (179,200). After tornadoes, in table 1 it can be seen that although some events are highly fatal (excessive heat, for example, the second most fatal event), their property damage is relatively low (beyond the tenth place). Therefore, a better overview is obtained from graph 1.
data %>% group_by(EVTYPE) %>% summarise(FATALITIES=sum(FATALITIES),
INJURIES=sum(INJURIES),
PROPDMG=sum(PROPDMG),
CROPDMG=sum(CROPDMG)) %>% arrange(-FATALITIES,
-INJURIES,
-PROPDMG,
-CROPDMG) %>%
slice_head(n = 30) %>%
kable(caption = "Table 1. The 30 events with the highest fatalities, injuries, property damages and crops damages in the United States") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
EVTYPE | FATALITIES | INJURIES | PROPDMG | CROPDMG |
---|---|---|---|---|
TORNADO | 5633 | 91346 | 3212258.16 | 100018.52 |
EXCESSIVE HEAT | 1903 | 6525 | 1460.00 | 494.40 |
FLASH FLOOD | 978 | 1777 | 1420124.59 | 179200.46 |
HEAT | 937 | 2100 | 298.50 | 662.70 |
LIGHTNING | 816 | 5230 | 603351.78 | 3580.61 |
TSTM WIND | 504 | 6957 | 1335965.61 | 109202.60 |
FLOOD | 470 | 6789 | 899938.48 | 168037.88 |
RIP CURRENT | 368 | 232 | 1.00 | 0.00 |
HIGH WIND | 248 | 1137 | 324731.56 | 17283.21 |
AVALANCHE | 224 | 170 | 1623.90 | 0.00 |
WINTER STORM | 206 | 1321 | 132720.59 | 1978.99 |
RIP CURRENTS | 204 | 297 | 162.00 | 0.00 |
HEAT WAVE | 172 | 309 | 1269.25 | 255.30 |
EXTREME COLD | 160 | 231 | 7657.54 | 6121.14 |
THUNDERSTORM WIND | 133 | 1488 | 876844.17 | 66791.45 |
HEAVY SNOW | 127 | 1021 | 122251.99 | 2165.72 |
EXTREME COLD/WIND CHILL | 125 | 24 | 2654.00 | 50.00 |
STRONG WIND | 103 | 280 | 62993.81 | 1616.90 |
BLIZZARD | 101 | 805 | 25318.48 | 172.00 |
HIGH SURF | 101 | 152 | 3041.62 | 0.00 |
HEAVY RAIN | 98 | 251 | 50842.14 | 11122.80 |
EXTREME HEAT | 96 | 155 | 5.11 | 5.00 |
COLD/WIND CHILL | 95 | 12 | 1990.00 | 600.00 |
ICE STORM | 89 | 1975 | 66000.67 | 1688.95 |
WILDFIRE | 75 | 911 | 84459.34 | 4364.20 |
HURRICANE/TYPHOON | 64 | 1275 | 5839.37 | 4798.48 |
THUNDERSTORM WINDS | 64 | 908 | 446293.18 | 18684.93 |
FOG | 62 | 734 | 8849.81 | 0.00 |
HURRICANE | 61 | 46 | 15513.68 | 5339.31 |
TROPICAL STORM | 58 | 340 | 48423.68 | 5899.12 |
Graph 1 shows separately the events that cause greater fatalities, injuries and economic damages. The events with the most fatalities are tornado (5,633), excessive heat (1,903), flash flood (978) and heat (937). The events with the most injuries are by far the tornado (91,346), much lower are the tstm wind (6,957), the flood (6,789) and excessive heat (6,525). Those with the greatest property damage are tornadoes, flash floods, tstm wind and floods. Finally, those with the greatest damage to crops are hail, flash flood and flood. The hail, then, are the weather events that stand out for producing a lot of economic damage (in crops), but very little on people’s lives and health.
## Agrupate base by event type
a<-data %>% group_by(EVTYPE) %>% summarise(FATALITIES=sum(FATALITIES),
INJURIES=sum(INJURIES),
PROPDMG=sum(PROPDMG),
CROPDMG=sum(CROPDMG),
MAG=sum(MAG)) %>% arrange(-FATALITIES)
## Graphs
g1<-a %>% filter(FATALITIES>200) %>% ggplot(aes(x=reorder(EVTYPE,FATALITIES), y=FATALITIES)) +
geom_bar(stat="identity", fill="green", colour="black", width=0.8) +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE)) +
theme_bw() + coord_flip() +
geom_text(aes(label = format(FATALITIES, big.mark = ",", scientific = FALSE)), hjust=-0.1, colour = "black", size=3.0) +
labs(title="Fatalities",
x="EVTYPE",
y = "FATALITIES")
g2<-a %>% filter(INJURIES>1000) %>% ggplot(aes(x=reorder(EVTYPE,INJURIES), y=INJURIES)) +
geom_bar(stat="identity", fill="blue", colour="black", width=0.8) +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE)) +
theme_bw() + coord_flip() +
geom_text(aes(label = format(INJURIES, big.mark = ",", scientific = FALSE)), hjust=-0.1, colour = "black", size=3.0) +
labs(title="Injuries",
x="EVTYPE",
y = "INJURIES")
g3<-a %>% filter(PROPDMG>30000) %>% mutate(PROPDMG=PROPDMG/10) %>%
ggplot(aes(x=reorder(EVTYPE,PROPDMG), y=round(PROPDMG))) +
geom_bar(stat="identity", fill="purple", colour="black", width=0.8) +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE)) +
theme_bw() + coord_flip() +
geom_text(aes(label = format(round(PROPDMG), big.mark = ",", scientific = FALSE)), hjust=-0.1, colour = "black", size=3.0) +
labs(title="Property damage (divided into 10)",
x="EVTYPE",
y = "PROPDMG")
g4<-a %>% filter(CROPDMG>5000) %>% mutate(CROPDMG=CROPDMG/10) %>%
ggplot(aes(x=reorder(EVTYPE,CROPDMG), y=round(CROPDMG))) +
geom_bar(stat="identity", fill="red", colour="black", width=0.8) +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE)) +
theme_bw() + coord_flip() +
geom_text(aes(label = format(round(CROPDMG), big.mark = ",", scientific = FALSE)), hjust=-0.1,
colour = "black", size=3.0) +
labs(title="Crop damage (divided into 10)",
x="EVTYPE",
y = "CROPDMG")
g5<-ggarrange(g1,g2,g3,g4,ncol = 2,nrow = 2)
annotate_figure(g5, top=text_grob("Graph 1. The most damaging and economically costly weather events in the United States"),
bottom = text_grob("National Climatic Data Center Storm Events.
For a better visualization, the values of economic damages are presented divided into 10.",
hjust = 1, x = 1, size = 10))
Finally, graph 2 shows the evolution in economic damages and on people’s lives of the six most injurious events. In this it can be seen that all events, except tornadoes, do not have records of crop damages, fatalities, and injuries for the period prior to 1990. Therefore, the major conclusions that can be drawn in temporal terms apply to tornadoes. Basically it can be seen that tornadoes have been progressively more damaging to property in the last fifty years (purple line), but the fatalities (green line) and injuries (blue line) they produce occur at different times without a clear trend. The years in which the greatest injuries are observed are 2011, 1999 and 1953.
data %>% filter(EVTYPE %in% c("TORNADO","EXCESSIVE HEAT","FLASH FLOOD",
"FLOOD","TSTM WIND",
"HAIL")) %>%
group_by(EVTYPE,BGN_YEAR) %>% summarise(FATALITIES=sum(FATALITIES),
INJURIES=sum(INJURIES),
PROPDMG=(sum(PROPDMG)/10),
CROPDMG=sum(CROPDMG)/10) %>%
gather(key = "type", value = "value", -EVTYPE,-BGN_YEAR) %>%
ggplot(aes( x = BGN_YEAR, y =value, fill = type, color = type)) + facet_wrap(~EVTYPE) +
geom_line() + theme_bw() +
labs(title="Graph 2. The most damaging and economically costly weather events in the USA in time",
x="Year",
y = "Fatalities, injures, property damage and crop damage",
caption = "National Climatic Data Center Storm Events.
For a better visualization, the values of economic damages are presented divided into 10.") +
theme(legend.position = "bottom")