Synopsis

This document explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (N=902.297). This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property and crops damage. This answers the questions of which types of events are most harmful to population health and of which types of events have the greatest economic consequences. This analysis include a time perspective to understand if the most harmful and with the greatest economic consequences events have changed in the period 1950-2011 that the database addresses.The results indicate that the most harmful and damaging events for the property are tornadoes. Also, the falsh flood, flood and tstm wind are the most damaging to crops. Although tornadoes have been progressively more damaging to property in the last fifty years, the fatalities and injuries they produce occur at different times without a clear trend, especially in the years 2011, 1999 and 1953.

Data processing

After installing the packages required for the analysis, the database was downloaded from the link Storm Data that Professor Roger Peng made available on the coursera website. Once downloaded it was opened with the read.csv function and explored, as follows:

## Install and load packages

packages = c("tidyverse", "lubridate","ggplot2","data.table","ggpubr","kableExtra")

## Now load or install & load all
package.check <- lapply(
  packages,
  FUN = function(x) {
    if (!require(x, character.only = TRUE)) {
      install.packages(x, dependencies = TRUE)
      library(x, character.only = TRUE)
    }
  }
)



# Load data
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
              "StormData.csv.bz2", method = "libcurl")

data<-read.csv("StormData.csv.bz2", header = TRUE, sep = ",")

head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

In order to work the database without problems, the variables to be used were transformed into the appropriate format:

## Change format variables
data$BGN_DATE<-mdy_hms(as.character(data$BGN_DATE))
data$BGN_YEAR<-year(data$BGN_DATE)
data$END_DATE<-mdy_hms(as.character(data$END_DATE))

data$FATALITIES<-as.numeric(data$FATALITIES)
data$INJURIES<-as.numeric(data$INJURIES)
data$PROPDMG<-as.numeric(data$PROPDMG)
data$CROPDMG<-as.numeric(data$CROPDMG)
data$PROPDMGEXP<-as.factor(data$PROPDMGEXP)
data$CROPDMGEXP<-as.factor(data$CROPDMGEXP)

Results

The first results indicate that the events with the most fatalities, injuries and property damages are similar in the United States in the period from 1950 to 2011. Tornadoes are indisputably the events with the most fatalities (5,633), injuries (91,346) and property damages (3,212,258) produce respect other. However, the events that cause the greatest damage to crops are flash floods (179,200). After tornadoes, in table 1 it can be seen that although some events are highly fatal (excessive heat, for example, the second most fatal event), their property damage is relatively low (beyond the tenth place). Therefore, a better overview is obtained from graph 1.

data %>% group_by(EVTYPE) %>% summarise(FATALITIES=sum(FATALITIES),
                                        INJURIES=sum(INJURIES),
                                        PROPDMG=sum(PROPDMG),
                                        CROPDMG=sum(CROPDMG)) %>% arrange(-FATALITIES,
                                                                          -INJURIES,
                                                                          -PROPDMG,
                                                                          -CROPDMG) %>% 
  slice_head(n = 30) %>% 
  kable(caption = "Table 1. The 30 events with the highest fatalities, injuries, property damages and crops damages in the United States") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Table 1. The 30 events with the highest fatalities, injuries, property damages and crops damages in the United States
EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
TORNADO 5633 91346 3212258.16 100018.52
EXCESSIVE HEAT 1903 6525 1460.00 494.40
FLASH FLOOD 978 1777 1420124.59 179200.46
HEAT 937 2100 298.50 662.70
LIGHTNING 816 5230 603351.78 3580.61
TSTM WIND 504 6957 1335965.61 109202.60
FLOOD 470 6789 899938.48 168037.88
RIP CURRENT 368 232 1.00 0.00
HIGH WIND 248 1137 324731.56 17283.21
AVALANCHE 224 170 1623.90 0.00
WINTER STORM 206 1321 132720.59 1978.99
RIP CURRENTS 204 297 162.00 0.00
HEAT WAVE 172 309 1269.25 255.30
EXTREME COLD 160 231 7657.54 6121.14
THUNDERSTORM WIND 133 1488 876844.17 66791.45
HEAVY SNOW 127 1021 122251.99 2165.72
EXTREME COLD/WIND CHILL 125 24 2654.00 50.00
STRONG WIND 103 280 62993.81 1616.90
BLIZZARD 101 805 25318.48 172.00
HIGH SURF 101 152 3041.62 0.00
HEAVY RAIN 98 251 50842.14 11122.80
EXTREME HEAT 96 155 5.11 5.00
COLD/WIND CHILL 95 12 1990.00 600.00
ICE STORM 89 1975 66000.67 1688.95
WILDFIRE 75 911 84459.34 4364.20
HURRICANE/TYPHOON 64 1275 5839.37 4798.48
THUNDERSTORM WINDS 64 908 446293.18 18684.93
FOG 62 734 8849.81 0.00
HURRICANE 61 46 15513.68 5339.31
TROPICAL STORM 58 340 48423.68 5899.12

Graph 1 shows separately the events that cause greater fatalities, injuries and economic damages. The events with the most fatalities are tornado (5,633), excessive heat (1,903), flash flood (978) and heat (937). The events with the most injuries are by far the tornado (91,346), much lower are the tstm wind (6,957), the flood (6,789) and excessive heat (6,525). Those with the greatest property damage are tornadoes, flash floods, tstm wind and floods. Finally, those with the greatest damage to crops are hail, flash flood and flood. The hail, then, are the weather events that stand out for producing a lot of economic damage (in crops), but very little on people’s lives and health.

## Agrupate base by event type
a<-data %>% group_by(EVTYPE) %>% summarise(FATALITIES=sum(FATALITIES),
                                        INJURIES=sum(INJURIES),
                                        PROPDMG=sum(PROPDMG),
                                        CROPDMG=sum(CROPDMG),
                                        MAG=sum(MAG)) %>% arrange(-FATALITIES)

## Graphs

g1<-a %>% filter(FATALITIES>200) %>% ggplot(aes(x=reorder(EVTYPE,FATALITIES), y=FATALITIES)) +
  geom_bar(stat="identity", fill="green", colour="black", width=0.8) + 
  scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE)) +
  theme_bw() +  coord_flip() + 
  geom_text(aes(label = format(FATALITIES, big.mark = ",", scientific = FALSE)), hjust=-0.1, colour = "black", size=3.0) +
  labs(title="Fatalities",
       x="EVTYPE", 
       y = "FATALITIES")

g2<-a %>% filter(INJURIES>1000) %>% ggplot(aes(x=reorder(EVTYPE,INJURIES), y=INJURIES)) +
  geom_bar(stat="identity", fill="blue", colour="black", width=0.8) + 
  scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE)) +
  theme_bw() +  coord_flip() + 
  geom_text(aes(label = format(INJURIES, big.mark = ",", scientific = FALSE)), hjust=-0.1, colour = "black", size=3.0) +
  labs(title="Injuries",
       x="EVTYPE", 
       y = "INJURIES")

g3<-a %>% filter(PROPDMG>30000) %>% mutate(PROPDMG=PROPDMG/10) %>% 
  ggplot(aes(x=reorder(EVTYPE,PROPDMG), y=round(PROPDMG))) +
  geom_bar(stat="identity", fill="purple", colour="black", width=0.8) + 
  scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE)) +
  theme_bw() +  coord_flip() + 
  geom_text(aes(label = format(round(PROPDMG), big.mark = ",", scientific = FALSE)), hjust=-0.1, colour = "black", size=3.0) +
  labs(title="Property damage (divided into 10)",
       x="EVTYPE", 
       y = "PROPDMG")

g4<-a %>% filter(CROPDMG>5000) %>% mutate(CROPDMG=CROPDMG/10) %>% 
  ggplot(aes(x=reorder(EVTYPE,CROPDMG), y=round(CROPDMG))) +
  geom_bar(stat="identity", fill="red", colour="black", width=0.8) + 
  scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE)) +
  theme_bw() +  coord_flip() + 
  geom_text(aes(label = format(round(CROPDMG), big.mark = ",", scientific = FALSE)), hjust=-0.1, 
            colour = "black", size=3.0) +
  labs(title="Crop damage (divided into 10)",
       x="EVTYPE", 
       y = "CROPDMG")

g5<-ggarrange(g1,g2,g3,g4,ncol = 2,nrow = 2)
annotate_figure(g5, top=text_grob("Graph 1. The most damaging and economically costly weather events in the United States"),
                bottom = text_grob("National Climatic Data Center Storm Events.
                                   For a better visualization, the values of economic damages are presented divided into 10.",
                                       hjust = 1, x = 1, size = 10))

Finally, graph 2 shows the evolution in economic damages and on people’s lives of the six most injurious events. In this it can be seen that all events, except tornadoes, do not have records of crop damages, fatalities, and injuries for the period prior to 1990. Therefore, the major conclusions that can be drawn in temporal terms apply to tornadoes. Basically it can be seen that tornadoes have been progressively more damaging to property in the last fifty years (purple line), but the fatalities (green line) and injuries (blue line) they produce occur at different times without a clear trend. The years in which the greatest injuries are observed are 2011, 1999 and 1953.

data %>% filter(EVTYPE %in% c("TORNADO","EXCESSIVE HEAT","FLASH FLOOD",
                              "FLOOD","TSTM WIND",
                              "HAIL")) %>% 
  group_by(EVTYPE,BGN_YEAR) %>% summarise(FATALITIES=sum(FATALITIES),
                                          INJURIES=sum(INJURIES),
                                          PROPDMG=(sum(PROPDMG)/10),
                                          CROPDMG=sum(CROPDMG)/10) %>% 
  gather(key = "type", value = "value", -EVTYPE,-BGN_YEAR) %>% 
  ggplot(aes( x = BGN_YEAR, y =value, fill = type, color = type)) + facet_wrap(~EVTYPE) + 
  geom_line() + theme_bw() + 
  labs(title="Graph 2. The most damaging and economically costly weather events in the USA in time",
       x="Year",
       y = "Fatalities, injures, property damage and crop damage",
       caption = "National Climatic Data Center Storm Events.
       For a better visualization, the values of economic damages are presented divided into 10.") + 
  theme(legend.position = "bottom")