Synopsis

The objective of this report is to show the climatic events that are most harmful to the health of the population and those that cause the most economic damage. The damage to the health of the population will be measured by the number of fatalities and injuries, on the other hand, the economic damage will be measured by the costs in thousands of dollars in property and crop damage. The information for the report will be obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

From U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, we obtained data about characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011.

Reading data

First, the data compressed file is downloaded. The data is a csv file compressed with the bz2 algorithm, so the file is read using the read.csv function without extracting the file.

file.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file.name <- "StormData.csv.bz2"

if(!file.exists(file.name)) {
  download.file(file.url, destfile = file.name)
}

storm.data <- read.csv(file.name, stringsAsFactors = FALSE)

The data has 37 variables which are

str(storm.data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

And the following number of records

nrow(storm.data)
## [1] 902297

The variables of interest are EVTYPE, FATALITIES, INJURIES, PROPDMG and CROPDMG. PROPDMG and CROPDMG refer to property damage and crop damage respectively and are measured in thousands of dollars.

library(dplyr)
storm.data <- select(storm.data, EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)

Results

Population health impact analysis

Group average injuries and average fatalities by event type

library(ggplot2)

health.impact <- storm.data %>% 
  group_by(EVTYPE) %>%
  summarise(AVG.FATALITIES = mean(FATALITIES, na.rm = TRUE),
            AVG.INJURIES = mean(INJURIES, na.rm = TRUE))

10 most harmful weather events by average fatalities.

fatalities.ranking <- health.impact %>% 
  select(EVTYPE, AVG.FATALITIES) %>%
  arrange(desc(AVG.FATALITIES), .by_group=TRUE) %>%
  slice_head(n=10)

ggplot(data=fatalities.ranking, aes(x=EVTYPE, y=AVG.FATALITIES, fill = EVTYPE)) +
  geom_bar(stat="identity", color="black") + 
  labs(y="Average number of fatalities", 
       x = "Weather event",
       title="10 most harmful weather events by average fatalities") + 
  scale_fill_brewer(palette="PiYG") +
  theme(legend.position="none") +
  scale_x_discrete(guide = guide_axis(n.dodge=3))

10 most harmful weather events by average injuries.

injuries.ranking <- health.impact %>% 
  select(EVTYPE, AVG.INJURIES) %>%
  arrange(desc(AVG.INJURIES), .by_group=TRUE) %>%
  slice_head(n=10)

ggplot(data=injuries.ranking, aes(x=EVTYPE, y=AVG.INJURIES, fill = EVTYPE)) +
  geom_bar(stat="identity", color="black") + 
  labs(y="Average number of injuries", 
       x = "Weather event",
       title="10 most harmful weather events by average injuries") + 
  scale_fill_brewer(palette="Spectral") +
  theme(legend.position="none") +
  scale_x_discrete(guide = guide_axis(n.dodge=3))

Economic damage analysis

Regarding the sum of property damages and crop damages in thousands of US dollars, the next figure shows the 10 weather events which cause more damages in economic terms.

library(ggplot2)

economic.damage <- storm.data %>%
  group_by(EVTYPE) %>%
  summarise(TOTAL.DAMAGES.IN.DOLLARS = sum(PROPDMG, na.rm = TRUE) + 
              sum(CROPDMG, na.rm = TRUE)) %>%
  arrange(desc(TOTAL.DAMAGES.IN.DOLLARS), .by_group=TRUE) %>%
  slice_head(n=10)

ggplot(data=economic.damage, aes(x=EVTYPE, y=TOTAL.DAMAGES.IN.DOLLARS, fill = EVTYPE)) +
  geom_bar(stat="identity", color="black") + 
  labs(y="Thousands of dollars", 
       x = "Weather event",
       title="10 weather events which cause more damages in economic terms") + 
  scale_fill_brewer(palette="Paired") +
  theme(legend.position="none") +
  scale_x_discrete(guide = guide_axis(n.dodge=2))