Synopsis

We offer the answer to the question indicated in the title based on the analysis of U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.Using this data, we investigated what harm the weather causes to the life and health of people. The economic damage from weather events was also investigated.It was found that the greatest harm to human health is caused by tornadoes. At the same time, floods have the greatest economic impact.

Loading and Processing the Raw Data

From U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database we obtained data

library (tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.0     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library (lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library (data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
if (!file.exists("repdata_data_StormData.csv.bz2"))
{download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
                "repdata_data_StormData.csv.bz2", method = "curl")}
if (!exists("data1")) 
  {data1 <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),header = TRUE)}

We extract the data necessary for analysis

data2<-data1[,c("EVTYPE", "BGN_DATE", "FATALITIES", "INJURIES",
               "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

According to National Centers for Environmental Information, only from 1996 to present, 48 event types are recorded as defined in NWS Directive 10-1605. From 1950 through 1954, only tornado events were recorded.From 1955 through 1992, only tornado, thunderstorm wind and hail events were keyed from the paper publications into digital data. From 1993 to 1995, only tornado, thunderstorm wind and hail events have been extracted from the Unformatted Text Files https://www.ncdc.noaa.gov/stormevents/details.jsp Consequently, only data from January 1996 are suitable for the purposes of our analysis.

data2$BGN_DATE <- as.Date(data2$BGN_DATE, "%m/%d/%Y")
data2$YEAR<-year(data2$BGN_DATE)
data3<-subset(data2,YEAR>1995)

Examining the resulting data allows us to formulate a series of commands to clear the EVTYPE column. We will replace all letters with capital letters and correct the names in accordance with the list of events given in the Directive 10-1605 (table 2.1.1). It is clear that in this way we will not achieve full compliance of the content of the column with the list from the Directive. But since we are only interested in the events with the greatest harm, the method seems to be sufficient.

data3$EVTYPE <- toupper(data3$EVTYPE)

data3$EVTYPE[(data3$EVTYPE == "TSTM WIND")] <- "THUNDERSTORM WIND"
data3$EVTYPE[(data3$EVTYPE == "HURRICANE/TYPHOON"|
              data3$EVTYPE == "HURRICANE"|
              data3$EVTYPE == "TYPHOON")]<- "HURRICANE (TYPHOON)"
data3$EVTYPE[(data3$EVTYPE == "WILD/FOREST FIRE")] <- "WILDFIRE"
data3$EVTYPE[(data3$EVTYPE == "RIP CURRENTS")] <- "RIP CURRENT"

To understand the data in the columns PROPDMGEXP and CROPDMGEXP, we find out the unique values

unique (data3$PROPDMGEXP)
## [1] "K" ""  "M" "B" "0"
unique (data3$CROPDMGEXP)
## [1] "K" ""  "M" "B"

From the NWS Directive 10-1605 (p 2.7) we can make conclusion that in columns PROPDMGEXP and CROPDMGEXP “K” means thousands USD, “M” - millions USD, “B” - billions USD. Let`s Change the values of columns PROPDMG and CROPDMG accordingly

data3$TPROPDMGEXP<-1
data3$TPROPDMGEXP [data3$PROPDMGEXP=="K"]<-10^3
data3$TPROPDMGEXP [data3$PROPDMGEXP=="M"]<-10^6
data3$TPROPDMGEXP [data3$PROPDMGEXP=="B"]<-10^9

data3$TCROPDMGEXP<-1
data3$TCROPDMGEXP [data3$CROPDMGEXP=="K"]<-10^3
data3$TCROPDMGEXP [data3$CROPDMGEXP=="M"]<-10^6
data3$TCROPDMGEXP [data3$CROPDMGEXP=="B"]<-10^9

data3$PROPDMG<-data3$PROPDMG*data3$TPROPDMGEXP
data3$CROPDMG<-data3$CROPDMG*data3$TCROPDMGEXP

Final stage of processing data is creating two frames with data to address the questions of which types of events are most harmful to population health and which types of events have the greatest economic consequence. By using quantile for subsetting we obtain lists of more harmful events for health and economics. Parameter prob = 0.977 allows us to obtain top 10 events.

data.health<-summarise(group_by(data3,EVTYPE),
                 FATALITIES=sum(FATALITIES),
                 INJURIES=sum(INJURIES),
                 TOTAL=sum(FATALITIES)+sum(INJURIES))
data.health<-arrange(subset(data.health, 
                    TOTAL > quantile(TOTAL, prob = 0.977)),
                    desc (TOTAL))

data.health
## # A tibble: 10 x 4
##    EVTYPE              FATALITIES INJURIES TOTAL
##    <chr>                    <dbl>    <dbl> <dbl>
##  1 TORNADO                   1511    20667 22178
##  2 EXCESSIVE HEAT            1797     6391  8188
##  3 FLOOD                      414     6758  7172
##  4 THUNDERSTORM WIND          371     5029  5400
##  5 LIGHTNING                  651     4141  4792
##  6 FLASH FLOOD                887     1674  2561
##  7 WILDFIRE                    87     1456  1543
##  8 WINTER STORM               191     1292  1483
##  9 HEAT                       237     1222  1459
## 10 HURRICANE (TYPHOON)        125     1326  1451
data.economic<-summarise (group_by(data3,EVTYPE),
                          PROPDMG=sum(PROPDMG),
                          CROPDMG=sum(CROPDMG),
                          TOTAL=sum(PROPDMG)+sum(CROPDMG))
data.economic<-arrange (subset(data.economic,
                               TOTAL>quantile(TOTAL, prob=0.977)),
                                                desc(TOTAL))
data.economic
## # A tibble: 10 x 4
##    EVTYPE                   PROPDMG     CROPDMG        TOTAL
##    <chr>                      <dbl>       <dbl>        <dbl>
##  1 FLOOD               143944833550  4974778400 148919611950
##  2 HURRICANE (TYPHOON)  81718889010  5350107800  87068996810
##  3 STORM SURGE          43193536000        5000  43193541000
##  4 TORNADO              24616945710   283425010  24900370720
##  5 HAIL                 14595143420  2476029450  17071172870
##  6 FLASH FLOOD          15222203910  1334901700  16557105610
##  7 DROUGHT               1046101000 13367566000  14413667000
##  8 THUNDERSTORM WIND     7860710880   952246350   8812957230
##  9 TROPICAL STORM        7642475550   677711000   8320186550
## 10 WILDFIRE              7760449500   402255130   8162704630

Results

data.health<-as.data.table(data.health)
health.for.gg <- melt(data.health, id.vars = "EVTYPE", variable.name = "Consequences")

plot1<-ggplot (health.for.gg,aes(x = reorder(EVTYPE, -value), y = value))
plot1+geom_bar (aes(fill=Consequences), stat ="identity", 
                position="dodge",show.legend = FALSE)+
facet_wrap(.~Consequences, scales="free_y")+
  ylab("Value") +  
  xlab("Event Type") + 
  theme(axis.text.x = element_text(angle=90, hjust=1)) + 
  ggtitle("The Most Harmful Weather Events for People's Health in US") + 
  theme(plot.title = element_text(hjust = 0.5))

As we can see the most harmful weather event for people’s health in US by total consequences is tornado. However, for people’s life excessive heat is more dangerous than tornado. This weather event more often then tornado leads to fatalities

names(data.economic)<-c ("EVTYPE","Property damage",
                         "Crop damage", "Total damage")
data.economic<-as.data.table(data.economic)
economic.for.gg <- melt(data.economic, id.vars = "EVTYPE", variable.name = "Consequences")

plot2<-ggplot (economic.for.gg,aes(x = reorder(EVTYPE, -value), y = value))
plot2+geom_bar (aes(fill=Consequences), stat ="identity", 
                show.legend = FALSE, position="dodge")+
  facet_wrap(.~Consequences, scales="free_y")+
  ylab("Value") +  
  xlab("Event Type") + 
  theme(axis.text.x = element_text(angle=90, hjust=1)) + 
  ggtitle("The Most Harmful Weather Events for Economics in US") + 
  theme(plot.title = element_text(hjust = 0.5))

By total harm the greatest economic consequences are due to floods. At the same time, the most dangerous event for crop is drought.