We offer the answer to the question indicated in the title based on the analysis of U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.Using this data, we investigated what harm the weather causes to the life and health of people. The economic damage from weather events was also investigated.It was found that the greatest harm to human health is caused by tornadoes. At the same time, floods have the greatest economic impact.
From U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database we obtained data
library (tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library (lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library (data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
if (!file.exists("repdata_data_StormData.csv.bz2"))
{download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"repdata_data_StormData.csv.bz2", method = "curl")}
if (!exists("data1"))
{data1 <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),header = TRUE)}
We extract the data necessary for analysis
data2<-data1[,c("EVTYPE", "BGN_DATE", "FATALITIES", "INJURIES",
"PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
According to National Centers for Environmental Information, only from 1996 to present, 48 event types are recorded as defined in NWS Directive 10-1605. From 1950 through 1954, only tornado events were recorded.From 1955 through 1992, only tornado, thunderstorm wind and hail events were keyed from the paper publications into digital data. From 1993 to 1995, only tornado, thunderstorm wind and hail events have been extracted from the Unformatted Text Files https://www.ncdc.noaa.gov/stormevents/details.jsp Consequently, only data from January 1996 are suitable for the purposes of our analysis.
data2$BGN_DATE <- as.Date(data2$BGN_DATE, "%m/%d/%Y")
data2$YEAR<-year(data2$BGN_DATE)
data3<-subset(data2,YEAR>1995)
Examining the resulting data allows us to formulate a series of commands to clear the EVTYPE column. We will replace all letters with capital letters and correct the names in accordance with the list of events given in the Directive 10-1605 (table 2.1.1). It is clear that in this way we will not achieve full compliance of the content of the column with the list from the Directive. But since we are only interested in the events with the greatest harm, the method seems to be sufficient.
data3$EVTYPE <- toupper(data3$EVTYPE)
data3$EVTYPE[(data3$EVTYPE == "TSTM WIND")] <- "THUNDERSTORM WIND"
data3$EVTYPE[(data3$EVTYPE == "HURRICANE/TYPHOON"|
data3$EVTYPE == "HURRICANE"|
data3$EVTYPE == "TYPHOON")]<- "HURRICANE (TYPHOON)"
data3$EVTYPE[(data3$EVTYPE == "WILD/FOREST FIRE")] <- "WILDFIRE"
data3$EVTYPE[(data3$EVTYPE == "RIP CURRENTS")] <- "RIP CURRENT"
To understand the data in the columns PROPDMGEXP and CROPDMGEXP, we find out the unique values
unique (data3$PROPDMGEXP)
## [1] "K" "" "M" "B" "0"
unique (data3$CROPDMGEXP)
## [1] "K" "" "M" "B"
From the NWS Directive 10-1605 (p 2.7) we can make conclusion that in columns PROPDMGEXP and CROPDMGEXP “K” means thousands USD, “M” - millions USD, “B” - billions USD. Let`s Change the values of columns PROPDMG and CROPDMG accordingly
data3$TPROPDMGEXP<-1
data3$TPROPDMGEXP [data3$PROPDMGEXP=="K"]<-10^3
data3$TPROPDMGEXP [data3$PROPDMGEXP=="M"]<-10^6
data3$TPROPDMGEXP [data3$PROPDMGEXP=="B"]<-10^9
data3$TCROPDMGEXP<-1
data3$TCROPDMGEXP [data3$CROPDMGEXP=="K"]<-10^3
data3$TCROPDMGEXP [data3$CROPDMGEXP=="M"]<-10^6
data3$TCROPDMGEXP [data3$CROPDMGEXP=="B"]<-10^9
data3$PROPDMG<-data3$PROPDMG*data3$TPROPDMGEXP
data3$CROPDMG<-data3$CROPDMG*data3$TCROPDMGEXP
Final stage of processing data is creating two frames with data to address the questions of which types of events are most harmful to population health and which types of events have the greatest economic consequence. By using quantile for subsetting we obtain lists of more harmful events for health and economics. Parameter prob = 0.977 allows us to obtain top 10 events.
data.health<-summarise(group_by(data3,EVTYPE),
FATALITIES=sum(FATALITIES),
INJURIES=sum(INJURIES),
TOTAL=sum(FATALITIES)+sum(INJURIES))
data.health<-arrange(subset(data.health,
TOTAL > quantile(TOTAL, prob = 0.977)),
desc (TOTAL))
data.health
## # A tibble: 10 x 4
## EVTYPE FATALITIES INJURIES TOTAL
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 1511 20667 22178
## 2 EXCESSIVE HEAT 1797 6391 8188
## 3 FLOOD 414 6758 7172
## 4 THUNDERSTORM WIND 371 5029 5400
## 5 LIGHTNING 651 4141 4792
## 6 FLASH FLOOD 887 1674 2561
## 7 WILDFIRE 87 1456 1543
## 8 WINTER STORM 191 1292 1483
## 9 HEAT 237 1222 1459
## 10 HURRICANE (TYPHOON) 125 1326 1451
data.economic<-summarise (group_by(data3,EVTYPE),
PROPDMG=sum(PROPDMG),
CROPDMG=sum(CROPDMG),
TOTAL=sum(PROPDMG)+sum(CROPDMG))
data.economic<-arrange (subset(data.economic,
TOTAL>quantile(TOTAL, prob=0.977)),
desc(TOTAL))
data.economic
## # A tibble: 10 x 4
## EVTYPE PROPDMG CROPDMG TOTAL
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 143944833550 4974778400 148919611950
## 2 HURRICANE (TYPHOON) 81718889010 5350107800 87068996810
## 3 STORM SURGE 43193536000 5000 43193541000
## 4 TORNADO 24616945710 283425010 24900370720
## 5 HAIL 14595143420 2476029450 17071172870
## 6 FLASH FLOOD 15222203910 1334901700 16557105610
## 7 DROUGHT 1046101000 13367566000 14413667000
## 8 THUNDERSTORM WIND 7860710880 952246350 8812957230
## 9 TROPICAL STORM 7642475550 677711000 8320186550
## 10 WILDFIRE 7760449500 402255130 8162704630
data.health<-as.data.table(data.health)
health.for.gg <- melt(data.health, id.vars = "EVTYPE", variable.name = "Consequences")
plot1<-ggplot (health.for.gg,aes(x = reorder(EVTYPE, -value), y = value))
plot1+geom_bar (aes(fill=Consequences), stat ="identity",
position="dodge",show.legend = FALSE)+
facet_wrap(.~Consequences, scales="free_y")+
ylab("Value") +
xlab("Event Type") +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
ggtitle("The Most Harmful Weather Events for People's Health in US") +
theme(plot.title = element_text(hjust = 0.5))
As we can see the most harmful weather event for people’s health in US by total consequences is tornado. However, for people’s life excessive heat is more dangerous than tornado. This weather event more often then tornado leads to fatalities
names(data.economic)<-c ("EVTYPE","Property damage",
"Crop damage", "Total damage")
data.economic<-as.data.table(data.economic)
economic.for.gg <- melt(data.economic, id.vars = "EVTYPE", variable.name = "Consequences")
plot2<-ggplot (economic.for.gg,aes(x = reorder(EVTYPE, -value), y = value))
plot2+geom_bar (aes(fill=Consequences), stat ="identity",
show.legend = FALSE, position="dodge")+
facet_wrap(.~Consequences, scales="free_y")+
ylab("Value") +
xlab("Event Type") +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
ggtitle("The Most Harmful Weather Events for Economics in US") +
theme(plot.title = element_text(hjust = 0.5))
By total harm the greatest economic consequences are due to floods. At the same time, the most dangerous event for crop is drought.