Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.1 (2020-08-26 16:20:06 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.24.0 (2020-08-26 16:11:58 UTC) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
##
## throw
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, load, save
## R.utils v2.10.1 (2020-08-26 22:50:31 UTC) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:tidyr':
##
## extract
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, nullfile, parse,
## warnings
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(ggplot2)
setwd("C:/Users/ventd/OneDrive/Escritorio/Coursera/Reproducible Research/RRProject2")
filename <- "Dataset.csv.bz2"
##checking if the file already exists, if it doesn't it will download it
if (!file.exists(filename)){
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, filename)
}
##Reading data
xdata <- read.csv(bzfile(filename))
dim(xdata)
## [1] 902297 37
str(xdata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
#Creating the table with variables we are going to use
pdata <- select(xdata, BGN_DATE, STATE, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, FATALITIES, INJURIES)
Checking if there are any NA’s in the data
sum(is.na(pdata))
## [1] 0
Format the BGN_DATE variable as a date Latest years are the ones with more relevant information, 48 event since 1996
pdata$BGN_DATE <- as.Date(pdata$BGN_DATE, "%m/%d/%Y")
pdata$YEAR <- year(pdata$BGN_DATE)
pdata <- filter(pdata, YEAR >= 1996)
Only use events with either health impact or economic damage
pdata <- filter(pdata, PROPDMG > 0 | CROPDMG > 0 | FATALITIES > 0 | INJURIES > 0)
table(pdata$PROPDMGEXP)
##
## B K M
## 8448 32 185474 7364
table(pdata$CROPDMGEXP)
##
## B K M
## 102767 2 96787 1762
pdata$PROPDMGEXP <- toupper(pdata$PROPDMGEXP)
pdata$CROPDMGEXP <- toupper(pdata$CROPDMGEXP)
pdata <- pdata %>%
mutate(CROPDMGFACTOR = case_when(CROPDMGEXP == "" ~ 10^0 * CROPDMG,
CROPDMGEXP == "?" ~ 10^0 * CROPDMG,
CROPDMGEXP == "0" ~ 10^0 * CROPDMG,
CROPDMGEXP == "2" ~ 10^2 * CROPDMG,
CROPDMGEXP == "K" ~ 10^3 * CROPDMG,
CROPDMGEXP == "M" ~ 10^6 * CROPDMG,
CROPDMGEXP == "B" ~ 10^9 * CROPDMG)) %>%
mutate(PROPDMGFACTOR = case_when(PROPDMGEXP == "" ~ 10^0 * PROPDMG,
PROPDMGEXP == "-" ~ 10^0 * PROPDMG,
PROPDMGEXP == "?" ~ 10^0 * PROPDMG,
PROPDMGEXP == "+" ~ 10^0 * PROPDMG,
PROPDMGEXP == "0" ~ 10^0 * PROPDMG,
PROPDMGEXP == "1" ~ 10^1 * PROPDMG,
PROPDMGEXP == "2" ~ 10^2 * PROPDMG,
PROPDMGEXP == "3" ~ 10^3 * PROPDMG,
PROPDMGEXP == "4" ~ 10^4 * PROPDMG,
PROPDMGEXP == "5" ~ 10^5 * PROPDMG,
PROPDMGEXP == "6" ~ 10^6 * PROPDMG,
PROPDMGEXP == "7" ~ 10^7 * PROPDMG,
PROPDMGEXP == "8" ~ 10^8 * PROPDMG,
PROPDMGEXP == "H" ~ 10^2 * PROPDMG,
PROPDMGEXP == "K" ~ 10^3 * PROPDMG,
PROPDMGEXP == "M" ~ 10^6 * PROPDMG,
PROPDMGEXP == "B" ~ 10^9 * PROPDMG,)) %>%
mutate(SUMDMG = PROPDMGFACTOR+CROPDMGFACTOR) %>%
mutate(SUMFATINJ = FATALITIES + INJURIES)
Check if there is any NA in the new 2 columns
sum(is.na(pdata))
## [1] 0
sumpdata <- pdata %>%
group_by(EVTYPE) %>%
summarize(SUMFATALITIES = sum(FATALITIES),
SUMINJURIES = sum(INJURIES),
TOTALFATINJ = sum(SUMFATINJ),
SUMPROPDMG = sum(PROPDMGFACTOR),
SUMCROPDMG = sum(CROPDMGFACTOR),
TOTALDMG = sum(SUMDMG))
## `summarise()` ungrouping output (override with `.groups` argument)
head(sumpdata)
## # A tibble: 6 x 7
## EVTYPE SUMFATALITIES SUMINJURIES TOTALFATINJ SUMPROPDMG SUMCROPDMG TOTALDMG
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 " HIGH~ 0 0 0 200000 0 200000
## 2 " FLASH ~ 0 0 0 50000 0 50000
## 3 " TSTM W~ 0 0 0 8100000 0 8100000
## 4 " TSTM W~ 0 0 0 8000 0 8000
## 5 "AGRICUL~ 0 0 0 0 28820000 28820000
## 6 "ASTRONO~ 0 0 0 9425000 0 9425000
Harmful Impact with respect population Results
Impact <- arrange(sumpdata, desc(TOTALFATINJ))
ImpactData <- head(Impact,10)
ImpactData
## # A tibble: 10 x 7
## EVTYPE SUMFATALITIES SUMINJURIES TOTALFATINJ SUMPROPDMG SUMCROPDMG TOTALDMG
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 TORNADO 1511 20667 22178 2.46e10 283425010 2.49e10
## 2 EXCESSI~ 1797 6391 8188 7.72e 6 492402000 5.00e 8
## 3 FLOOD 414 6758 7172 1.44e11 4974778400 1.49e11
## 4 LIGHTNI~ 651 4141 4792 7.43e 8 6898440 7.50e 8
## 5 TSTM WI~ 241 3629 3870 4.48e 9 553915350 5.03e 9
## 6 FLASH F~ 887 1674 2561 1.52e10 1334901700 1.66e10
## 7 THUNDER~ 130 1400 1530 3.38e 9 398331000 3.78e 9
## 8 WINTER ~ 191 1292 1483 1.53e 9 11944000 1.54e 9
## 9 HEAT 237 1222 1459 1.52e 6 176500 1.70e 6
## 10 HURRICA~ 64 1275 1339 6.93e10 2607872800 7.19e10
Economic Impact Results
Economic <- arrange(sumpdata, desc(TOTALDMG))
EconomicData <- head(Economic,10)
EconomicData
## # A tibble: 10 x 7
## EVTYPE SUMFATALITIES SUMINJURIES TOTALFATINJ SUMPROPDMG SUMCROPDMG TOTALDMG
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FLOOD 414 6758 7172 1.44e11 4.97e 9 1.49e11
## 2 HURRICA~ 64 1275 1339 6.93e10 2.61e 9 7.19e10
## 3 STORM S~ 2 37 39 4.32e10 5.00e 3 4.32e10
## 4 TORNADO 1511 20667 22178 2.46e10 2.83e 8 2.49e10
## 5 HAIL 7 713 720 1.46e10 2.48e 9 1.71e10
## 6 FLASH F~ 887 1674 2561 1.52e10 1.33e 9 1.66e10
## 7 HURRICA~ 61 46 107 1.18e10 2.74e 9 1.46e10
## 8 DROUGHT 0 4 4 1.05e 9 1.34e10 1.44e10
## 9 TROPICA~ 57 338 395 7.64e 9 6.78e 8 8.32e 9
## 10 HIGH WI~ 235 1083 1318 5.25e 9 6.34e 8 5.88e 9
ImpactData$EVTYPE <- with(ImpactData, reorder(EVTYPE, -TOTALFATINJ))
ImpactDataS <- ImpactData %>%
gather(key = "Type", value = "TOTALIMPACT", c("SUMFATALITIES", "SUMINJURIES")) %>%
select(EVTYPE, Type, TOTALIMPACT)
ImpactDataS$Type[ImpactDataS$Type %in% c("SUMFATALITIES")] <- "Deaths"
ImpactDataS$Type[ImpactDataS$Type %in% c("SUMINJURIES")] <- "Injuries"
plot1 <- ggplot(ImpactDataS, aes(x = EVTYPE, y = TOTALIMPACT, fill = Type)) +
geom_bar(stat = "identity", position = "dodge2") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") +
ylab("Health Impact") +
ggtitle("Health Impact Events") +
theme(plot.title = element_text(hjust = 0.5))
plot1
EconomicData$EVTYPE <- with(EconomicData, reorder(EVTYPE, -TOTALDMG))
EconomicDataS <- EconomicData %>%
gather(key = "Type", value = "TOTALDAMAGE", c("SUMPROPDMG", "SUMCROPDMG")) %>%
select(EVTYPE, Type, TOTALDAMAGE)
EconomicDataS$Type[EconomicDataS$Type %in% c("SUMPROPDMG")] <- "Property damage"
EconomicDataS$Type[EconomicDataS$Type %in% c("SUMCROPDMG")] <- "Crop damage"
plot2 <- ggplot(EconomicDataS, aes(x = EVTYPE, y = TOTALDAMAGE, fill = Type)) +
geom_bar(stat = "identity", position = "dodge2") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") +
ylab("Economic Impact") +
ggtitle("Economic Impact Events") +
theme(plot.title = element_text(hjust = 0.5))
plot2
Per the graphics we can conclude what events are more dangerous for people’s economy and health, if we can get a better way to anticipate and evacuate people this can help to reduce the danger, and getting a better preparation for events such as a flood in materials that can help reduce the economic impact.