This report summarises the top 5 causes of health (fatalities/injuries) and economic damage due to weather in the United states. Data was downloaded from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (Storm Data). The data was summaries based on event type for the whole united states and the top five event types per fatality, injury or economic damage noted as proportion of the whole. Tornado is the top weather event that causes the highest injuries, fatalities and economic damage in the united states. Individually, the top five causes of fatalities in the US were tornado>excessive heat>flash flood > heat>lightning (37%>13%>6%>6%>5%) and heat>TSTM wind > flood>excessive heat>lightning (65%>5%>5%>5%>4%) for injuries. For economic loss caused by weather tornado (32%) followed by flood>flash flood>hail>TSTM wind (12%>11%>8%>7%) were the top weather related events.
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.6.3 magrittr_1.5 tools_3.6.3 htmltools_0.4.0
## [5] Rcpp_1.0.4.6 stringi_1.4.6 rmarkdown_2.1 knitr_1.28
## [9] stringr_1.4.0 xfun_0.13 digest_0.6.25 rlang_0.4.6
## [13] evaluate_0.14
library(ggplot2)
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------- tidyverse 1.3.0 --
## v tibble 3.0.1 v dplyr 0.8.5
## v tidyr 1.0.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## v purrr 0.3.4
## -- Conflicts ------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
The data was downloaded from the link provided, unzipped and imported as an object to R using following codes:
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "data.bz2")
data <- bzfile("data.bz2")
data <- read.csv(data, header = TRUE, sep = ",")
The data was grouped by the event type and injuries/fatalities summaries. The percentiage of each event type of the total fatalities/injuries was then calculated. Finally the data was ordered in descending order and a data frame made from the resulting top highest percentage.
#Summaries injuries and fatalities based on event type
df0 <- data %>% group_by(EVTYPE) %>% summarise(FAT = sum(FATALITIES), INJ = sum(INJURIES))
#Select only columns that are needed
df1 <- select(df0, EVTYPE, FAT, INJ)
#Calculate fatalities and injuries by percentage of total
df1$FATPct <- (df1$FAT / sum(df1$FAT))*100
df1$InPct <- (df1$INJ / sum(df1$INJ))*100
#Arrange in decenting order
df1a <- arrange(df1, desc(FATPct))
df1a <-as.data.frame(df1a[1:5,c(1,4)])
#Formatting significant numbers and number format
df1b <- as.data.frame(df1a) %>% format(digits = 1)
df1b$FATPct <- as.numeric(df1b$FATPct)
The economic data was divided into crop damage and property damage. The exponential number was put into a separate column. The total damage was calculated by summing the crop and property damage in its exponential form. The proportional economic damage of each event type was then calculated. The data was ordered in descentind order and the top 5 causes were combined in data frame.
#Make a data frame that contains the variables needed and convert to numeric
data2 <- select(data, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
data2$PROPDMGEXP <- as.numeric(data2$PROPDMGEXP)
data2$CROPDMGEXP <- as.numeric(data2$CROPDMGEXP)
#Put the data in the right form for comparison and summaries the two types of damages(crop and property)
data2a <- mutate(data2, TPD= PROPDMG *10^(PROPDMGEXP) + CROPDMG *10^(CROPDMGEXP))
#Calculate sum of damage per by event type
data2b <- data2a %>% group_by(EVTYPE) %>% summarise(Damage = sum(TPD))
#Calculate damage by percentage of total
data2b$PerDam <- (data2b$Damage / sum(data2b$Damage))*100
#Pick top 5 causes of damage
data2c <- arrange(data2b, desc(PerDam))
data2c <- data2c[1:5,c(1,3)]
#Formatting significant numbers and number format
data2c <- as.data.frame(data2c) %>% format(digits = 1)
data2c$PerDam <- as.numeric(data2c$PerDam)
The effects of weather on population health is demonstrated by two factors in the data: Fatalities and Injuries. The data was processed such that the sum of all injuries/fatalities was calculated per weather event. The proportion of injuries/fatalities was then calculated and the top 5 causes of injuries/fatalities were reported separately. Overall, tornadoes where the number one cause of weather related heath effects in the US for both fatalities and injuries (Figure 1 & Figure 2). For fatalities, 37% of all weather related fatalities were caused by tornadoes and for injuries, 65% of all weather related injuries were caused by tornadoes. The top 5 causes of fatalities in the US were tornado>excessive heat>flash flood > heat>lightning (37%>13%>6%>6%>5%). The top five injuries in the US were caused by heat>tstm wind > flood>excessive heat>lightning (65%>5%>5%>5%>4%).
#Order in descending order
df1b$EVTYPE <- factor(df1b$EVTYPE, levels = df1b$EVTYPE[order(df1b$FATPct, decreasing = TRUE)])
#Set colors
col = c("turquoise1", "hotpink", "yellow", "greenyellow", "blueviolet")
p1a <- ggplot(data = df1b, aes(x = EVTYPE, y = FATPct))+
geom_bar(position='stack', stat='identity', fill = col)+
theme_classic()+
theme(plot.title = element_text(hjust = 0.5), axis.text.x = element_text(angle = 0, hjust = 0.5), plot.caption = element_text(hjust = 0, face = "bold")
)+
labs(title = "TOP 5 Causes of Fatalies by Weather in the USA", x = "Event type", y = "Fatalities (%)", caption = "Figure 1: Top 5 causes of fatalities in the United States")+
geom_text(aes(label = FATPct, hjust = "left"))
#Arrange in decenting order
df1c <- arrange(df1, desc(InPct))
df1c <-as.data.frame(df1c[1:5,c(1,5)])
df1c <- as.data.frame(df1c) %>% format(digits = 1)
df1c$InPct <- as.numeric(df1c$InPct)
#Order in descending order
df1c$EVTYPE <- factor(df1c$EVTYPE, levels = df1c$EVTYPE[order(df1c$InPct, decreasing = TRUE)])
#Make a figure
p1b <- ggplot(data = df1c, aes(x = EVTYPE, y = InPct))+
geom_bar(position='stack', stat='identity', fill = col)+
theme_classic()+
theme(plot.title = element_text(hjust = 0.5), axis.text.x = element_text(angle = 0, hjust = 0.5), plot.caption = element_text(hjust = 0, face = "bold"))+
labs(title = "TOP 5 Causes of Injuries by Weather in the USA", x = "Event type", y = "Fatalities (%)", caption = "Figure 2: Top 5 causes of injuries in the United States")+
geom_text(aes(label = InPct, hjust = "left"))
p1a
p1b
The economic effects of weather was illustrated by two measurements: property damage and crop damage. For this data analsyis, the number was transformed into its actual number by multiplying the number by the exponential factors. The resulting numbers were then added together and sumed up by event. Finally the percentage of the event compared to total economic loss was calculated and the top five causes were plotted. The weather realted cause ecomonic damage in the USA was tornado (32%) followed by flood>flash flood>hail>TSTM wind (12%>11%>8%>7%).
#Order in descending order
data2c$EVTYPE <- factor(data2c$EVTYPE, levels = data2c$EVTYPE[order(data2c$PerDam, decreasing = TRUE)])
#Set color
col = c("turquoise1", "hotpink", "yellow", "greenyellow", "blueviolet")
#Make a figure
p4 <- ggplot(data = data2c, aes(x=EVTYPE, y=PerDam))+
geom_bar(position='stack', stat='identity', fill = col)+
theme_classic()+
theme(strip.background =element_rect(fill="red"))+
theme(plot.title = element_text(hjust = 0.5), axis.text.x = element_text(angle = 0, hjust = 0.5), plot.caption = element_text(hjust = 0, face = "bold"))+
labs(title = "Top 5 causese of economic damage in the USA", x = "Event type", y = "Damage (%)", caption = "Figure 3: Top 5 causes of economic damage due to weather in the United States")+
geom_text(aes(label = PerDam, hjust = "left"))
p4