The objective of this report is to show the climatic events that are most harmful to the health of the population and those that cause the most economic damage. The damage to the health of the population will be measured by the number of fatalities and injuries, on the other hand, the economic damage will be measured by the costs in thousands of dollars in property and crop damage. The information for the report will be obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
From U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, we obtained data about characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011.
First, the data compressed file is downloaded. The data is a csv file compressed with the bz2 algorithm, so the file is read using the read.csv function without extracting the file.
file.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file.name <- "StormData.csv.bz2"
if(!file.exists(file.name)) {
download.file(file.url, destfile = file.name)
}
storm.data <- read.csv(file.name, stringsAsFactors = FALSE)
The data has 37 variables which are
str(storm.data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
And the following number of records
nrow(storm.data)
## [1] 902297
The variables of interest are EVTYPE, FATALITIES, INJURIES, PROPDMG and CROPDMG. PROPDMG and CROPDMG refer to property damage and crop damage respectively and are measured in thousands of dollars.
library(dplyr)
storm.data <- select(storm.data, EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)
Group average injuries and average fatalities by event type
library(ggplot2)
health.impact <- storm.data %>%
group_by(EVTYPE) %>%
summarise(AVG.FATALITIES = mean(FATALITIES, na.rm = TRUE),
AVG.INJURIES = mean(INJURIES, na.rm = TRUE))
10 most harmful weather events by average fatalities.
fatalities.ranking <- health.impact %>%
select(EVTYPE, AVG.FATALITIES) %>%
arrange(desc(AVG.FATALITIES), .by_group=TRUE) %>%
slice_head(n=10)
ggplot(data=fatalities.ranking, aes(x=EVTYPE, y=AVG.FATALITIES, fill = EVTYPE)) +
geom_bar(stat="identity", color="black") +
labs(y="Average number of fatalities",
x = "Weather event",
title="10 most harmful weather events by average fatalities") +
scale_fill_brewer(palette="PiYG") +
theme(legend.position="none") +
scale_x_discrete(guide = guide_axis(n.dodge=3))
10 most harmful weather events by average injuries.
injuries.ranking <- health.impact %>%
select(EVTYPE, AVG.INJURIES) %>%
arrange(desc(AVG.INJURIES), .by_group=TRUE) %>%
slice_head(n=10)
ggplot(data=injuries.ranking, aes(x=EVTYPE, y=AVG.INJURIES, fill = EVTYPE)) +
geom_bar(stat="identity", color="black") +
labs(y="Average number of injuries",
x = "Weather event",
title="10 most harmful weather events by average injuries") +
scale_fill_brewer(palette="Spectral") +
theme(legend.position="none") +
scale_x_discrete(guide = guide_axis(n.dodge=3))
Regarding the sum of property damages and crop damages in thousands of US dollars, the next figure shows the 10 weather events which cause more damages in economic terms.
library(ggplot2)
economic.damage <- storm.data %>%
group_by(EVTYPE) %>%
summarise(TOTAL.DAMAGES.IN.DOLLARS = sum(PROPDMG, na.rm = TRUE) +
sum(CROPDMG, na.rm = TRUE)) %>%
arrange(desc(TOTAL.DAMAGES.IN.DOLLARS), .by_group=TRUE) %>%
slice_head(n=10)
ggplot(data=economic.damage, aes(x=EVTYPE, y=TOTAL.DAMAGES.IN.DOLLARS, fill = EVTYPE)) +
geom_bar(stat="identity", color="black") +
labs(y="Thousands of dollars",
x = "Weather event",
title="10 weather events which cause more damages in economic terms") +
scale_fill_brewer(palette="Paired") +
theme(legend.position="none") +
scale_x_discrete(guide = guide_axis(n.dodge=2))