The goal of this report is to better understand exactly which types of weather events have the greatest effects on public health and have the highest economic consequences.
Your data analysis must address the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences?
First, check to see if the data is available in the root directory of this project. If not, then the data will be downloaded. Following the check for the data, it will just be loaded through a read.csv() call.
if (!file.exists("StormData.csv.bz2"))
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2")
data <- read.csv("StormData.csv.bz2")
We are going to create a new dataset that includes only the EVTYPE, FATALITIES, INJURIES, and PROPDMG columns, then we are going to group the set by EVTYPE.
weather <- select(data, EVTYPE, FATALITIES, INJURIES, PROPDMG) %>%
group_by(EVTYPE) %>%
summarize(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), PROPDMG = sum(PROPDMG))
Here is a sample of the new dataset:
head(weather)
## # A tibble: 6 x 4
## EVTYPE FATALITIES INJURIES PROPDMG
## <fct> <dbl> <dbl> <dbl>
## 1 " HIGH SURF ADVISORY" 0 0 200
## 2 " COASTAL FLOOD" 0 0 0
## 3 " FLASH FLOOD" 0 0 50
## 4 " LIGHTNING" 0 0 0
## 5 " TSTM WIND" 0 0 108
## 6 " TSTM WIND (G45)" 0 0 8
We’re going to take the weather dataset and split it into three different data sets that will have the top five event types in property damage, injuries, and fatalities. Let’s also print the results so that we can see the data that is returned.
damage <- arrange(weather, desc(PROPDMG)) %>%
head(5) %>%
print()
## # A tibble: 5 x 4
## EVTYPE FATALITIES INJURIES PROPDMG
## <fct> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 3212258.
## 2 FLASH FLOOD 978 1777 1420125.
## 3 TSTM WIND 504 6957 1335966.
## 4 FLOOD 470 6789 899938.
## 5 THUNDERSTORM WIND 133 1488 876844.
injuries <- arrange(weather, desc(INJURIES)) %>%
head(5) %>%
print()
## # A tibble: 5 x 4
## EVTYPE FATALITIES INJURIES PROPDMG
## <fct> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 3212258.
## 2 TSTM WIND 504 6957 1335966.
## 3 FLOOD 470 6789 899938.
## 4 EXCESSIVE HEAT 1903 6525 1460
## 5 LIGHTNING 816 5230 603352.
fatalities <- arrange(weather, desc(FATALITIES)) %>%
head(5) %>%
print()
## # A tibble: 5 x 4
## EVTYPE FATALITIES INJURIES PROPDMG
## <fct> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 3212258.
## 2 EXCESSIVE HEAT 1903 6525 1460
## 3 FLASH FLOOD 978 1777 1420125.
## 4 HEAT 937 2100 298.
## 5 LIGHTNING 816 5230 603352.
Here are the graphs that show the results of the three data sets:
ggplot(damage, aes(x = reorder(EVTYPE, -PROPDMG), y = PROPDMG)) + geom_bar(stat = "identity") + labs(title = "Damage by Event Type", x = "Event Type", y = "Property Damage")
ggplot(injuries, aes(x = reorder(EVTYPE, -INJURIES), y = INJURIES)) + geom_bar(stat = "identity") + labs(title = "Injuries by Event Type", x = "Event Type", y = "Number of Injuries")
ggplot(fatalities, aes(x = reorder(EVTYPE, -FATALITIES), y = FATALITIES)) + geom_bar(stat = "identity") + labs(title = "Fatalities by Event Type", x = "Event Type", y = "Number of Fatalities")
The data clearly shows that tornadoes rank first in all three categories, however, second through fifth in each category is different. Excessive heat ranks second in fatalities. Thunderstorm wind is second in injuries, and flash floods rank second in property damage.