Each year, severe weather conditions have big impact on the United States. In this analysis, we analyzed the impact of storms in the US in terms of the number of injuries, fatalities, as well as economic consequences. It was found that tornadoes are particularly dangerous for people’s lives. They also generate much damage to the property. Heat is also another type of events that leads many people to death. On the other hand, flood is the type that has the greatest economic consequences.
We import libraries that will be used to make the analysis.
library(dplyr)
library(ggplot2)
library(xtable)
For the purpose of this project, we are going to use the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.
We start with downloading the CSV file and loading it into a data frame
if (!file.exists("storm_data.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "storm_data.csv.bz2")
}
storm_data <- read.csv("storm_data.csv.bz2")
names(storm_data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
We group all rows by events and take the total number of fatalities and injuries for them.
storm_grouped_harm <- storm_data %>%
group_by(EVTYPE) %>%
summarise(TOTAL_INJURIES = sum(INJURIES), TOTAL_FATALITIES = sum(FATALITIES))
We sort the events in the descending order by fatalities and injuries and save results into separate 2 data frames. We find the top 5 of both to make our visualization less cluttered.
top5_fatalities <- head(storm_grouped_harm %>% arrange(-TOTAL_FATALITIES))
top5_injuries <- head(storm_grouped_harm %>% arrange(-TOTAL_INJURIES))
print(xtable(top5_fatalities), type="html")
EVTYPE | TOTAL_INJURIES | TOTAL_FATALITIES | |
---|---|---|---|
1 | TORNADO | 91346.00 | 5633.00 |
2 | EXCESSIVE HEAT | 6525.00 | 1903.00 |
3 | FLASH FLOOD | 1777.00 | 978.00 |
4 | HEAT | 2100.00 | 937.00 |
5 | LIGHTNING | 5230.00 | 816.00 |
6 | TSTM WIND | 6957.00 | 504.00 |
For both data frames, we change the type of EVTYPE to factors to preserve the orders while plotting.
top5_fatalities$EVTYPE <- factor(top5_fatalities$EVTYPE, levels = top5_fatalities$EVTYPE)
top5_injuries$EVTYPE <- factor(top5_injuries$EVTYPE, levels = top5_injuries$EVTYPE)
We plot the top 5 event types in terms of fatalities.
ggplot(top5_fatalities, aes(x = EVTYPE, y = TOTAL_FATALITIES)) +
geom_bar(stat="identity") +
ggtitle("Top 5 most fatal event types") +
xlab("Event type") +
ylab("Number of fatalities")
We do the same for injuries
ggplot(top5_injuries, aes(x = EVTYPE, y = TOTAL_INJURIES)) +
geom_bar(stat="identity") +
ggtitle("Top 5 event types in terms of injuries") +
xlab("Event type") +
ylab("Number of injuries")
Undoubtedly, the most harmful events are tornadoes. Other harmful events are those associated with excessive heat and thunderstorm wind.
This time we will find the overall cost of each event type.
We need to first multiply the property damage column with its multiplier. Let’s check what are the distinct multipliers we need to use.
unique(storm_data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
We can omit the empty multipliers, special characters (“-”, “?”) and digits 0-8 (footnotes) because they do not add anything new to our data. We will interpret the rest in the following way:
We multiply the property damage column according to the multipliers.
storm_data <- storm_data %>%
mutate(PROPDMG = PROPDMG * ifelse(PROPDMGEXP == "H", 1e2,
ifelse(PROPDMGEXP == "K", 1e3,
ifelse(PROPDMGEXP == "M", 1e6,
ifelse(PROPDMGEXP == "m", 1e6,
ifelse(PROPDMGEXP == "B", 1e9,
1
))))))
head(storm_data$PROPDMG, 10)
## [1] 25000 2500 25000 2500 2500 2500 2500 2500 25000 25000
Now, we are ready to sum up the property damage for every type
storm_grouped_cost <- storm_data %>%
group_by(EVTYPE) %>%
summarise(TOTAL_COST = sum(PROPDMG))
print(xtable(head(storm_grouped_cost)), type="html")
EVTYPE | TOTAL_COST | |
---|---|---|
1 | HIGH SURF ADVISORY | 200000.00 |
2 | COASTAL FLOOD | 0.00 |
3 | FLASH FLOOD | 50000.00 |
4 | LIGHTNING | 0.00 |
5 | TSTM WIND | 8100000.00 |
6 | TSTM WIND (G45) | 8000.00 |
We sort the data frame in the decreasing order by the total cost and choose top 5.
storm_grouped_cost <- storm_grouped_cost %>%
arrange(-TOTAL_COST)
top5_cost <- head(storm_grouped_cost)
print(xtable(top5_cost), type="html")
EVTYPE | TOTAL_COST | |
---|---|---|
1 | FLOOD | 144657709807.00 |
2 | HURRICANE/TYPHOON | 69305840000.00 |
3 | TORNADO | 56937160778.70 |
4 | STORM SURGE | 43323536000.00 |
5 | FLASH FLOOD | 16140812067.10 |
6 | HAIL | 15732267542.70 |
We change the event type column to factors to preserver order on a plot.
top5_cost$EVTYPE <- factor(top5_cost$EVTYPE, levels = top5_cost$EVTYPE)
We plot the top 5 to get better insight. We express the y-axis in billions of dollars.
ggplot(top5_cost, aes(x = EVTYPE, y = TOTAL_COST/1e9)) +
geom_bar(stat="identity") +
ggtitle("Top 5 event types in terms of total cost") +
xlab("Event type") +
ylab("Billions of dollars")
Events related to rain and wind have the greatest economic consequences. The main event type in flood, followed by hurricanes and tornadoes.