This report aims to provide an analysis on which severe weather conditions are the most harmful to health population and have the greatest economic consequences. The pre-processing of data are shown and plots are provided to visualize the findings.
We will first read the data and convert it into a data table
library("data.table")
storm_data <- read.csv("repdata_data_StormData.csv") #read raw file
storm_tb <- as.data.table(storm_data) #convert to data table
Next, we will extract the columns that we are interested in which are the type of event (EVTYPE), fatalities (FATALITIES), injuries (INJURIES), and those columns that describe the amount of damage (DMG).
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
cleaned_storm <- storm_tb %>% select (EVTYPE,FATALITIES, INJURIES, contains("DMG"))
Notice that PROPDMGEXP and CROPDMGEXP columns are in terms of alphanumeric exponents. We will convert them to numeric values so they can be used to calculate the property and crop cost later on
columns <- c("PROPDMGEXP", "CROPDMGEXP")
cleaned_storm [, (columns) := c(lapply(.SD, toupper)), .SDcols = columns]
property_dmg <- c("\"\"" = 10^0,"-" = 10^0, "+" = 10^0,
"0" = 10^0,"1" = 10^1,"2" = 10^2,
"3" = 10^3,"4" = 10^4,"5" = 10^5,
"6" = 10^6,"7" = 10^7,"8" = 10^8,
"9" = 10^9,"H" = 10^2,"K" = 10^3,
"M" = 10^6,"B" = 10^9)
crop_dmg <- c("\"\"" = 10^0,"?" = 10^0, "0" = 10^0,
"K" = 10^3,"M" = 10^6,"B" = 10^9)
cleaned_storm[, PROPDMGEXP := property_dmg[as.character(cleaned_storm[,PROPDMGEXP])]]
cleaned_storm[is.na(PROPDMGEXP), PROPDMGEXP := 10^0 ]
cleaned_storm[, CROPDMGEXP := crop_dmg [as.character(cleaned_storm[,CROPDMGEXP])] ]
cleaned_storm[is.na(CROPDMGEXP), CROPDMGEXP := 10^0 ]
First, we will create new columns for property cost and crop cost
cleaned_storm$prop_cost <- cleaned_storm$PROPDMG * cleaned_storm$PROPDMGEXP
cleaned_storm$crop_cost <- cleaned_storm$CROPDMG * cleaned_storm$CROPDMGEXP
Then we will compute for the total property cost and total crop cost
total_cost <- cleaned_storm[, .(Total_Cost = sum(prop_cost) + sum(crop_cost), property_cost = sum(prop_cost), crop_cost = sum(crop_cost)), by = .(EVTYPE)]
total_cost <- head(arrange(total_cost,desc(Total_Cost)), n = 10)
total_cost
## EVTYPE Total_Cost property_cost crop_cost
## 1: FLOOD 150319678257 144657709807 5661968450
## 2: HURRICANE/TYPHOON 71913712800 69305840000 2607872800
## 3: TORNADO 57362333946 56947380676 414953270
## 4: STORM SURGE 43323541000 43323536000 5000
## 5: HAIL 18761221986 15735267513 3025954473
## 6: FLASH FLOOD 18243991078 16822673978 1421317100
## 7: DROUGHT 15018672000 1046106000 13972566000
## 8: HURRICANE 14610229010 11868319010 2741910000
## 9: RIVER FLOOD 10148404500 5118945500 5029459000
## 10: ICE STORM 8967041360 3944927860 5022113500
This time, we will compute for the total fatalities and total injuries
total_fat_inj <- cleaned_storm[, .(
total_incidents = sum(INJURIES) + sum(FATALITIES),
injuries = sum(INJURIES),
fatalities = sum(FATALITIES)
), by = .(EVTYPE)]
total_fat_inj <- head(arrange(total_fat_inj,desc(total_incidents)), n = 10)
total_fat_inj
## EVTYPE total_incidents injuries fatalities
## 1: TORNADO 96979 91346 5633
## 2: EXCESSIVE HEAT 8428 6525 1903
## 3: TSTM WIND 7461 6957 504
## 4: FLOOD 7259 6789 470
## 5: LIGHTNING 6046 5230 816
## 6: HEAT 3037 2100 937
## 7: FLASH FLOOD 2755 1777 978
## 8: ICE STORM 2064 1975 89
## 9: THUNDERSTORM WIND 1621 1488 133
## 10: WINTER STORM 1527 1321 206
After we have done our calculations, we can now make our analysis.
First, we reshape our data to a long format such that the the fatalities, injuries, and the total number of fatalities and injuries are in one column with their corresponding values. This helps us create the plot easier.
incidents_table <- melt(total_fat_inj , id.vars="EVTYPE", variable.name = "incidents")
Here’s a barplot of top 10 most harmful severe weather events according the highest total number of incidents such as injuries and fatalities.
library(ggplot2)
colors <- c("#1887ab", "#ea8553", "#eec76b")
health_chart <- ggplot(incidents_table, aes(fill = incidents, y = value, x = reorder(EVTYPE, -value))) +
geom_bar(position = "dodge", stat = "identity", color="black") +
labs(title = "Top 10 Most Harmful Severe Weather Conditions to Health Population",
y = "Frequency", x = "Event Type") +
theme(axis.text.x = element_text(angle = 90), legend.title = element_blank()) +
scale_fill_manual(values = colors, labels = c("Total Incidents", "Injuries", "Fatalities"))
health_chart
From the barplot, we can see that tornado recorded the most number of injuries and fatalities. Therefore, it is the most harmful to health population.
First, we reshape our data to a long format such that the the property costs, crop costs, and the total number of property and crop costs are in one column with their corresponding values. This helps us create the plot easier.
economic_table <- melt(total_cost , id.vars="EVTYPE", variable.name = "cost_type")
Here’s a barplot of top 10 severe weather events affecting the economy according to the highest total cost.
library(ggplot2)
colors <- c("#1887ab", "#ea8553", "#eec76b")
economic_chart <- ggplot(economic_table, aes(fill = cost_type, y = value, x = reorder(EVTYPE, -value))) +
geom_bar(position = "dodge", stat = "identity", color="black") +
labs(title = "Top 10 Most Severe Weather Conditions with Greatest Economic Consequences",
y = "Frequency", x = "Event Type") +
theme(axis.text.x = element_text(angle = 90), legend.title = element_blank()) +
scale_fill_manual(values = colors, labels = c("Total Cost", "Property Cost", "Crops Cost"))
economic_chart
From the barplot, flood incurred the most cost among the severe weather events. Therefore, it has the greatest economic consequence.