Synopsis

This report aims to provide an analysis on which severe weather conditions are the most harmful to health population and have the greatest economic consequences. The pre-processing of data are shown and plots are provided to visualize the findings.

Data Processing

We will first read the data and convert it into a data table

library("data.table")
storm_data <- read.csv("repdata_data_StormData.csv") #read raw file
storm_tb <- as.data.table(storm_data) #convert to data table

Next, we will extract the columns that we are interested in which are the type of event (EVTYPE), fatalities (FATALITIES), injuries (INJURIES), and those columns that describe the amount of damage (DMG).

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
cleaned_storm <- storm_tb %>% select (EVTYPE,FATALITIES, INJURIES, contains("DMG"))

Notice that PROPDMGEXP and CROPDMGEXP columns are in terms of alphanumeric exponents. We will convert them to numeric values so they can be used to calculate the property and crop cost later on

columns <- c("PROPDMGEXP", "CROPDMGEXP")
cleaned_storm [,  (columns) := c(lapply(.SD, toupper)), .SDcols = columns]
property_dmg <-  c("\"\"" = 10^0,"-" = 10^0, "+" = 10^0,
                 "0" = 10^0,"1" = 10^1,"2" = 10^2,
                 "3" = 10^3,"4" = 10^4,"5" = 10^5,
                 "6" = 10^6,"7" = 10^7,"8" = 10^8,
                 "9" = 10^9,"H" = 10^2,"K" = 10^3,
                 "M" = 10^6,"B" = 10^9)

crop_dmg <-  c("\"\"" = 10^0,"?" = 10^0, "0" = 10^0,
                "K" = 10^3,"M" = 10^6,"B" = 10^9)

cleaned_storm[, PROPDMGEXP := property_dmg[as.character(cleaned_storm[,PROPDMGEXP])]]
cleaned_storm[is.na(PROPDMGEXP), PROPDMGEXP := 10^0 ]
cleaned_storm[, CROPDMGEXP := crop_dmg [as.character(cleaned_storm[,CROPDMGEXP])] ]
cleaned_storm[is.na(CROPDMGEXP), CROPDMGEXP := 10^0 ]

Calculations

First, we will create new columns for property cost and crop cost

cleaned_storm$prop_cost <- cleaned_storm$PROPDMG * cleaned_storm$PROPDMGEXP
cleaned_storm$crop_cost <- cleaned_storm$CROPDMG * cleaned_storm$CROPDMGEXP

Then we will compute for the total property cost and total crop cost

total_cost <- cleaned_storm[, .(Total_Cost = sum(prop_cost) + sum(crop_cost), property_cost = sum(prop_cost), crop_cost = sum(crop_cost)),  by = .(EVTYPE)]
total_cost <- head(arrange(total_cost,desc(Total_Cost)), n = 10)
total_cost
##                EVTYPE   Total_Cost property_cost   crop_cost
##  1:             FLOOD 150319678257  144657709807  5661968450
##  2: HURRICANE/TYPHOON  71913712800   69305840000  2607872800
##  3:           TORNADO  57362333946   56947380676   414953270
##  4:       STORM SURGE  43323541000   43323536000        5000
##  5:              HAIL  18761221986   15735267513  3025954473
##  6:       FLASH FLOOD  18243991078   16822673978  1421317100
##  7:           DROUGHT  15018672000    1046106000 13972566000
##  8:         HURRICANE  14610229010   11868319010  2741910000
##  9:       RIVER FLOOD  10148404500    5118945500  5029459000
## 10:         ICE STORM   8967041360    3944927860  5022113500

This time, we will compute for the total fatalities and total injuries

total_fat_inj <- cleaned_storm[, .(
  total_incidents = sum(INJURIES) + sum(FATALITIES),
  injuries = sum(INJURIES),
  fatalities = sum(FATALITIES)
), by = .(EVTYPE)]

total_fat_inj <-  head(arrange(total_fat_inj,desc(total_incidents)), n = 10)
total_fat_inj
##                EVTYPE total_incidents injuries fatalities
##  1:           TORNADO           96979    91346       5633
##  2:    EXCESSIVE HEAT            8428     6525       1903
##  3:         TSTM WIND            7461     6957        504
##  4:             FLOOD            7259     6789        470
##  5:         LIGHTNING            6046     5230        816
##  6:              HEAT            3037     2100        937
##  7:       FLASH FLOOD            2755     1777        978
##  8:         ICE STORM            2064     1975         89
##  9: THUNDERSTORM WIND            1621     1488        133
## 10:      WINTER STORM            1527     1321        206

Results

After we have done our calculations, we can now make our analysis.

Population Health

First, we reshape our data to a long format such that the the fatalities, injuries, and the total number of fatalities and injuries are in one column with their corresponding values. This helps us create the plot easier.

incidents_table <- melt(total_fat_inj , id.vars="EVTYPE", variable.name = "incidents")

Here’s a barplot of top 10 most harmful severe weather events according the highest total number of incidents such as injuries and fatalities.

library(ggplot2)
colors <- c("#1887ab", "#ea8553", "#eec76b")  
health_chart <- ggplot(incidents_table, aes(fill = incidents, y = value, x = reorder(EVTYPE, -value))) +
  geom_bar(position = "dodge", stat = "identity", color="black") +
  labs(title = "Top 10 Most Harmful Severe Weather Conditions to Health Population",
       y = "Frequency", x = "Event Type") +
  theme(axis.text.x = element_text(angle = 90), legend.title = element_blank()) +
  scale_fill_manual(values = colors, labels = c("Total Incidents", "Injuries", "Fatalities"))
health_chart

From the barplot, we can see that tornado recorded the most number of injuries and fatalities. Therefore, it is the most harmful to health population.

Economic Consequences

First, we reshape our data to a long format such that the the property costs, crop costs, and the total number of property and crop costs are in one column with their corresponding values. This helps us create the plot easier.

economic_table <- melt(total_cost , id.vars="EVTYPE", variable.name = "cost_type")

Here’s a barplot of top 10 severe weather events affecting the economy according to the highest total cost.

library(ggplot2)
colors <- c("#1887ab", "#ea8553", "#eec76b")  
economic_chart <- ggplot(economic_table, aes(fill = cost_type, y = value, x = reorder(EVTYPE, -value))) +
  geom_bar(position = "dodge", stat = "identity", color="black") +
  labs(title = "Top 10 Most Severe Weather Conditions with Greatest Economic Consequences",
       y = "Frequency", x = "Event Type") +
  theme(axis.text.x = element_text(angle = 90), legend.title = element_blank()) +
  scale_fill_manual(values = colors, labels = c("Total Cost", "Property Cost", "Crops Cost"))
economic_chart

From the barplot, flood incurred the most cost among the severe weather events. Therefore, it has the greatest economic consequence.