Synopsis This analysis presents the results of the most harmful events with respect to population health and those with the greatest economic consequences. The top 15 are prioritized for the outcomes. Tornado is by far the most harmful with respect to population heath (96979 fatalities and injuries). The economic burden is lead by flood that is much more important than the others.

Data processing The loading of the dataset starts with importing the dataset directement from the internet. The second step consists in unzipping the file using a program installed on the computer. There may be a need to adapt the code to be able to run the second step. One will need to find the path of the ‘.exe’ file used to open zip file and break it down like it appears in the following command : executable <- file.path(“C:”, “Program Files”, “WinRAR”, “WinRAR.exe”)

setwd("G:/My Drive/From Dropbox/Training/Data science specialization/Assignments/Project 2")

#Load the data

fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, file.path("./", "repdata_data_StormData.bz2"))


# Unzip using WinRAR utility on Windows 11:
executable <- file.path("C:", "Program Files", "WinRAR", "WinRAR.exe")
cmd <- paste(paste0("\"", executable, "\""), "x",
             paste0("\"", file.path("./", "repdata_data_StormData.zip"), "\""))
system(cmd)
## [1] 1
data <- read.csv("repdata_data_StormData.csv", sep = ",")

#Create a folder for figures
if(!dir.exists("figure") == TRUE) dir.create("figure")

Results

To find the most harmful events with respect to population health, we compute the sum of fatalities and injuries and save it under variable pophealth. Then, we compute this total by event and sort them in descending order to select the top 15.

#Sum of fatalities and injuries
data$pophealth <- data$INJURIES + data$FATALITIES
pophealth <- (tapply(data$pophealth, data$EVTYPE, sum))
str(pophealth)
##  num [1:985(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "dimnames")=List of 1
##   ..$ : chr [1:985] "   HIGH SURF ADVISORY" " COASTAL FLOOD" " FLASH FLOOD" " LIGHTNING" ...
dim(pophealth)
## [1] 985
pophealth <-as.data.frame(pophealth) 
pophealth$evtype <- rownames(pophealth)
rownames(pophealth) <- NULL
pophealth <- pophealth %>%
  arrange(desc(pophealth))
pophealth <- pophealth %>% select(evtype, pophealth)
top_15_ph <- pophealth[1:15, ]
  1. The top 15 of the events with the highest fatalities and injuries
print(top_15_ph)
##               evtype pophealth
## 1            TORNADO     96979
## 2     EXCESSIVE HEAT      8428
## 3          TSTM WIND      7461
## 4              FLOOD      7259
## 5          LIGHTNING      6046
## 6               HEAT      3037
## 7        FLASH FLOOD      2755
## 8          ICE STORM      2064
## 9  THUNDERSTORM WIND      1621
## 10      WINTER STORM      1527
## 11         HIGH WIND      1385
## 12              HAIL      1376
## 13 HURRICANE/TYPHOON      1339
## 14        HEAVY SNOW      1148
## 15          WILDFIRE       986

The pie chart below shows the distribution of the fatalities and injuries across the top 15 events. The value labels were not added to avoid overlapping. We can cleary see that the most harmful is TORNADO.

#Define palette colors for the top 15
custom_colors <- c("TORNADO"="#669E40", "EXCESSIVE HEAT"="#FFD966", "TSTM WIND"="#FFD966",  "FLOOD"="#FFD966",  "LIGHTNING"="#FFD966",  "HEAT"="#FFD966",   "FLASH FLOOD"="#FFD966",    "ICE STORM"="#FFD966",  "THUNDERSTORM WIND"="#FD8D77",  "WINTERS STORM"="#FD8D77",  "HIGH WIND"="#FD8D77",  "HAIL"="#FD8D77",   "HURRICANE/TYPHOON"="#FD8D77",  "HEAVY SNOW"="#EB0335", "WILDFIRE"="#EB0335")

ggplot(top_15_ph, aes(x = "", y = pophealth, fill = evtype)) +
  geom_bar(stat = "identity", width = 1) +   # Create a bar chart
  coord_polar(theta = "y") +                 # Convert to a pie chart
  theme_void() +                              # Remove unnecessary grid elements
  labs(fill = "Events", title = "Distribution of the top 15 events with respect to the impact on the population health") +
  scale_fill_manual(values = custom_colors)    # Add custom colors

  1. Type of events with the greatest economic consequences

To find the events with the greatest economic consequences, we set the cost on the same scale prior to compute the cost for each event. Then, we sort them in descending order and select the top 15. And we use barchart to visualize it. Flood is by far the event with the greatest economic consequences. Then the second most import can be the group of huricane/typhon, tornado and storm surge. With flood, they represent around 75% of the economic burden.They could be prioritized for resources allocation.

data$costmag <- data$PROPDMGEXP
data$costmag[data$costmag=="K"] <- 1000
data$costmag[data$costmag=="M"] <- 1000000
data$costmag[data$costmag=="B"] <- 1000000000
data$costmag <- as.numeric(data$costmag)
## Warning: NAs introduced by coercion
data$costmag[is.na(data$costmag) ] <- 0
data$ecoconsq <- data$PROPDMG*data$costmag
eco_dmg <- tapply(data$ecoconsq, data$EVTYPE, sum)
eco_dmg <- as.data.frame(eco_dmg)
eco_dmg$evtype <- rownames(eco_dmg)
rownames(eco_dmg) <- NULL
eco_dmg <- eco_dmg %>%
  select(evtype, eco_dmg)
eco_dmg <- eco_dmg %>%
  arrange(desc(eco_dmg))
top_15_cost <- eco_dmg[1:15, ]
top_15_cost <- top_15_cost %>%
  arrange(desc(eco_dmg))

print(top_15_cost)
##               evtype      eco_dmg
## 1              FLOOD 144657709800
## 2  HURRICANE/TYPHOON  69305840000
## 3            TORNADO  56925660991
## 4        STORM SURGE  43323536000
## 5        FLASH FLOOD  16140812087
## 6               HAIL  15727366870
## 7          HURRICANE  11868319010
## 8     TROPICAL STORM   7703890550
## 9       WINTER STORM   6688497250
## 10         HIGH WIND   5270046260
## 11       RIVER FLOOD   5118945500
## 12          WILDFIRE   4765114000
## 13  STORM SURGE/TIDE   4641188000
## 14         TSTM WIND   4484928440
## 15         ICE STORM   3944927810
ggplot(top_15_cost, aes(x = reorder(evtype, -eco_dmg), y = eco_dmg))+
  geom_bar(stat = "identity")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  labs(x = "Events", y = "Economic consequences", title = "Ranking of the top 15 events with the greatest economic consequences")