Data Analysis: Effects of Severe Weather Events

Synopsis

This data analysis uses data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, collected from 1950 to November 2011.

The goal of the data analysis is to investigate the effects of severe weather events on population health and economy.

We found that the weather events that cause the most harm to population health are tornados, as they caused the most fatalities, deaths and overall casualties. The weather events that caused the most severe agricultural damages are drought. The weather events that caused the most property damages and combined monetary damages are floods.

Data Processing

Load Data

From the course website, we downloaded the dataset provided by U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, and loaded the raw data into R.

file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(file_url, destfile="StormData.csv.bz2")
stormraw <- read.csv("StormData.csv.bz2")

After reading the raw dataset, we check the dataset dimension and column headers.

dim(stormraw)
## [1] 902297     37
names(stormraw)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Analysis for Population Health

To analyze the impact of weather events on population health, we create a new variable casualties which is the sum of fatalities and injury.

##subset for Fatalities and Injuries
storm_population <- stormraw[, c("EVTYPE", "FATALITIES", "INJURIES")]
##calculate Casualties= FATALITEIS+INJURIES
storm_population$Casualties <- storm_population$FATALITIES + storm_population$INJURIES
##confirm that there is no missing values in this subset
sum(is.na(storm_population))
## [1] 0

We then grouped the data by event types and evaluated the total deaths, injuries and casualties for each type of events.

##group data by event types
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
storm_casualties_bytype <- storm_population %>%
        group_by(EVTYPE) %>%
        summarise(occurence=n(),
                  fatalities_total=sum(FATALITIES),
                  injuries_total=sum(INJURIES),
                  casualties_total=sum(Casualties),
                  )
##acquire top 5 event types that causes the highest deaths, injuries and casualties
top5fatalities_total <- storm_casualties_bytype %>%
        slice_max(order_by = fatalities_total, n=5)
top5injuries_total <- storm_casualties_bytype %>%
        slice_max(order_by = injuries_total, n=5)
top5casualties_total <- storm_casualties_bytype %>%
        slice_max(order_by = casualties_total, n=5)

print(top5fatalities_total)
## # A tibble: 5 × 5
##   EVTYPE         occurence fatalities_total injuries_total casualties_total
##   <chr>              <int>            <dbl>          <dbl>            <dbl>
## 1 TORNADO            60652             5633          91346            96979
## 2 EXCESSIVE HEAT      1678             1903           6525             8428
## 3 FLASH FLOOD        54277              978           1777             2755
## 4 HEAT                 767              937           2100             3037
## 5 LIGHTNING          15754              816           5230             6046
print(top5injuries_total)
## # A tibble: 5 × 5
##   EVTYPE         occurence fatalities_total injuries_total casualties_total
##   <chr>              <int>            <dbl>          <dbl>            <dbl>
## 1 TORNADO            60652             5633          91346            96979
## 2 TSTM WIND         219940              504           6957             7461
## 3 FLOOD              25326              470           6789             7259
## 4 EXCESSIVE HEAT      1678             1903           6525             8428
## 5 LIGHTNING          15754              816           5230             6046
print(top5casualties_total)
## # A tibble: 5 × 5
##   EVTYPE         occurence fatalities_total injuries_total casualties_total
##   <chr>              <int>            <dbl>          <dbl>            <dbl>
## 1 TORNADO            60652             5633          91346            96979
## 2 EXCESSIVE HEAT      1678             1903           6525             8428
## 3 TSTM WIND         219940              504           6957             7461
## 4 FLOOD              25326              470           6789             7259
## 5 LIGHTNING          15754              816           5230             6046
##The event types that are in any of the above three top 5 rankings are: 
TopEvents_population <- union(union(top5injuries_total$EVTYP, top5fatalities_total$EVTYPE), top5casualties_total$EVTYPE)
## Warning: Unknown or uninitialised column: `EVTYP`.
print(TopEvents_population)
## [1] "TORNADO"        "EXCESSIVE HEAT" "FLASH FLOOD"    "HEAT"          
## [5] "LIGHTNING"      "TSTM WIND"      "FLOOD"

We see that the top 5 rankings of events that caused the most deaths, injuries and overall casualties mostly overlap, and we focus on the 7 event types and appear in these rankings.

##Prepare the dataset for plotting by subsetting for the above 7 top harmful events and reshape to long format
top_population_harm <- subset(storm_casualties_bytype, EVTYPE %in% TopEvents_population)
library(tidyr)
top_population_long <- top_population_harm %>%
        pivot_longer(cols=c("fatalities_total", "injuries_total", "casualties_total"), names_to = "harm", values_to = "total_number")

Analysis for Economic Consequences

To analyze the economic consequences of severe weather events, we look at the property damages, crop damages and total moneytary damages (sum of property and crop damages).

##subset for event types, property damages and crop damages.
storm_econ <- stormraw[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

In this dataset, each property damages are represented as a numeric value (“PROPDMG” and “CROPDMG) and a multiplier (”PROPDMGEXP” and “CROPDMGEXP”). We calculated the actual dollar amount is calculated by multiplying the numeric values by their multiplier. The total monetary damages are the sum of the property and crop damages after multiplier conversions.

##Create a function that read and convert the multiplier columns
unique(storm_econ$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storm_econ$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
multiplier_conversion <- function(multiplier) {
  case_when(
    multiplier == "K" ~ 1e3,
    multiplier == "k" ~ 1e3,
    multiplier == "M" ~ 1e6,
    multiplier == "m" ~ 1e3,
    multiplier == "B" ~ 1e9,
    multiplier == "b" ~ 1e3,
    multiplier == "H" ~ 1e2,
    multiplier == "h" ~ 1e2,
    multiplier == "1" ~ 10,
    multiplier == "2" ~ 1e2,
    multiplier == "3" ~ 1e3,
    multiplier == "4" ~ 1e4,
    multiplier == "5" ~ 1e5,
    multiplier == "6" ~ 1e6,
    multiplier == "7" ~ 1e7,
    multiplier == "8" ~ 1e8,
    TRUE ~ 1  # Default if multiplier is not one of the above
  )
}
##calculate property and crop damages by multiplying the numeric values by their corresponding multipliers
storm_econ$property_dollar <- storm_econ$PROPDMG * multiplier_conversion(storm_econ$PROPDMGEXP)
storm_econ$crop_dollar <- storm_econ$CROPDMG * multiplier_conversion(storm_econ$CROPDMGEXP)
##create add up the property and crop damages
storm_econ$total_dollar <- storm_econ$property_dollar+storm_econ$crop_dollar
##confirm that there is no missing data
sum(is.na(storm_econ))
## [1] 0

We then grouped the data by event types and evaluated the total dollar amount of property damages, crop damages and combined monetary damages.

##group data by event types
storm_econ_bytype <- storm_econ %>%
        group_by(EVTYPE) %>%
        summarise(occurence = n(),
                  property_total = sum(property_dollar),
                  crop_total = sum(crop_dollar),
                  money_total = sum(total_dollar)
        )
##accquire top 5 events types that caused the highest property, crop and overall monetary damages.
top5prop_total <- storm_econ_bytype %>%
        slice_max(order_by = property_total, n=5)
top5crop_total <- storm_econ_bytype %>%
        slice_max(order_by = crop_total, n=5)
top5money <- storm_econ_bytype %>%
        slice_max(order_by = money_total, n=5)
print(top5prop_total)
## # A tibble: 5 × 5
##   EVTYPE            occurence property_total crop_total   money_total
##   <chr>                 <int>          <dbl>      <dbl>         <dbl>
## 1 FLOOD                 25326  144657709807  5661968450 150319678257 
## 2 HURRICANE/TYPHOON        88   69305840000  2607872800  71913712800 
## 3 TORNADO               60652   56935892176.  414953270  57350845446.
## 4 STORM SURGE             261   43323536000        5000  43323541000 
## 5 FLASH FLOOD           54277   16822673978. 1421317100  18243991078.
print(top5crop_total)
## # A tibble: 5 × 5
##   EVTYPE      occurence property_total  crop_total   money_total
##   <chr>           <int>          <dbl>       <dbl>         <dbl>
## 1 DROUGHT          2488    1046106000  13972566000  15018672000 
## 2 FLOOD           25326  144657709807   5661968450 150319678257 
## 3 RIVER FLOOD       173    5118945500   5029459000  10148404500 
## 4 ICE STORM        2006    3944927860   5022113500   8967041360 
## 5 HAIL           288661   15730372413.  3025954473  18756326886.
print(top5money)
## # A tibble: 5 × 5
##   EVTYPE            occurence property_total crop_total   money_total
##   <chr>                 <int>          <dbl>      <dbl>         <dbl>
## 1 FLOOD                 25326  144657709807  5661968450 150319678257 
## 2 HURRICANE/TYPHOON        88   69305840000  2607872800  71913712800 
## 3 TORNADO               60652   56935892176.  414953270  57350845446.
## 4 STORM SURGE             261   43323536000        5000  43323541000 
## 5 HAIL                 288661   15730372413. 3025954473  18756326886.
##The event types that are in any of the above three top 5 rankings are: 
TopEvents_econ <- union(union(top5prop_total$EVTYPE, top5crop_total$EVTYPE), top5money$EVTYPE)  
print(TopEvents_econ)
## [1] "FLOOD"             "HURRICANE/TYPHOON" "TORNADO"          
## [4] "STORM SURGE"       "FLASH FLOOD"       "DROUGHT"          
## [7] "RIVER FLOOD"       "ICE STORM"         "HAIL"

We see that the top 5 rankings by the three criteria have some overlaps and there are nine types of weather events that appear on these rankings.

##prepare the dataset for plotting by subsetting for the nine focus event types and reshape dataset to long format
top_econ_damage <- subset(storm_econ_bytype, EVTYPE %in% TopEvents_econ)
##convert the dollar amount to million
top_econ_damage$property_total_M <- top_econ_damage$property_total/1000000
top_econ_damage$crop_total_M <- top_econ_damage$crop_total/1000000
top_econ_damage$money_total_M <- top_econ_damage$money_total/1000000
top_econ_long <- top_econ_damage %>%
        pivot_longer(cols=c("property_total_M", "crop_total_M", "money_total_M"), names_to = "damage", values_to = "million")

Results

Most harmful event types with respect to population health

To show the harms to population health by severe weather events, we can make a clustered bar plots of the top event types that caused the most death, injuries or combined casualties.

##arrange order of clustered bars to display in order of 1.fatalities, 2.injuries, 3.combined casualties
top_population_long$harm <- factor(top_population_long$harm, levels = c("fatalities_total", "injuries_total", "casualties_total"))
##create clustered barplot
library(ggplot2)
g1 <- ggplot(top_population_long, aes(x=EVTYPE, y=total_number, fill=harm))
g1+geom_bar(stat="identity", position="dodge") +
        labs(x="Events", y="Total Number", fill="Color Legend", title = "Top Harmful Weather Events for Population Health") +
        scale_fill_discrete(labels = c("Fatalities", "Injuries", "Combined Casualties")) + 
        theme_minimal()+
        theme(axis.text.x = element_text(angle = 45, hjust = 1), 
              legend.key.size = unit(0.3,"cm"))

From this barplot, it is apparent that Tornados caused overwhelming the most deaths and injuries among all severe weather events, and therefore is the most harmful to population health.

Most damaging events with respect to economic consequences

To show the economic damages by severe weather events, we can make a clustered bar plots of the top event types that caused the most properties, crops and combined damages. The damage amount is shown in million dollars.

##arrange order of clustered bars to display in order of 1.property damages, 2.crop damages, 3.combined monetary damages
top_econ_long$damage <- factor(top_econ_long$damage, levels = c("property_total_M", "crop_total_M", "money_total_M"))
##create clustered barplot
g2 <- ggplot(top_econ_long, aes(x=EVTYPE, y=million, fill=damage))
g2 + geom_bar(stat="identity", position="dodge") + 
        labs(x="Events", y="Total Damage (million dollars)", fill="Damages", title="Top Damaging Weather Events for Economic Loss ") + 
        scale_fill_discrete(labels = c("Properties", "Crops","Combined Loss" )) + 
        theme_minimal()+
        theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.key.size = unit(0.3,"cm"))

From this barplot, it is apparent that floods caused caused the heaviest economic impact among all severe weather events, and this was mostly driven by the enormous property damages. Droughts, while caused less damages to properties, are the most severe events for agricultural losses.