Introduction / Synopsis

This report explore the NOAA Storm Database and answer some basic questions about severe weather events. I what follows we present two basic information. The first is the human injuries caused by the weather events. Then, we discuss the economic consequences of the same events along the whole time series starting in 1950.

Data Processing

This section describes the loading and data processing in RStudio.

The first step is to downloa and load the data

#required libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#Define the URL and destination of the files
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "storm.bz2"

#Download the zip file
download.file(fileUrl, "storm.bz2", method = "curl")

#Read a bz file as data.frame
storm <- read.csv(bzfile("storm.bz2"))
sd <- storm

I will now convert the data variable to date format in rbase function

#converting date (month/day/year) and time to respective formats using lubridate
sd <- sd |> 
        mutate(BGN_DATE = mdy_hms(BGN_DATE))

Results

The first topic to discuss is to see the most harmful events to the population health. To do so, I use the number of fatalities and the number of injuries as proxies to harmfulness of an event. The, the basic thing to look at is the total number of injuries and fatalities in all years.

harmful <- sd |> 
        group_by(EVTYPE) |> 
        summarize(total_harm = sum(FATALITIES+INJURIES)) |> 
        arrange(desc(total_harm))

print(harmful, n=20)
## # A tibble: 985 × 2
##    EVTYPE             total_harm
##    <chr>                   <dbl>
##  1 TORNADO                 96979
##  2 EXCESSIVE HEAT           8428
##  3 TSTM WIND                7461
##  4 FLOOD                    7259
##  5 LIGHTNING                6046
##  6 HEAT                     3037
##  7 FLASH FLOOD              2755
##  8 ICE STORM                2064
##  9 THUNDERSTORM WIND        1621
## 10 WINTER STORM             1527
## 11 HIGH WIND                1385
## 12 HAIL                     1376
## 13 HURRICANE/TYPHOON        1339
## 14 HEAVY SNOW               1148
## 15 WILDFIRE                  986
## 16 THUNDERSTORM WINDS        972
## 17 BLIZZARD                  906
## 18 FOG                       796
## 19 RIP CURRENT               600
## 20 WILD/FOREST FIRE          557
## # ℹ 965 more rows

We can graph the results to check how the tornado is by far the most harmful weather event in causing health consequences for the US population. For the sake of visualization we show only the 15 most important.

#Create a bar plot to make visualization easier
#selecting only the top 15 events
top_harm <- harmful |> 
        slice_max(total_harm, n = 15) 
#creating the bar plot
plot1 <-ggplot(top_harm, aes(x = reorder(EVTYPE, total_harm), y = total_harm)) +
        geom_bar(stat = "identity") +
        coord_flip() +  # Flip coordinates for better readability
        labs(title = "Total Harm by Event Type",
             x = "Event Type",
             y = "Total Harm") +
        theme_minimal()

print(plot1)

As we can see below, tornado is responsible for more than 60% percent of human harm.

#calculating the impact of each event in %
harmful_perc <- harmful |> 
        mutate(percentage = total_harm/sum(total_harm)*100) |> 
                       mutate(percentage = scales::percent(percentage/100, accuracy = 0.1))

print(harmful_perc, n=15)
## # A tibble: 985 × 3
##    EVTYPE            total_harm percentage
##    <chr>                  <dbl> <chr>     
##  1 TORNADO                96979 62.3%     
##  2 EXCESSIVE HEAT          8428 5.4%      
##  3 TSTM WIND               7461 4.8%      
##  4 FLOOD                   7259 4.7%      
##  5 LIGHTNING               6046 3.9%      
##  6 HEAT                    3037 2.0%      
##  7 FLASH FLOOD             2755 1.8%      
##  8 ICE STORM               2064 1.3%      
##  9 THUNDERSTORM WIND       1621 1.0%      
## 10 WINTER STORM            1527 1.0%      
## 11 HIGH WIND               1385 0.9%      
## 12 HAIL                    1376 0.9%      
## 13 HURRICANE/TYPHOON       1339 0.9%      
## 14 HEAVY SNOW              1148 0.7%      
## 15 WILDFIRE                 986 0.6%      
## # ℹ 970 more rows

Separating Fatalities and Injuries

Let’s look at fatalities and injuries separately creating a plot of top 10 by each one.

## plot graphs showing the top 10 fatalities and injuries

## Procedure = aggregate the top 10 fatalities by the event type and sort the output in descending order

fatalities <- aggregate(FATALITIES ~ EVTYPE, data = sd, FUN = sum)
Top10_Fatalities <- fatalities[order(-fatalities$FATALITIES), ][1:10, ] 
Top10_Fatalities 
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
#Reviewing events that cause the most injuries ( The Top-10 Injuries by Weather Event )
## Procedure = aggregate the top 10 injuries by the event type and sort the output in descending order

Injuries <- aggregate(INJURIES ~ EVTYPE, data = sd, FUN = sum)
Top10_Injuries <- Injuries[order(-Injuries$INJURIES), ][1:10, ] 
Top10_Injuries 
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361
#now the plot
par(mfrow=c(1,2),mar=c(10,2,2,3))
barplot(Top10_Fatalities$FATALITIES,names.arg=Top10_Fatalities$EVTYPE,las=2,col="sienna",ylab="fatalities",main="Top 10 fatalities")
barplot(Top10_Injuries$INJURIES,names.arg=Top10_Injuries$EVTYPE,las=2,col="sienna",ylab="injuries",main="Top 10 Injuries")

Exploring economic consequences of weather events

Which types of events have the greatest economic consequences

The procedures here are calculating the total economic damage couse by two different groups: crops and properties. To do so, we need to attribute diffente values to damages and crops.

# Assigning values for the property exponent Storm Data
unique(sd$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
strmdata <- storm
        
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "K"] <- 1000
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "M"] <- 1e+06
        strmdata$PROPEXP[strmdata$PROPDMGEXP == ""] <- 1
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "B"] <- 1e+09
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "m"] <- 1e+06
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "0"] <- 1
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "5"] <- 1e+05
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "6"] <- 1e+06
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "4"] <- 10000
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "2"] <- 100
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "3"] <- 1000
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "h"] <- 100
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "7"] <- 1e+07
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "H"] <- 100
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "1"] <- 10
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "8"] <- 1e+08   
        
#there are some invalid symbols
        
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "+"] <- 0
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "-"] <- 0
        strmdata$PROPEXP[strmdata$PROPDMGEXP == "?"] <- 0
        
        
        
        #Now I calculate the property damage value
        strmdata$PROPDMGVAL <- strmdata$PROPDMG * strmdata$PROPEXP

Doing the same to damage in crops

unique(strmdata$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
        # Assigning values for the crop exponent strmdata 
        strmdata$CROPEXP[strmdata$CROPDMGEXP == "M"] <- 1e+06
        strmdata$CROPEXP[strmdata$CROPDMGEXP == "K"] <- 1000
        strmdata$CROPEXP[strmdata$CROPDMGEXP == "m"] <- 1e+06
        strmdata$CROPEXP[strmdata$CROPDMGEXP == "B"] <- 1e+09
        strmdata$CROPEXP[strmdata$CROPDMGEXP == "0"] <- 1
        strmdata$CROPEXP[strmdata$CROPDMGEXP == "k"] <- 1000
        strmdata$CROPEXP[strmdata$CROPDMGEXP == "2"] <- 100
        strmdata$CROPEXP[strmdata$CROPDMGEXP == ""] <- 1
        
        # Assigning '0' to invalid exponent strmdata
        strmdata$CROPEXP[strmdata$CROPDMGEXP == "?"] <- 0
        
        # calculating the crop damage 
        strmdata$CROPDMGVAL <- strmdata$CROPDMG * strmdata$CROPEXP

Property Damage Summary

#Procedure = aggregate the property damage by the event type and sort the output it in descending order
        
        prop <- aggregate(PROPDMGVAL~EVTYPE,data=strmdata,FUN=sum,na.rm=TRUE)
        prop <- prop[with(prop,order(-PROPDMGVAL)),]
        prop <- head(prop,10)
        print(prop)
##                EVTYPE   PROPDMGVAL
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56947380616
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822673978
## 244              HAIL  15735267513
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497251
## 359         HIGH WIND   5270046260

Now crop damage Summary

# Procedure = aggregate the crop damage by the event type and sort the output it in descending order
        
        crop <- aggregate(CROPDMGVAL~EVTYPE,data=strmdata,FUN=sum,na.rm=TRUE)
        crop <- crop[with(crop,order(-CROPDMGVAL)),]
        crop <- head(crop,10)
        print(crop)
##                EVTYPE  CROPDMGVAL
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954473
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000

In the plot below we the the final results of the top 10 property and crop damages by weather event types and its economic consequences.

        #Now plotting the result
        
        # Plot of Top 10 Property & Crop damages by Weather Event Types ( Economic Consequences )
        
        ##plot the graph showing the top 10 property and crop damages
        
        par(mfrow=c(1,2),mar=c(11,3,3,2))
        barplot(prop$PROPDMGVAL/(10^9),names.arg=prop$EVTYPE,las=2,col="blue",ylab="Prop.damage(billions)",main="Top10 Prop.Damages")
        barplot(crop$CROPDMGVAL/(10^9),names.arg=crop$EVTYPE,las=2,col="blue",ylab="Crop damage(billions)",main="Top10 Crop.Damages")

Conclusion

We can see that floods are the main cause of economic damage, droughts cause maximum crop damage. Hurricanes is the second main responsible for damages in property while floods are in second in causing crop damage.