Pertinent system/program information at the time this report was generated

sessionInfo()
## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.35     R6_2.5.1          fastmap_1.1.1     xfun_0.43        
##  [5] cachem_1.0.8      knitr_1.46        htmltools_0.5.8.1 rmarkdown_2.26   
##  [9] lifecycle_1.0.4   cli_3.6.2         sass_0.4.9        jquerylib_0.1.4  
## [13] compiler_4.3.1    rstudioapi_0.16.0 tools_4.3.1       evaluate_0.23    
## [17] bslib_0.7.0       yaml_2.3.8        rlang_1.1.3       jsonlite_1.8.8

Synopsis

This project involves a storm data set sourced from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This dataset catalogs weather events across the united states (from 1950 to November 2011) and (among other things) details the fatalities, injuries, property damage and crop damage for each event therein. We are interested in determined which classes of weather events have 1) the elicit the greatest 1) effects on population health (through fatalities and injuries) and 2) economic losses (through property and crop damage).

Data Processing

Loading the data and cutting down to include only the pertinent variables

We begin by reading the data into R and getting a sense of its scope.

#```{r import-data, cache=TRUE}
storm_data <- read.csv("repdata_data_StormData.csv.bz2")
head(storm_data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

The data set is extremely large, and for the purpose at hand, we only need to keep consider the variables EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP.

cut_down <- as.data.frame(cbind(storm_data$EVTYPE, storm_data$FATALITIES, 
  storm_data$INJURIES, storm_data$PROPDMG, storm_data$PROPDMGEXP, 
  storm_data$CROPDMG, storm_data$CROPDMGEXP))

colnames(cut_down) <- c("event_type", "fatalities", "injuries",
  "property_damage", "property_damage_exp", "crop_damage", "crop_damage_exp")

cut_down$fatalities <- as.numeric(cut_down$fatalities)
cut_down$injuries <- as.numeric(cut_down$injuries)
cut_down$property_damage <- as.numeric(cut_down$property_damage)
cut_down$crop_damage <- as.numeric(cut_down$crop_damage)

head(cut_down)
##   event_type fatalities injuries property_damage property_damage_exp
## 1    TORNADO          0       15            25.0                   K
## 2    TORNADO          0        0             2.5                   K
## 3    TORNADO          0        2            25.0                   K
## 4    TORNADO          0        2             2.5                   K
## 5    TORNADO          0        2             2.5                   K
## 6    TORNADO          0        6             2.5                   K
##   crop_damage crop_damage_exp
## 1           0                
## 2           0                
## 3           0                
## 4           0                
## 5           0                
## 6           0

Extracting the explicit monetary value of property and crop damage

The property and and crop damage amounts are coded through the variables “property_damage”, “property_damage_exp”, (the mantissas) and “crop_damage”, “crop_damage_exp” (the exponents). The “exp” variables have the following unique values.

uniquepd <- unique(cut_down$property_damage_exp)
uniquecd <- unique(cut_down$crop_damage_exp)

uniquepd
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
uniquecd
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

“B” is one billion, M” or “m” is one million, “K” or “k” is one thousand, and “H” is one hundred. It is reasonable(?) to assume other units code for a unit of one. We use this conversion to append the variable economic_loss = property_damage_value + crop_damage_value. More simply,we also attach total_health_incidences = fatalities + injuries.

new_pd_exp = cut_down$property_damage_exp
new_cd_exp = cut_down$crop_damage_exp

number = length(cut_down[,1])

for (idx in 1:number){
  
  if(cut_down$property_damage_exp[idx] == "K"){
    new_pd_exp[idx] = 3
  }
  
  else if(cut_down$property_damage_exp[idx] == "M" | cut_down$property_damage_exp[idx] == "m"){
    new_pd_exp[idx] = 6
  }
  
  else if(cut_down$property_damage_exp[idx] == "B"){
    new_pd_exp[idx] = 9
  }
  
  else if(cut_down$property_damage_exp[idx] == "H" | cut_down$property_damage_exp[idx] == "h"){
    new_pd_exp[idx] = 2
  }
  
  else{
    new_pd_exp[idx] = 0
  }
  
}

new_pd_exp <- as.numeric(new_pd_exp)

for (idx in 1:number){
  
  if(cut_down$crop_damage_exp[idx] == "K" | cut_down$property_damage_exp[idx] == "k"){
    new_cd_exp[idx] = 3
  }
  
  else if(cut_down$crop_damage_exp[idx] == "M" | cut_down$crop_damage_exp[idx] == "m"){
    new_cd_exp[idx] = 6
  }
  
  else if(cut_down$crop_damage_exp[idx] == "B"){
    new_cd_exp[idx] = 9
  }
  
  else{
    new_cd_exp[idx] = 0
  }
}

new_cd_exp <- as.numeric(new_cd_exp)

property_damage_value = cut_down$property_damage*10^(new_pd_exp)
crop_damage_value = cut_down$crop_damage*10^(new_cd_exp)

useful_data <-as.data.frame(cbind(cut_down$event_type, cut_down$fatalities,
                                  cut_down$injuries, property_damage_value,
                                  crop_damage_value))

colnames(useful_data) <- c("event_type", "fatalities", "injuries",
   "property_damage_value", "crop_damage_value")

useful_data$fatalities <- as.numeric(useful_data$fatalities)
useful_data$injuries <- as.numeric(useful_data$injuries)
total_health_incidences = useful_data$fatalities + useful_data$injuries
 
useful_data$property_damage_value <- as.numeric(useful_data $property_damage_value)
useful_data$crop_damage_value <- as.numeric(useful_data $crop_damage_value)
economic_loss = useful_data$property_damage_value + crop_damage_value
 
useful_data <-as.data.frame(cbind(useful_data,total_health_incidences,economic_loss))

colnames(useful_data) <- c("event_type", "fatalities", "injuries",
                           "property_damage_value", "crop_damage_value",
                           "total_health_incidences","economic_loss")

head(useful_data)
##   event_type fatalities injuries property_damage_value crop_damage_value
## 1    TORNADO          0       15                 25000                 0
## 2    TORNADO          0        0                  2500                 0
## 3    TORNADO          0        2                 25000                 0
## 4    TORNADO          0        2                  2500                 0
## 5    TORNADO          0        2                  2500                 0
## 6    TORNADO          0        6                  2500                 0
##   total_health_incidences economic_loss
## 1                      15         25000
## 2                       0          2500
## 3                       2         25000
## 4                       2          2500
## 5                       2          2500
## 6                       6          2500

RESULTS

When we compute the total health incidences grouped by weather event type, we see that “Tornados” are by far the most dangerous, followed by “Exessive Heat” in a distant second place.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.2
twod_vec <- aggregate(useful_data$total_health_incidences,
                      by=list(useful_data$event_type),sum)

colnames(twod_vec) <- c("event_type","total_health_incidences")

twod_vec <- twod_vec[order(-twod_vec$total_health_incidences),]

ggplot(data=twod_vec[1:15,], aes(x=event_type,y=total_health_incidences))+
  geom_bar(stat="identity")+
  theme(axis.text.x = element_text(angle=90, vjust=.5, hjust=1))+
  xlab("Type of weather event") + ylab("Total health incidences")

When we compute the total economic loss grouped by weather event type, we see that “Flooding” is the most significant contributor, followed by “Hurricane/Typhoon” and “Tornado”.

twod_vec <- aggregate(useful_data$economic_loss,
                      by=list(useful_data$event_type),sum)

colnames(twod_vec) <- c("event_type","economic_loss")

twod_vec <- twod_vec[order(-twod_vec$economic_loss),]

ggplot(data=twod_vec[1:15,], aes(x=event_type,y=economic_loss))+
  geom_bar(stat="identity")+
  theme(axis.text.x = element_text(angle=90, vjust=.5, hjust=1))+
  xlab("Type of weather event") + ylab("Economic loss in dollars")