Analysis of U.S. Storm Event Data and the Impact on Population Health and the Economy

Suhas P K

2023-05-17

Course Project

Reproducible Research : Project 2 The course project is available on Reproducible Research : Project 2

Synopsis

Storm and other extreme weather events can cause both public health and economic problems. Many extreme events results in fatalities, injuries, and property damage. Preventing such outcomes to the extent possible is a key concern. The published report contains the result of analysis where the goal was to identify the most hazardous weather events with respect to population health and those with greatest economic impact in the U.S. based on data collected from U.S. National Oceanic and Atmospheric Administration (NOAA).

Libraries used

if (!require(ggplot2)){
    install.packages("ggplot")
    library(ggplot2)
}
## Loading required package: ggplot2
if (!require(dplyr)) {
    install.packages("dplyr")
    library(dplyr, warn.conflicts = FALSE)
}
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
if (!require(xtable)) {
    install.packages("xtable")
    library(xtable, warn.conflicts = FALSE)
}
## Loading required package: xtable
if (!require(htmlTable)){
    install.packages("htmlTables")
    library(htmlTable, warn.conflicts = FALSE)
}
## Loading required package: htmlTable
## Warning: package 'htmlTable' was built under R version 4.2.3

To display session information,

sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_India.utf8  LC_CTYPE=English_India.utf8   
## [3] LC_MONETARY=English_India.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=English_India.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] htmlTable_2.4.1 xtable_1.8-4    dplyr_1.1.0     ggplot2_3.4.1  
## 
## loaded via a namespace (and not attached):
##  [1] bslib_0.4.1       compiler_4.2.2    pillar_1.8.1      jquerylib_0.1.4  
##  [5] rmdformats_1.0.4  tools_4.2.2       digest_0.6.30     checkmate_2.2.0  
##  [9] jsonlite_1.8.4    evaluate_0.18     lifecycle_1.0.3   tibble_3.1.8     
## [13] gtable_0.3.1      pkgconfig_2.0.3   rlang_1.0.6       cli_3.6.0        
## [17] rstudioapi_0.14   yaml_2.3.6        xfun_0.39         fastmap_1.1.0    
## [21] withr_2.5.0       stringr_1.5.0     knitr_1.40        htmlwidgets_1.6.1
## [25] generics_0.1.3    vctrs_0.5.2       sass_0.4.2        tidyselect_1.2.0 
## [29] grid_4.2.2        glue_1.6.2        R6_2.5.1          fansi_1.0.3      
## [33] rmarkdown_2.17    bookdown_0.34     magrittr_2.0.3    backports_1.4.1  
## [37] scales_1.2.1      htmltools_0.5.4   colorspace_2.1-0  utf8_1.2.2       
## [41] stringi_1.7.8     munsell_0.5.0     cachem_1.0.6

Obtaining dataset.

# Url and path
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- paste(getwd(), "stormdata.csv", sep = "/")

# DOWNLOAD THE DATASET
if (!file.exists(destfile)){
    download.file(url = url,
              destfile = destfile,)
}

Loading the datasetand displaying the dataset summary.

# READ THE .csv FILE
storm_Data <- read.csv("stormdata.csv")
names(storm_Data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
str(storm_Data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
head(storm_Data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

To view the dataset,

head(storm_Data,10)
##    STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1        1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2        1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3        1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4        1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5        1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6        1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
## 7        1 11/16/1951 0:00:00     0100       CST      9     BLOUNT    AL
## 8        1  1/22/1952 0:00:00     0900       CST    123 TALLAPOOSA    AL
## 9        1  2/13/1952 0:00:00     2000       CST    125 TUSCALOOSA    AL
## 10       1  2/13/1952 0:00:00     2000       CST     57    FAYETTE    AL
##     EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1  TORNADO         0                                               0         NA
## 2  TORNADO         0                                               0         NA
## 3  TORNADO         0                                               0         NA
## 4  TORNADO         0                                               0         NA
## 5  TORNADO         0                                               0         NA
## 6  TORNADO         0                                               0         NA
## 7  TORNADO         0                                               0         NA
## 8  TORNADO         0                                               0         NA
## 9  TORNADO         0                                               0         NA
## 10 TORNADO         0                                               0         NA
##    END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1          0                      14.0   100 3   0          0       15    25.0
## 2          0                       2.0   150 2   0          0        0     2.5
## 3          0                       0.1   123 2   0          0        2    25.0
## 4          0                       0.0   100 2   0          0        2     2.5
## 5          0                       0.0   150 2   0          0        2     2.5
## 6          0                       1.5   177 2   0          0        6     2.5
## 7          0                       1.5    33 2   0          0        1     2.5
## 8          0                       0.0    33 1   0          0        0     2.5
## 9          0                       3.3   100 3   0          1       14    25.0
## 10         0                       2.3   100 3   0          0        0    25.0
##    PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1           K       0                                         3040      8812
## 2           K       0                                         3042      8755
## 3           K       0                                         3340      8742
## 4           K       0                                         3458      8626
## 5           K       0                                         3412      8642
## 6           K       0                                         3450      8748
## 7           K       0                                         3405      8631
## 8           K       0                                         3255      8558
## 9           K       0                                         3334      8740
## 10          K       0                                         3336      8738
##    LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1        3051       8806              1
## 2           0          0              2
## 3           0          0              3
## 4           0          0              4
## 5           0          0              5
## 6           0          0              6
## 7           0          0              7
## 8           0          0              8
## 9        3336       8738              9
## 10       3337       8737             10

Data Processing

Creating a subset of the dataset

For this analysis, the dataset will be reduced to only include the necessary variables. Only onservations with values greater than zero will be included.<

Variable Description
EVTYPE Event type (Flood, Heat, Hurricane, Tornado, …)
FATALITIES Number of fatalities resulting from event
INJURIES Number of injuries resulting from event
PROPDMG Property damage in USD
PROPDMGEXP Unit multiplier for property damage (K, M, or B)
CROPDMG Crop damage in USD
CROPDMGEXP Unit multiplier for property damage (K, M, or B)
BGN_DATE Begin date of the event
END_DATE End date of the event
STATE State where the event occurred
# TIDY DATA FROM STORM DATA
tidy_data <- subset(
    storm_Data, EVTYPE != "?" &
        (FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0),
    select = c("EVTYPE",
               "FATALITIES",
               "INJURIES", 
               "PROPDMG",
               "PROPDMGEXP",
               "CROPDMG",
               "CROPDMGEXP",
               "BGN_DATE",
               "END_DATE",
               "STATE"
               )
)

To check the dimension of the tidy_data.

dim(tidy_data)
## [1] 254632     10

Just to check if any NA values exists in the dataset.

sum(is.na(tidy_data))
## [1] 0

Data Cleaning, most important and boring part but very very IMPORTANT!

Clean Event type data.

There are a total of487 unique Eventy Type values in teh current tidy data.

length(unique(tidy_data$EVTYPE))
## [1] 487

Exploring the Event Type data revealed many values that appeared to be similar; however, they were entered with different spellings, pluralization, mixed case and even misspellings. For example, Strong Wind, STRONG WIND, Strong Winds, and STRONG WINDS. The dataset was normalized by converting all Event Type values to uppercase and combining similar Event Type values into unique categories.

# converting all Event Type values to uppercase and combining similar Event Type values into unique categories.
tidy_data$EVTYPE <- toupper(tidy_data$EVTYPE)

A better approach can be taken but this is fool proof, which not a bad thing but also not a good practice in programming.

# Avalanche
tidy_data$EVTYPE <- gsub('.*AVALANCE.*', 'AVALANCHE', tidy_data$EVTYPE)

# Blizzard
tidy_data$EVTYPE <- gsub('.*BLIZZARD.*', 'BLIZZARD', tidy_data$EVTYPE)

# Cloud
tidy_data$EVTYPE <- gsub('.*CLOUD.*', 'CLOUD', tidy_data$EVTYPE)

# Cold
tidy_data$EVTYPE <- gsub('.*COLD.*', 'COLD', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*FREEZ.*', 'COLD', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*FROST.*', 'COLD', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*ICE.*', 'COLD', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*LOW TEMPERATURE RECORD.*', 'COLD',tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*LO.*TEMP.*', 'COLD',tidy_data$EVTYPE)

# Dry
tidy_data$EVTYPE <- gsub('.*DRY.*', 'DRY',tidy_data$EVTYPE)

# Dust
tidy_data$EVTYPE <- gsub('.*DUST.*', 'DUST',tidy_data$EVTYPE)

# Fire
tidy_data$EVTYPE <- gsub('.*FIRE.*', 'FIRE', tidy_data$EVTYPE)

# flood
tidy_data$EVTYPE <- gsub('.*FLOOD.*', 'FLOOD', tidy_data$EVTYPE)

# Fog
tidy_data$EVTYPE <- gsub('.*FOG.*', 'FOG', tidy_data$EVTYPE)

# Hail
tidy_data$EVTYPE <- gsub('.*HAIL.*', 'HAIL', tidy_data$EVTYPE)

# HEAT
tidy_data$EVTYPE <- gsub('.*HEAT.*', 'HEAT', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*WARM.*', 'HEAT', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*HIGH.*TEMP.*', 'HEAT', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*RECORD HIGH TEMPERATURES.*', 'HEAT', tidy_data$EVTYPE)

# Hypothermia 
tidy_data$EVTYPE <- gsub('.*HYPOTHERMIA.*', 'HYPOTHERMIA/EXPOSURE',tidy_data$EVTYPE)

#Landslide
tidy_data$EVTYPE <- gsub('.*LANDSLIDE.*', 'LANDSLIDE', tidy_data$EVTYPE)

# Lightning
tidy_data$EVTYPE <- gsub('^LIGHTNING.*', 'LIGHTNING', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('^LIGNTNING.*', 'LIGHTNING', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('^LIGHTING.*', 'LIGHTNING', tidy_data$EVTYPE)

# Microburst
tidy_data$EVTYPE <- gsub('.*MICROBURST.*', 'MICROBURST', tidy_data$EVTYPE)

# Mudslide
tidy_data$EVTYPE <- gsub('.*MUDSLIDE.*', 'MUDSLIDE', tidy_data$EVTYPE)

# Rain
tidy_data$EVTYPE <-gsub('.*RAIN.*', 'RAIN', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*MUD SLIDE.*', 'MUDSLIDE', tidy_data$EVTYPE)

# Rip current
tidy_data$EVTYPE <- gsub('.*RIP CURRENT.*', 'RIP CURRENT', tidy_data$EVTYPE)

# Storm
tidy_data$EVTYPE <- gsub('.*STORM.*', 'STORM', tidy_data$EVTYPE)

# Summary
tidy_data$EVTYPE <- gsub('.*SUMMARY.*', 'SUMMARY', tidy_data$EVTYPE)

# Tornado
tidy_data$EVTYPE <- gsub('.*TORNADO.*', 'TORNADO', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*TORNDAO.*', 'TORNADO', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*LANDSPOUT.*', 'TORNADO', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*WATERSPOUT.*', 'TORNADO', tidy_data$EVTYPE)

# Surf
tidy_data$EVTYPE <- gsub('.*SURF.*', 'SURF', tidy_data$EVTYPE)

# Volcanic
tidy_data$EVTYPE <- gsub('.*VOLCANIC.*', 'VOLCANIC', tidy_data$EVTYPE)

# Wet
tidy_data$EVTYPE <- gsub('.*WET.*', 'WET', tidy_data$EVTYPE)

# Wind
tidy_data$EVTYPE <- gsub('.*WIND.*', 'WIND', tidy_data$EVTYPE)

# Winter
tidy_data$EVTYPE <- gsub('.*WINTER.*', 'WINTER', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*WINTRY.*', 'WINTER', tidy_data$EVTYPE)
tidy_data$EVTYPE <- gsub('.*SNOW.*', 'WINTER', tidy_data$EVTYPE)

After tidying the dataset, the number of unique Event Type values were reduced to 81.

# Number of unique event type value
length(unique(tidy_data$EVTYPE))
## [1] 81

Cleaning Date type data.

Format date variables for any type of optional reporting or further analysis.

In the raw dataset, the BNG_START and END_DATE variables are stored as factors which should be made available as actual date types that can be manipulated and reported on. For now, time variables will be ignored.

Create four new variables based on date variables in the tidy dataset:

Variable Description
DATE_START Begin date of the event stored as a date type
DATE_END End date of the event stored as a date type
YEAR Year the event started
DURATION Duration (in hours) of the event
# Cleaning Date data
tidy_data$DATE_START <- as.Date(tidy_data$BGN_DATE, format = "%m/%d/%Y")
tidy_data$DATE_END <- as.Date(tidy_data$END_DATE, format = "%m/%d/%Y")
tidy_data$YEAR <- as.integer(format(tidy_data$DATE_START, "%Y"))
tidy_data$DURATION <- as.numeric(tidy_data$DATE_END - tidy_data$DATE_START)/3600

Cleaning Economic data

According to the “National Weather Service Storm Data Documentation” (page 12), information about Property Damage is logged using two variables: PROPDMG and PROPDMGEXP. PROPDMG is the mantissa (the significand) rounded to three significant digits and PROPDMGEXP is the exponent (the multiplier). The same approach is used for Crop Damage where the CROPDMG variable is encoded by the CROPDMGEXP variable. The documentation also specifies that the PROPDMGEXP and CROPDMGEXP are supposed to contain an alphabetical character used to signify magnitude and logs “K” for thousands, “M” for millions, and “B” for billions. A quick review of the data, however, shows that there are several other characters being logged.

# Cleaning Economic Data
table(toupper(tidy_data$PROPDMGEXP))
## 
##             -      +      0      2      3      4      5      6      7      B 
##  11585      1      5    210      1      1      4     18      3      3     40 
##      H      K      M 
##      7 231427  11327
table(toupper(tidy_data$CROPDMGEXP))
## 
##             ?      0      B      K      M 
## 152663      6     17      7  99953   1986

n order to calculate costs, the PROPDMGEXP and CROPDMGEXP variables will be mapped to a multiplier factor which will then be used to calculate the actual costs for both property and crop damage. Two new variables will be created to store damage costs:

  • PROP_COST

  • CROP_COST

# function to get multiplier factor
getMultiplier <- function(exp) {
    exp <- toupper(exp);
    if (exp == "")  return (10^0);
    if (exp == "-") return (10^0);
    if (exp == "?") return (10^0);
    if (exp == "+") return (10^0);
    if (exp == "0") return (10^0);
    if (exp == "1") return (10^1);
    if (exp == "2") return (10^2);
    if (exp == "3") return (10^3);
    if (exp == "4") return (10^4);
    if (exp == "5") return (10^5);
    if (exp == "6") return (10^6);
    if (exp == "7") return (10^7);
    if (exp == "8") return (10^8);
    if (exp == "9") return (10^9);
    if (exp == "H") return (10^2);
    if (exp == "K") return (10^3);
    if (exp == "M") return (10^6);
    if (exp == "B") return (10^9);
    return (NA);
}

# calculate property damage and crop damage costs (in billions)
tidy_data$PROP_COST <- with(tidy_data, as.numeric(PROPDMG) * sapply(PROPDMGEXP, getMultiplier))/10^9
tidy_data$CROP_COST <- with(tidy_data, as.numeric(CROPDMG) * sapply(CROPDMGEXP, getMultiplier))/10^9

Summarize Data

Create a summarized dataset of health impact data (fatalities + injuries). Sort the results in descending order by health impact.

# Create a summarise data of health impact data (fatalities + injuries). 
health_impact_data <- aggregate(
    x = list(HEALTH_IMPACT = tidy_data$FATALITIES + tidy_data$INJURIES),
    by = list(EVENT_TYPE = tidy_data$EVTYPE),
    FUN = sum,
    na.rm = TRUE
)
health_impact_data <- health_impact_data[order(health_impact_data$HEALTH_IMPACT,
                                               decreasing = TRUE),]
row.names(health_impact_data) <- NULL

Create a summarized dataset of damage impact costs (property damage + crop damage). Sort the results in descending order by damage cost.

# Create a summarise dataset of damage impact cost (property damage + crop damage).
damage_costImpact_data <- aggregate(
    x = list(DAMAGE_IMPACT = tidy_data$PROP_COST + tidy_data$CROP_COST),
    by = list(EVENT_TYPE = tidy_data$EVTYPE),
    FUN = sum,
    na.rm = TRUE
)
damage_costImpact_data <- damage_costImpact_data[order(damage_costImpact_data$DAMAGE_IMPACT,
                                                       decreasing = TRUE),]
row.names(damage_costImpact_data) <- NULL

Result

Event Types Most Harmful to Population Health

Fatalities and injuries have the most harmful impact on population health. The results below display the 10 most harmful weather events in terms of population health in the U.S.

# Event types most harmful to population health.
htmlTable(head(health_impact_data,10),
          caption = "Top 10 Weather Events with Most harmful events to population health. "
          
          )
Top 10 Weather Events with Most harmful events to population health.
EVENT_TYPE HEALTH_IMPACT
1 TORNADO 97075
2 HEAT 12392
3 FLOOD 10127
4 WIND 9893
5 LIGHTNING 6049
6 STORM 4780
7 COLD 3100
8 WINTER 1924
9 FIRE 1698
10 HAIL 1512

For dark themed plot use,

library(ggdark)

Code for Plotting.

health_impact_chart <- ggplot(head(health_impact_data,10),
                              aes(x=reorder(EVENT_TYPE, HEALTH_IMPACT),
                                  y = HEALTH_IMPACT, fill = EVENT_TYPE)) + 
    coord_flip() + 
    geom_bar(stat = "identity") + 
    xlab("Event Type") + 
    ylab("Total Fatalities and Injuries") + 
    theme(plot.title = element_text(size = 14, hjust = 0.5)) + 
    ggtitle("Top 10 Weather Events Most Harmful to \nPopulation Health")+
    dark_theme_grey()
## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().
print(health_impact_chart)

Event Types with Greatest Economic Consequences

Property and crop damage have the most harmful impact on the economy. The results below display the 10 most harmful weather events in terms economic consequences in the U.S.

htmlTable(head(damage_costImpact_data,10),
          caption = "Top 10 Weather Events with Greatest Economic Consequences"
          
          )
Top 10 Weather Events with Greatest Economic Consequences
EVENT_TYPE DAMAGE_IMPACT
1 FLOOD 180.58155793491
2 HURRICANE/TYPHOON 71.9137128
3 STORM 70.4499388701
4 TORNADO 57.4278510465
5 HAIL 20.7372044097
6 DROUGHT 15.018672
7 HURRICANE 14.61022901
8 COLD 12.69943621
9 WIND 12.005544868
10 FIRE 8.90491013

Code for plotting.

damage_costImpact_chart <- ggplot(
    head(damage_costImpact_data,10),
    aes(x = reorder(EVENT_TYPE, DAMAGE_IMPACT),
        y = DAMAGE_IMPACT, fill = EVENT_TYPE)) + 
    coord_flip() + 
    geom_bar(stat = 'identity') + 
    xlab("Event Type") + 
    ylab("Total property/ Crop damage cost \n(in Billions)") + 
    theme(plot.title = element_text(size = 14, hjust = 0.7 )) + 
    ggtitle("Top 10 Weather Events with \nGreatest Economic Consequences")+
    dark_theme_grey()
print(damage_costImpact_chart)

Conclusion

Based on the evidence demonstrated in this analysis and supported by the included data and graphs, the following conclusions can be drawn:

  • Which types of weather events are most harmful to population health? Tornadoes are responsible for the greatest number of fatalities and injuries.

  • Which types of weather events have the greatest economic consequences? Floods are responsible for causing the most property damage and crop damage costs.