Synopsis:

In this analysis, we explore the NOAA Storm Database to gain insights into severe weather events in the United States. The primary goal is to answer two key questions: (1) Which types of events are most harmful to population health? and (2) Which types of events have the greatest economic consequences? By examining the database, we aim to provide valuable information for government and municipal managers responsible for preparing for severe weather events and prioritizing resources. The analysis follows a structured approach, starting with data processing, including loading and cleaning the raw data. We then present the results through visualizations and statistical summaries. The analysis showcases the types of events that pose the highest risk to population health and identifies the events that have the most significant economic impacts. By understanding these patterns, decision-makers can allocate resources efficiently and effectively mitigate the adverse effects of severe weather events.

Data Processing:

To ensure reproducibility and start the analysis from the raw CSV file, we will describe how the data were loaded into R and processed for analysis. The following steps outline the data processing procedure:

1. Loading the Required Packages:

We begin by loading the necessary packages for our analysis. In this case, we will use the tidyverse package, which provides a suite of tools for data manipulation and visualization.

# Load the required packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

2. Loading the Raw Data:

Next, we load the raw CSV file containing the NOAA Storm Database into R. We assume that the data file, named “StormData.csv.bz2,” is located in the current working directory.

# Set the URL of the compressed data file
url  <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

# Set the path where you want to save the decompressed CSV file
output_file <- "C:/Users/arobn/OneDrive/Documents/school/MOOC/Reproducible Research/repdata_data_StormData.csv"

# Download the compressed file
download.file(url, destfile = "repdata_data_StormData.csv.bz2")

# Decompress the file
bz2file <- bzfile("repdata_data_StormData.csv.bz2", "r")
decompressed <- readLines(bz2file, n = -1L)
close(bz2file)

# Write the decompressed data to a CSV file
writeLines(decompressed, con = output_file)

# Read the decompressed CSV file
data <- read.csv(output_file, stringsAsFactors = FALSE)

3. Initial Data Exploration:

Once the data is loaded, we can perform some initial exploration to understand its structure and contents. We can use functions such as head(), summary(), and str() to get a glimpse of the data set.

# Load the raw data from the CSV file
# Display the first few rows of the data set
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
# Summarize the main characteristics of the data set
summary(data)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 
# Explore the structure of the data set
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

4.Data Cleaning and Transformation:

The raw dataset might contain missing values, inconsistencies, or irrelevant columns. We need to clean and transform the data to ensure its quality and suitability for analysis. Depending on the specific analysis requirements, this step may include tasks such as removing unnecessary columns, handling missing values, and transforming variables.

# Clean and filter the data for relevant variables
cleaned_data <- data %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG) %>%
  filter(!is.na(EVTYPE), !is.na(FATALITIES), !is.na(INJURIES), !is.na(PROPDMG))

Results:

# Calculate the total number of fatalities and injuries for each event type
event_health <- cleaned_data %>%
  group_by(EVTYPE) %>%
  summarise(total_fatalities = sum(FATALITIES),
            total_injuries = sum(INJURIES)) %>%
  arrange(desc(total_fatalities))

# Identify the event types with the highest impact on population health
top_health_events <- event_health[1:5, ]

# Calculate the total property damage for each event type
event_economic <- cleaned_data %>%
  group_by(EVTYPE) %>%
  summarise(total_prop_damage = sum(PROPDMG)) %>%
  arrange(desc(total_prop_damage))

# Identify the event types with the greatest economic consequences
top_economic_events <- event_economic[1:5, ]

# Most Harmful Events with Respect to Population Health
cat("Most Harmful Events with Respect to Population Health:\n")
## Most Harmful Events with Respect to Population Health:
top_health_events
## # A tibble: 5 × 3
##   EVTYPE         total_fatalities total_injuries
##   <chr>                     <dbl>          <dbl>
## 1 TORNADO                    5633          91346
## 2 EXCESSIVE HEAT             1903           6525
## 3 FLASH FLOOD                 978           1777
## 4 HEAT                        937           2100
## 5 LIGHTNING                   816           5230
# Events with the Greatest Economic Consequences
cat("\nEvents with the Greatest Economic Consequences:\n")
## 
## Events with the Greatest Economic Consequences:
top_economic_events
## # A tibble: 5 × 2
##   EVTYPE            total_prop_damage
##   <chr>                         <dbl>
## 1 TORNADO                    3212258.
## 2 FLASH FLOOD                1420125.
## 3 TSTM WIND                  1335966.
## 4 FLOOD                       899938.
## 5 THUNDERSTORM WIND           876844.

Most Harmful Events with Respect to Population Health

The analysis of the data revealed the following events with the highest impact on population health:

  1. TORNADO: Tornadoes were found to be the most harmful event with a total of 5633 fatalities.
  2. EXCESSIVE HEAT: Excessive heat events were responsible for 1903 fatalities.
  3. FLASH FLOOD: Flash floods caused 978 fatalities.
  4. HEAT: Heat-related events resulted in 937 fatalities.
  5. LIGHTNING: Lightning strikes were associated with 816 fatalities. These findings highlight the importance of preparedness and resource allocation for these specific events to mitigate their impact on public health.

Events with the Greatest Economic Consequences

Considering the economic consequences, the analysis identified the following events with the highest property damage:

  1. TORNADO: Tornadoes caused the most significant economic impact, resulting in a total property damage of $3,212,258.2.
  2. FLASH FLOOD: Flash floods were responsible for property damage amounting to $1,420,124.6.
  3. TSTM WIND: Thunderstorm winds accounted for property damage of $1,335,965.6.
  4. FLOOD: Flood events caused $899,938.5 worth of property damage.
  5. THUNDERSTORM WIND: Thunderstorm winds were associated with property damage of $876,844.2. These findings suggest that allocating resources and implementing measures to mitigate the economic consequences of these events is crucial for effective disaster management and resource allocation.

To assist government or municipal managers responsible for preparing for severe weather events, it is essential to consider the prioritization of resources based on the potential impact on both population health and economic consequences.

# Plotting Most Harmful Events
library(ggplot2)

# Combine the data for most harmful events
most_harmful <- rbind(
  data.frame(Event_Type = c("TORNADO", "EXCESSIVE HEAT", "FLASH FLOOD", "HEAT", "LIGHTNING"),
             Category = rep("Population Health", 5),
             Value = c(5633, 1903, 978, 937, 816)),
  data.frame(Event_Type = c("TORNADO", "FLASH FLOOD", "TSTM WIND", "FLOOD", "THUNDERSTORM WIND"),
             Category = rep("Economic Consequences", 5),
             Value = c(3212258.2, 1420124.6, 1335965.6, 899938.5, 876844.2))
)

# Create the bar plot
plot <- ggplot(most_harmful, aes(x = Event_Type, y = Value, fill = Category)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +
  labs(x = "Event Type", y = "Value", title = "Most Harmful Events with Respect to Population Health and Economic Consequences") +
  scale_fill_manual(values = c("#1f77b4", "#ff7f0e"), labels = c("Population Health", "Economic Consequences")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

# Save the plot as a file
ggsave("path/to/your/figure.png", plot, width = 10, height = 6)

# Print the plot
plot

The figure above presents a visual representation of the most harmful events with respect to population health and events with the greatest economic consequences. This plot can provide a comprehensive overview to aid in decision-making and resource allocation for future severe weather event preparedness.

Please note that specific recommendations are beyond the scope of this analysis, but the results provide valuable insights for prioritizing resources and taking appropriate actions to minimize the impact of severe weather events.