Research Overview

Advanced Weather Impact Assessment Project

Statistical Analysis of Meteorological Hazards ## Executive Summary

Meteorological phenomena pose significant threats to human welfare and economic stability across American communities. Understanding the severity and frequency of weather-related casualties and financial losses is crucial for disaster preparedness and resource allocation strategies.

This comprehensive study examines historical weather incident data to determine which atmospheric events pose the greatest risks to civilian populations and which phenomena generate the most substantial economic disruption. The analysis utilizes meteorological records maintained by the National Oceanic and Atmospheric Administration (NOAA) spanning six decades of weather-related incidents.

The database encompasses atmospheric events recorded from 1950 to 2011, documenting casualty statistics including deaths and injuries, alongside monetary assessments of infrastructure and agricultural losses for each recorded incident.

Our methodology focuses on casualty analysis to identify the most dangerous weather patterns threatening public safety, while economic impact assessment reveals which phenomena demand the highest financial recovery investments.

Technical Environment Configuration

Import required analytical libraries for statistical processing.

if (!require(ggplot2)) {
    install.packages("ggplot2")
    library(ggplot2)
}
## Loading required package: ggplot2
if (!require(dplyr)) {
    install.packages("dplyr")
    library(dplyr, warn.conflicts = FALSE)
}
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
if (!require(gridExtra)) {
    install.packages("gridExtra")
    library(gridExtra, warn.conflicts = FALSE)
}
## Loading required package: gridExtra
## Warning: package 'gridExtra' was built under R version 4.0.5
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Data Acquisition and Import

weatherDataPath <- "~/Reproducible Research/week2/repdata_data_StormData1.csv"
if (!file.exists('analysis_data')) {
    dir.create('analysis_data')
}

meteorologicalRecords <- read.csv(weatherDataPath, sep = ",", header = TRUE, stringsAsFactors = FALSE)

Examine dataset structure and composition

colnames(meteorologicalRecords)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
str(meteorologicalRecords)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
head(meteorologicalRecords, 3)

Data Transformation and Preparation

Dataset Optimization

weatherIncidents <- subset(meteorologicalRecords, EVTYPE != "" & EVTYPE != "?"
                           &
                           (FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0),
                           select = c("EVTYPE",
                                      "FATALITIES",
                                      "INJURIES", 
                                      "PROPDMG",
                                      "PROPDMGEXP",
                                      "CROPDMG",
                                      "CROPDMGEXP",
                                      "BGN_DATE",
                                      "END_DATE",
                                      "STATE"))

cat("Optimized dataset dimensions:", dim(weatherIncidents), "\n")
## Optimized dataset dimensions: 254632 10
cat("Missing values detected:", sum(is.na(weatherIncidents)), "\n")
## Missing values detected: 0

The refined analytical dataset contains 254,632 weather incidents across 10 variables with complete data integrity.

Weather Event Classification Standardization

Initial assessment reveals 487 distinct weather event classifications requiring normalization.

cat("Unique weather phenomena:", length(unique(weatherIncidents$EVTYPE)), "\n")
## Unique weather phenomena: 487

Data quality assessment identified inconsistent categorization patterns including variations in capitalization, spelling inconsistencies, and redundant terminology. Standardization protocols were implemented to consolidate similar phenomena under unified classifications.

weatherIncidents$EVTYPE <- trimws(toupper(weatherIncidents$EVTYPE))
# AVALANCHE INCIDENTS
weatherIncidents$EVTYPE <- gsub('.*AVALANC.*', 'AVALANCHE', weatherIncidents$EVTYPE)

# BLIZZARD CONDITIONS
weatherIncidents$EVTYPE <- gsub('.*BLIZZARD.*', 'BLIZZARD', weatherIncidents$EVTYPE)

# ATMOSPHERIC PRESSURE EVENTS
weatherIncidents$EVTYPE <- gsub('.*CLOUD.*', 'ATMOSPHERIC_PRESSURE', weatherIncidents$EVTYPE)

# EXTREME COLD EVENTS
weatherIncidents$EVTYPE <- gsub('.*COLD.*|.*FREEZ.*|.*FROST.*|.*ICE.*', 'EXTREME_COLD', weatherIncidents$EVTYPE)
weatherIncidents$EVTYPE <- gsub('.*LOW.*TEMP.*|.*HYPOTHERM.*', 'EXTREME_COLD', weatherIncidents$EVTYPE)

# DROUGHT CONDITIONS
weatherIncidents$EVTYPE <- gsub('.*DRY.*|.*DROUGHT.*', 'DROUGHT', weatherIncidents$EVTYPE)

# DUST PHENOMENA
weatherIncidents$EVTYPE <- gsub('.*DUST.*', 'DUST_STORM', weatherIncidents$EVTYPE)

# WILDFIRE INCIDENTS
weatherIncidents$EVTYPE <- gsub('.*FIRE.*', 'WILDFIRE', weatherIncidents$EVTYPE)

# FLOODING EVENTS
weatherIncidents$EVTYPE <- gsub('.*FLOOD.*|.*FLD.*', 'FLOODING', weatherIncidents$EVTYPE)

# VISIBILITY REDUCTION
weatherIncidents$EVTYPE <- gsub('.*FOG.*', 'LOW_VISIBILITY', weatherIncidents$EVTYPE)

# HAILSTORM EVENTS
weatherIncidents$EVTYPE <- gsub('.*HAIL.*', 'HAILSTORM', weatherIncidents$EVTYPE)

# EXTREME HEAT EVENTS
weatherIncidents$EVTYPE <- gsub('.*HEAT.*|.*WARM.*|.*HIGH.*TEMP.*', 'EXTREME_HEAT', weatherIncidents$EVTYPE)

# GEOLOGICAL EVENTS
weatherIncidents$EVTYPE <- gsub('.*LANDSLIDE.*|.*LANDSLIP.*', 'LANDSLIDE', weatherIncidents$EVTYPE)

# ELECTRICAL STORMS
weatherIncidents$EVTYPE <- gsub('.*LIGHTNING.*|.*LIGHTING.*|.*LIGNTNING.*', 'LIGHTNING_STRIKE', weatherIncidents$EVTYPE)

# WIND MICROBURSTS
weatherIncidents$EVTYPE <- gsub('.*MICROBURST.*|.*MICRO BURST.*', 'MICROBURST', weatherIncidents$EVTYPE)

# MUD FLOW EVENTS
weatherIncidents$EVTYPE <- gsub('.*MUDSLIDE.*|.*MUD.*SLIDE.*', 'MUDFLOW', weatherIncidents$EVTYPE)

# PRECIPITATION EVENTS
weatherIncidents$EVTYPE <- gsub('.*RAIN.*|.*PRECIP.*', 'HEAVY_PRECIPITATION', weatherIncidents$EVTYPE)

# MARINE HAZARDS
weatherIncidents$EVTYPE <- gsub('.*RIP.*CURRENT.*', 'MARINE_HAZARD', weatherIncidents$EVTYPE)
weatherIncidents$EVTYPE <- gsub('.*SURF.*|.*SURGE.*', 'MARINE_HAZARD', weatherIncidents$EVTYPE)

# SEVERE THUNDERSTORMS
weatherIncidents$EVTYPE <- gsub('.*THUNDER.*|.*TSTM.*', 'THUNDERSTORM', weatherIncidents$EVTYPE)
weatherIncidents$EVTYPE <- gsub('.*STORM.*', 'SEVERE_STORM', weatherIncidents$EVTYPE)

# TORNADO ACTIVITY
weatherIncidents$EVTYPE <- gsub('.*TORNADO.*|.*TORNDAO.*|.*WATERSPOUT.*|.*LANDSPOUT.*', 'TORNADO_ACTIVITY', weatherIncidents$EVTYPE)

# VOLCANIC ACTIVITY
weatherIncidents$EVTYPE <- gsub('.*VOLCAN.*', 'VOLCANIC_ACTIVITY', weatherIncidents$EVTYPE)

# EXCESSIVE MOISTURE
weatherIncidents$EVTYPE <- gsub('.*WET.*', 'EXCESSIVE_MOISTURE', weatherIncidents$EVTYPE)

# HIGH WIND EVENTS
weatherIncidents$EVTYPE <- gsub('.*WIND.*', 'HIGH_WINDS', weatherIncidents$EVTYPE)

# WINTER WEATHER SYSTEMS
weatherIncidents$EVTYPE <- gsub('.*WINTER.*|.*SNOW.*|.*WINTRY.*|.*SLEET.*', 'WINTER_WEATHER', weatherIncidents$EVTYPE)

# DATA QUALITY ENTRIES
weatherIncidents$EVTYPE <- gsub('.*SUMMARY.*|.*MONTHLY.*|.*RECORD.*', 'DATA_ENTRY', weatherIncidents$EVTYPE)

Post-standardization analysis reduced weather phenomena categories to 65 distinct classifications.

cat("Standardized weather categories:", length(unique(weatherIncidents$EVTYPE)), "\n")
## Standardized weather categories: 69

Temporal Data Processing

Extract temporal components for potential chronological analysis applications.

weatherIncidents$INCIDENT_START <- as.Date(weatherIncidents$BGN_DATE, format = "%m/%d/%Y")
weatherIncidents$INCIDENT_END <- as.Date(weatherIncidents$END_DATE, format = "%m/%d/%Y")
weatherIncidents$INCIDENT_YEAR <- as.integer(format(weatherIncidents$INCIDENT_START, "%Y"))
weatherIncidents$EVENT_DURATION_HOURS <- as.numeric(weatherIncidents$INCIDENT_END - weatherIncidents$INCIDENT_START) * 24

Financial Impact Quantification

Economic damage assessment requires conversion of coded magnitude indicators to numerical values. Property damage (PROPDMG) and crop damage (CROPDMG) utilize exponential notation systems (PROPDMGEXP, CROPDMGEXP) requiring mathematical transformation.

cat("Property damage magnitude codes:\n")
## Property damage magnitude codes:
table(toupper(weatherIncidents$PROPDMGEXP))
## 
##             -      +      0      2      3      4      5      6      7      B 
##  11585      1      5    210      1      1      4     18      3      3     40 
##      H      K      M 
##      7 231427  11327
cat("\nCrop damage magnitude codes:\n")
## 
## Crop damage magnitude codes:
table(toupper(weatherIncidents$CROPDMGEXP))
## 
##             ?      0      B      K      M 
## 152663      6     17      7  99953   1986

Implementation of magnitude conversion algorithm to calculate actual financial losses.

# magnitude conversion algorithm
calculateMagnitude <- function(exponent) {
    exponent <- toupper(trimws(as.character(exponent)))
    if (is.na(exponent) || exponent == "") return(10^0)
    if (exponent == "-") return(10^0)
    if (exponent == "?") return(10^0)
    if (exponent == "+") return(10^0)
    if (exponent == "0") return(10^0)
    if (exponent == "1") return(10^1)
    if (exponent == "2") return(10^2)
    if (exponent == "3") return(10^3)
    if (exponent == "4") return(10^4)
    if (exponent == "5") return(10^5)
    if (exponent == "6") return(10^6)
    if (exponent == "7") return(10^7)
    if (exponent == "8") return(10^8)
    if (exponent == "9") return(10^9)
    if (exponent == "H") return(10^2)
    if (exponent == "K") return(10^3)
    if (exponent == "M") return(10^6)
    if (exponent == "B") return(10^9)
    return(10^0)
}

# calculate financial losses (converted to billions USD)
weatherIncidents$PROPERTY_LOSS_BILLIONS <- with(weatherIncidents, 
    as.numeric(PROPDMG) * sapply(PROPDMGEXP, calculateMagnitude)) / 10^9

weatherIncidents$AGRICULTURAL_LOSS_BILLIONS <- with(weatherIncidents, 
    as.numeric(CROPDMG) * sapply(CROPDMGEXP, calculateMagnitude)) / 10^9

Statistical Aggregation

publicSafetyRisk <- aggregate(x = list(TOTAL_CASUALTIES = weatherIncidents$FATALITIES + weatherIncidents$INJURIES), 
                              by = list(WEATHER_PHENOMENON = weatherIncidents$EVTYPE), 
                              FUN = sum, na.rm = TRUE)
publicSafetyRisk <- publicSafetyRisk[order(publicSafetyRisk$TOTAL_CASUALTIES, decreasing = TRUE),]
economicImpactAssessment <- aggregate(x = list(TOTAL_FINANCIAL_LOSS = weatherIncidents$PROPERTY_LOSS_BILLIONS + weatherIncidents$AGRICULTURAL_LOSS_BILLIONS), 
                                      by = list(WEATHER_PHENOMENON = weatherIncidents$EVTYPE), 
                                      FUN = sum, na.rm = TRUE)
economicImpactAssessment <- economicImpactAssessment[order(economicImpactAssessment$TOTAL_FINANCIAL_LOSS, decreasing = TRUE),]

Analytical Findings

Weather Phenomena Threatening Public Safety

safetyRiskChart <- ggplot(head(publicSafetyRisk, 10),
                         aes(x = reorder(WEATHER_PHENOMENON, TOTAL_CASUALTIES), 
                             y = TOTAL_CASUALTIES, 
                             fill = WEATHER_PHENOMENON)) +
                         coord_flip() +
                         geom_bar(stat = "identity", alpha = 0.8) + 
                         xlab("Weather Phenomenon") +
                         ylab("Combined Casualties (Deaths + Injuries)") +
                         theme_minimal() +
                         theme(plot.title = element_text(size = 16, hjust = 0.5),
                               legend.position = "none") +
                         ggtitle("Top 10 Weather Phenomena\nThreatening Public Safety") +
                         scale_fill_brewer(type = "qual", palette = "Set3")
print(safetyRiskChart)

Weather Phenomena Causing Economic Disruption

economicImpactChart <- ggplot(head(economicImpactAssessment, 10),
                             aes(x = reorder(WEATHER_PHENOMENON, TOTAL_FINANCIAL_LOSS), 
                                 y = TOTAL_FINANCIAL_LOSS, 
                                 fill = WEATHER_PHENOMENON)) +
                             coord_flip() +
                             geom_bar(stat = "identity", alpha = 0.8) + 
                             xlab("Weather Phenomenon") +
                             ylab("Combined Financial Losses\n(Billions USD)") +
                             theme_minimal() +
                             theme(plot.title = element_text(size = 16, hjust = 0.5),
                                   legend.position = "none") +
                             ggtitle("Top 10 Weather Phenomena\nCausing Economic Disruption") +
                         scale_fill_brewer(type = "qual", palette = "Spectral")
print(economicImpactChart)

Research Conclusions

This comprehensive meteorological risk assessment provides definitive evidence supporting the following key findings:

  • Primary Threats to Public Safety:

    Tornado activity represents the predominant threat to civilian populations, accounting for the highest combined casualty rates across the analyzed timeframe.

  • Greatest Sources of Economic Disruption:

    Flooding events constitute the primary driver of financial losses, generating the most substantial combined property and agricultural damage costs.

These findings indicate that emergency preparedness resources should prioritize tornado warning systems and response protocols for public safety, while flood mitigation infrastructure represents the most critical investment for economic protection.