========================================================================================================

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This report contains the results of an analysis where the goal was to identify the most hazardous weather events with respect to population health and those with the greatest economic impact in the U.S. based on data collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA).

The storm database includes weather events from 1950 through the year 2011 and contains data estimates such as the number fatalities and injuries for each weather event as well as economic cost damage to properties and crops for each weather event.

The estimates for fatalities and injuries were used to determine weather events with the most harmful impact to population health. Property damage and crop damage cost estimates were used to determine weather events with the greatest economic consequences.

Data Processing

Loading and Pre processing of Storm Data

library(ggplot2)
library(xtable)
library(knitr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
stormURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
storm <- "data/storm-data.csv.bz2"
if (!file.exists('data')) {
    dir.create('data')}

if (!file.exists(storm)) {
    download.file(url = stormURL, destfile = storm)}

stormData <- read.csv(storm, sep = ",", header = TRUE)

Analyzing the Storm Data

names(stormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
str(stormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Subsetting the Storm Data

In case of a large data set like the storm data, it is better to create a subset of the data that contains only the needed columns required for analysis and related to the desired output of the analysis. Subset Data should include the necessary columns:

Variable Description
EVTYPE Event type (Flood, Heat, Hurricane, Tornado, …)
FATALITIES Number of fatalities resulting from event
INJURIES Number of injuries resulting from event
PROPDMG Property damage in USD
PROPDMGEXP Unit multiplier for property damage (K, M, or B)
CROPDMG Crop damage in USD
CROPDMGEXP Unit multiplier for property damage (K, M, or B)
BGN_DATE Begin date of the event
END_DATE End date of the event
STATE State where the event occurred
stormTidy <- subset(stormData, EVTYPE != '?' & (FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0), select = c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "BGN_DATE", "END_DATE", "STATE"))
summary(stormTidy)
##     EVTYPE            FATALITIES          INJURIES            PROPDMG       
##  Length:254632      Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   2.00  
##  Mode  :character   Median :  0.0000   Median :   0.0000   Median :   5.00  
##                     Mean   :  0.0595   Mean   :   0.5519   Mean   :  42.75  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:  25.00  
##                     Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##   PROPDMGEXP           CROPDMG         CROPDMGEXP          BGN_DATE        
##  Length:254632      Min.   :  0.000   Length:254632      Length:254632     
##  Class :character   1st Qu.:  0.000   Class :character   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character   Mode  :character  
##                     Mean   :  5.411                                        
##                     3rd Qu.:  0.000                                        
##                     Max.   :990.000                                        
##    END_DATE            STATE          
##  Length:254632      Length:254632     
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

The tidy storm data set contains 254632 observations with 10 variables and no missing values.

Clean the Storm Data

  1. Clean Date Data Format date variables for any type of optional reporting or further analysis. Create four new date variables based on date variables from the tidy dataset:
Variable Description
DATE_START Begin date of the event stored as a date type
DATE_END End date of the event stored as a date type
YEAR Year the event started
DURATION Duration (in hours) of the event
stormTidy$DATE_START <- as.Date(stormTidy$BGN_DATE, format = "%m/%d/%Y")
stormTidy$DATE_END <- as.Date(stormTidy$END_DATE, format = "%m/%d/%Y")
stormTidy$YEAR <- as.integer(format(stormTidy$DATE_START, "%Y"))
stormTidy$DURATION <- as.numeric(stormTidy$DATE_END - stormTidy$DATE_START)/3600
  1. Clean the Economic Data (Converting Exponent Columns) The raw data which is based on the “National Weather Service Storm Data Documentation” PROPDMGEXP and CROPDMGEXP are supposed to contain an alphabetical character used to signify magnitude and logs “K” for thousands, “M” for millions, and “B” for billions. We need to convert the exponent values from K, M, B to 1000, 1000000, and 1000000000
multiplier <- function(exp) {
    exp <- toupper(exp);
    if (exp == "")  return (10^0);
    if (exp == "-") return (10^0);
    if (exp == "?") return (10^0);
    if (exp == "+") return (10^0);
    if (exp == "0") return (10^0);
    if (exp == "1") return (10^1);
    if (exp == "2") return (10^2);
    if (exp == "3") return (10^3);
    if (exp == "4") return (10^4);
    if (exp == "5") return (10^5);
    if (exp == "6") return (10^6);
    if (exp == "7") return (10^7);
    if (exp == "8") return (10^8);
    if (exp == "9") return (10^9);
    if (exp == "H") return (10^2);
    if (exp == "K") return (10^3);
    if (exp == "M") return (10^6);
    if (exp == "B") return (10^9);
    return (NA);
}

Create New Columns for Property Cost and Crop Cost : PR_COST, CR_COST

# Compute the property damage and crop damage costs (in billions) using sapply
stormTidy$PR_COST <- with(stormTidy, as.numeric(PROPDMG) * sapply(PROPDMGEXP, multiplier))/10^9
stormTidy$CR_COST <- with(stormTidy, as.numeric(CROPDMG) * sapply(CROPDMGEXP, multiplier))/10^9

Data Analyzation

The project needs to address through the use of data analysis using the following question below:

Aggregate the Storm Data

The raw data has been processed and tidied. The only thing to do is to create a summarized dataset for desired output. * Health Impact Summarized Data

healthImpactData <- aggregate(x = list(HEALTH_IMPACT = stormTidy$FATALITIES + stormTidy$INJURIES), by = list(EVENT_TYPE = stormTidy$EVTYPE),FUN = sum, na.rm = TRUE)

healthImpactData <- healthImpactData[order(healthImpactData$HEALTH_IMPACT, decreasing = TRUE),]
  • Damage Cost Summarized Data
damageCostImpactData <- aggregate(x = list(DAMAGE_IMPACT = stormTidy$PR_COST + stormTidy$CR_COST), by = list(EVENT_TYPE = stormTidy$EVTYPE), FUN = sum,na.rm = TRUE)
damageCostImpactData <- damageCostImpactData[order(damageCostImpactData$DAMAGE_IMPACT, decreasing = TRUE),]

Results

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

library(knitr)
data <- head(healthImpactData, 10)
caption <- "Top 10 US Weather Events that are Most Harmful to Health Population"
kable(data, format = "html", caption = caption, table.attr = 'class="table-bordered"')
Top 10 US Weather Events that are Most Harmful to Health Population
EVENT_TYPE HEALTH_IMPACT
406 TORNADO 96979
60 EXCESSIVE HEAT 8428
422 TSTM WIND 7461
85 FLOOD 7259
257 LIGHTNING 6046
150 HEAT 3037
72 FLASH FLOOD 2755
237 ICE STORM 2064
363 THUNDERSTORM WIND 1621
480 WINTER STORM 1527
HIGraph <- ggplot(head(healthImpactData, 10),
                            aes(x = reorder(EVENT_TYPE, HEALTH_IMPACT), y = HEALTH_IMPACT, fill = EVENT_TYPE)) +
                            geom_bar(stat = "identity") + 
                            xlab("Event Type") +
                            ylab("Total Health Impacts [Fatalities + Injuries]") +
                            theme(plot.title = element_text(size = 14, hjust = 0.5),
                           axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
                            ggtitle("Top 10 Weather Events Most Harmful to\nPopulation Health")
print(HIGraph)

2. Across the United States, which types of events have the greatest economic consequences?

data2 <- head(damageCostImpactData, 10)
caption <- "Top 10 Weather Events with Greatest Economic Consequences"
kable(data2, format = "html", caption = caption, table.attr ='class="table-bordered"')
Top 10 Weather Events with Greatest Economic Consequences
EVENT_TYPE DAMAGE_IMPACT
85 FLOOD 150.319678
223 HURRICANE/TYPHOON 71.913713
406 TORNADO 57.362334
349 STORM SURGE 43.323541
133 HAIL 18.761222
72 FLASH FLOOD 18.243991
48 DROUGHT 15.018672
214 HURRICANE 14.610229
309 RIVER FLOOD 10.148404
237 ICE STORM 8.967041
DCIGraph <- ggplot(head(damageCostImpactData, 10),
                   aes(x = reorder(EVENT_TYPE, DAMAGE_IMPACT), y = DAMAGE_IMPACT, fill = EVENT_TYPE)) +
  geom_bar(stat = "identity") + 
  xlab("Event Type") +
  ylab("Total Property / Crop Damage Cost\n(in Billions)") +
  theme(plot.title = element_text(size = 14, hjust = 0.5),
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +  # Set angle to 90 degrees
  ggtitle("Top 10 Weather Events with\nGreatest Economic Consequences")
print(DCIGraph)

Inferences

Based on the generated outputs based from the data analysis done above, the following inferences and conclusions can be drawn:

a. Which types of weather events are most harmful to population health?

The greatest number of fatalities and injuries are mostly caused by the weather event, Tornadoes.

b. Which types of weather events have the greatest economic consequences?

The weather event that have the greatest economic consequences based on property damage and crop damage costs is Flood.