Title

Severe Weather Events and Their Impact on Health Population and Economic Consequences of United States Analysis Project


Background

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.


Data

Label Source
DataSource Storm Data
Documentation National Weather Service Storm Data Documentation
FAQ National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Disclaimer all rights reserve to the respective owner of the data.


Synopsis

In this project, the main objectives are to identify the harmful weather events that caused great impact on health population and economy of United States.

Firstly, the targeted data, Storm Data is downloaded and further explored to identify the potential fields that can be used for analysis. Data transformation such as field dropping, health impact and economic loss calculations are carry out and save as new data sets.

Before plotting the graphs, the final data set is also sorted accordingly on both health population impact and economic impact. Then the graphs are plotted and its table is also provided as a support evidence for the graph.


Data Processing

0. Import Library

Please install the following packages below before proceeds from here. (Using “install.packages()”)

library(dplyr)
library(data.table)
library(ggplot2)

1. Download Data

Name the zipfile as StormData.csv.bz2

zipFile <- "StormData.csv.bz2"

Validate the presence of the file, and only download it when StormData.csv.bz2 is absent in the directory

if(!file.exists(zipFile)){
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  zipFile,mode="wb")
}

Load Data into R and save it as StormData

StormData <- read.csv(file=zipFile, header=TRUE)

2. Explore Data

2.1 Explore Data Structure

Explore StormData Structure.

str(StormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Our objective is to carry out impact analysis on health population and economic consequences. Hence below are the assumed relevant fields that will be emphasized for the analysis:

  • EVTYPE , Type of weather event
  • FATALITIES , Fatalities caused by weather [Population Health]
  • INJURIES , Injuries caused by weather [Population Health]
  • PROPDMG , Prop damage loss in dollar [Economic Consequence]
  • PROPDMGEXP , Prop damage loss magnitude [Economic Consequence]
  • CROPDMG , Crop damage loss in dollar [Economic Consequence]
  • CROPDMGEXP , Crop damage loss magnitude [Economic Consequence]

2.2 Subset Data

Subset StormData on the emphasize columns and save it as StormDataEmp.

StormDataEmp <- subset(StormData,select = c(EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP))

2.3 Explore Subset Data

Explore the existence of ‘NA’ in all StormDataEmp columns.

lapply(StormDataEmp, function(x) sum(is.na(x)))
## $EVTYPE
## [1] 0
## 
## $FATALITIES
## [1] 0
## 
## $INJURIES
## [1] 0
## 
## $PROPDMG
## [1] 0
## 
## $PROPDMGEXP
## [1] 0
## 
## $CROPDMG
## [1] 0
## 
## $CROPDMGEXP
## [1] 0

From the table above, we can see that there is no ‘NA’ present in the StormDataEmp. (‘Emp’ stands for Emphasized) Hence, action to ommit ‘NA’ is not required for the following analysis.

3. Transform Data

3.1 Transform Prop Damage and Crop Damage

According to the research on How To Handle Exponent Value of PROPDMGEXP and CROPDMGEXP , we can assumed that the exponents for PROPDMGEXP, CROPDMGEXP are represented as table below:

Exponent Description Value
H,h hundreds 100
K,k kilos or thousands 1,000
M,m millions 1,000,000
B,b billions 1,000,000,000
(+) - 1
(-) - 0
(?) - 0
blank/empty character - 0
numeric 0..8 - 10

Hence, we will prepare a function named getExactLoss that does calculation of the exact loss value:

getExactLoss <- function(val, exp){
    exp <- getFactorUppercase(exp)
    loss <- productValExp(val,exp)
    return(loss)
}
getFactorUppercase <- function(f){
    return(toupper(levels(f)[f]))
}
productValExp <- function(val,exp){
    product <- 
      mapply(function(x,y){
        if(y=="H"){
          x*100
        }
        else if(y=="K"){
          x*1000
        }
        else if(y=="M"){
          x*1000000
        }
        else if(y=="B"){
          x*1000000000
        }
        else if(y=="+"){
          x *1
        }
        else if(y %in% c("-","?","")){
          0
        }
        else if(y %in% c(0:8)){
          x*10
        }
        else{
          0
        }
      },val,exp)
    return(product)
}

3.2 Transform Table

Calculate exact economicLoss with the function getExactLoss created and healthImpact by doing sum of FATALITIES and INJURIES.

Then reduce the final table size by selecting EVType, economicLoss and healthImpact columns only. After that, rename EVType as eventType and save the table as StormDataEmpP. (‘P’ stands for Processed)

propLoss <- getExactLoss(StormDataEmp$PROPDMG,StormDataEmp$PROPDMGEXP)
cropLoss <- getExactLoss(StormDataEmp$CROPDMG,StormDataEmp$CROPDMGEXP)

StormDataEmpP <- 
  StormDataEmp %>%
  mutate(healthImpact=(FATALITIES+INJURIES),
    economicLoss=(propLoss)+(cropLoss)) %>%
  select(eventType=EVTYPE, economicLoss, healthImpact)

4. Aggregate Data

Since both objectives are to determine the most harmful weather event type, we can aggregate StormDataEmpP by grouping on column eventType, and sum on healthImpact and economicLoss accordingly, then save the table as StormDataEmpF. (F stands for final)

StormDataEmpF <- as.data.table(StormDataEmpP)
StormDataEmpF <- StormDataEmpF[,.(totalEconomicLoss=sum(economicLoss),
                                  totalHealthImpact=sum(healthImpact)),by=eventType]
StormDataEmpF <- as.data.frame(StormDataEmpF)

Result

1. Most Harmful Weather Event on Population Health

1.1 Sort Final Data

Sort StormDataEmpF by totalHealthImpact in descending order and save it as HealthImpact.

HealthImpact <- StormDataEmpF[order(-StormDataEmpF$totalHealthImpact),]
HealthImpact$eventType <- factor(HealthImpact$eventType, 
                                 levels = HealthImpact$eventType)
totalrow <- nrow(HealthImpact)

It is shown that HealthImpact has total of 985 rows.

In order to create a sensible plot, only top 10 weather events will be taken into account.

1.2 Health Impact Plot

ggplot(head(HealthImpact,10),
       aes(x = eventType, y = totalHealthImpact, fill = eventType)) + 
  ggtitle("Top 10 Harmful Event on Population Health across United States") +
  xlab("Weather Event") + ylab("Total Individuals") +
  geom_bar(stat = "identity") +
  theme(axis.text.x = element_text(angle = 90), 
        plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Set3",
                     name="Weather Event")

Bar chart above shows top 10 the most harmful weather event against total individuals affected in descending order.

From the chart, we can conclude that tornado has the most harmful impact on population health in relative of other weather events.

1.3 Health Impact Table

head(HealthImpact[,-2],10)
##            eventType totalHealthImpact
## 1            TORNADO             96979
## 99    EXCESSIVE HEAT              8428
## 2          TSTM WIND              7461
## 36             FLOOD              7259
## 15         LIGHTNING              6046
## 27              HEAT              3037
## 20       FLASH FLOOD              2755
## 65         ICE STORM              2064
## 16 THUNDERSTORM WIND              1621
## 8       WINTER STORM              1527

Table above shows the exact values represented for each weather event in the bar chart.

2. Most Harmful Weather Event on Economic Consequences

2.1 Sort Final Data

Sort StormDataEmpF by totalEconomicLoss in descending order and save it as EconomicImpact.

EconomicImpact <- StormDataEmpF[order(-StormDataEmpF$totalEconomicLoss),]
EconomicImpact$eventType <- factor(EconomicImpact$eventType, 
                                 levels = EconomicImpact$eventType)
totalrow <- nrow(EconomicImpact)

It is shown that EconomicImpact has total of 985 rows.

In order to create a sensible plot, only top 10 weather events will be taken into account.

2.2 Economic Impact Plot

ggplot(head(EconomicImpact,10),
       aes(x = eventType, y = totalEconomicLoss, fill = eventType)) + 
  ggtitle("Top 10 Harmful Event on Economic Consequences across United States") +
  xlab("Weather Event") + ylab("Total Loss (Dollar)") +
  geom_bar(stat = "identity") +
  theme(axis.text.x = element_text(angle = 90), 
        plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Set3",
                     name="Weather Event")

Bar chart above shows that top 10 most harmful weather event against economic loss in dollar in descending order.

From the chart, we can conclude that flood has the most harmful impact on economic loss in relative of other weather events.

2.3 Economic Impact Table

head(EconomicImpact[,-3],10)
##             eventType totalEconomicLoss
## 36              FLOOD      150319678250
## 973 HURRICANE/TYPHOON       71913712800
## 1             TORNADO       57352117607
## 204       STORM SURGE       43323541000
## 3                HAIL       18758224527
## 20        FLASH FLOOD       17562132111
## 194           DROUGHT       15018672000
## 226         HURRICANE       14610229010
## 52        RIVER FLOOD       10148404500
## 65          ICE STORM        8967041810

Table above shows the exact values represented for each weather event in the bar chart.