Exploring the NOAA Storm Database and Analyzing Population Health and Economic Consequences Brought by Severe Weather in the US

Synopsis

Storms and other severe weather events can cause both public health and economic problems.

Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

By exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and analyzing Population Health and economic consequences brought by severe weather, we found out that

  • For population health, Tornado is the most harmful event type evaluated either by number of fatalities or number of injuries. ( Flood Ranked 3rd)
  • For economic consequences, Flood is the most harmful event type evaluated by Crop Damages and Property Damages.( Tornado Ranked 3rd)
  • Tornado and Flood are in the most harmful events that with respect to both population health and conomic consequences.

Data Processing

1.Download the file and put the file in the data folder

if(!file.exists("./data")){dir.create("./data")}
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile="./data/Dataset.csv.bz2",method="curl")

2.load Data

library(data.table)
Data <- read.csv(bzfile("./data/Dataset.csv.bz2"), stringsAsFactors=FALSE)
Data<- data.table(Data)
  1. Clean data

Rename the variables to lowercase for ease of coding.

oldnames<-names(Data)
newnames<-tolower(names(Data))
setnames(Data,oldnames,newnames)
str(Data)
## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ state__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ bgn_date  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ bgn_time  : chr  "0130" "0145" "1600" "0900" ...
##  $ time_zone : chr  "CST" "CST" "CST" "CST" ...
##  $ county    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ countyname: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ state     : chr  "AL" "AL" "AL" "AL" ...
##  $ evtype    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ bgn_range : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ bgn_azi   : chr  "" "" "" "" ...
##  $ bgn_locati: chr  "" "" "" "" ...
##  $ end_date  : chr  "" "" "" "" ...
##  $ end_time  : chr  "" "" "" "" ...
##  $ county_end: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ countyendn: logi  NA NA NA NA NA NA ...
##  $ end_range : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ end_azi   : chr  "" "" "" "" ...
##  $ end_locati: chr  "" "" "" "" ...
##  $ length    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ width     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ f         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ mag       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ fatalities: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ propdmg   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ propdmgexp: chr  "K" "K" "K" "K" ...
##  $ cropdmg   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cropdmgexp: chr  "" "" "" "" ...
##  $ wfo       : chr  "" "" "" "" ...
##  $ stateoffic: chr  "" "" "" "" ...
##  $ zonenames : chr  "" "" "" "" ...
##  $ latitude  : num  3040 3042 3340 3458 3412 ...
##  $ longitude : num  8812 8755 8742 8626 8642 ...
##  $ latitude_e: num  3051 0 0 0 0 ...
##  $ longitude_: num  8806 0 0 0 0 ...
##  $ remarks   : chr  "" "" "" "" ...
##  $ refnum    : num  1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Find related variables

By reading National Weather Service Storm Data Documentation, variables releated to Population Health and Economic Consequences are listed as follows

Variable name Description
evtype Event Type
fatalities Number of occurrences of death: related to Population Health
injuries Number of occurrences of injuries: related to Population Health
propdmg Value of Property Damage with four levels: related to Economic Consequences
propdmgexp levels of Property Damage (“B”, “M”, “K”,“H”) : related to Economic Consequences
cropdmg Value of Crop Damage with three levels: related to Economic Consequences
cropdmgexp levels of Crop Damage (“B”, “M”, K“) : related to Economic Consequences

Note: “H” for hundreds, “K” for thousands, “M” for millions, and “B” for billions.

Scale the property damage variable propdmg

  • If propdmgexp = B, then multiply propdmg by 1,000,000,000
  • Else if propdmgexp = M, then multiply propdmg by 1,000,000
  • Else if propdmgexp = K, then multiply propdmg by 1,000
  • Else if cropdmgexp = H, then multiply cropdmg by 100
  • Else propdmg = NA
Data <- Data[, propdmgexp := toupper(propdmgexp)]
Data[, .N, propdmgexp]
##     propdmgexp      N
##  1:          K 424665
##  2:          M  11337
##  3:            465934
##  4:          B     40
##  5:          +      5
##  6:          0    216
##  7:          5     28
##  8:          6      4
##  9:          ?      8
## 10:          4      4
## 11:          2     13
## 12:          3      4
## 13:          H      7
## 14:          7      5
## 15:          -      1
## 16:          1     25
## 17:          8      1
Data<- Data[, propdmg := ifelse(propdmgexp == "B", propdmg * 1E9,
                            ifelse(propdmgexp == "M", propdmg * 1E6, 
                               ifelse(propdmgexp == "K", propdmg * 1E3, 
                                      ifelse(propdmgexp == "H", propdmg * 1E2, NA))))]
summary(Data$propdmg)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## 0.00e+00 0.00e+00 1.00e+03 9.80e+05 1.00e+04 1.15e+11   466248

Scale the property damage variable cropdmg

  • If cropdmgexp = B, then multiply cropdmg by 1,000,000,000
  • Else if cropdmgexp = M, then multiply cropdmg by 1,000,000
  • Else if cropdmgexp = K, then multiply cropdmg by 1,000
  • Else cropdmg = NA
Data <- Data[, cropdmgexp := toupper(cropdmgexp)]
Data[, .N, cropdmgexp]
##    cropdmgexp      N
## 1:            618413
## 2:          M   1995
## 3:          K 281853
## 4:          B      9
## 5:          ?      7
## 6:          0     19
## 7:          2      1
Data <- Data[, cropdmg := ifelse(cropdmgexp == "B", cropdmg * 1E9, 
                             ifelse(cropdmgexp == "M", cropdmg * 1E6, 
                                 ifelse(cropdmgexp == "K", cropdmg * 1E3, NA)))]
summary(Data$cropdmg)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## 0.00e+00 0.00e+00 0.00e+00 1.73e+05 0.00e+00 5.00e+09   618440

Subset storm dataset

tidyData<-subset(Data,select = c('evtype','fatalities','injuries', 'propdmg', 
                                 'propdmgexp', 'cropdmg', 'cropdmgexp'))

Rename variables by discriptive names

oldnames<-names(tidyData)
newnames<-c('eventType','fatalities','injuries', 'propertyDamage', 'propertyDamageLevel', 'cropDamage', 'cropDamageLevel')
setnames(tidyData,oldnames,newnames)
str(tidyData)
## Classes 'data.table' and 'data.frame':   902297 obs. of  7 variables:
##  $ eventType          : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ fatalities         : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries           : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ propertyDamage     : num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
##  $ propertyDamageLevel: chr  "K" "K" "K" "K" ...
##  $ cropDamage         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cropDamageLevel    : chr  "" "" "" "" ...
##  - attr(*, ".internal.selfref")=<externalptr>

Results

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  1. Subset the dataset to get the top 10 harmful event types evaluated by number of fatalities
fatalitiesData <- aggregate(fatalities ~ eventType, data=tidyData, sum)
fatalitiesData<-fatalitiesData[order(-fatalitiesData$fatalities), ][1:10, ]
fatalitiesData$eventType <- factor(fatalitiesData$eventType, levels = fatalitiesData$eventType)
str(fatalitiesData)
## 'data.frame':    10 obs. of  2 variables:
##  $ eventType : Factor w/ 10 levels "TORNADO","EXCESSIVE HEAT",..: 1 2 3 4 5 6 7 8 9 10
##  $ fatalities: num  5633 1903 978 937 816 ...
  1. plot Top 10 Harmful Weather Events Evaluated by Fatalities
 library(ggplot2)
ggplot(fatalitiesData, aes(x = eventType, y = fatalities)) + 
    geom_bar(stat = "identity", fill = "blue", las = 3) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Fatalities") +
    ggtitle("Top 10 Harmful Weather Events Evaluated by Fatalities")

Analysis:

From the above plot, Tornado is the most harmful event type evaluated by number of fatalitie.

The top 3 Harmful Weather Events evaluated by number of injuries are Tornado,Excessive Heat and Flash Flood

The number of fatalitie caused by Tornado is far more than those of the other Top 10 Harmful Weather Events.

  1. Subset the dataset to get the top 10 harmful event types evaluated by number of injuries
injuriesData <- aggregate(injuries ~ eventType, data=tidyData, sum)
injuriesData<-injuriesData[order(-injuriesData$injuries), ][1:10, ]
injuriesData$eventType <- factor(injuriesData$eventType, levels = injuriesData$eventType)
str(injuriesData)
## 'data.frame':    10 obs. of  2 variables:
##  $ eventType: Factor w/ 10 levels "TORNADO","TSTM WIND",..: 1 2 3 4 5 6 7 8 9 10
##  $ injuries : num  91346 6957 6789 6525 5230 ...
  1. plot the top 10 Harmful Weather Events Evaluated by Injuries
ggplot(injuriesData, aes(x = eventType, y = injuries)) + 
    geom_bar(stat = "identity", fill = "blue", las = 3) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Injuries") + ggtitle("Top 10 Harmful Weather Events Evaluated by Injuries")

Analysis:

From the above plot, Tornado is the most harmful event type evaluated by number of injuries.

The top 3 Harmful Weather Events evaluated by number of injuries are Tornado,TSTM Wind and Flood

The number of injuries caused by Tornado is far more than those of the other Top 10 Harmful Weather Events.

Across the United States, which types of events have the greatest economic consequences?

Economic consequences include Crop Damages and Property Damages.

Crop Damages and Property Damages often occur at the same time.

Therefore, the sum of Crop Damages and Property Damages will be used to evaluate the harmful weather events.

1 Subset the dataset to get the top 10 harmful event types evaluated by sum of crop & property damages

damagesData <- aggregate(propertyDamage + cropDamage ~ eventType, data=tidyData, sum)
names(damagesData)<- c('eventType', 'totalDamages'  )
damagesData<-damagesData[order(-damagesData$totalDamages), ][1:10, ]
damagesData$eventType <- factor(damagesData$eventType, levels = damagesData$eventType)
str(damagesData)
## 'data.frame':    10 obs. of  2 variables:
##  $ eventType   : Factor w/ 10 levels "FLOOD","HURRICANE/TYPHOON",..: 1 2 3 4 5 6 7 8 9 10
##  $ totalDamages: num  1.38e+11 2.93e+10 1.65e+10 1.24e+10 1.01e+10 ...
  1. plot op 10 Harmful Weather Events Evaluated by Property & Crop Damages
ggplot(damagesData, aes(x = eventType, y = totalDamages)) + 
    geom_bar(stat = "identity", fill = "blue", las = 3) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Property & Crop Damages") +
    ggtitle("Top 10 Harmful Weather Events Evaluated \n by Property & Crop Damages ")

Analysis: From the above plot, Flood is the most harmful event type evaluated by Crop Damages and Property Damages

The top 3 Harmful Weather Events evaluated by Crop Damages and Property Damages are Flood, Hurricane/Typhoon and Tornado

Crop Damages and Property Damages caused by Tornado are far more than those of the other Top 10 Harmful Weather Events.