The U.S. NOAA Storm Data: Analysis of Most Harmful and Costly Damages

Synopsis

In this article, we explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which represents characteristics of major weather events in the US. The charateristics include when and where the events occur and estimates of any health damage (i.e., fatalities and injuries) and economic damages (i.e., property and crop damages).

The research questions we have addressed in this analysis are as follows:

Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

The structure of this report is as follows: In Section Data Processing we load and pre-process the data. Section Results presents our analysis results.

Data Processing

In this section, we load and process the data. In the first subsection (Data), we load the data. The Variables subsection selects the variables from the data set that are essential for our analysis. The subsequent subsection calculates the real amount of property and crop damages. Finally, in the last subsection, we categorize the damages into two groups (health and economic).

Data

The data for this project comes in the form of a CSV file compressed via the bzip2 algorithm to reduce its size. The following script loads it into a variable named data.

dataURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
temp <- tempfile()
download.file(dataURL,temp)
data <- read.csv(temp, stringsAsFactors = FALSE)
unlink(temp)
dim(data)

## [1] 902297     37

As we see, the data has 902297 observations and 37 variables.

Variables

Now, let us take a look at the variable names. (Note that we first convert them to lower case.)

names(data) <- tolower(names(data))
names(data)

##  [1] "state__"    "bgn_date"   "bgn_time"   "time_zone"  "county"    
##  [6] "countyname" "state"      "evtype"     "bgn_range"  "bgn_azi"   
## [11] "bgn_locati" "end_date"   "end_time"   "county_end" "countyendn"
## [16] "end_range"  "end_azi"    "end_locati" "length"     "width"     
## [21] "f"          "mag"        "fatalities" "injuries"   "propdmg"   
## [26] "propdmgexp" "cropdmg"    "cropdmgexp" "wfo"        "stateoffic"
## [31] "zonenames"  "latitude"   "longitude"  "latitude_e" "longitude_"
## [36] "remarks"    "refnum"

The variables that we need in this analysis are as follows:

evtype (the type of the event)
fatalities (the number of fatalities of the event)
injuries (the number of injuries of the event)
propdmg (a value for property damage caused by the event)
propdmgexp (the exponent of property damage)
cropdmg (a value for crop damage caused by the event)
cropdmgexp (exponent of crop damage)

To make our life easier, we modify the names of the above varaibles as follows:

names(data)[8] <- "event_type"
names(data)[25] <- "property_damage"
names(data)[26] <- "property_dmg_exp"
names(data)[27] <- "crop_damage"
names(data)[28] <- "crop_dmg_exp"

In the following script, we keep only those variables of the data set that we need for our analysis.

library(dplyr)
data <- data %>% 
        select(event_type, fatalities, injuries, property_damage, property_dmg_exp, crop_damage, crop_dmg_exp)
str(data)

## 'data.frame':    902297 obs. of  7 variables:
##  $ event_type      : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ fatalities      : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries        : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ property_damage : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ property_dmg_exp: chr  "K" "K" "K" "K" ...
##  $ crop_damage     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ crop_dmg_exp    : chr  "" "" "" "" ...

Real Amount of Economic Damages

The next step is to calculate the real amount of economic damages, i.e., property and crop damages. The combination of property_dmg_exp (crop_dmg_exp, respectively) and property_damage (crop_damage, respectively) gives us the amount of property (crop, respectively) damage in US dollars. Indeed, property_dmg_exp and crop_dmg_exp are kind of exponents for the values in property_damage and crop_damage, respectively. Let us first take a look at the possible values of these exponents:

unique(c(unique(data$property_dmg_exp), unique(data$crop_dmg_exp)))

##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8" "k"

According to this article, the meaning of the possible values of property_dmg_exp and crop_dmg_exp are as follows:

H,h = hundreds = 100
K,k = kilos = thousands = 1,000
M,m = millions = 1,000,000
B,b = billions = 1,000,000,000
(+) = 1
(-) = 0
(?) = 0
black/empty character = 0
numeric 0..8 = 10

The following function, apply_exp, helps us to compute the real amount of damages in US dollars. It takes a value (i.e., a value of either property_damage or crop_damage) and converts it to a new value based on a given exponent:

apply_exp <- function(value, exponent){
        x <- 0
        if((exponent == "h") || (exponent == "H")){
               x <- value * 100  
       }
        else if((exponent == "k") || (exponent == "K")){
                x <- value * 1000
        }
        else if((exponent == "m") || (exponent == "M")){
                x <- value * 1000000
        }
        else if((exponent == "b") || (exponent == "B")){
                x <- value * 1000000000
        }
        else if(exponent == "+"){
                x <- value
        }
        else if(exponent %in% as.character(0:8)){
                x <- value * 10
        }
        else {
                x <- 0
        }
        x
}

The following script adds the following new variables to the data set:

prop_dmg, which represents the real amount of property damage casused by events in US dollars
crop_dmg, which represents the real amount of crop damage casused by events in US dollars

Moreover, it removes the old variables (i.e, property_damage, crop_damage, property_dmg_exp, and crop_dmg_exp), as we do not need to keep them in the data set anymore.

data <- data %>% 
        mutate(prop_dmg = mapply(apply_exp, property_damage, property_dmg_exp), crop_dmg = mapply(apply_exp, crop_damage, crop_dmg_exp)) %>% 
        select(-c(property_damage, crop_damage, property_dmg_exp, crop_dmg_exp))

sample_n(data, 6)

##        event_type fatalities injuries prop_dmg crop_dmg
## 302744    TORNADO          0        2   150000        0
## 817754       HAIL          0        0        0        0
## 349854       HAIL          0        0        0        0
## 885722       HAIL          0        0        0        0
## 421664  TSTM WIND          0        0   100000        0
## 615162       HAIL          0        0   500000        0

Health and Economic Damages

For a given observation, we can categorize the damages into two major groups: the health and the economic damages. The former would be the number of fatalities plus the number of injuries. The latter would be the amount of crop damage plus the amount of property damage. In the following script, we add two corresponding variables (health_dmg and economic_dmg, respectivly) to the data set.

data <- data %>% 
        mutate(health_dmg = fatalities + injuries, economic_dmg = prop_dmg + crop_dmg)
data <- data[, c(1:3, 6, 4, 5, 7)]
sample_n(data, 5)

##               event_type fatalities injuries health_dmg prop_dmg crop_dmg
## 616274         TSTM WIND          0        0          0        0        0
## 127090              HAIL          0        0          0        0        0
## 227251         LIGHTNING          0        0          0        0        0
## 898274 THUNDERSTORM WIND          0        0          0        0        0
## 576909         TSTM WIND          0        0          0    25000        0
##        economic_dmg
## 616274            0
## 127090            0
## 227251            0
## 898274            0
## 576909        25000

Now, let us split our data into two data sets: data_health and data_economic:

data_health <- data[, c(1:4)]
str(data_health)

## 'data.frame':    902297 obs. of  4 variables:
##  $ event_type: chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ fatalities: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ health_dmg: num  15 0 2 2 2 6 1 0 15 0 ...

data_economic <- data[, c(1, 5:7)]
str(data_economic)

## 'data.frame':    902297 obs. of  4 variables:
##  $ event_type  : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ prop_dmg    : num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
##  $ crop_dmg    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ economic_dmg: num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...

Now, we are ready to address our analysis questions in the next section.

Results

In this section, we address our main questions:

QUESTION 1: ``Across the United States, which types of events are most harmful with respect to population health?’’

QUESTION 2: ``Across the United States, which types of events have the greatest economic consequences?’’

To address the above questions, we will respectively play with the data sets data_health and data_economic. We address the above questions in the following subsections, respectively.

Most Dangerous Event Types

In the following script, we get the sum of health damages for each event type. The result is saved into a new data set named data_health_evn.

data_health_evn <- data_health %>% 
        group_by(event_type) %>% 
        summarize(fatalities = sum(fatalities, na.rm = TRUE), injuries = sum(injuries, na.rm = TRUE), health_damage = sum(health_dmg, na.rm = TRUE))

data_health_evn <- as.data.frame(data_health_evn)
tail(data_health_evn)

##             event_type fatalities injuries health_damage
## 980 WINTER WEATHER/MIX         28       72           100
## 981        WINTERY MIX          0        0             0
## 982         Wintry mix          0        0             0
## 983         Wintry Mix          0        0             0
## 984         WINTRY MIX          1       77            78
## 985                WND          0        0             0

Let us take a look at the range of health damages:

range(data_health_evn$health_damage)

## [1]     0 96979

The following script shows us what percentage of health damage values are 0 in this vector.

mean(data_health_evn$health_damage == 0) * 100

## [1] 77.66497

Since those events with 0 health damage are not of interest, we filter out them from the data set:

data_health_evn <- data_health_evn %>% filter(health_damage != 0)
dim(data_health_evn)

## [1] 220   4

Let us now take a look at the quantile of the health damage vector:

bord_health <- quantile(data_health_evn$health_damage, probs = c(0.1, 0.5, 0.9))
bord_health

##   10%   50%   90% 
##   1.0   5.0 463.9

In our point of view, those events that are in top %10 of decreaseing health damages should be considered as the most harmful damages with respect to population health. As we see above, any event type whose health damage is greater than or equal to 464² should be included in this list. In the rest of this subsection, we analyze more these event types.

We filter our data set, according to the above criteria, to get a new data frame, data_health_evn_worst. Let us refer to this data set as the worst cases w.r.t health damages.

data_health_evn_worst <- data_health_evn %>% filter(health_damage >= round(bord_health[3]))
dim(data_health_evn_worst)

## [1] 22  4

Before going further (say visual analysis), we need to add IDs for events. As we see in this code chunk, the names of some given events are so long. This could make our analysis plots look akward. Therefore, we add an index column to our data set to distiguish the event types by their indices.

data_health_evn_worst$event_ID <- seq.int(nrow(data_health_evn_worst))
data_health_evn_worst <- data_health_evn_worst[, c(5, 1:4)]
str(data_health_evn_worst)

## 'data.frame':    22 obs. of  5 variables:
##  $ event_ID     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ event_type   : chr  "BLIZZARD" "EXCESSIVE HEAT" "FLASH FLOOD" "FLOOD" ...
##  $ fatalities   : num  101 1903 978 470 62 ...
##  $ injuries     : num  805 6525 1777 6789 734 ...
##  $ health_damage: num  906 8428 2755 7259 796 ...

In the following, we do some visualiztion analysis on our data set, i.e., the worst cases w.r.t health damages. The blue and red dashed lines indicate the corresponding median and mean lines. The x-axis in any of the following plots denotes the event IDs.

layout(matrix(c(1, 2, 3, 3), nrow=2, byrow=TRUE))

with(data_health_evn_worst, plot(event_ID, fatalities, main = "Fatalities in Worst Cases", xlab = "Event ID", ylab =  "Fatalities"))
abline(h = median(data_health_evn_worst$fatalities), col = "blue", lwd = 2, lty = 3)
abline(h = mean(data_health_evn_worst$fatalities), col = "red", lwd = 2, lty = 3)

with(data_health_evn_worst, plot(event_ID, injuries, main = "Injuries in Worst Cases", xlab = "Event ID", ylab = "Injuries"))
abline(h = median(data_health_evn_worst$injuries), col = "blue", lwd = 2, lty = 3)
abline(h = mean(data_health_evn_worst$injuries), col = "red", lwd = 2, lty = 3)

with(data_health_evn_worst, plot(event_ID, health_damage, main = "Health Damages in Worst Cases", xlab = "Event ID", ylab =  "fatalities + injuries"))
abline(h = median(data_health_evn_worst$health_damage), col = "blue", lwd = 2, lty = 3)
abline(h = mean(data_health_evn_worst$health_damage), col = "red", lwd = 2, lty = 3)

Here is just to recall which Event IDs in our plots refer to what Event Types:

data_health_evn_worst %>% select(event_ID, event_type)

##    event_ID         event_type
## 1         1           BLIZZARD
## 2         2     EXCESSIVE HEAT
## 3         3        FLASH FLOOD
## 4         4              FLOOD
## 5         5                FOG
## 6         6               HAIL
## 7         7               HEAT
## 8         8          HEAT WAVE
## 9         9         HEAVY SNOW
## 10       10          HIGH WIND
## 11       11  HURRICANE/TYPHOON
## 12       12          ICE STORM
## 13       13          LIGHTNING
## 14       14        RIP CURRENT
## 15       15       RIP CURRENTS
## 16       16  THUNDERSTORM WIND
## 17       17 THUNDERSTORM WINDS
## 18       18            TORNADO
## 19       19          TSTM WIND
## 20       20   WILD/FOREST FIRE
## 21       21           WILDFIRE
## 22       22       WINTER STORM

The median (blue line), mean (red line), and max of the fatalities in the worst cases are 188, ~599, and 5633, respectively.

The following script extracts those events which have fatalities above the median value of the selected events:

data_fatalities_median <- data_health_evn_worst %>% filter(data_health_evn_worst$fatalities > median(data_health_evn_worst$fatalities))
data_fatalities_median$event_type

##  [1] "EXCESSIVE HEAT" "FLASH FLOOD"    "FLOOD"          "HEAT"          
##  [5] "HIGH WIND"      "LIGHTNING"      "RIP CURRENT"    "RIP CURRENTS"  
##  [9] "TORNADO"        "TSTM WIND"      "WINTER STORM"

The following script, extracts those selected events which are above mean value:

data_fatalities_mean <- data_health_evn_worst %>% filter(data_health_evn_worst$fatalities > mean(data_health_evn_worst$fatalities))
data_fatalities_mean$event_type

## [1] "EXCESSIVE HEAT" "FLASH FLOOD"    "HEAT"           "LIGHTNING"     
## [5] "TORNADO"

The following script extracts the most harmful event type with respect to the number of fatalities.

data_health_evn_worst[data_health_evn_worst$fatalities == max(data_health_evn_worst$fatalities), ]$event_type

## [1] "TORNADO"

The median, mean, and max of the injuries in the selected event types are ~1298, ~6138, 91346, respectively.

The following script extracts those events which have injuries above the median value of the selected events:

data_injuries_median <- data_health_evn_worst %>% filter(data_health_evn_worst$injuries > median(data_health_evn_worst$injuries))
data_injuries_median$event_type

##  [1] "EXCESSIVE HEAT"    "FLASH FLOOD"       "FLOOD"            
##  [4] "HAIL"              "HEAT"              "ICE STORM"        
##  [7] "LIGHTNING"         "THUNDERSTORM WIND" "TORNADO"          
## [10] "TSTM WIND"         "WINTER STORM"

The following script extracts those events which have injuries above the mean value of the selected events:

data_injuries_mean <- data_health_evn_worst %>% filter(data_health_evn_worst$injuries > mean(data_health_evn_worst$injuries))
data_injuries_mean$event_type

## [1] "EXCESSIVE HEAT" "FLOOD"          "TORNADO"        "TSTM WIND"

The following script extracts the most harmful event type with respect to the number of injuries.

data_health_evn_worst[data_health_evn_worst$injuries == max(data_health_evn_worst$injuries), ]$event_type

## [1] "TORNADO"

The median, mean, and max of the health damages (fatalities + injuries) in the worst case event types are 1380, 6737.4545, 96979, respectively.

The following script, extracts those events which have health damages above the median value of the selected events:

data_health_median <- data_health_evn_worst %>% filter(data_health_evn_worst$health_damage > median(data_health_evn_worst$health_damage))
data_health_median$event_type

##  [1] "EXCESSIVE HEAT"    "FLASH FLOOD"       "FLOOD"            
##  [4] "HEAT"              "HIGH WIND"         "ICE STORM"        
##  [7] "LIGHTNING"         "THUNDERSTORM WIND" "TORNADO"          
## [10] "TSTM WIND"         "WINTER STORM"

The following script, extracts those events which have health damages above the mean value of the selected events:

data_health_mean <- data_health_evn_worst %>% filter(data_health_evn_worst$health_damage > mean(data_health_evn_worst$health_damage))
data_health_mean$event_type

## [1] "EXCESSIVE HEAT" "FLOOD"          "TORNADO"        "TSTM WIND"

The following script extracts the most harmful event type with respect to the number of injuries plus fatalities.

data_health_evn_worst[data_health_evn_worst$health_damage == max(data_health_evn_worst$health_damage), ]$event_type

## [1] "TORNADO"

Most Costly Event Types

In the following script, we get the sum of economic damages for each event type. The result is saved into a new data set named data_economic_evn.

data_economic_evn <- data_economic %>% 
        group_by(event_type) %>% 
        summarize(property_damage = sum(prop_dmg, na.rm = TRUE), crop_damage = sum(crop_dmg, na.rm = TRUE), economic_damage = sum(economic_dmg, na.rm = TRUE))

data_economic_evn <- as.data.frame(data_economic_evn)
tail(data_economic_evn)

##             event_type property_damage crop_damage economic_damage
## 980 WINTER WEATHER/MIX         6372000           0         6372000
## 981        WINTERY MIX               0           0               0
## 982         Wintry mix               0           0               0
## 983         Wintry Mix            2500           0            2500
## 984         WINTRY MIX           10000           0           10000
## 985                WND               0           0               0

The following script represents the range of economic damages in US dollars.

range(data_economic_evn$economic_damage)

## [1]            0 150319678250

The following script shows us what percentage of economic damage values are 0 in this vector.

mean(data_economic_evn$economic_damage == 0) * 100

## [1] 56.35

Obviously, we are not interested in event types whose economic damages are 0. Therefore, we filter them out from our data set:

data_economic_evn <- data_economic_evn %>% filter(economic_damage != 0)
dim(data_economic_evn)

## [1] 430   4

Let us now take a look at the quantile of the economic damage vector:

bord_economic <- quantile(data_economic_evn$economic_damage, probs = c(0.1, 0.5, 0.9) )
bord_economic

##       10%       50%       90% 
##      4000    223250 237918499

Again, we consider those event types whose economic damages are in top %10 of decreaseing economic damages. As we see above, any event type whose economic damage is greater than or equal to 237918499 should be included in this list.

We filter our data set, according to the above criteria, to get a new data frames, called data_economic_evn_worst. Let us refer to this data set as the worst cases w.r.t economic damages.

data_economic_evn_worst <- data_economic_evn %>% filter(economic_damage >= round(bord_economic[3]))
dim(data_economic_evn_worst)

## [1] 43  4

As we did in the previous subsection, we add an index column to data_economic_evn to distiguish the event types by their indices.

data_economic_evn_worst$event_ID <- seq.int(nrow(data_economic_evn_worst))
data_economic_evn_worst <- data_economic_evn_worst[, c(5, 1:4)]
str(data_economic_evn_worst)

## 'data.frame':    43 obs. of  5 variables:
##  $ event_ID       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ event_type     : chr  "BLIZZARD" "DAMAGING FREEZE" "DROUGHT" "EXCESSIVE HEAT" ...
##  $ property_damage: num  659213950 8000000 1046106000 7753700 67737400 ...
##  $ crop_damage    : num  112060000 262100000 13972566000 492402000 1292973000 ...
##  $ economic_damage: num  771273950 270100000 15018672000 500155700 1360710400 ...

In the following figures, we see the plots regarding the worst cases w.r.t. economic damages. Again, the blue and red dashed lines indicate the corresponding median and mean lines, respectively.

layout(matrix(c(1, 2, 3, 3), nrow=2, byrow=TRUE))

with(data_economic_evn_worst, plot(event_ID, property_damage, main = "Property Damages in Worst Cases", xlab = "Event ID", ylab =  "Property Damages ($)"))
abline(h = median(data_economic_evn_worst$property_damage), col = "blue", lwd = 2, lty = 3)
abline(h = mean(data_economic_evn_worst$property_damage), col = "red", lwd = 2, lty = 3)

with(data_economic_evn_worst, plot(event_ID, crop_damage, main = "Crop Damages in Worst Cases", xlab = "Event ID", ylab =  "Crop Damages ($)"))
abline(h = median(data_economic_evn_worst$crop_damage), col = "blue", lwd = 2, lty = 3)
abline(h = mean(data_economic_evn_worst$crop_damage), col = "red", lwd = 2, lty = 3)

with(data_economic_evn_worst, plot(event_ID, economic_damage, main = "Economic (Property + Crop) Damages in Worst Cases", xlab = "Event ID", ylab =  "Economic Damages ($)"))
abline(h = median(data_economic_evn_worst$economic_damage), col = "blue", lwd = 2, lty = 3)
abline(h = mean(data_economic_evn_worst$economic_damage), col = "red", lwd = 2, lty = 3)

Here is to recall what Event IDs in our plots refer to what Event Types:

data_economic_evn_worst %>% select(event_ID, event_type)

##    event_ID                 event_type
## 1         1                   BLIZZARD
## 2         2            DAMAGING FREEZE
## 3         3                    DROUGHT
## 4         4             EXCESSIVE HEAT
## 5         5               EXTREME COLD
## 6         6                FLASH FLOOD
## 7         7          FLASH FLOOD/FLOOD
## 8         8             FLASH FLOODING
## 9         9                      FLOOD
## 10       10          FLOOD/FLASH FLOOD
## 11       11                     FREEZE
## 12       12               FROST/FREEZE
## 13       13                       HAIL
## 14       14                  HAILSTORM
## 15       15                       HEAT
## 16       16                 HEAVY RAIN
## 17       17  HEAVY RAIN/SEVERE WEATHER
## 18       18                 HEAVY SNOW
## 19       19                  HIGH WIND
## 20       20                 HIGH WINDS
## 21       21                  HURRICANE
## 22       22             HURRICANE ERIN
## 23       23             HURRICANE OPAL
## 24       24          HURRICANE/TYPHOON
## 25       25                  ICE STORM
## 26       26                  LANDSLIDE
## 27       27                  LIGHTNING
## 28       28                RIVER FLOOD
## 29       29        SEVERE THUNDERSTORM
## 30       30                STORM SURGE
## 31       31           STORM SURGE/TIDE
## 32       32                STRONG WIND
## 33       33          THUNDERSTORM WIND
## 34       34         THUNDERSTORM WINDS
## 35       35                    TORNADO
## 36       36 TORNADOES, TSTM WIND, HAIL
## 37       37             TROPICAL STORM
## 38       38                  TSTM WIND
## 39       39                    TYPHOON
## 40       40                 WILD FIRES
## 41       41           WILD/FOREST FIRE
## 42       42                   WILDFIRE
## 43       43               WINTER STORM

The median (blue line), mean (red line), and max of the property damges in the selected event types are 1205360000$, ~9888930226$, and 144657709800$ respectively.

The following script extracts those events whose property damage are above the median value of the selected events:

data_property_median <- data_economic_evn_worst %>% filter(data_economic_evn_worst$property_damage > median(data_economic_evn_worst$property_damage))
data_property_median$event_type

##  [1] "FLASH FLOOD"                "FLOOD"                     
##  [3] "HAIL"                       "HEAVY RAIN/SEVERE WEATHER" 
##  [5] "HIGH WIND"                  "HURRICANE"                 
##  [7] "HURRICANE OPAL"             "HURRICANE/TYPHOON"         
##  [9] "ICE STORM"                  "RIVER FLOOD"               
## [11] "STORM SURGE"                "STORM SURGE/TIDE"          
## [13] "THUNDERSTORM WIND"          "THUNDERSTORM WINDS"        
## [15] "TORNADO"                    "TORNADOES, TSTM WIND, HAIL"
## [17] "TROPICAL STORM"             "TSTM WIND"                 
## [19] "WILD/FOREST FIRE"           "WILDFIRE"                  
## [21] "WINTER STORM"

The following script extracts those events whose property damage are above the mean value of the selected events:

data_property_mean <- data_economic_evn_worst %>% filter(data_economic_evn_worst$property_damage > mean(data_economic_evn_worst$property_damage))
data_property_mean$event_type

## [1] "FLASH FLOOD"       "FLOOD"             "HAIL"             
## [4] "HURRICANE"         "HURRICANE/TYPHOON" "STORM SURGE"      
## [7] "TORNADO"

The following script extracts the most constly event type with respect to property damage.

data_economic_evn_worst[data_economic_evn_worst$property_damage == max(data_economic_evn_worst$property_damage), ]$event_type

## [1] "FLOOD"

The median and mean, and max of the crop damage in the selected event types are ~190655500$, ~1120488179$, 13972566000$, respectively.

The following script extracts those events whose crop damage are above the median value of the selected events:

data_crop_median <- data_economic_evn_worst %>% filter(data_economic_evn_worst$crop_damage > median(data_economic_evn_worst$crop_damage))
data_crop_median$event_type

##  [1] "DAMAGING FREEZE"   "DROUGHT"           "EXCESSIVE HEAT"   
##  [4] "EXTREME COLD"      "FLASH FLOOD"       "FLOOD"            
##  [7] "FREEZE"            "FROST/FREEZE"      "HAIL"             
## [10] "HEAT"              "HEAVY RAIN"        "HIGH WIND"        
## [13] "HURRICANE"         "HURRICANE/TYPHOON" "ICE STORM"        
## [16] "RIVER FLOOD"       "THUNDERSTORM WIND" "TORNADO"          
## [19] "TROPICAL STORM"    "TSTM WIND"         "WILDFIRE"

The following script extracts those events whose crop damage are above the mean value of the selected events:

data_crop_mean <- data_economic_evn_worst %>% filter(data_economic_evn_worst$crop_damage > mean(data_economic_evn_worst$crop_damage))
data_crop_mean$event_type

## [1] "DROUGHT"           "EXTREME COLD"      "FLASH FLOOD"      
## [4] "FLOOD"             "HAIL"              "HURRICANE"        
## [7] "HURRICANE/TYPHOON" "ICE STORM"         "RIVER FLOOD"

The following script extracts the most costly event type with respect to the crop damage.

data_economic_evn_worst[data_economic_evn_worst$crop_damage == max(data_economic_evn_worst$crop_damage), ]$event_type

## [1] "DROUGHT"

The median and mean, and max of the economic damages (property + crop damages) in the selected event types are 1602500000$, 11009418405.0698$, 150319678250$, respectively.

The following script extracts those events whose economic cost are above the median value of the selected events:

data_eco_median <- data_economic_evn_worst %>% filter(data_economic_evn_worst$economic_damage > median(data_economic_evn_worst$economic_damage))
data_eco_median$event_type

##  [1] "DROUGHT"                   "FLASH FLOOD"              
##  [3] "FLOOD"                     "HAIL"                     
##  [5] "HEAVY RAIN/SEVERE WEATHER" "HIGH WIND"                
##  [7] "HURRICANE"                 "HURRICANE OPAL"           
##  [9] "HURRICANE/TYPHOON"         "ICE STORM"                
## [11] "RIVER FLOOD"               "STORM SURGE"              
## [13] "STORM SURGE/TIDE"          "THUNDERSTORM WIND"        
## [15] "THUNDERSTORM WINDS"        "TORNADO"                  
## [17] "TROPICAL STORM"            "TSTM WIND"                
## [19] "WILD/FOREST FIRE"          "WILDFIRE"                 
## [21] "WINTER STORM"

The following script extracts those events whose economic cost are above the mean value of the selected events:

data_eco_mean <- data_economic_evn_worst %>% filter(data_economic_evn_worst$economic_damage > mean(data_economic_evn_worst$economic_damage))
data_eco_mean$event_type

## [1] "DROUGHT"           "FLASH FLOOD"       "FLOOD"            
## [4] "HAIL"              "HURRICANE"         "HURRICANE/TYPHOON"
## [7] "STORM SURGE"       "TORNADO"

The following script extracts the most costly event type.

data_economic_evn_worst[data_economic_evn_worst$economic_damage == max(data_economic_evn_worst$economic_damage), ]$event_type

## [1] "FLOOD"

This analysis report is the final project of the Producible Research (Coursera) course at Johns Hopkins University↩
Note that we rounded the actual value, as only integer values would make sense for health damages.↩

The U.S. NOAA Storm Data: Analysis of Most Harmful and Costly Damages¹

Aliakbar Safilian

October 27, 2018

Synopsis

Data Processing

Data

Variables

Real Amount of Economic Damages

Health and Economic Damages

Results

Most Dangerous Event Types

Most Costly Event Types

The U.S. NOAA Storm Data: Analysis of Most Harmful and Costly Damages1

Aliakbar Safilian

October 27, 2018

Synopsis

Data Processing

Data

Variables

Real Amount of Economic Damages

Health and Economic Damages

Results

Most Dangerous Event Types

Most Costly Event Types

The U.S. NOAA Storm Data: Analysis of Most Harmful and Costly Damages¹