Severe Weather Events From Storm Data Analysis

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This report involves exploring the U.S. National Oceanic and Atmospheric Administration’s storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.This report will determine the events with maximum population hazard (Injuries and fatalaties) as well as the events with maximum economic hazard(property and crop damage).

Downloading and Reading the dataset

From the National Oceanic and Atmospheric Administration website we obtained the data for storms starting from the year 1950 and ending in 2011.

Reading the whole data

The data is read from its csv.bz2 file and stored in the working directory as storm.csv file then read in R using the read,csv command.

#Reading the data

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "C:/Users/Mohsen/Documents/storm.csv.bz2")

storm_data<-read.csv("C:/Users/Mohsen/Documents/storm.csv.bz2")

Exploring the whole data

We first activate the necessary package and viewing the first few rows and its structure.

#Activating the necessary libraries

library(tidyverse)
## -- Attaching packages ----------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts -------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
head(storm_data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Selecting the necessary columns for our analysis

#To select only the columns associated with population health

population_health<-select(storm_data, EVTYPE , FATALITIES, INJURIES)

summary(population_health)
##                EVTYPE         FATALITIES          INJURIES        
##  HAIL             :288661   Min.   :  0.0000   Min.   :   0.0000  
##  TSTM WIND        :219940   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  THUNDERSTORM WIND: 82563   Median :  0.0000   Median :   0.0000  
##  TORNADO          : 60652   Mean   :  0.0168   Mean   :   0.1557  
##  FLASH FLOOD      : 54277   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  FLOOD            : 25326   Max.   :583.0000   Max.   :1700.0000  
##  (Other)          :170878
head(population_health)
##    EVTYPE FATALITIES INJURIES
## 1 TORNADO          0       15
## 2 TORNADO          0        0
## 3 TORNADO          0        2
## 4 TORNADO          0        2
## 5 TORNADO          0        2
## 6 TORNADO          0        6
#To select only the columns associated with economic hazard

economic_hazard<-select(storm_data, EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)

head(economic_hazard)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO    25.0          K       0           
## 2 TORNADO     2.5          K       0           
## 3 TORNADO    25.0          K       0           
## 4 TORNADO     2.5          K       0           
## 5 TORNADO     2.5          K       0           
## 6 TORNADO     2.5          K       0

We then examine the levels of EXP or expression to transform the numbers to actual numbers

#To examine the levels of PROPDMGEXP

levels(economic_hazard$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
#The documentation said H or h for hundreds, K for thousands, M or m for millions, and B for Billions, other symbols are negligible

economic_filtered_prop <-filter(economic_hazard, economic_hazard$PROPDMGEXP %in% c("B","h","H", "K","m", "M")) %>% select(EVTYPE, PROPDMG, PROPDMGEXP)

head(economic_filtered_prop)
##    EVTYPE PROPDMG PROPDMGEXP
## 1 TORNADO    25.0          K
## 2 TORNADO     2.5          K
## 3 TORNADO    25.0          K
## 4 TORNADO     2.5          K
## 5 TORNADO     2.5          K
## 6 TORNADO     2.5          K
table(economic_filtered_prop$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
##      0      0      0      0      0      0      0      0      0      0 
##      6      7      8      B      h      H      K      m      M 
##      0      0      0     40      1      6 424665      7  11330
#To calculate the actual property damage
economic_filtered_prop$PROPDMGEXP<-replace(as.character(economic_filtered_prop$PROPDMGEXP),economic_filtered_prop$PROPDMGEXP=="K",1000)

economic_filtered_prop$PROPDMGEXP<-replace(as.character(economic_filtered_prop$PROPDMGEXP),economic_filtered_prop$PROPDMGEXP=="B",1000000000)

economic_filtered_prop$PROPDMGEXP<-replace(as.character(economic_filtered_prop$PROPDMGEXP),economic_filtered_prop$PROPDMGEXP=="h"|economic_filtered_prop$PROPDMGEXP=="H",100)

economic_filtered_prop$PROPDMGEXP<-replace(as.character(economic_filtered_prop$PROPDMGEXP),economic_filtered_prop$PROPDMGEXP=="m"|economic_filtered_prop$PROPDMGEXP=="M",1000000)

#To see that all variables are changed
table(economic_filtered_prop$PROPDMGEXP)
## 
##    100   1000  1e+06  1e+09 
##      7 424665  11337     40
str(economic_filtered_prop)
## 'data.frame':    436049 obs. of  3 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "1000" "1000" "1000" "1000" ...
#economic_filtered_prop$PROPDMGEXP is a character vector

economic_filtered_prop$PROPDMGEXP<-as.numeric(economic_filtered_prop$PROPDMGEXP)

str(economic_filtered_prop)
## 'data.frame':    436049 obs. of  3 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: num  1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
economic_filtered_prop$prop_damage<-economic_filtered_prop$PROPDMG*economic_filtered_prop$PROPDMGEXP

head(economic_filtered_prop)
##    EVTYPE PROPDMG PROPDMGEXP prop_damage
## 1 TORNADO    25.0       1000       25000
## 2 TORNADO     2.5       1000        2500
## 3 TORNADO    25.0       1000       25000
## 4 TORNADO     2.5       1000        2500
## 5 TORNADO     2.5       1000        2500
## 6 TORNADO     2.5       1000        2500
#To examine the levels of CROPDMGEXP

levels(economic_hazard$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"
#The documentation said H or h for hundreds, K for thousands, M or m for millions, and B for Billions, othersymbols are negligible

economic_filtered_crop <-filter(economic_hazard, economic_hazard$CROPDMGEXP %in% c("B","k","K","m","M")) %>% select(EVTYPE, CROPDMG, CROPDMGEXP)

head(economic_filtered_crop)
##                      EVTYPE CROPDMG CROPDMGEXP
## 1 HURRICANE OPAL/HIGH WINDS      10          M
## 2        THUNDERSTORM WINDS     500          K
## 3            HURRICANE ERIN       1          M
## 4            HURRICANE OPAL       4          M
## 5            HURRICANE OPAL      10          m
## 6        THUNDERSTORM WINDS      50          K
#Replace all expressions with their values

economic_filtered_crop$CROPDMGEXP<-replace(as.character(economic_filtered_crop$CROPDMGEXP),economic_filtered_crop$CROPDMGEXP=="K" | economic_filtered_crop$CROPDMGEXP=="k",1000)

table(economic_filtered_crop$CROPDMGEXP)
## 
##   1000      B      m      M 
## 281853      9      1   1994
economic_filtered_crop$CROPDMGEXP<-replace(as.character(economic_filtered_crop$CROPDMGEXP),economic_filtered_crop$CROPDMGEXP=="B",1000000000)

table(economic_filtered_crop$CROPDMGEXP)
## 
##   1000  1e+09      m      M 
## 281853      9      1   1994
economic_filtered_crop$CROPDMGEXP<-replace(as.character(economic_filtered_crop$CROPDMGEXP),economic_filtered_crop$CROPDMGEXP=="m"|economic_filtered_crop$CROPDMGEXP=="M",1000000)

table(economic_filtered_crop$CROPDMGEXP)
## 
##   1000  1e+06  1e+09 
## 281853   1995      9
#Convert economic_filtered_crop$CROPDMGEXP to numeric vector

economic_filtered_crop$CROPDMGEXP<-as.numeric(economic_filtered_crop$CROPDMGEXP)

str(economic_filtered_crop)
## 'data.frame':    283857 obs. of  3 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 410 786 406 409 409 786 786 834 834 812 ...
##  $ CROPDMG   : num  10 500 1 4 10 50 50 5 50 15 ...
##  $ CROPDMGEXP: num  1e+06 1e+03 1e+06 1e+06 1e+06 1e+03 1e+03 1e+03 1e+03 1e+03 ...
#To obtain the actual value of crop damage

economic_filtered_crop$crop_damage<-economic_filtered_crop$CROPDMG*economic_filtered_crop$CROPDMGEXP

head(economic_filtered_crop)
##                      EVTYPE CROPDMG CROPDMGEXP crop_damage
## 1 HURRICANE OPAL/HIGH WINDS      10      1e+06       1e+07
## 2        THUNDERSTORM WINDS     500      1e+03       5e+05
## 3            HURRICANE ERIN       1      1e+06       1e+06
## 4            HURRICANE OPAL       4      1e+06       4e+06
## 5            HURRICANE OPAL      10      1e+06       1e+07
## 6        THUNDERSTORM WINDS      50      1e+03       5e+04

Results

Worst events for population health

We view the worst events for injuries and fatalities and then plotting the data

#To view the top 10 worst events regarding to fatalities

population_health %>% group_by(EVTYPE) %>% summarise (max_fatality=max(FATALITIES)) %>% arrange(desc(max_fatality)) %>% head(10)
## # A tibble: 10 x 2
##    EVTYPE                     max_fatality
##    <fct>                             <dbl>
##  1 HEAT                                583
##  2 TORNADO                             158
##  3 EXCESSIVE HEAT                       99
##  4 EXTREME HEAT                         57
##  5 HEAT WAVE                            33
##  6 TSUNAMI                              32
##  7 UNSEASONABLY WARM AND DRY            29
##  8 TORNADOES, TSTM WIND, HAIL           25
##  9 TROPICAL STORM                       22
## 10 FLASH FLOOD                          20
#To view the top 10 worst events regarding to injuiries

population_health %>% group_by(EVTYPE) %>% summarise (max_injury=max(INJURIES)) %>% arrange(desc(max_injury)) %>% head(10)
## # A tibble: 10 x 2
##    EVTYPE            max_injury
##    <fct>                  <dbl>
##  1 TORNADO                 1700
##  2 ICE STORM               1568
##  3 FLOOD                    800
##  4 HURRICANE/TYPHOON        780
##  5 EXCESSIVE HEAT           519
##  6 BLIZZARD                 385
##  7 HEAT                     230
##  8 HEAT WAVE                200
##  9 TROPICAL STORM           200
## 10 HEAVY SNOW               185
#To view the top 10 events regarding injuries and fatalities

maximum_hazard<-population_health %>% group_by(EVTYPE) %>% summarise (max_fatality=max(FATALITIES), max_injury=max(INJURIES)) %>% arrange(desc(max_fatality,max_injury)) %>% head(10)


# To plot the data
data<-gather(maximum_hazard,type,value,-EVTYPE)

ggplot(data, aes(EVTYPE, value)) + geom_bar(aes(fill = type), stat = "identity")+theme(axis.text.x = element_text(angle = 45,hjust = 1))+facet_grid(type~.)+scale_y_log10()+xlab("Event type")+ylab("Health damage value")
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 2 rows containing missing values (geom_bar).

so the worst event for fatality is Heat amd the worst for injuries is Tornado.

Worst events for economic damage

We view the worst events for property and crop damage and then plotting the data

#To determine the top events with maximum property damage
max_prop_damage<-economic_filtered_prop %>% group_by(EVTYPE) %>% summarise(max_propdamage=max(prop_damage)) %>% arrange(desc(max_propdamage)) %>% head(10)

max_prop_damage
## # A tibble: 10 x 2
##    EVTYPE                    max_propdamage
##    <fct>                              <dbl>
##  1 FLOOD                       115000000000
##  2 STORM SURGE                  31300000000
##  3 HURRICANE/TYPHOON            16930000000
##  4 TROPICAL STORM                5150000000
##  5 RIVER FLOOD                   5000000000
##  6 WINTER STORM                  5000000000
##  7 STORM SURGE/TIDE              4000000000
##  8 HURRICANE                     3000000000
##  9 TORNADO                       2800000000
## 10 HEAVY RAIN/SEVERE WEATHER     2500000000
#To determine the top events with maximum crop damage
max_crop_damage<-economic_filtered_crop %>% group_by(EVTYPE) %>% summarise(max_cropdamage=max(crop_damage)) %>% arrange(desc(max_cropdamage)) %>% head(10)

max_crop_damage
## # A tibble: 10 x 2
##    EVTYPE            max_cropdamage
##    <fct>                      <dbl>
##  1 ICE STORM             5000000000
##  2 RIVER FLOOD           5000000000
##  3 HURRICANE/TYPHOON     1510000000
##  4 DROUGHT               1000000000
##  5 EXTREME COLD           596000000
##  6 FLOOD                  500000000
##  7 HURRICANE              500000000
##  8 EXCESSIVE HEAT         492400000
##  9 HEAT                   400000000
## 10 FROST/FREEZE           286000000
max_eco_damage<- full_join(max_prop_damage,max_crop_damage,by="EVTYPE")

max_eco_damage
## # A tibble: 16 x 3
##    EVTYPE                    max_propdamage max_cropdamage
##    <fct>                              <dbl>          <dbl>
##  1 FLOOD                       115000000000      500000000
##  2 STORM SURGE                  31300000000             NA
##  3 HURRICANE/TYPHOON            16930000000     1510000000
##  4 TROPICAL STORM                5150000000             NA
##  5 RIVER FLOOD                   5000000000     5000000000
##  6 WINTER STORM                  5000000000             NA
##  7 STORM SURGE/TIDE              4000000000             NA
##  8 HURRICANE                     3000000000      500000000
##  9 TORNADO                       2800000000             NA
## 10 HEAVY RAIN/SEVERE WEATHER     2500000000             NA
## 11 ICE STORM                             NA     5000000000
## 12 DROUGHT                               NA     1000000000
## 13 EXTREME COLD                          NA      596000000
## 14 EXCESSIVE HEAT                        NA      492400000
## 15 HEAT                                  NA      400000000
## 16 FROST/FREEZE                          NA      286000000
#To plot the results

eco_data<-gather(max_eco_damage,type,value,-EVTYPE)

ggplot(eco_data, aes(EVTYPE, value)) + geom_bar(aes(fill = type), stat = "identity", position="dodge") +theme(axis.text.x = element_text(angle = 45,hjust = 1))+facet_grid(type~.)+scale_y_log10()+xlab("Event type")+ylab("Economic damage value")
## Warning: Removed 12 rows containing missing values (geom_bar).

so Ice storm is the worst cause of crop damage and Flood is the worst cause of property damage.