Exploring the NOAA Storm Database: US Severe Weather impacts on Health and Economics

Assignment

The objective of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.

Background

The NOAA database tracks characteristics of major weather events in US, including when and where they occur, as well as estimates of fatalities, injuries and property damage.

Objective

Explore the NOAA Storm Database to help answer important questions about severe weather events.

Questions

Q1: Across the US, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Q2: Across the US, which types of events have the greatest economic consequences?

Install package

library(tidyverse)
library(skimr)
library(gridExtra)
theme_set(theme_bw())

Loading dataset

storm <- read.csv("repdata_data_StormData.csv", header=TRUE, sep=",")

Exploratory Data Analysis

dim(storm)
[1] 902297     37
skim(storm)
Data summary
Name storm
Number of rows 902297
Number of columns 37
_______________________
Column type frequency:
character 18
logical 1
numeric 18
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
BGN_DATE 0 1 16 18 0 16335 0
BGN_TIME 0 1 3 11 0 3608 0
TIME_ZONE 0 1 3 3 0 22 0
COUNTYNAME 0 1 0 200 1589 29601 0
STATE 0 1 2 2 0 72 0
EVTYPE 0 1 1 30 0 985 0
BGN_AZI 0 1 0 3 547332 35 0
BGN_LOCATI 0 1 0 21 287743 54429 0
END_DATE 0 1 0 18 243411 6663 0
END_TIME 0 1 0 12 238978 3647 0
END_AZI 0 1 0 3 724837 24 0
END_LOCATI 0 1 0 21 499225 34506 0
PROPDMGEXP 0 1 0 1 465934 19 0
CROPDMGEXP 0 1 0 1 618413 9 0
WFO 0 1 0 3 142069 542 0
STATEOFFIC 0 1 0 45 248769 250 0
ZONENAMES 0 1 0 7226 594029 25112 205988
REMARKS 0 1 0 41276 287433 436774 24658

Variable type: logical

skim_variable n_missing complete_rate mean count
COUNTYENDN 902297 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
STATE__ 0 1.00 31.20 16.57 1 19 30 45.0 95 ▆▇▇▁▁
COUNTY 0 1.00 100.64 107.28 0 31 75 131.0 873 ▇▁▁▁▁
BGN_RANGE 0 1.00 1.48 5.48 0 0 0 1.0 3749 ▇▁▁▁▁
COUNTY_END 0 1.00 0.00 0.00 0 0 0 0.0 0 ▁▁▇▁▁
END_RANGE 0 1.00 0.99 3.37 0 0 0 0.0 925 ▇▁▁▁▁
LENGTH 0 1.00 0.23 4.62 0 0 0 0.0 2315 ▇▁▁▁▁
WIDTH 0 1.00 7.50 61.57 0 0 0 0.0 4400 ▇▁▁▁▁
F 843563 0.07 0.91 1.00 0 0 1 1.0 5 ▇▂▁▁▁
MAG 0 1.00 46.90 61.91 0 0 50 75.0 22000 ▇▁▁▁▁
FATALITIES 0 1.00 0.02 0.77 0 0 0 0.0 583 ▇▁▁▁▁
INJURIES 0 1.00 0.16 5.43 0 0 0 0.0 1700 ▇▁▁▁▁
PROPDMG 0 1.00 12.06 59.48 0 0 0 0.5 5000 ▇▁▁▁▁
CROPDMG 0 1.00 1.53 22.17 0 0 0 0.0 990 ▇▁▁▁▁
LATITUDE 47 1.00 2874.94 1657.65 0 2802 3540 4019.0 9706 ▅▇▆▁▁
LONGITUDE 0 1.00 6939.54 3958.06 -14451 7247 8707 9605.0 17124 ▁▁▂▇▁
LATITUDE_E 40 1.00 1451.61 1858.73 0 0 0 3549.0 9706 ▇▃▂▁▁
LONGITUDE_ 0 1.00 3509.14 4475.68 -14455 0 0 8735.0 106220 ▇▁▁▁▁
REFNUM 0 1.00 451149.00 260470.85 1 225575 451149 676723.0 902297 ▇▇▇▇▇

Analysis

Q1: Across the US, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Selecting variables

columns<-c("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP","CROPDMG","CROPDMGEXP")
storm_df<-storm[columns]
dim(storm_df)
[1] 902297      7
names(storm_df)
[1] "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"    "PROPDMGEXP"
[6] "CROPDMG"    "CROPDMGEXP"

Top 10 events that caused more fatalities

fatal <- aggregate(FATALITIES ~ EVTYPE, data = storm_df, FUN = sum)
top10_fatal <- fatal[order(-fatal$FATALITIES), ][1:10, ] 
top10_fatal 
            EVTYPE FATALITIES
834        TORNADO       5633
130 EXCESSIVE HEAT       1903
153    FLASH FLOOD        978
275           HEAT        937
464      LIGHTNING        816
856      TSTM WIND        504
170          FLOOD        470
585    RIP CURRENT        368
359      HIGH WIND        248
19       AVALANCHE        224

Top 10 events that caused more injuries

injuries <- aggregate(INJURIES ~ EVTYPE, data = storm_df, FUN = sum)
top10_injuries <- injuries[order(-injuries$INJURIES), ][1:10, ] 
top10_injuries 
               EVTYPE INJURIES
834           TORNADO    91346
856         TSTM WIND     6957
170             FLOOD     6789
130    EXCESSIVE HEAT     6525
464         LIGHTNING     5230
275              HEAT     2100
427         ICE STORM     1975
153       FLASH FLOOD     1777
760 THUNDERSTORM WIND     1488
244              HAIL     1361

Plot with top ten fatalities & injuries per event.

g1 <- ggplot(top10_fatal, aes(y= (FATALITIES), x= reorder(EVTYPE, FATALITIES), fill = FATALITIES)) + 
  geom_col() +
  ggtitle("Top ten fatalities by event") + 
  ylab('') +
  xlab('Events') + 
  coord_flip()
  

g2 <- ggplot(top10_injuries, aes(y= (INJURIES), x= reorder(EVTYPE, INJURIES), fill = INJURIES)) + 
  geom_col() +
  ggtitle("Top ten injuries by event") + 
  ylab('') +
  xlab('Events') + 
  coord_flip()

grid.arrange(g1, g2)

Q2: Across the US, which types of events have the greatest economic consequences?

An analysis of the weather events responsible with greatest economic impact in properties and crops Property and Crop Damage Values Degrees are Indicated by PROPDMGEXP and CROPDMGEXP which indicate the damage amounts. These damage estimates will be normalized to numeric.

tmp_PROPDMG <- plyr::mapvalues(storm_df$PROPDMGEXP,
                          c("K","M","", "B","m","+","0","5","6","?","4","2","3","h","7","H","-","1","8"), 
                          c(1e3,1e6, 1, 1e9,1e6,  1,  1,1e5,1e6,  1,1e4,1e2,1e3,  1,1e7,1e2,  1, 10,1e8))

tmp_CROPDMG <- plyr::mapvalues(storm_df$CROPDMGEXP,
                          c("","M","K","m","B","?","0","k","2"),
                          c( 1,1e6,1e3,1e6,1e9,1,1,1e3,1e2))

storm_df$TOTAL_PROPDMG <- as.numeric(tmp_PROPDMG) * storm_df$PROPDMG
storm_df$TOTAL_CROPDMG <- as.numeric(tmp_CROPDMG) * storm_df$CROPDMG

# Show column cames 
colnames(storm_df)
[1] "EVTYPE"        "FATALITIES"    "INJURIES"      "PROPDMG"      
[5] "PROPDMGEXP"    "CROPDMG"       "CROPDMGEXP"    "TOTAL_PROPDMG"
[9] "TOTAL_CROPDMG"

Processing the data to calculate the properties and crop damages

storm_df$TOTALDMG <- storm_df$TOTAL_PROPDMG + storm_df$TOTAL_CROPDMG

prop_damage <- aggregate(TOTAL_PROPDMG ~ EVTYPE, data=storm_df, sum)
crop_damage <- aggregate(TOTAL_CROPDMG ~ EVTYPE, data=storm_df, sum)


total_damage <- aggregate(TOTALDMG ~ EVTYPE, data=storm_df, sum)

crop_damage <- arrange(crop_damage,desc(crop_damage$TOTAL_CROPDMG),EVTYPE)[1:10,]
prop_damage <- arrange(prop_damage,desc(prop_damage$TOTAL_PROPDMG),EVTYPE)[1:10,]
total_damage <- arrange(total_damage,desc(total_damage$TOTALDMG),EVTYPE)[1:10,]

# Convert EVTYPE as factor 
prop_damage$EVTYPE <- factor(prop_damage$EVTYPE, levels = prop_damage$EVTYPE)
crop_damage$EVTYPE <- factor(crop_damage$EVTYPE, levels = crop_damage$EVTYPE)
total_damage$EVTYPE <- factor(total_damage$EVTYPE, levels = total_damage$EVTYPE)

Total Damages (in USD)

total_damage
              EVTYPE     TOTALDMG
1              FLOOD 150319678257
2  HURRICANE/TYPHOON  71913712800
3            TORNADO  57362333947
4        STORM SURGE  43323541000
5               HAIL  18761221986
6        FLASH FLOOD  18243991079
7            DROUGHT  15018672000
8          HURRICANE  14610229010
9        RIVER FLOOD  10148404500
10         ICE STORM   8967041360
ggplot(total_damage, aes(x = reorder(EVTYPE,TOTALDMG), y = TOTALDMG, fill= TOTALDMG)) + 
  geom_col() + 
  ggtitle('Total Damages in USD') +
  xlab('') +
  ylab('') + 
  theme(legend.position = "none") +
  scale_fill_viridis_c() + 
  coord_flip()

Property damages (in USD)

prop_damage
              EVTYPE TOTAL_PROPDMG
1              FLOOD  144657709807
2  HURRICANE/TYPHOON   69305840000
3            TORNADO   56947380677
4        STORM SURGE   43323536000
5        FLASH FLOOD   16822673979
6               HAIL   15735267513
7          HURRICANE   11868319010
8     TROPICAL STORM    7703890550
9       WINTER STORM    6688497251
10         HIGH WIND    5270046295
ggplot(prop_damage, aes(x = reorder(EVTYPE,TOTAL_PROPDMG), y = TOTAL_PROPDMG, fill= TOTAL_PROPDMG)) + 
  geom_col() + 
  ggtitle('Total Property Damages in USD') +
  xlab('') +
  ylab('') + 
  theme(legend.position = "none") +
  scale_fill_viridis_c() + 
  coord_flip()

Crop damages (in USD)

crop_damage
              EVTYPE TOTAL_CROPDMG
1            DROUGHT   13972566000
2              FLOOD    5661968450
3        RIVER FLOOD    5029459000
4          ICE STORM    5022113500
5               HAIL    3025954473
6          HURRICANE    2741910000
7  HURRICANE/TYPHOON    2607872800
8        FLASH FLOOD    1421317100
9       EXTREME COLD    1292973000
10      FROST/FREEZE    1094086000
ggplot(crop_damage, aes(x = reorder(EVTYPE,TOTAL_CROPDMG), y = TOTAL_CROPDMG, fill= TOTAL_CROPDMG)) + 
  geom_col() + 
  ggtitle('Total Crop Damages in USD') +
  xlab('') +
  ylab('') + 
  theme(legend.position = "none") +
  scale_fill_viridis_c() + 
  coord_flip()

Results

Question 1: Across the US, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Question 2: Across the US, which types of events have the greatest economic consequences?

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Mexico.1252  LC_CTYPE=Spanish_Mexico.1252   
[3] LC_MONETARY=Spanish_Mexico.1252 LC_NUMERIC=C                   
[5] LC_TIME=Spanish_Mexico.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gridExtra_2.3   skimr_2.1.3     forcats_0.5.1   stringr_1.4.0  
 [5] dplyr_1.0.7     purrr_0.3.4     readr_1.4.0     tidyr_1.1.3    
 [9] tibble_3.1.2    ggplot2_3.3.5   tidyverse_1.3.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7        lubridate_1.7.10  assertthat_0.2.1  digest_0.6.27    
 [5] utf8_1.2.1        R6_2.5.0          cellranger_1.1.0  plyr_1.8.6       
 [9] repr_1.1.3        backports_1.2.1   reprex_2.0.0      evaluate_0.14    
[13] httr_1.4.2        highr_0.9         pillar_1.6.1      rlang_0.4.11     
[17] readxl_1.3.1      rstudioapi_0.13   jquerylib_0.1.4   rmarkdown_2.9    
[21] labeling_0.4.2    munsell_0.5.0     broom_0.7.8       compiler_4.1.0   
[25] modelr_0.1.8      xfun_0.24         pkgconfig_2.0.3   base64enc_0.1-3  
[29] htmltools_0.5.1.1 tidyselect_1.1.1  fansi_0.5.0       viridisLite_0.4.0
[33] crayon_1.4.1      dbplyr_2.1.1      withr_2.4.2       grid_4.1.0       
[37] jsonlite_1.7.2    gtable_0.3.0      lifecycle_1.0.0   DBI_1.1.1        
[41] magrittr_2.0.1    scales_1.1.1      cli_3.0.0         stringi_1.6.2    
[45] farver_2.1.0      fs_1.5.0          xml2_1.3.2        bslib_0.2.5.1    
[49] ellipsis_0.3.2    generics_0.1.0    vctrs_0.3.8       tools_4.1.0      
[53] glue_1.4.2        hms_1.1.0         yaml_2.2.1        colorspace_2.0-2 
[57] rvest_1.0.0       knitr_1.33        haven_2.4.1       sass_0.4.0