The objective of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.
The NOAA database tracks characteristics of major weather events in US, including when and where they occur, as well as estimates of fatalities, injuries and property damage.
Explore the NOAA Storm Database to help answer important questions about severe weather events.
Q1: Across the US, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Q2: Across the US, which types of events have the greatest economic consequences?
Install package
library(tidyverse)
library(skimr)
library(gridExtra)
theme_set(theme_bw())
Loading dataset
storm <- read.csv("repdata_data_StormData.csv", header=TRUE, sep=",")
dim(storm)
[1] 902297 37
skim(storm)
| Name | storm |
| Number of rows | 902297 |
| Number of columns | 37 |
| _______________________ | |
| Column type frequency: | |
| character | 18 |
| logical | 1 |
| numeric | 18 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| BGN_DATE | 0 | 1 | 16 | 18 | 0 | 16335 | 0 |
| BGN_TIME | 0 | 1 | 3 | 11 | 0 | 3608 | 0 |
| TIME_ZONE | 0 | 1 | 3 | 3 | 0 | 22 | 0 |
| COUNTYNAME | 0 | 1 | 0 | 200 | 1589 | 29601 | 0 |
| STATE | 0 | 1 | 2 | 2 | 0 | 72 | 0 |
| EVTYPE | 0 | 1 | 1 | 30 | 0 | 985 | 0 |
| BGN_AZI | 0 | 1 | 0 | 3 | 547332 | 35 | 0 |
| BGN_LOCATI | 0 | 1 | 0 | 21 | 287743 | 54429 | 0 |
| END_DATE | 0 | 1 | 0 | 18 | 243411 | 6663 | 0 |
| END_TIME | 0 | 1 | 0 | 12 | 238978 | 3647 | 0 |
| END_AZI | 0 | 1 | 0 | 3 | 724837 | 24 | 0 |
| END_LOCATI | 0 | 1 | 0 | 21 | 499225 | 34506 | 0 |
| PROPDMGEXP | 0 | 1 | 0 | 1 | 465934 | 19 | 0 |
| CROPDMGEXP | 0 | 1 | 0 | 1 | 618413 | 9 | 0 |
| WFO | 0 | 1 | 0 | 3 | 142069 | 542 | 0 |
| STATEOFFIC | 0 | 1 | 0 | 45 | 248769 | 250 | 0 |
| ZONENAMES | 0 | 1 | 0 | 7226 | 594029 | 25112 | 205988 |
| REMARKS | 0 | 1 | 0 | 41276 | 287433 | 436774 | 24658 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| COUNTYENDN | 902297 | 0 | NaN | : |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| STATE__ | 0 | 1.00 | 31.20 | 16.57 | 1 | 19 | 30 | 45.0 | 95 | ▆▇▇▁▁ |
| COUNTY | 0 | 1.00 | 100.64 | 107.28 | 0 | 31 | 75 | 131.0 | 873 | ▇▁▁▁▁ |
| BGN_RANGE | 0 | 1.00 | 1.48 | 5.48 | 0 | 0 | 0 | 1.0 | 3749 | ▇▁▁▁▁ |
| COUNTY_END | 0 | 1.00 | 0.00 | 0.00 | 0 | 0 | 0 | 0.0 | 0 | ▁▁▇▁▁ |
| END_RANGE | 0 | 1.00 | 0.99 | 3.37 | 0 | 0 | 0 | 0.0 | 925 | ▇▁▁▁▁ |
| LENGTH | 0 | 1.00 | 0.23 | 4.62 | 0 | 0 | 0 | 0.0 | 2315 | ▇▁▁▁▁ |
| WIDTH | 0 | 1.00 | 7.50 | 61.57 | 0 | 0 | 0 | 0.0 | 4400 | ▇▁▁▁▁ |
| F | 843563 | 0.07 | 0.91 | 1.00 | 0 | 0 | 1 | 1.0 | 5 | ▇▂▁▁▁ |
| MAG | 0 | 1.00 | 46.90 | 61.91 | 0 | 0 | 50 | 75.0 | 22000 | ▇▁▁▁▁ |
| FATALITIES | 0 | 1.00 | 0.02 | 0.77 | 0 | 0 | 0 | 0.0 | 583 | ▇▁▁▁▁ |
| INJURIES | 0 | 1.00 | 0.16 | 5.43 | 0 | 0 | 0 | 0.0 | 1700 | ▇▁▁▁▁ |
| PROPDMG | 0 | 1.00 | 12.06 | 59.48 | 0 | 0 | 0 | 0.5 | 5000 | ▇▁▁▁▁ |
| CROPDMG | 0 | 1.00 | 1.53 | 22.17 | 0 | 0 | 0 | 0.0 | 990 | ▇▁▁▁▁ |
| LATITUDE | 47 | 1.00 | 2874.94 | 1657.65 | 0 | 2802 | 3540 | 4019.0 | 9706 | ▅▇▆▁▁ |
| LONGITUDE | 0 | 1.00 | 6939.54 | 3958.06 | -14451 | 7247 | 8707 | 9605.0 | 17124 | ▁▁▂▇▁ |
| LATITUDE_E | 40 | 1.00 | 1451.61 | 1858.73 | 0 | 0 | 0 | 3549.0 | 9706 | ▇▃▂▁▁ |
| LONGITUDE_ | 0 | 1.00 | 3509.14 | 4475.68 | -14455 | 0 | 0 | 8735.0 | 106220 | ▇▁▁▁▁ |
| REFNUM | 0 | 1.00 | 451149.00 | 260470.85 | 1 | 225575 | 451149 | 676723.0 | 902297 | ▇▇▇▇▇ |
Selecting variables
columns<-c("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP","CROPDMG","CROPDMGEXP")
storm_df<-storm[columns]
dim(storm_df)
[1] 902297 7
names(storm_df)
[1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG" "PROPDMGEXP"
[6] "CROPDMG" "CROPDMGEXP"
Top 10 events that caused more fatalities
fatal <- aggregate(FATALITIES ~ EVTYPE, data = storm_df, FUN = sum)
top10_fatal <- fatal[order(-fatal$FATALITIES), ][1:10, ]
top10_fatal
EVTYPE FATALITIES
834 TORNADO 5633
130 EXCESSIVE HEAT 1903
153 FLASH FLOOD 978
275 HEAT 937
464 LIGHTNING 816
856 TSTM WIND 504
170 FLOOD 470
585 RIP CURRENT 368
359 HIGH WIND 248
19 AVALANCHE 224
Top 10 events that caused more injuries
injuries <- aggregate(INJURIES ~ EVTYPE, data = storm_df, FUN = sum)
top10_injuries <- injuries[order(-injuries$INJURIES), ][1:10, ]
top10_injuries
EVTYPE INJURIES
834 TORNADO 91346
856 TSTM WIND 6957
170 FLOOD 6789
130 EXCESSIVE HEAT 6525
464 LIGHTNING 5230
275 HEAT 2100
427 ICE STORM 1975
153 FLASH FLOOD 1777
760 THUNDERSTORM WIND 1488
244 HAIL 1361
Plot with top ten fatalities & injuries per event.
g1 <- ggplot(top10_fatal, aes(y= (FATALITIES), x= reorder(EVTYPE, FATALITIES), fill = FATALITIES)) +
geom_col() +
ggtitle("Top ten fatalities by event") +
ylab('') +
xlab('Events') +
coord_flip()
g2 <- ggplot(top10_injuries, aes(y= (INJURIES), x= reorder(EVTYPE, INJURIES), fill = INJURIES)) +
geom_col() +
ggtitle("Top ten injuries by event") +
ylab('') +
xlab('Events') +
coord_flip()
grid.arrange(g1, g2)
An analysis of the weather events responsible with greatest economic impact in properties and crops Property and Crop Damage Values Degrees are Indicated by PROPDMGEXP and CROPDMGEXP which indicate the damage amounts. These damage estimates will be normalized to numeric.
tmp_PROPDMG <- plyr::mapvalues(storm_df$PROPDMGEXP,
c("K","M","", "B","m","+","0","5","6","?","4","2","3","h","7","H","-","1","8"),
c(1e3,1e6, 1, 1e9,1e6, 1, 1,1e5,1e6, 1,1e4,1e2,1e3, 1,1e7,1e2, 1, 10,1e8))
tmp_CROPDMG <- plyr::mapvalues(storm_df$CROPDMGEXP,
c("","M","K","m","B","?","0","k","2"),
c( 1,1e6,1e3,1e6,1e9,1,1,1e3,1e2))
storm_df$TOTAL_PROPDMG <- as.numeric(tmp_PROPDMG) * storm_df$PROPDMG
storm_df$TOTAL_CROPDMG <- as.numeric(tmp_CROPDMG) * storm_df$CROPDMG
# Show column cames
colnames(storm_df)
[1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG"
[5] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "TOTAL_PROPDMG"
[9] "TOTAL_CROPDMG"
Processing the data to calculate the properties and crop damages
storm_df$TOTALDMG <- storm_df$TOTAL_PROPDMG + storm_df$TOTAL_CROPDMG
prop_damage <- aggregate(TOTAL_PROPDMG ~ EVTYPE, data=storm_df, sum)
crop_damage <- aggregate(TOTAL_CROPDMG ~ EVTYPE, data=storm_df, sum)
total_damage <- aggregate(TOTALDMG ~ EVTYPE, data=storm_df, sum)
crop_damage <- arrange(crop_damage,desc(crop_damage$TOTAL_CROPDMG),EVTYPE)[1:10,]
prop_damage <- arrange(prop_damage,desc(prop_damage$TOTAL_PROPDMG),EVTYPE)[1:10,]
total_damage <- arrange(total_damage,desc(total_damage$TOTALDMG),EVTYPE)[1:10,]
# Convert EVTYPE as factor
prop_damage$EVTYPE <- factor(prop_damage$EVTYPE, levels = prop_damage$EVTYPE)
crop_damage$EVTYPE <- factor(crop_damage$EVTYPE, levels = crop_damage$EVTYPE)
total_damage$EVTYPE <- factor(total_damage$EVTYPE, levels = total_damage$EVTYPE)
total_damage
EVTYPE TOTALDMG
1 FLOOD 150319678257
2 HURRICANE/TYPHOON 71913712800
3 TORNADO 57362333947
4 STORM SURGE 43323541000
5 HAIL 18761221986
6 FLASH FLOOD 18243991079
7 DROUGHT 15018672000
8 HURRICANE 14610229010
9 RIVER FLOOD 10148404500
10 ICE STORM 8967041360
ggplot(total_damage, aes(x = reorder(EVTYPE,TOTALDMG), y = TOTALDMG, fill= TOTALDMG)) +
geom_col() +
ggtitle('Total Damages in USD') +
xlab('') +
ylab('') +
theme(legend.position = "none") +
scale_fill_viridis_c() +
coord_flip()
prop_damage
EVTYPE TOTAL_PROPDMG
1 FLOOD 144657709807
2 HURRICANE/TYPHOON 69305840000
3 TORNADO 56947380677
4 STORM SURGE 43323536000
5 FLASH FLOOD 16822673979
6 HAIL 15735267513
7 HURRICANE 11868319010
8 TROPICAL STORM 7703890550
9 WINTER STORM 6688497251
10 HIGH WIND 5270046295
ggplot(prop_damage, aes(x = reorder(EVTYPE,TOTAL_PROPDMG), y = TOTAL_PROPDMG, fill= TOTAL_PROPDMG)) +
geom_col() +
ggtitle('Total Property Damages in USD') +
xlab('') +
ylab('') +
theme(legend.position = "none") +
scale_fill_viridis_c() +
coord_flip()
crop_damage
EVTYPE TOTAL_CROPDMG
1 DROUGHT 13972566000
2 FLOOD 5661968450
3 RIVER FLOOD 5029459000
4 ICE STORM 5022113500
5 HAIL 3025954473
6 HURRICANE 2741910000
7 HURRICANE/TYPHOON 2607872800
8 FLASH FLOOD 1421317100
9 EXTREME COLD 1292973000
10 FROST/FREEZE 1094086000
ggplot(crop_damage, aes(x = reorder(EVTYPE,TOTAL_CROPDMG), y = TOTAL_CROPDMG, fill= TOTAL_CROPDMG)) +
geom_col() +
ggtitle('Total Crop Damages in USD') +
xlab('') +
ylab('') +
theme(legend.position = "none") +
scale_fill_viridis_c() +
coord_flip()
Question 1: Across the US, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Question 2: Across the US, which types of events have the greatest economic consequences?
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Mexico.1252 LC_CTYPE=Spanish_Mexico.1252
[3] LC_MONETARY=Spanish_Mexico.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Mexico.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gridExtra_2.3 skimr_2.1.3 forcats_0.5.1 stringr_1.4.0
[5] dplyr_1.0.7 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3
[9] tibble_3.1.2 ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 lubridate_1.7.10 assertthat_0.2.1 digest_0.6.27
[5] utf8_1.2.1 R6_2.5.0 cellranger_1.1.0 plyr_1.8.6
[9] repr_1.1.3 backports_1.2.1 reprex_2.0.0 evaluate_0.14
[13] httr_1.4.2 highr_0.9 pillar_1.6.1 rlang_0.4.11
[17] readxl_1.3.1 rstudioapi_0.13 jquerylib_0.1.4 rmarkdown_2.9
[21] labeling_0.4.2 munsell_0.5.0 broom_0.7.8 compiler_4.1.0
[25] modelr_0.1.8 xfun_0.24 pkgconfig_2.0.3 base64enc_0.1-3
[29] htmltools_0.5.1.1 tidyselect_1.1.1 fansi_0.5.0 viridisLite_0.4.0
[33] crayon_1.4.1 dbplyr_2.1.1 withr_2.4.2 grid_4.1.0
[37] jsonlite_1.7.2 gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1
[41] magrittr_2.0.1 scales_1.1.1 cli_3.0.0 stringi_1.6.2
[45] farver_2.1.0 fs_1.5.0 xml2_1.3.2 bslib_0.2.5.1
[49] ellipsis_0.3.2 generics_0.1.0 vctrs_0.3.8 tools_4.1.0
[53] glue_1.4.2 hms_1.1.0 yaml_2.2.1 colorspace_2.0-2
[57] rvest_1.0.0 knitr_1.33 haven_2.4.1 sass_0.4.0