Storms and other weather events cause public health and economic losses for communities and municipalities. Many such events result in fatalities and injuries as well as property and crop damage. Mitigating such outcomes to the extent possible is the highest priority.
The goal of this study was to identify the most hazardous weather events in terms of the population health and the economic loss in the U.S. based on data collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA).
The storm database includes weather events in the U.S. from year 1950 through year 2011. It contains data such as the number fatalities and injuries for each weather event as well as economic damage to properties and crops for each weather event.
The data for fatalities and injuries were used to determine weather events which were the most harmful to population health. Property damage and crop damage cost data were used to determine which weather events had the greatest economic consequences.
This analysis shows that tornado and wind were the most dangerous weather events in terms of deaths and injuries, with tornados causing almost 100,000 deaths and injuries during the 61 years covered in this study. In terms of economic damage, floods were the most costly, resulting in over $150 billion in losses.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. This report presents an analysis of severe weather events in the United States from 1950 to 2011, focusing on their impacts on both public health and the economy.
This analysis was conducted using RStudio, an integrated development environment (IDE) for R programming. RStudio provides a powerful platform for conducting data analysis, visualization, and reporting using the R programming language.
To conduct the analysis, several R packages were used. These packages provide functions and tools for data manipulation, visualization, and statistical analysis. The following R packages were loaded for this analysis:
if (!require(ggplot2)) {
install.packages('ggplot2')
library(ggplot2)
}
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.2.3
The above code snippet is a common practice in R programming for loading packages additional checks for package installation. Explanation of each part:
if (!require(ggplot2)) { }: This line checks if the package ggplot2 is already installed and available in the R environment. If ggplot2 is not available, the condition !require(ggplot2) will evaluate to TRUE.
install.packages(“ggplot2”): Inside the conditional block, if ggplot2 is not #available, this line installs the ggplot2 package from CRAN (Comprehensive R Archive #Network), which is the official repository for R packages.
library(ggplot2): After installation, or if ggplot2 is already available, this line loads the ggplot2 package so that its functions and capabilities are accessible for use in the R script.
This practice ensures that your R script will not encounter errors due to missing packages. If the required package is not installed, it will be installed automatically. However, it’s generally recommended to have such checks, especially if you’re sharing your script with others who may not have the required packages installed.
It’s a good practice to include similar checks for other packages you plan to use as well.
if (!require(dplyr)) {
install.packages('dplyr')
library(dplyr, warn.conflicts = FALSE)
}
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 4.2.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
if (!require(lubridate)) {
install.packages('libridate')
library(lubridate, warn.conflicts = FALSE)
}
## Loading required package: lubridate
## Loading required package: timechange
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
if (!require(knitr)) {
install.packages('knitr')
library(knitr, warn.conflicts = FALSE)
}
## Loading required package: knitr
## Warning: package 'knitr' was built under R version 4.2.3
if (!require(readr)) {
install.packages('readr')
library(readr, warn.conflicts = FALSE)
}
## Loading required package: readr
## Warning: package 'readr' was built under R version 4.2.3
#
sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] readr_2.1.5 knitr_1.43 lubridate_1.9.0 timechange_0.2.0
## [5] dplyr_1.1.4 ggplot2_3.4.4
##
## loaded via a namespace (and not attached):
## [1] bslib_0.4.1 compiler_4.2.2 pillar_1.9.0 jquerylib_0.1.4
## [5] tools_4.2.2 digest_0.6.30 jsonlite_1.8.4 evaluate_0.18
## [9] lifecycle_1.0.3 tibble_3.2.1 gtable_0.3.1 pkgconfig_2.0.3
## [13] rlang_1.1.0 cli_3.4.1 rstudioapi_0.14 yaml_2.3.6
## [17] xfun_0.39 fastmap_1.1.0 withr_2.5.0 hms_1.1.2
## [21] generics_0.1.3 vctrs_0.6.5 sass_0.4.2 grid_4.2.2
## [25] tidyselect_1.2.0 glue_1.6.2 R6_2.5.1 fansi_1.0.3
## [29] rmarkdown_2.18 tzdb_0.3.0 magrittr_2.0.3 ellipsis_0.3.2
## [33] scales_1.2.1 htmltools_0.5.3 colorspace_2.0-3 utf8_1.2.2
## [37] munsell_0.5.0 cachem_1.0.6
#
Before proceeding with the analysis, the R environment needs to be set up. This involves loading the dataset containing severe weather event data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) database and preparing it for analysis.
##
# Define the filename for the compressed data
stormDat_filename <- 'repdata_data_StormData.csv.bz2'
# Check if the compressed file exists, if not, download it
if (!file.exists(stormDat_filename)) {
fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(fileURL, stormDat_filename, mode = "wb")
}
# Decompress the file if it exists
if (file.exists(stormDat_filename)) {
stormDat_decompressed <- 'StormData.csv'
if (!file.exists(stormDat_decompressed)) {
bzfile <- bzfile(stormDat_filename, "rb")
csv_content <- readLines(bzfile)
close(bzfile)
writeLines(csv_content, stormDat_decompressed)
}
}
# Set the working directory
setwd('C:/Users/default user.DESKTOP-JJNHJML/Documents/data')
# Check if 'stmdata' already exists, if not, read the data in chunks
library(readr)
# Define the path to the CSV file
file_path <- 'C:/Users/default user.DESKTOP-JJNHJML/Documents/data/StormData.csv'
file_path
## [1] "C:/Users/default user.DESKTOP-JJNHJML/Documents/data/StormData.csv"
#
if (!exists('stmdata')) {
stmdata <- read_csv(file_path)
}
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl (1): COUNTYENDN
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# See what we have in 'stmdata'
head(stmdata)
## # A tibble: 6 × 37
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
## <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 4/18/1950… 0130 CST 97 MOBILE AL TORNA… 0
## 2 1 4/18/1950… 0145 CST 3 BALDWIN AL TORNA… 0
## 3 1 2/20/1951… 1600 CST 57 FAYETTE AL TORNA… 0
## 4 1 6/8/1951 … 0900 CST 89 MADISON AL TORNA… 0
## 5 1 11/15/195… 1500 CST 43 CULLMAN AL TORNA… 0
## 6 1 11/15/195… 2000 CST 77 LAUDERDALE AL TORNA… 0
## # ℹ 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## # END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## # END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <dbl>,
## # MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## # PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## # STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## # LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>
dim(stmdata)
## [1] 902297 37
str(stmdata)
## spc_tbl_ [902,297 × 37] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ STATE__ : num [1:902297] 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr [1:902297] "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
## $ COUNTY : num [1:902297] 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr [1:902297] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr [1:902297] "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr [1:902297] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr [1:902297] NA NA NA NA ...
## $ BGN_LOCATI: chr [1:902297] NA NA NA NA ...
## $ END_DATE : chr [1:902297] NA NA NA NA ...
## $ END_TIME : chr [1:902297] NA NA NA NA ...
## $ COUNTY_END: num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi [1:902297] NA NA NA NA NA NA ...
## $ END_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr [1:902297] NA NA NA NA ...
## $ END_LOCATI: chr [1:902297] NA NA NA NA ...
## $ LENGTH : num [1:902297] 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num [1:902297] 100 150 123 100 150 177 33 33 100 100 ...
## $ F : num [1:902297] 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num [1:902297] 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num [1:902297] 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num [1:902297] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr [1:902297] "K" "K" "K" "K" ...
## $ CROPDMG : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr [1:902297] NA NA NA NA ...
## $ WFO : chr [1:902297] NA NA NA NA ...
## $ STATEOFFIC: chr [1:902297] NA NA NA NA ...
## $ ZONENAMES : chr [1:902297] NA NA NA NA ...
## $ LATITUDE : num [1:902297] 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num [1:902297] 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num [1:902297] 3051 0 0 0 0 ...
## $ LONGITUDE_: num [1:902297] 8806 0 0 0 0 ...
## $ REMARKS : chr [1:902297] NA NA NA NA ...
## $ REFNUM : num [1:902297] 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, "spec")=
## .. cols(
## .. STATE__ = col_double(),
## .. BGN_DATE = col_character(),
## .. BGN_TIME = col_character(),
## .. TIME_ZONE = col_character(),
## .. COUNTY = col_double(),
## .. COUNTYNAME = col_character(),
## .. STATE = col_character(),
## .. EVTYPE = col_character(),
## .. BGN_RANGE = col_double(),
## .. BGN_AZI = col_character(),
## .. BGN_LOCATI = col_character(),
## .. END_DATE = col_character(),
## .. END_TIME = col_character(),
## .. COUNTY_END = col_double(),
## .. COUNTYENDN = col_logical(),
## .. END_RANGE = col_double(),
## .. END_AZI = col_character(),
## .. END_LOCATI = col_character(),
## .. LENGTH = col_double(),
## .. WIDTH = col_double(),
## .. F = col_double(),
## .. MAG = col_double(),
## .. FATALITIES = col_double(),
## .. INJURIES = col_double(),
## .. PROPDMG = col_double(),
## .. PROPDMGEXP = col_character(),
## .. CROPDMG = col_double(),
## .. CROPDMGEXP = col_character(),
## .. WFO = col_character(),
## .. STATEOFFIC = col_character(),
## .. ZONENAMES = col_character(),
## .. LATITUDE = col_double(),
## .. LONGITUDE = col_double(),
## .. LATITUDE_E = col_double(),
## .. LONGITUDE_ = col_double(),
## .. REMARKS = col_character(),
## .. REFNUM = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
We see we have a ‘solid’ dataframe named ‘stmdata’ so we can…
The following are the column names (variable names) for the “National Climatic Data Center Storm Events” dataset and their respective definitions. I need to choose the variables from this list to help me answer the following questions; 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences?
##### STATE__: State code (e.g., numerical or abbreviation).
##### BGN_DATE: Beginning date of the event.
##### BGN_TIME: Beginning time of the event.
##### TIME_ZONE: Time zone of the event.
##### COUNTY: County code or identifier.
##### COUNTYNAME: Name of the county.
##### STATE: State abbreviation or name.
##### EVTYPE: Type of event (e.g., tornado, flood, hurricane).
##### BGN_RANGE: Range of the beginning location of the event.
##### BGN_AZI: Azimuth of the beginning location of the event.
##### BGN_LOCATI: Location description of the beginning of the event.
##### END_DATE: End date of the event.
##### END_TIME: End time of the event.
##### COUNTY_END: End county code or identifier.
##### COUNTYENDN: End county number.
##### END_RANGE: Range of the end location of the event.
##### END_AZI: Azimuth of the end location of the event.
##### END_LOCATI: Location description of the end of the event.
##### LENGTH: Length of the event.
##### WIDTH: Width of the event.
##### F: Flag for tornado event (0 = No, 1 = Yes).
##### MAG: Magnitude of the event (e.g., for tornadoes).
##### FATALITIES: Number of fatalities.
##### INJURIES: Number of injuries.
##### PROPDMG: Property damage estimate.
##### PROPDMGEXP: Property damage exponent (e.g., K for thousands, M for millions).
##### CROPDMG: Crop damage estimate.
##### CROPDMGEXP: Crop damage exponent.
##### WFO: Weather Forecast Office responsible for the event.
##### STATEOFFIC: State office issuing the event report.
##### ZONENAMES: Zone names affected by the event.
##### LATITUDE: Latitude of the event location.
##### LONGITUDE: Longitude of the event location.
##### LATITUDE_E: Estimated latitude (if needed).
##### LONGITUDE_: Estimated longitude (if needed).
##### REMARKS: Additional remarks or details about the event.
##### REFNUM: Reference number or identifier
Our assignment: Your (my) data analysis must address the following questions: 1.Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2.Across the United States, which types of events have the greatest economic consequences?
This report is written with the intent of addressing a government or municipal manager tasked with readiness for severe weather occurrences, entrusted with allocating resources efficiently across various event categories. While this report quantifies harm done by storms, it does not recommend explicit subsequent actions.
# Looking more closely at our 'stmdata' dataframe;
## we need to look at the column 'EVTYPE'......
#
## 'EVTYPE' looks like the 'main feature' we need for our analysis.
#
## How many 'unique' values in 'stmdata$EVTYPE' ??
length(unique(stmdata$EVTYPE))
## [1] 977
#
stmdata_vals <- unique(stmdata$EVTYPE)
print(stmdata_vals)
## [1] "TORNADO" "TSTM WIND"
## [3] "HAIL" "FREEZING RAIN"
## [5] "SNOW" "ICE STORM/FLASH FLOOD"
## [7] "SNOW/ICE" "WINTER STORM"
## [9] "HURRICANE OPAL/HIGH WINDS" "THUNDERSTORM WINDS"
## [11] "RECORD COLD" "HURRICANE ERIN"
## [13] "HURRICANE OPAL" "HEAVY RAIN"
## [15] "LIGHTNING" "THUNDERSTORM WIND"
## [17] "DENSE FOG" "RIP CURRENT"
## [19] "THUNDERSTORM WINS" "FLASH FLOOD"
## [21] "FLASH FLOODING" "HIGH WINDS"
## [23] "FUNNEL CLOUD" "TORNADO F0"
## [25] "THUNDERSTORM WINDS LIGHTNING" "THUNDERSTORM WINDS/HAIL"
## [27] "HEAT" "WIND"
## [29] "LIGHTING" "HEAVY RAINS"
## [31] "LIGHTNING AND HEAVY RAIN" "FUNNEL"
## [33] "WALL CLOUD" "FLOODING"
## [35] "THUNDERSTORM WINDS HAIL" "FLOOD"
## [37] "COLD" "HEAVY RAIN/LIGHTNING"
## [39] "FLASH FLOODING/THUNDERSTORM WI" "WALL CLOUD/FUNNEL CLOUD"
## [41] "THUNDERSTORM" "WATERSPOUT"
## [43] "EXTREME COLD" "HAIL 1.75)"
## [45] "LIGHTNING/HEAVY RAIN" "HIGH WIND"
## [47] "BLIZZARD" "BLIZZARD WEATHER"
## [49] "WIND CHILL" "BREAKUP FLOODING"
## [51] "HIGH WIND/BLIZZARD" "RIVER FLOOD"
## [53] "HEAVY SNOW" "FREEZE"
## [55] "COASTAL FLOOD" "HIGH WIND AND HIGH TIDES"
## [57] "HIGH WIND/BLIZZARD/FREEZING RA" "HIGH TIDES"
## [59] "HIGH WIND AND HEAVY SNOW" "RECORD COLD AND HIGH WIND"
## [61] "RECORD HIGH TEMPERATURE" "RECORD HIGH"
## [63] "HIGH WINDS HEAVY RAINS" "HIGH WIND/ BLIZZARD"
## [65] "ICE STORM" "BLIZZARD/HIGH WIND"
## [67] "HIGH WIND/LOW WIND CHILL" "HEAVY SNOW/HIGH"
## [69] "RECORD LOW" "HIGH WINDS AND WIND CHILL"
## [71] "HEAVY SNOW/HIGH WINDS/FREEZING" "LOW TEMPERATURE RECORD"
## [73] "AVALANCHE" "MARINE MISHAP"
## [75] "WIND CHILL/HIGH WIND" "HIGH WIND/WIND CHILL/BLIZZARD"
## [77] "HIGH WIND/WIND CHILL" "HIGH WIND/HEAVY SNOW"
## [79] "HIGH TEMPERATURE RECORD" "FLOOD WATCH/"
## [81] "RECORD HIGH TEMPERATURES" "HIGH WIND/SEAS"
## [83] "HIGH WINDS/HEAVY RAIN" "HIGH SEAS"
## [85] "SEVERE TURBULENCE" "RECORD RAINFALL"
## [87] "RECORD SNOWFALL" "RECORD WARMTH"
## [89] "HEAVY SNOW/WIND" "EXTREME HEAT"
## [91] "WIND DAMAGE" "DUST STORM"
## [93] "APACHE COUNTY" "SLEET"
## [95] "HAIL STORM" "FUNNEL CLOUDS"
## [97] "FLASH FLOODS" "DUST DEVIL"
## [99] "EXCESSIVE HEAT" "THUNDERSTORM WINDS/FUNNEL CLOU"
## [101] "WINTER STORM/HIGH WIND" "WINTER STORM/HIGH WINDS"
## [103] "GUSTY WINDS" "STRONG WINDS"
## [105] "FLOODING/HEAVY RAIN" "SNOW AND WIND"
## [107] "HEAVY SURF COASTAL FLOODING" "HEAVY SURF"
## [109] "HEAVY PRECIPATATION" "URBAN FLOODING"
## [111] "HIGH SURF" "BLOWING DUST"
## [113] "URBAN/SMALL" "WILD FIRES"
## [115] "HIGH" "URBAN/SMALL FLOODING"
## [117] "WATER SPOUT" "HIGH WINDS DUST STORM"
## [119] "WINTER STORM HIGH WINDS" "LOCAL FLOOD"
## [121] "WINTER STORMS" "MUDSLIDES"
## [123] "RAINSTORM" "SEVERE THUNDERSTORM"
## [125] "SEVERE THUNDERSTORMS" "SEVERE THUNDERSTORM WINDS"
## [127] "THUNDERSTORMS WINDS" "DRY MICROBURST"
## [129] "FLOOD/FLASH FLOOD" "FLOOD/RAIN/WINDS"
## [131] "WINDS" "DRY MICROBURST 61"
## [133] "THUNDERSTORMS" "FLASH FLOOD WINDS"
## [135] "URBAN/SMALL STREAM FLOODING" "MICROBURST"
## [137] "STRONG WIND" "HIGH WIND DAMAGE"
## [139] "STREAM FLOODING" "URBAN AND SMALL"
## [141] "HEAVY SNOWPACK" "ICE"
## [143] "FLASH FLOOD/" "DOWNBURST"
## [145] "GUSTNADO AND" "FLOOD/RAIN/WIND"
## [147] "WET MICROBURST" "DOWNBURST WINDS"
## [149] "DRY MICROBURST WINDS" "DRY MIRCOBURST WINDS"
## [151] "DRY MICROBURST 53" "SMALL STREAM URBAN FLOOD"
## [153] "MICROBURST WINDS" "HIGH WINDS 57"
## [155] "DRY MICROBURST 50" "HIGH WINDS 66"
## [157] "HIGH WINDS 76" "HIGH WINDS 63"
## [159] "HIGH WINDS 67" "BLIZZARD/HEAVY SNOW"
## [161] "HEAVY SNOW/HIGH WINDS" "BLOWING SNOW"
## [163] "HIGH WINDS 82" "HIGH WINDS 80"
## [165] "HIGH WINDS 58" "FREEZING DRIZZLE"
## [167] "LIGHTNING THUNDERSTORM WINDSS" "DRY MICROBURST 58"
## [169] "HAIL 75" "HIGH WINDS 73"
## [171] "HIGH WINDS 55" "LIGHT SNOW AND SLEET"
## [173] "URBAN FLOOD" "DRY MICROBURST 84"
## [175] "THUNDERSTORM WINDS 60" "HEAVY RAIN/FLOODING"
## [177] "THUNDERSTORM WINDSS" "TORNADOS"
## [179] "GLAZE" "RECORD HEAT"
## [181] "COASTAL FLOODING" "HEAT WAVE"
## [183] "FIRST SNOW" "FREEZING RAIN AND SLEET"
## [185] "UNSEASONABLY DRY" "UNSEASONABLY WET"
## [187] "WINTRY MIX" "WINTER WEATHER"
## [189] "UNSEASONABLY COLD" "EXTREME/RECORD COLD"
## [191] "RIP CURRENTS HEAVY SURF" "SLEET/RAIN/SNOW"
## [193] "UNSEASONABLY WARM" "DROUGHT"
## [195] "NORMAL PRECIPITATION" "HIGH WINDS/FLOODING"
## [197] "DRY" "RAIN/SNOW"
## [199] "SNOW/RAIN/SLEET" "WATERSPOUT/TORNADO"
## [201] "WATERSPOUTS" "WATERSPOUT TORNADO"
## [203] "URBAN/SMALL STREAM FLOOD" "STORM SURGE"
## [205] "WATERSPOUT-TORNADO" "WATERSPOUT-"
## [207] "TORNADOES, TSTM WIND, HAIL" "TROPICAL STORM ALBERTO"
## [209] "TROPICAL STORM" "TROPICAL STORM GORDON"
## [211] "TROPICAL STORM JERRY" "LIGHTNING THUNDERSTORM WINDS"
## [213] "WAYTERSPOUT" "MINOR FLOODING"
## [215] "LIGHTNING INJURY" "URBAN/SMALL STREAM FLOOD"
## [217] "LIGHTNING AND THUNDERSTORM WIN" "THUNDERSTORM WINDS53"
## [219] "URBAN AND SMALL STREAM FLOOD" "URBAN AND SMALL STREAM"
## [221] "WILDFIRE" "DAMAGING FREEZE"
## [223] "THUNDERSTORM WINDS 13" "SMALL HAIL"
## [225] "HEAVY SNOW/HIGH WIND" "HURRICANE"
## [227] "WILD/FOREST FIRE" "SMALL STREAM FLOODING"
## [229] "MUD SLIDE" "LIGNTNING"
## [231] "FROST" "FREEZING RAIN/SNOW"
## [233] "HIGH WINDS/" "THUNDERSNOW"
## [235] "FLOODS" "EXTREME WIND CHILLS"
## [237] "COOL AND WET" "HEAVY RAIN/SNOW"
## [239] "SMALL STREAM AND URBAN FLOODIN" "SMALL STREAM/URBAN FLOOD"
## [241] "SNOW/SLEET/FREEZING RAIN" "SEVERE COLD"
## [243] "GLAZE ICE" "COLD WAVE"
## [245] "EARLY SNOW" "SMALL STREAM AND URBAN FLOOD"
## [247] "HIGH WINDS" "RURAL FLOOD"
## [249] "SMALL STREAM AND" "MUD SLIDES"
## [251] "HAIL 80" "EXTREME WIND CHILL"
## [253] "COLD AND WET CONDITIONS" "EXCESSIVE WETNESS"
## [255] "GRADIENT WINDS" "HEAVY SNOW/BLOWING SNOW"
## [257] "SLEET/ICE STORM" "THUNDERSTORM WINDS URBAN FLOOD"
## [259] "THUNDERSTORM WINDS SMALL STREA" "ROTATING WALL CLOUD"
## [261] "LARGE WALL CLOUD" "COLD AIR FUNNEL"
## [263] "GUSTNADO" "COLD AIR FUNNELS"
## [265] "BLOWING SNOW- EXTREME WIND CHI" "SNOW AND HEAVY SNOW"
## [267] "GROUND BLIZZARD" "MAJOR FLOOD"
## [269] "SNOW/HEAVY SNOW" "FREEZING RAIN/SLEET"
## [271] "ICE JAM FLOODING" "SNOW- HIGH WIND- WIND CHILL"
## [273] "STREET FLOOD" "COLD AIR TORNADO"
## [275] "SMALL STREAM FLOOD" "FOG"
## [277] "THUNDERSTORM WINDS 2" "FUNNEL CLOUD/HAIL"
## [279] "ICE/SNOW" "TSTM WIND 51"
## [281] "TSTM WIND 50" "TSTM WIND 52"
## [283] "TSTM WIND 55" "HEAVY SNOW/BLIZZARD"
## [285] "THUNDERSTORM WINDS 61" "HAIL 0.75"
## [287] "THUNDERSTORM DAMAGE" "THUNDERTORM WINDS"
## [289] "HAIL 1.00" "HAIL/WINDS"
## [291] "SNOW AND ICE" "WIND STORM"
## [293] "SNOWSTORM" "GRASS FIRES"
## [295] "LAKE FLOOD" "PROLONG COLD"
## [297] "HAIL/WIND" "HAIL 1.75"
## [299] "THUNDERSTORMW 50" "WIND/HAIL"
## [301] "SNOW AND ICE STORM" "URBAN AND SMALL STREAM FLOODIN"
## [303] "THUNDERSTORMS WIND" "THUNDERSTORM WINDS"
## [305] "HEAVY SNOW/SLEET" "AGRICULTURAL FREEZE"
## [307] "DROUGHT/EXCESSIVE HEAT" "TUNDERSTORM WIND"
## [309] "TROPICAL STORM DEAN" "THUNDERTSORM WIND"
## [311] "THUNDERSTORM WINDS/ HAIL" "THUNDERSTORM WIND/LIGHTNING"
## [313] "HEAVY RAIN/SEVERE WEATHER" "THUNDESTORM WINDS"
## [315] "WATERSPOUT/ TORNADO" "LIGHTNING."
## [317] "WARM DRY CONDITIONS" "HURRICANE-GENERATED SWELLS"
## [319] "HEAVY SNOW/ICE STORM" "RIVER AND STREAM FLOOD"
## [321] "HIGH WIND 63" "COASTAL SURGE"
## [323] "HEAVY SNOW AND ICE STORM" "MINOR FLOOD"
## [325] "HIGH WINDS/COASTAL FLOOD" "RAIN"
## [327] "RIVER FLOODING" "SNOW/RAIN"
## [329] "ICE FLOES" "HIGH WAVES"
## [331] "SNOW SQUALLS" "SNOW SQUALL"
## [333] "THUNDERSTORM WIND G50" "LIGHTNING FIRE"
## [335] "BLIZZARD/FREEZING RAIN" "HEAVY LAKE SNOW"
## [337] "HEAVY SNOW/FREEZING RAIN" "LAKE EFFECT SNOW"
## [339] "HEAVY WET SNOW" "DUST DEVIL WATERSPOUT"
## [341] "THUNDERSTORM WINDS/HEAVY RAIN" "THUNDERSTROM WINDS"
## [343] "THUNDERSTORM WINDS LE CEN" "HAIL 225"
## [345] "BLIZZARD AND HEAVY SNOW" "HEAVY SNOW AND ICE"
## [347] "ICE STORM AND SNOW" "HEAVY SNOW ANDBLOWING SNOW"
## [349] "HEAVY SNOW/ICE" "BLIZZARD AND EXTREME WIND CHIL"
## [351] "LOW WIND CHILL" "BLOWING SNOW & EXTREME WIND CH"
## [353] "WATERSPOUT/" "URBAN/SMALL STREAM"
## [355] "TORNADO F3" "FUNNEL CLOUD."
## [357] "TORNDAO" "HAIL 0.88"
## [359] "FLOOD/RIVER FLOOD" "MUD SLIDES URBAN FLOODING"
## [361] "TORNADO F1" "THUNDERSTORM WINDS G"
## [363] "DEEP HAIL" "GLAZE/ICE STORM"
## [365] "HEAVY SNOW/WINTER STORM" "AVALANCE"
## [367] "BLIZZARD/WINTER STORM" "DUST STORM/HIGH WINDS"
## [369] "ICE JAM" "FOREST FIRES"
## [371] "THUNDERSTORM WIND G60" "FROST\\FREEZE"
## [373] "THUNDERSTORM WINDS." "HAIL 88"
## [375] "HAIL 175" "HVY RAIN"
## [377] "HAIL 100" "HAIL 150"
## [379] "HAIL 075" "THUNDERSTORM WIND G55"
## [381] "HAIL 125" "THUNDERSTORM WINDS G60"
## [383] "HARD FREEZE" "HAIL 200"
## [385] "THUNDERSTORM WINDS FUNNEL CLOU" "THUNDERSTORM WINDS 62"
## [387] "WILDFIRES" "RECORD HEAT WAVE"
## [389] "HEAVY SNOW AND HIGH WINDS" "HEAVY SNOW/HIGH WINDS & FLOOD"
## [391] "HAIL FLOODING" "THUNDERSTORM WINDS/FLASH FLOOD"
## [393] "HIGH WIND 70" "WET SNOW"
## [395] "HEAVY RAIN AND FLOOD" "LOCAL FLASH FLOOD"
## [397] "THUNDERSTORM WINDS 53" "FLOOD/FLASH FLOODING"
## [399] "TORNADO/WATERSPOUT" "RAIN AND WIND"
## [401] "THUNDERSTORM WIND 59" "THUNDERSTORM WIND 52"
## [403] "COASTAL/TIDAL FLOOD" "SNOW/ICE STORM"
## [405] "BELOW NORMAL PRECIPITATION" "RIP CURRENTS/HEAVY SURF"
## [407] "FLASH FLOOD/FLOOD" "EXCESSIVE RAIN"
## [409] "RECORD/EXCESSIVE HEAT" "HEAT WAVES"
## [411] "LIGHT SNOW" "THUNDERSTORM WIND 69"
## [413] "HAIL DAMAGE" "LIGHTNING DAMAGE"
## [415] "RECORD TEMPERATURES" "LIGHTNING AND WINDS"
## [417] "FOG AND COLD TEMPERATURES" "OTHER"
## [419] "RECORD SNOW" "SNOW/COLD"
## [421] "FLASH FLOOD FROM ICE JAMS" "TSTM WIND G58"
## [423] "MUDSLIDE" "HEAVY SNOW SQUALLS"
## [425] "HEAVY SNOW/SQUALLS" "HEAVY SNOW-SQUALLS"
## [427] "ICY ROADS" "HEAVY MIX"
## [429] "SNOW FREEZING RAIN" "LACK OF SNOW"
## [431] "SNOW/SLEET" "SNOW/FREEZING RAIN"
## [433] "SNOW DROUGHT" "THUNDERSTORMW WINDS"
## [435] "THUNDERSTORM WIND 60 MPH" "THUNDERSTORM WIND 65MPH"
## [437] "THUNDERSTORM WIND/ TREES" "THUNDERSTORM WIND/AWNING"
## [439] "THUNDERSTORM WIND 98 MPH" "THUNDERSTORM WIND TREES"
## [441] "TORRENTIAL RAIN" "TORNADO F2"
## [443] "RIP CURRENTS" "HURRICANE EMILY"
## [445] "HURRICANE GORDON" "HURRICANE FELIX"
## [447] "THUNDERSTORM WIND 59 MPH" "THUNDERSTORM WINDS 63 MPH"
## [449] "THUNDERSTORM WIND/ TREE" "THUNDERSTORM DAMAGE TO"
## [451] "THUNDERSTORM WIND 65 MPH" "FLASH FLOOD - HEAVY RAIN"
## [453] "THUNDERSTORM WIND." "FLASH FLOOD/ STREET"
## [455] "THUNDERSTORM WIND 59 MPH." "HEAVY SNOW FREEZING RAIN"
## [457] "DAM FAILURE" "THUNDERSTORM HAIL"
## [459] "HAIL 088" "THUNDERSTORM WINDSHAIL"
## [461] "LIGHTNING WAUSEON" "THUDERSTORM WINDS"
## [463] "ICE AND SNOW" "RECORD COLD/FROST"
## [465] "STORM FORCE WINDS" "FREEZING RAIN AND SNOW"
## [467] "FREEZING RAIN SLEET AND" "SOUTHEAST"
## [469] "HEAVY SNOW & ICE" "FREEZING DRIZZLE AND FREEZING"
## [471] "THUNDERSTORM WINDS AND" "HAIL/ICY ROADS"
## [473] "FLASH FLOOD/HEAVY RAIN" "HEAVY RAIN; URBAN FLOOD WINDS;"
## [475] "HEAVY PRECIPITATION" "TSTM WIND DAMAGE"
## [477] "HIGH WATER" "FLOOD FLASH"
## [479] "RAIN/WIND" "THUNDERSTORM WINDS 50"
## [481] "THUNDERSTORM WIND G52" "FLOOD FLOOD/FLASH"
## [483] "THUNDERSTORM WINDS 52" "SNOW SHOWERS"
## [485] "THUNDERSTORM WIND G51" "HEAT WAVE DROUGHT"
## [487] "HEAVY SNOW/BLIZZARD/AVALANCHE" "RECORD SNOW/COLD"
## [489] "WET WEATHER" "UNSEASONABLY WARM AND DRY"
## [491] "FREEZING RAIN SLEET AND LIGHT" "RECORD/EXCESSIVE RAINFALL"
## [493] "TIDAL FLOOD" "BEACH EROSIN"
## [495] "THUNDERSTORM WIND G61" "FLOOD/FLASH"
## [497] "LOW TEMPERATURE" "SLEET & FREEZING RAIN"
## [499] "HEAVY RAINS/FLOODING" "THUNDERESTORM WINDS"
## [501] "THUNDERSTORM WINDS/FLOODING" "THUNDEERSTORM WINDS"
## [503] "HIGHWAY FLOODING" "THUNDERSTORM W INDS"
## [505] "HYPOTHERMIA" "FLASH FLOOD/ FLOOD"
## [507] "THUNDERSTORM WIND 50" "THUNERSTORM WINDS"
## [509] "HEAVY RAIN/MUDSLIDES/FLOOD" "MUD/ROCK SLIDE"
## [511] "HIGH WINDS/COLD" "BEACH EROSION/COASTAL FLOOD"
## [513] "COLD/WINDS" "SNOW/ BITTER COLD"
## [515] "THUNDERSTORM WIND 56" "SNOW SLEET"
## [517] "DRY HOT WEATHER" "COLD WEATHER"
## [519] "RAPIDLY RISING WATER" "HAIL ALOFT"
## [521] "EARLY FREEZE" "ICE/STRONG WINDS"
## [523] "EXTREME WIND CHILL/BLOWING SNO" "SNOW/HIGH WINDS"
## [525] "HIGH WINDS/SNOW" "EARLY FROST"
## [527] "SNOWMELT FLOODING" "HEAVY SNOW AND STRONG WINDS"
## [529] "SNOW ACCUMULATION" "BLOWING SNOW/EXTREME WIND CHIL"
## [531] "SNOW/ ICE" "SNOW/BLOWING SNOW"
## [533] "TORNADOES" "THUNDERSTORM WIND/HAIL"
## [535] "FLASH FLOODING/FLOOD" "HAIL 275"
## [537] "HAIL 450" "FLASH FLOOODING"
## [539] "EXCESSIVE RAINFALL" "THUNDERSTORMW"
## [541] "HAILSTORM" "TSTM WINDS"
## [543] "BEACH FLOOD" "HAILSTORMS"
## [545] "TSTMW" "FUNNELS"
## [547] "TSTM WIND 65)" "THUNDERSTORM WINDS/ FLOOD"
## [549] "HEAVY RAINFALL" "HEAT/DROUGHT"
## [551] "HEAT DROUGHT" "NEAR RECORD SNOW"
## [553] "LANDSLIDE" "HIGH WIND AND SEAS"
## [555] "THUNDERSTORMWINDS" "THUNDERSTORM WINDS HEAVY RAIN"
## [557] "SLEET/SNOW" "EXCESSIVE"
## [559] "SNOW/SLEET/RAIN" "WILD/FOREST FIRES"
## [561] "HEAVY SEAS" "DUSTSTORM"
## [563] "FLOOD & HEAVY RAIN" "?"
## [565] "THUNDERSTROM WIND" "FLOOD/FLASHFLOOD"
## [567] "SNOW AND COLD" "HOT PATTERN"
## [569] "PROLONG COLD/SNOW" "BRUSH FIRES"
## [571] "SNOW\\COLD" "WINTER MIX"
## [573] "EXCESSIVE PRECIPITATION" "SNOWFALL RECORD"
## [575] "HOT/DRY PATTERN" "DRY PATTERN"
## [577] "MILD/DRY PATTERN" "MILD PATTERN"
## [579] "LANDSLIDES" "HEAVY SHOWERS"
## [581] "HEAVY SNOW AND" "HIGH WIND 48"
## [583] "LAKE-EFFECT SNOW" "BRUSH FIRE"
## [585] "WATERSPOUT FUNNEL CLOUD" "URBAN SMALL STREAM FLOOD"
## [587] "SAHARAN DUST" "HEAVY SHOWER"
## [589] "URBAN FLOOD LANDSLIDE" "HEAVY SWELLS"
## [591] "URBAN SMALL" "URBAN FLOODS"
## [593] "SMALL STREAM" "HEAVY RAIN/URBAN FLOOD"
## [595] "FLASH FLOOD/LANDSLIDE" "LANDSLIDE/URBAN FLOOD"
## [597] "HEAVY RAIN/SMALL STREAM URBAN" "FLASH FLOOD LANDSLIDES"
## [599] "EXTREME WINDCHILL" "URBAN/SML STREAM FLD"
## [601] "TSTM WIND/HAIL" "Other"
## [603] "Record dry month" "Temperature record"
## [605] "Minor Flooding" "Ice jam flood (minor"
## [607] "High Wind" "Tstm Wind"
## [609] "ROUGH SURF" "Wind"
## [611] "Heavy Surf" "Dust Devil"
## [613] "Wind Damage" "Marine Accident"
## [615] "Snow" "Freeze"
## [617] "Snow Squalls" "Coastal Flooding"
## [619] "Heavy Rain" "Strong Wind"
## [621] "COASTAL STORM" "COASTALFLOOD"
## [623] "Erosion/Cstl Flood" "Heavy Rain and Wind"
## [625] "Light Snow/Flurries" "Wet Month"
## [627] "Wet Year" "Tidal Flooding"
## [629] "River Flooding" "Damaging Freeze"
## [631] "Beach Erosion" "Hot and Dry"
## [633] "Flood/Flash Flood" "Icy Roads"
## [635] "High Surf" "Heavy Rain/High Surf"
## [637] "Thunderstorm Wind" "Rain Damage"
## [639] "Unseasonable Cold" "Early Frost"
## [641] "Wintry Mix" "blowing snow"
## [643] "STREET FLOODING" "Record Cold"
## [645] "Extreme Cold" "Ice Fog"
## [647] "Excessive Cold" "Torrential Rainfall"
## [649] "Freezing Rain" "Landslump"
## [651] "Late-season Snowfall" "Hurricane Edouard"
## [653] "Coastal Storm" "Flood"
## [655] "HEAVY RAIN/WIND" "TIDAL FLOODING"
## [657] "Winter Weather" "Snow squalls"
## [659] "Strong Winds" "Strong winds"
## [661] "RECORD WARM TEMPS." "Ice/Snow"
## [663] "Mudslide" "Glaze"
## [665] "Extended Cold" "Snow Accumulation"
## [667] "Freezing Fog" "Drifting Snow"
## [669] "Whirlwind" "Heavy snow shower"
## [671] "Heavy rain" "LATE SNOW"
## [673] "Record May Snow" "Record Winter Snow"
## [675] "Heavy Precipitation" "Record temperature"
## [677] "Light snow" "Late Season Snowfall"
## [679] "Gusty Wind" "small hail"
## [681] "Light Snow" "MIXED PRECIP"
## [683] "Black Ice" "Mudslides"
## [685] "Gradient wind" "Snow and Ice"
## [687] "Freezing Spray" "Summary Jan 17"
## [689] "Summary of March 14" "Summary of March 23"
## [691] "Summary of March 24" "Summary of April 3rd"
## [693] "Summary of April 12" "Summary of April 13"
## [695] "Summary of April 21" "Summary August 11"
## [697] "Summary of April 27" "Summary of May 9-10"
## [699] "Summary of May 10" "Summary of May 13"
## [701] "Summary of May 14" "Summary of May 22 am"
## [703] "Summary of May 22 pm" "Heatburst"
## [705] "Summary of May 26 am" "Summary of May 26 pm"
## [707] "Metro Storm, May 26" "Summary of May 31 am"
## [709] "Summary of May 31 pm" "Summary of June 3"
## [711] "Summary of June 4" "Summary June 5-6"
## [713] "Summary June 6" "Summary of June 11"
## [715] "Summary of June 12" "Summary of June 13"
## [717] "Summary of June 15" "Summary of June 16"
## [719] "Summary June 18-19" "Summary of June 23"
## [721] "Summary of June 24" "Summary of June 30"
## [723] "Summary of July 2" "Summary of July 3"
## [725] "Summary of July 11" "Summary of July 22"
## [727] "Summary July 23-24" "Summary of July 26"
## [729] "Summary of July 29" "Summary of August 1"
## [731] "Summary August 2-3" "Summary August 7"
## [733] "Summary August 9" "Summary August 10"
## [735] "Summary August 17" "Summary August 21"
## [737] "Summary August 28" "Summary September 4"
## [739] "Summary September 20" "Summary September 23"
## [741] "Summary Sept. 25-26" "Summary: Oct. 20-21"
## [743] "Summary: October 31" "Summary: Nov. 6-7"
## [745] "Summary: Nov. 16" "Microburst"
## [747] "wet micoburst" "Hail(0.75)"
## [749] "Funnel Cloud" "Urban Flooding"
## [751] "No Severe Weather" "Urban flood"
## [753] "Urban Flood" "Cold"
## [755] "Summary of May 22" "Summary of June 6"
## [757] "Summary August 4" "Summary of June 10"
## [759] "Summary of June 18" "Summary September 3"
## [761] "Summary: Sept. 18" "Coastal Flood"
## [763] "coastal flooding" "Small Hail"
## [765] "Record Temperatures" "Light Snowfall"
## [767] "Freezing Drizzle" "Gusty wind/rain"
## [769] "GUSTY WIND/HVY RAIN" "Blowing Snow"
## [771] "Early snowfall" "Monthly Snowfall"
## [773] "Record Heat" "Seasonal Snowfall"
## [775] "Monthly Rainfall" "Cold Temperature"
## [777] "Sml Stream Fld" "Heat Wave"
## [779] "MUDSLIDE/LANDSLIDE" "Saharan Dust"
## [781] "Volcanic Ash" "Volcanic Ash Plume"
## [783] "Thundersnow shower" "NONE"
## [785] "COLD AND SNOW" "DAM BREAK"
## [787] "TSTM WIND (G45)" "SLEET/FREEZING RAIN"
## [789] "BLACK ICE" "BLOW-OUT TIDES"
## [791] "UNSEASONABLY COOL" "TSTM HEAVY RAIN"
## [793] "Gusty Winds" "GUSTY WIND"
## [795] "TSTM WIND 40" "TSTM WIND 45"
## [797] "TSTM WIND (41)" "TSTM WIND (G40)"
## [799] "TSTM WND" "Wintry mix"
## [801] "Frost" "Frost/Freeze"
## [803] "RAIN (HEAVY)" "Record Warmth"
## [805] "Prolong Cold" "Cold and Frost"
## [807] "URBAN/SML STREAM FLDG" "STRONG WIND GUST"
## [809] "LATE FREEZE" "BLOW-OUT TIDE"
## [811] "Hypothermia/Exposure" "HYPOTHERMIA/EXPOSURE"
## [813] "Lake Effect Snow" "Mixed Precipitation"
## [815] "Record High" "COASTALSTORM"
## [817] "Snow and sleet" "Freezing rain"
## [819] "Gusty winds" "Blizzard Summary"
## [821] "SUMMARY OF MARCH 24-25" "SUMMARY OF MARCH 27"
## [823] "SUMMARY OF MARCH 29" "GRADIENT WIND"
## [825] "Icestorm/Blizzard" "Flood/Strong Wind"
## [827] "TSTM WIND AND LIGHTNING" "gradient wind"
## [829] "Freezing drizzle" "Mountain Snows"
## [831] "URBAN/SMALL STRM FLDG" "Heavy surf and wind"
## [833] "Mild and Dry Pattern" "COLD AND FROST"
## [835] "TYPHOON" "HIGH SWELLS"
## [837] "HIGH SWELLS" "VOLCANIC ASH"
## [839] "DRY SPELL" "BEACH EROSION"
## [841] "UNSEASONAL RAIN" "EARLY RAIN"
## [843] "PROLONGED RAIN" "WINTERY MIX"
## [845] "COASTAL FLOODING/EROSION" "HOT SPELL"
## [847] "UNSEASONABLY HOT" "TSTM WIND (G45)"
## [849] "HIGH WIND (G40)" "TSTM WIND (G35)"
## [851] "DRY WEATHER" "ABNORMAL WARMTH"
## [853] "UNUSUAL WARMTH" "WAKE LOW WIND"
## [855] "MONTHLY RAINFALL" "COLD TEMPERATURES"
## [857] "COLD WIND CHILL TEMPERATURES" "MODERATE SNOW"
## [859] "MODERATE SNOWFALL" "URBAN/STREET FLOODING"
## [861] "COASTAL EROSION" "UNUSUAL/RECORD WARMTH"
## [863] "BITTER WIND CHILL" "BITTER WIND CHILL TEMPERATURES"
## [865] "SEICHE" "TSTM"
## [867] "COASTAL FLOODING/EROSION" "UNSEASONABLY WARM YEAR"
## [869] "HYPERTHERMIA/EXPOSURE" "ROCK SLIDE"
## [871] "ICE PELLETS" "PATCHY DENSE FOG"
## [873] "RECORD COOL" "RECORD WARM"
## [875] "HOT WEATHER" "RECORD TEMPERATURE"
## [877] "TROPICAL DEPRESSION" "VOLCANIC ERUPTION"
## [879] "COOL SPELL" "WIND ADVISORY"
## [881] "GUSTY WIND/HAIL" "RED FLAG FIRE WX"
## [883] "FIRST FROST" "EXCESSIVELY DRY"
## [885] "SNOW AND SLEET" "LIGHT SNOW/FREEZING PRECIP"
## [887] "VOG" "MONTHLY PRECIPITATION"
## [889] "MONTHLY TEMPERATURE" "RECORD DRYNESS"
## [891] "EXTREME WINDCHILL TEMPERATURES" "MIXED PRECIPITATION"
## [893] "DRY CONDITIONS" "REMNANTS OF FLOYD"
## [895] "EARLY SNOWFALL" "FREEZING FOG"
## [897] "LANDSPOUT" "DRIEST MONTH"
## [899] "RECORD COLD" "LATE SEASON HAIL"
## [901] "EXCESSIVE SNOW" "DRYNESS"
## [903] "FLOOD/FLASH/FLOOD" "WIND AND WAVE"
## [905] "LIGHT FREEZING RAIN" "MONTHLY SNOWFALL"
## [907] "RECORD PRECIPITATION" "ICE ROADS"
## [909] "ROUGH SEAS" "UNSEASONABLY WARM/WET"
## [911] "UNSEASONABLY COOL & WET" "UNUSUALLY WARM"
## [913] "TSTM WIND G45" "NON SEVERE HAIL"
## [915] "NON-SEVERE WIND DAMAGE" "UNUSUALLY COLD"
## [917] "WARM WEATHER" "LANDSLUMP"
## [919] "THUNDERSTORM WIND (G40)" "UNSEASONABLY WARM & WET"
## [921] "LOCALLY HEAVY RAIN" "WIND GUSTS"
## [923] "UNSEASONAL LOW TEMP" "HIGH SURF ADVISORY"
## [925] "LATE SEASON SNOW" "GUSTY LAKE WIND"
## [927] "ABNORMALLY DRY" "WINTER WEATHER MIX"
## [929] "RED FLAG CRITERIA" "WND"
## [931] "CSTL FLOODING/EROSION" "SMOKE"
## [933] "SNOW ADVISORY" "EXTREMELY WET"
## [935] "UNUSUALLY LATE SNOW" "VERY DRY"
## [937] "RECORD LOW RAINFALL" "ROGUE WAVE"
## [939] "PROLONG WARMTH" "ACCUMULATED SNOWFALL"
## [941] "FALLING SNOW/ICE" "DUST DEVEL"
## [943] "NON-TSTM WIND" "NON TSTM WIND"
## [945] "GUSTY THUNDERSTORM WINDS" "PATCHY ICE"
## [947] "HEAVY RAIN EFFECTS" "EXCESSIVE HEAT/DROUGHT"
## [949] "NORTHERN LIGHTS" "MARINE TSTM WIND"
## [951] "HAZARDOUS SURF" "FROST/FREEZE"
## [953] "WINTER WEATHER/MIX" "ASTRONOMICAL HIGH TIDE"
## [955] "WHIRLWIND" "VERY WARM"
## [957] "ABNORMALLY WET" "TORNADO DEBRIS"
## [959] "EXTREME COLD/WIND CHILL" "ICE ON ROAD"
## [961] "DROWNING" "GUSTY THUNDERSTORM WIND"
## [963] "MARINE HAIL" "HIGH SURF ADVISORIES"
## [965] "HURRICANE/TYPHOON" "HEAVY SURF/HIGH SURF"
## [967] "SLEET STORM" "STORM SURGE/TIDE"
## [969] "COLD/WIND CHILL" "MARINE HIGH WIND"
## [971] "TSUNAMI" "DENSE SMOKE"
## [973] "LAKESHORE FLOOD" "MARINE THUNDERSTORM WIND"
## [975] "MARINE STRONG WIND" "ASTRONOMICAL LOW TIDE"
## [977] "VOLCANIC ASHFALL"
#
## See if the 'EVTYPE' variable is a factor variable or has some fixed levels which could
## lead to confusion;
#
if (is.factor(stmdata$EVTYPE)) {
unique_values <- levels(stmdata$EVTYPE)
print(unique_values)
} else {
print("EVTYPE column is not a factor variable.")
}
## [1] "EVTYPE column is not a factor variable."
#
## we see we have 977 event types ('EVTYPE') and that the EVTYPE column is not a factor variable.
#
## AND also need the columns
## 'FATALITIES' and 'INJURIES' to answer QUES 1.....it also seems logical
## to include the 'STATE', 'COUNTYNAME', 'BGN_DATE', 'REFNUM'
#
## we'll work with the 'dplyr' package for 'manipulating' our 'stmdata' dataframe
#
library(dplyr)
head(stmdata)
## # A tibble: 6 × 37
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
## <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 4/18/1950… 0130 CST 97 MOBILE AL TORNA… 0
## 2 1 4/18/1950… 0145 CST 3 BALDWIN AL TORNA… 0
## 3 1 2/20/1951… 1600 CST 57 FAYETTE AL TORNA… 0
## 4 1 6/8/1951 … 0900 CST 89 MADISON AL TORNA… 0
## 5 1 11/15/195… 1500 CST 43 CULLMAN AL TORNA… 0
## 6 1 11/15/195… 2000 CST 77 LAUDERDALE AL TORNA… 0
## # ℹ 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## # END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## # END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <dbl>,
## # MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## # PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## # STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## # LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>
dim(stmdata)
## [1] 902297 37
#View(stmdata)
str(stmdata)
## spc_tbl_ [902,297 × 37] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ STATE__ : num [1:902297] 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr [1:902297] "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
## $ COUNTY : num [1:902297] 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr [1:902297] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr [1:902297] "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr [1:902297] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr [1:902297] NA NA NA NA ...
## $ BGN_LOCATI: chr [1:902297] NA NA NA NA ...
## $ END_DATE : chr [1:902297] NA NA NA NA ...
## $ END_TIME : chr [1:902297] NA NA NA NA ...
## $ COUNTY_END: num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi [1:902297] NA NA NA NA NA NA ...
## $ END_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr [1:902297] NA NA NA NA ...
## $ END_LOCATI: chr [1:902297] NA NA NA NA ...
## $ LENGTH : num [1:902297] 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num [1:902297] 100 150 123 100 150 177 33 33 100 100 ...
## $ F : num [1:902297] 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num [1:902297] 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num [1:902297] 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num [1:902297] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr [1:902297] "K" "K" "K" "K" ...
## $ CROPDMG : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr [1:902297] NA NA NA NA ...
## $ WFO : chr [1:902297] NA NA NA NA ...
## $ STATEOFFIC: chr [1:902297] NA NA NA NA ...
## $ ZONENAMES : chr [1:902297] NA NA NA NA ...
## $ LATITUDE : num [1:902297] 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num [1:902297] 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num [1:902297] 3051 0 0 0 0 ...
## $ LONGITUDE_: num [1:902297] 8806 0 0 0 0 ...
## $ REMARKS : chr [1:902297] NA NA NA NA ...
## $ REFNUM : num [1:902297] 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, "spec")=
## .. cols(
## .. STATE__ = col_double(),
## .. BGN_DATE = col_character(),
## .. BGN_TIME = col_character(),
## .. TIME_ZONE = col_character(),
## .. COUNTY = col_double(),
## .. COUNTYNAME = col_character(),
## .. STATE = col_character(),
## .. EVTYPE = col_character(),
## .. BGN_RANGE = col_double(),
## .. BGN_AZI = col_character(),
## .. BGN_LOCATI = col_character(),
## .. END_DATE = col_character(),
## .. END_TIME = col_character(),
## .. COUNTY_END = col_double(),
## .. COUNTYENDN = col_logical(),
## .. END_RANGE = col_double(),
## .. END_AZI = col_character(),
## .. END_LOCATI = col_character(),
## .. LENGTH = col_double(),
## .. WIDTH = col_double(),
## .. F = col_double(),
## .. MAG = col_double(),
## .. FATALITIES = col_double(),
## .. INJURIES = col_double(),
## .. PROPDMG = col_double(),
## .. PROPDMGEXP = col_character(),
## .. CROPDMG = col_double(),
## .. CROPDMGEXP = col_character(),
## .. WFO = col_character(),
## .. STATEOFFIC = col_character(),
## .. ZONENAMES = col_character(),
## .. LATITUDE = col_double(),
## .. LONGITUDE = col_double(),
## .. LATITUDE_E = col_double(),
## .. LONGITUDE_ = col_double(),
## .. REMARKS = col_character(),
## .. REFNUM = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
#
##
## Now let's see if 'stmdata' has NA's :
#
sum(is.na(stmdata))
## [1] 6875981
#
### so we have a bunch of NA's (~7 million) we need to be aware of;
#
To answer the two questions posed regarding harmful events with respect to population health and economic consequences, we need to focus on certain variables from the given list. Here are the variables we should consider and why:
Question 1: Across the United States, which types of events are most harmful with respect to population health?
EVTYPE (Type of Event): This variable indicates the type of event, such as tornado, flood, hurricane, etc. Analyzing this variable will allow us to identify which types of events are associated with the most significant harm to population health.
REFNUM : Audit Trail: ‘REFNUM’ acts as an audit trail, enabling tracing data transformations or modifications back to the original dataset. This can be particularly useful for reproducibility and documentation purposes, facilitating transparency and collaboration in the analysis workflow.
STATE : as part of this dataset allows for a nuanced exploration of population health outcomes, facilitates comparative analysis, and enhances the relevance and applicability of your findings to specific geographic regions or jurisdictions.
FATALITIES (Number of Fatalities): This variable provides the count of fatalities resulting from each event. It directly measures the impact of the event on human lives and is crucial for assessing the severity of health consequences.
INJURIES (Number of Injuries): Similar to fatalities, this variable provides the count of injuries resulting from each event. It helps quantify the extent of non-fatal health impacts caused by the event.
Question 2: Across the United States, which types of events have the greatesteconomic consequences?
EVTYPE (Type of Event): Similar to question 1, analyzing this variable will help us identify which types of events are associated with the greatest economic consequences.
REFNUM : Audit Trail: ‘REFNUM’ acts as an audit trail, enabling tracing data transformations or modifications back to the original dataset. This can be particularly useful for reproducibility and documentation purposes, facilitating transparency and collaboration in the analysis workflow.
STATE : as part of this dataset allows for a nuanced exploration of population health outcomes, facilitates comparative analysis, and enhances the relevance and applicability of your findings to specific geographic regions or jurisdictions.
PROPDMG (Property Damage Estimate): This variable provides an estimate of the property damage caused by each event. It directly measures the economic impact of the event on property and infrastructure.
PROPDMGEXP (Property Damage Exponent): This variable specifies the exponent used to interpret the property damage estimate (e.g., K for thousands, M for millions). It helps us accurately interpret the magnitudeof property damage.
CROPDMG (Crop Damage Estimate): This variable provides an estimate of the crop damage caused by each event. It measures the economic impact on agricultural production and related industries.
CROPDMGEXP (Crop Damage Exponent): Similar to PROPDMGEXP, this variable specifies the exponent used to interpret the crop damage estimate.
By focusing on these variables, we can assess which types of events have the most significant impact on population health (Question 1) and which types of events have the greatest economic consequences (Question 2) across the United States. These variables directly measure the severity of health impacts, loss of life, and economic damage caused by different types of weather events.
In my experience, the ‘features’, ‘PROPDMGEXP’ and ‘CROPGMGEXP’ are potential headaches for data analysts, in that exponent data (10 ^ x) entry is prone to confusion and error. Therefore, I will begin by discovering what is to be found in those columns…..
## PROPDMGEXP is type 'character';
##
## find unique valuesin the 'PROPDMGEXP' column;
unique_char_pexp <- unique(stmdata$PROPDMGEXP)
#
print(unique_char_pexp)
## [1] "K" "M" NA "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
## CROPDMGEXP is also type 'character';
##
## find unique valuesin the 'CROPDMGEXP' column;
unique_char_cexp <- unique(stmdata$CROPDMGEXP)
#
print(unique_char_cexp)
## [1] NA "M" "K" "m" "B" "?" "0" "k" "2"
#
## After manually inspecting the unique values in the 'EVTYPE' column,
## it's evident that not all values strictly represent storm events.
## Entries such as 'Summary of June 12', '?', and 'HIGH' are among those
## that need attention. To streamline the dataset for analysis and improve
## computational performance, we'll take a subset of the necessary variables
## required for our analysis. Additionally, to ensure data integrity,
## we'll filter out observations with values less than or equal to 0 (or NA).
## This approach will help us focus on relevant storm event data (while also
## optimizing computational efficiency).
#
#
library(dplyr)
# Remove rows where all specified columns are 0
stmdata_filtered <- stmdata %>%
filter(FATALITIES != 0 | INJURIES != 0 | PROPDMG != 0 | CROPDMG != 0)
# View the filtered dataframe
head(stmdata_filtered)
## # A tibble: 6 × 37
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
## <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 4/18/1950… 0130 CST 97 MOBILE AL TORNA… 0
## 2 1 4/18/1950… 0145 CST 3 BALDWIN AL TORNA… 0
## 3 1 2/20/1951… 1600 CST 57 FAYETTE AL TORNA… 0
## 4 1 6/8/1951 … 0900 CST 89 MADISON AL TORNA… 0
## 5 1 11/15/195… 1500 CST 43 CULLMAN AL TORNA… 0
## 6 1 11/15/195… 2000 CST 77 LAUDERDALE AL TORNA… 0
## # ℹ 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## # END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## # END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <dbl>,
## # MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## # PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## # STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## # LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>
dim(stmdata_filtered)
## [1] 254633 37
#View(stmdata_filtered)
#
We see that after performing the ‘filter’ operation on ‘stmdata’ we have reduced the dataframe to 485 rows (observations) from 977 rows, i.e., there were almost half of the rows with ‘0’ and/or ‘NA’ values for the ‘features’ (rows) of interest for this analysis. That is, for ‘FATALITIES’, ‘INJURIES’, ‘PROPDMG’, ‘CROPDMG’.
#
library(readr)
#
write_csv(stmdata_filtered, "stmdata_filterd.csv")
#
###
## we see we now have reduced our 'progress dataframe' i.e., 'stmdata_filtered' to 254,633 x 37;
#
## now let's select only our 'columns' of interest for this assignment;
#
## to simplify, let's work on a dataframe, call it 'df_1', to answer QUES 1;
##
#
################## Save as CSV DONE ############
## Build the dataframe 'df_1',,,
df_1 <- stmdata_filtered %>%
select(REFNUM, STATE, COUNTYNAME, BGN_DATE, EVTYPE, FATALITIES, INJURIES)
#
head(df_1)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE FATALITIES INJURIES
## <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1 AL MOBILE 4/18/1950 0:00:00 TORNADO 0 15
## 2 2 AL BALDWIN 4/18/1950 0:00:00 TORNADO 0 0
## 3 3 AL FAYETTE 2/20/1951 0:00:00 TORNADO 0 2
## 4 4 AL MADISON 6/8/1951 0:00:00 TORNADO 0 2
## 5 5 AL CULLMAN 11/15/1951 0:00:00 TORNADO 0 2
## 6 6 AL LAUDERDALE 11/15/1951 0:00:00 TORNADO 0 6
#View(df_1)
#
## get rid of the '0:00:00's.......
library(lubridate)
# Assuming 'BGN_DATE' is currently in character format
df_1 <- df_1 %>%
mutate(BGN_DATE = mdy_hms(BGN_DATE), # Convert to date-time format
BGN_DATE = as.Date(BGN_DATE)) # Convert back to date-only format
# Now 'BGN_DATE' should contain only dates without the '0:00:00' time portion
head(df_1)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE FATALITIES INJURIES
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl>
## 1 1 AL MOBILE 1950-04-18 TORNADO 0 15
## 2 2 AL BALDWIN 1950-04-18 TORNADO 0 0
## 3 3 AL FAYETTE 1951-02-20 TORNADO 0 2
## 4 4 AL MADISON 1951-06-08 TORNADO 0 2
## 5 5 AL CULLMAN 1951-11-15 TORNADO 0 2
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 0 6
dim(df_1)
## [1] 254633 7
#View(df_1)
#
## %%%%%%%%%%% beautiful !!!
#
## check for NA's
sum(is.na(df_1))
## [1] 7
##
#
## the df_1 dataframe has 254633 observations, 7 features and NA's....
#
## find the number of unique values of EVTYPE in df_1:
length(unique(df_1$EVTYPE))
## [1] 485
#
## there are 488 types of events...
#
## let's list all types of events
unique(df_1$EVTYPE)
## [1] "TORNADO" "TSTM WIND"
## [3] "HAIL" "ICE STORM/FLASH FLOOD"
## [5] "WINTER STORM" "HURRICANE OPAL/HIGH WINDS"
## [7] "THUNDERSTORM WINDS" "HURRICANE ERIN"
## [9] "HURRICANE OPAL" "HEAVY RAIN"
## [11] "LIGHTNING" "THUNDERSTORM WIND"
## [13] "DENSE FOG" "RIP CURRENT"
## [15] "THUNDERSTORM WINS" "FLASH FLOODING"
## [17] "FLASH FLOOD" "TORNADO F0"
## [19] "THUNDERSTORM WINDS LIGHTNING" "THUNDERSTORM WINDS/HAIL"
## [21] "HEAT" "HIGH WINDS"
## [23] "WIND" "HEAVY RAINS"
## [25] "LIGHTNING AND HEAVY RAIN" "THUNDERSTORM WINDS HAIL"
## [27] "COLD" "HEAVY RAIN/LIGHTNING"
## [29] "FLASH FLOODING/THUNDERSTORM WI" "FLOODING"
## [31] "WATERSPOUT" "EXTREME COLD"
## [33] "LIGHTNING/HEAVY RAIN" "BREAKUP FLOODING"
## [35] "HIGH WIND" "FREEZE"
## [37] "RIVER FLOOD" "HIGH WINDS HEAVY RAINS"
## [39] "AVALANCHE" "MARINE MISHAP"
## [41] "HIGH TIDES" "HIGH WIND/SEAS"
## [43] "HIGH WINDS/HEAVY RAIN" "HIGH SEAS"
## [45] "COASTAL FLOOD" "SEVERE TURBULENCE"
## [47] "RECORD RAINFALL" "HEAVY SNOW"
## [49] "HEAVY SNOW/WIND" "DUST STORM"
## [51] "FLOOD" "APACHE COUNTY"
## [53] "SLEET" "DUST DEVIL"
## [55] "ICE STORM" "EXCESSIVE HEAT"
## [57] "THUNDERSTORM WINDS/FUNNEL CLOU" "GUSTY WINDS"
## [59] "FLOODING/HEAVY RAIN" "HEAVY SURF COASTAL FLOODING"
## [61] "HIGH SURF" "WILD FIRES"
## [63] "HIGH" "WINTER STORM HIGH WINDS"
## [65] "WINTER STORMS" "MUDSLIDES"
## [67] "RAINSTORM" "SEVERE THUNDERSTORM"
## [69] "SEVERE THUNDERSTORMS" "SEVERE THUNDERSTORM WINDS"
## [71] "THUNDERSTORMS WINDS" "FLOOD/FLASH FLOOD"
## [73] "FLOOD/RAIN/WINDS" "THUNDERSTORMS"
## [75] "FLASH FLOOD WINDS" "WINDS"
## [77] "FUNNEL CLOUD" "HIGH WIND DAMAGE"
## [79] "STRONG WIND" "HEAVY SNOWPACK"
## [81] "FLASH FLOOD/" "HEAVY SURF"
## [83] "DRY MIRCOBURST WINDS" "DRY MICROBURST"
## [85] "URBAN FLOOD" "THUNDERSTORM WINDSS"
## [87] "MICROBURST WINDS" "HEAT WAVE"
## [89] "UNSEASONABLY WARM" "COASTAL FLOODING"
## [91] "STRONG WINDS" "BLIZZARD"
## [93] "WATERSPOUT/TORNADO" "WATERSPOUT TORNADO"
## [95] "STORM SURGE" "URBAN/SMALL STREAM FLOOD"
## [97] "WATERSPOUT-" "TORNADOES, TSTM WIND, HAIL"
## [99] "TROPICAL STORM ALBERTO" "TROPICAL STORM"
## [101] "TROPICAL STORM GORDON" "TROPICAL STORM JERRY"
## [103] "LIGHTNING THUNDERSTORM WINDS" "URBAN FLOODING"
## [105] "MINOR FLOODING" "WATERSPOUT-TORNADO"
## [107] "LIGHTNING INJURY" "LIGHTNING AND THUNDERSTORM WIN"
## [109] "FLASH FLOODS" "THUNDERSTORM WINDS53"
## [111] "WILDFIRE" "DAMAGING FREEZE"
## [113] "THUNDERSTORM WINDS 13" "HURRICANE"
## [115] "SNOW" "LIGNTNING"
## [117] "FROST" "FREEZING RAIN/SNOW"
## [119] "HIGH WINDS/" "THUNDERSNOW"
## [121] "FLOODS" "COOL AND WET"
## [123] "HEAVY RAIN/SNOW" "GLAZE ICE"
## [125] "MUD SLIDE" "HIGH WINDS"
## [127] "RURAL FLOOD" "MUD SLIDES"
## [129] "EXTREME HEAT" "DROUGHT"
## [131] "COLD AND WET CONDITIONS" "EXCESSIVE WETNESS"
## [133] "SLEET/ICE STORM" "GUSTNADO"
## [135] "FREEZING RAIN" "SNOW AND HEAVY SNOW"
## [137] "GROUND BLIZZARD" "EXTREME WIND CHILL"
## [139] "MAJOR FLOOD" "SNOW/HEAVY SNOW"
## [141] "FREEZING RAIN/SLEET" "ICE JAM FLOODING"
## [143] "COLD AIR TORNADO" "WIND DAMAGE"
## [145] "FOG" "TSTM WIND 55"
## [147] "SMALL STREAM FLOOD" "THUNDERTORM WINDS"
## [149] "HAIL/WINDS" "SNOW AND ICE"
## [151] "WIND STORM" "GRASS FIRES"
## [153] "LAKE FLOOD" "HAIL/WIND"
## [155] "WIND/HAIL" "ICE"
## [157] "SNOW AND ICE STORM" "THUNDERSTORM WINDS"
## [159] "WINTER WEATHER" "DROUGHT/EXCESSIVE HEAT"
## [161] "THUNDERSTORMS WIND" "TUNDERSTORM WIND"
## [163] "URBAN AND SMALL STREAM FLOODIN" "THUNDERSTORM WIND/LIGHTNING"
## [165] "HEAVY RAIN/SEVERE WEATHER" "THUNDERSTORM"
## [167] "WATERSPOUT/ TORNADO" "LIGHTNING."
## [169] "HURRICANE-GENERATED SWELLS" "RIVER AND STREAM FLOOD"
## [171] "HIGH WINDS/COASTAL FLOOD" "RAIN"
## [173] "RIVER FLOODING" "ICE FLOES"
## [175] "THUNDERSTORM WIND G50" "LIGHTNING FIRE"
## [177] "HEAVY LAKE SNOW" "RECORD COLD"
## [179] "HEAVY SNOW/FREEZING RAIN" "COLD WAVE"
## [181] "DUST DEVIL WATERSPOUT" "TORNADO F3"
## [183] "TORNDAO" "FLOOD/RIVER FLOOD"
## [185] "MUD SLIDES URBAN FLOODING" "TORNADO F1"
## [187] "GLAZE/ICE STORM" "GLAZE"
## [189] "HEAVY SNOW/WINTER STORM" "MICROBURST"
## [191] "AVALANCE" "BLIZZARD/WINTER STORM"
## [193] "DUST STORM/HIGH WINDS" "ICE JAM"
## [195] "FOREST FIRES" "FROST\\FREEZE"
## [197] "THUNDERSTORM WINDS." "HVY RAIN"
## [199] "HAIL 150" "HAIL 075"
## [201] "HAIL 100" "THUNDERSTORM WIND G55"
## [203] "HAIL 125" "THUNDERSTORM WIND G60"
## [205] "THUNDERSTORM WINDS G60" "HARD FREEZE"
## [207] "HAIL 200" "HEAVY SNOW AND HIGH WINDS"
## [209] "HEAVY SNOW/HIGH WINDS & FLOOD" "HEAVY RAIN AND FLOOD"
## [211] "RIP CURRENTS/HEAVY SURF" "URBAN AND SMALL"
## [213] "WILDFIRES" "FOG AND COLD TEMPERATURES"
## [215] "SNOW/COLD" "FLASH FLOOD FROM ICE JAMS"
## [217] "TSTM WIND G58" "MUDSLIDE"
## [219] "HEAVY SNOW SQUALLS" "SNOW SQUALL"
## [221] "SNOW/ICE STORM" "HEAVY SNOW/SQUALLS"
## [223] "HEAVY SNOW-SQUALLS" "ICY ROADS"
## [225] "HEAVY MIX" "SNOW FREEZING RAIN"
## [227] "SNOW/SLEET" "SNOW/FREEZING RAIN"
## [229] "SNOW SQUALLS" "SNOW/SLEET/FREEZING RAIN"
## [231] "RECORD SNOW" "HAIL 0.75"
## [233] "RECORD HEAT" "THUNDERSTORM WIND 65MPH"
## [235] "THUNDERSTORM WIND/ TREES" "THUNDERSTORM WIND/AWNING"
## [237] "THUNDERSTORM WIND 98 MPH" "THUNDERSTORM WIND TREES"
## [239] "TORNADO F2" "RIP CURRENTS"
## [241] "HURRICANE EMILY" "COASTAL SURGE"
## [243] "HURRICANE GORDON" "HURRICANE FELIX"
## [245] "THUNDERSTORM WIND 60 MPH" "THUNDERSTORM WINDS 63 MPH"
## [247] "THUNDERSTORM WIND/ TREE" "THUNDERSTORM DAMAGE TO"
## [249] "THUNDERSTORM WIND 65 MPH" "FLASH FLOOD - HEAVY RAIN"
## [251] "THUNDERSTORM WIND." "FLASH FLOOD/ STREET"
## [253] "BLOWING SNOW" "HEAVY SNOW/BLIZZARD"
## [255] "THUNDERSTORM HAIL" "THUNDERSTORM WINDSHAIL"
## [257] "LIGHTNING WAUSEON" "THUDERSTORM WINDS"
## [259] "ICE AND SNOW" "STORM FORCE WINDS"
## [261] "HEAVY SNOW/ICE" "LIGHTING"
## [263] "HIGH WIND/HEAVY SNOW" "THUNDERSTORM WINDS AND"
## [265] "HEAVY PRECIPITATION" "HIGH WIND/BLIZZARD"
## [267] "TSTM WIND DAMAGE" "FLOOD FLASH"
## [269] "RAIN/WIND" "SNOW/ICE"
## [271] "HAIL 75" "HEAT WAVE DROUGHT"
## [273] "HEAVY SNOW/BLIZZARD/AVALANCHE" "HEAT WAVES"
## [275] "UNSEASONABLY WARM AND DRY" "UNSEASONABLY COLD"
## [277] "RECORD/EXCESSIVE HEAT" "THUNDERSTORM WIND G52"
## [279] "HIGH WAVES" "FLASH FLOOD/FLOOD"
## [281] "FLOOD/FLASH" "LOW TEMPERATURE"
## [283] "HEAVY RAINS/FLOODING" "THUNDERESTORM WINDS"
## [285] "THUNDERSTORM WINDS/FLOODING" "HYPOTHERMIA"
## [287] "THUNDEERSTORM WINDS" "THUNERSTORM WINDS"
## [289] "HIGH WINDS/COLD" "COLD/WINDS"
## [291] "SNOW/ BITTER COLD" "COLD WEATHER"
## [293] "RAPIDLY RISING WATER" "WILD/FOREST FIRE"
## [295] "ICE/STRONG WINDS" "SNOW/HIGH WINDS"
## [297] "HIGH WINDS/SNOW" "SNOWMELT FLOODING"
## [299] "HEAVY SNOW AND STRONG WINDS" "SNOW ACCUMULATION"
## [301] "SNOW/ ICE" "SNOW/BLOWING SNOW"
## [303] "TORNADOES" "THUNDERSTORM WIND/HAIL"
## [305] "FREEZING DRIZZLE" "HAIL 175"
## [307] "FLASH FLOODING/FLOOD" "HAIL 275"
## [309] "HAIL 450" "EXCESSIVE RAINFALL"
## [311] "THUNDERSTORMW" "HAILSTORM"
## [313] "TSTM WINDS" "TSTMW"
## [315] "TSTM WIND 65)" "TROPICAL STORM DEAN"
## [317] "THUNDERSTORM WINDS/ FLOOD" "LANDSLIDE"
## [319] "HIGH WIND AND SEAS" "THUNDERSTORMWINDS"
## [321] "WILD/FOREST FIRES" "HEAVY SEAS"
## [323] "HAIL DAMAGE" "FLOOD & HEAVY RAIN"
## [325] "?" "THUNDERSTROM WIND"
## [327] "FLOOD/FLASHFLOOD" "HIGH WATER"
## [329] "HIGH WIND 48" "LANDSLIDES"
## [331] "URBAN/SMALL STREAM" "BRUSH FIRE"
## [333] "HEAVY SHOWER" "HEAVY SWELLS"
## [335] "URBAN SMALL" "URBAN FLOODS"
## [337] "FLASH FLOOD/LANDSLIDE" "HEAVY RAIN/SMALL STREAM URBAN"
## [339] "FLASH FLOOD LANDSLIDES" "TSTM WIND/HAIL"
## [341] "Other" "Ice jam flood (minor"
## [343] "Tstm Wind" "URBAN/SML STREAM FLD"
## [345] "ROUGH SURF" "Heavy Surf"
## [347] "Dust Devil" "Marine Accident"
## [349] "Freeze" "Strong Wind"
## [351] "COASTAL STORM" "Erosion/Cstl Flood"
## [353] "River Flooding" "Damaging Freeze"
## [355] "Beach Erosion" "High Surf"
## [357] "Heavy Rain/High Surf" "Unseasonable Cold"
## [359] "Early Frost" "Wintry Mix"
## [361] "Extreme Cold" "Coastal Flooding"
## [363] "Torrential Rainfall" "Landslump"
## [365] "Hurricane Edouard" "Coastal Storm"
## [367] "TIDAL FLOODING" "Tidal Flooding"
## [369] "Strong Winds" "EXTREME WINDCHILL"
## [371] "Glaze" "Extended Cold"
## [373] "Whirlwind" "Heavy snow shower"
## [375] "Light snow" "Light Snow"
## [377] "MIXED PRECIP" "Freezing Spray"
## [379] "DOWNBURST" "Mudslides"
## [381] "Microburst" "Mudslide"
## [383] "Cold" "Coastal Flood"
## [385] "Snow Squalls" "Wind Damage"
## [387] "Light Snowfall" "Freezing Drizzle"
## [389] "Gusty wind/rain" "GUSTY WIND/HVY RAIN"
## [391] "Wind" "Cold Temperature"
## [393] "Heat Wave" "Snow"
## [395] "COLD AND SNOW" "RAIN/SNOW"
## [397] "TSTM WIND (G45)" "Gusty Winds"
## [399] "GUSTY WIND" "TSTM WIND 40"
## [401] "TSTM WIND 45" "TSTM WIND (41)"
## [403] "TSTM WIND (G40)" "Frost/Freeze"
## [405] "AGRICULTURAL FREEZE" "OTHER"
## [407] "Hypothermia/Exposure" "HYPOTHERMIA/EXPOSURE"
## [409] "Lake Effect Snow" "Freezing Rain"
## [411] "Mixed Precipitation" "BLACK ICE"
## [413] "COASTALSTORM" "LIGHT SNOW"
## [415] "DAM BREAK" "Gusty winds"
## [417] "blowing snow" "GRADIENT WIND"
## [419] "TSTM WIND AND LIGHTNING" "gradient wind"
## [421] "Gradient wind" "Freezing drizzle"
## [423] "WET MICROBURST" "Heavy surf and wind"
## [425] "TYPHOON" "HIGH SWELLS"
## [427] "SMALL HAIL" "UNSEASONAL RAIN"
## [429] "COASTAL FLOODING/EROSION" "TSTM WIND (G45)"
## [431] "HIGH WIND (G40)" "TSTM WIND (G35)"
## [433] "COASTAL EROSION" "SEICHE"
## [435] "COASTAL FLOODING/EROSION" "HYPERTHERMIA/EXPOSURE"
## [437] "WINTRY MIX" "ROCK SLIDE"
## [439] "GUSTY WIND/HAIL" "LANDSPOUT"
## [441] "EXCESSIVE SNOW" "LAKE EFFECT SNOW"
## [443] "FLOOD/FLASH/FLOOD" "MIXED PRECIPITATION"
## [445] "WIND AND WAVE" "LIGHT FREEZING RAIN"
## [447] "ICE ROADS" "ROUGH SEAS"
## [449] "TSTM WIND G45" "NON-SEVERE WIND DAMAGE"
## [451] "WARM WEATHER" "THUNDERSTORM WIND (G40)"
## [453] "LATE SEASON SNOW" "WINTER WEATHER MIX"
## [455] "ROGUE WAVE" "FALLING SNOW/ICE"
## [457] "NON-TSTM WIND" "NON TSTM WIND"
## [459] "BLOWING DUST" "VOLCANIC ASH"
## [461] "HIGH SURF ADVISORY" "HAZARDOUS SURF"
## [463] "WHIRLWIND" "ICE ON ROAD"
## [465] "DROWNING" "EXTREME COLD/WIND CHILL"
## [467] "MARINE TSTM WIND" "HURRICANE/TYPHOON"
## [469] "WINTER WEATHER/MIX" "FROST/FREEZE"
## [471] "ASTRONOMICAL HIGH TIDE" "HEAVY SURF/HIGH SURF"
## [473] "TROPICAL DEPRESSION" "LAKE-EFFECT SNOW"
## [475] "MARINE HIGH WIND" "TSUNAMI"
## [477] "STORM SURGE/TIDE" "COLD/WIND CHILL"
## [479] "LAKESHORE FLOOD" "MARINE THUNDERSTORM WIND"
## [481] "MARINE STRONG WIND" "ASTRONOMICAL LOW TIDE"
## [483] "DENSE SMOKE" "MARINE HAIL"
## [485] "FREEZING FOG"
#
After manually inspecting the unique values in the ‘EVTYPE’ column, it appears that each of these is a ligitimate ‘storm event’…. # We see a bunch of ‘EVTYPES’ that are lowercase. Do I need to convert to uppercase? Let’s proceed without doing that yet… #
#
### Generaterfrequency table for EVTYPE
evtype_counts <- table(df_1$EVTYPE)
#evtype_counts
# Sort the frequency table in descending order
evtype_counts_sorted <- sort(evtype_counts, decreasing = TRUE)
evtype_counts_sorted
##
## TSTM WIND THUNDERSTORM WIND
## 63236 43655
## TORNADO HAIL
## 39944 26130
## FLASH FLOOD LIGHTNING
## 20968 13293
## THUNDERSTORM WINDS FLOOD
## 12086 10175
## HIGH WIND STRONG WIND
## 5522 3370
## WINTER STORM HEAVY SNOW
## 1508 1342
## HEAVY RAIN WILDFIRE
## 1105 857
## ICE STORM URBAN/SML STREAM FLD
## 708 702
## EXCESSIVE HEAT HIGH WINDS
## 698 657
## TSTM WIND/HAIL TROPICAL STORM
## 441 416
## WINTER WEATHER RIP CURRENT
## 407 400
## WILD/FOREST FIRE FLASH FLOODING
## 388 302
## FLOOD/FLASH FLOOD AVALANCHE
## 279 268
## DROUGHT BLIZZARD
## 266 253
## RIP CURRENTS HEAT
## 241 215
## EXTREME COLD LAKE-EFFECT SNOW
## 197 194
## LANDSLIDE STORM SURGE
## 193 177
## COASTAL FLOOD URBAN FLOOD
## 164 139
## WINTER WEATHER/MIX HIGH SURF
## 139 130
## HURRICANE LIGHT SNOW
## 129 119
## FROST/FREEZE EXTREME COLD/WIND CHILL
## 116 111
## MARINE TSTM WIND FOG
## 109 107
## RIVER FLOOD DUST STORM
## 107 103
## COLD/WIND CHILL DUST DEVIL
## 90 89
## WIND URBAN FLOODING
## 83 80
## DRY MICROBURST DENSE FOG
## 78 74
## HURRICANE/TYPHOON FLOODING
## 72 58
## COASTAL FLOODING SNOW
## 55 52
## HEAVY SURF/HIGH SURF STRONG WINDS
## 50 50
## STORM SURGE/TIDE WATERSPOUT
## 47 47
## MARINE STRONG WIND THUNDERSTORM WINDS HAIL
## 46 40
## TSTM WIND (G45) FREEZING RAIN
## 37 35
## TROPICAL DEPRESSION HEAT WAVE
## 35 34
## THUNDERSTORM WINDSS MARINE THUNDERSTORM WIND
## 34 33
## OTHER COLD
## 33 32
## HEAVY SURF EXCESSIVE SNOW
## 30 25
## ICE ICY ROADS
## 24 22
## LIGHT FREEZING RAIN THUNDERSTORM WINDS/HAIL
## 22 22
## GUSTY WINDS HEAVY SNOW SQUALLS
## 21 21
## Light Snow HEAVY RAINS
## 21 20
## EXTREME WINDCHILL GLAZE
## 19 19
## MARINE HIGH WIND THUNDERSTORM
## 19 19
## WINDS EXTREME HEAT
## 18 17
## FLASH FLOOD/FLOOD FREEZE
## 16 16
## SNOW SQUALL HEAVY SNOW-SQUALLS
## 16 15
## MIXED PRECIPITATION TSUNAMI
## 15 14
## FLASH FLOODS FUNNEL CLOUD
## 13 13
## GUSTY WIND RIVER FLOODING
## 13 13
## SMALL HAIL DROUGHT/EXCESSIVE HEAT
## 11 10
## Gusty Winds SNOW FREEZING RAIN
## 10 10
## SNOW SQUALLS HEAVY RAINS/FLOODING
## 10 9
## SEICHE TSTM WIND (G40)
## 9 9
## TYPHOON URBAN/SMALL STREAM FLOOD
## 9 9
## ASTRONOMICAL HIGH TIDE FLASH FLOODING/FLOOD
## 8 8
## HEAVY MIX HIGH SEAS
## 8 8
## HURRICANE OPAL WIND DAMAGE
## 8 8
## FREEZING FOG FROST
## 7 7
## HURRICANE ERIN LOW TEMPERATURE
## 7 7
## SEVERE THUNDERSTORM UNSEASONABLY WARM
## 7 7
## WATERSPOUT- Coastal Flooding
## 7 6
## Cold Dust Devil
## 6 6
## High Surf MIXED PRECIP
## 6 6
## RAIN THUNDERSTORM WINDS LIGHTNING
## 6 6
## THUNDERSTORMS WINDS WATERSPOUT/TORNADO
## 6 6
## FLASH FLOOD FROM ICE JAMS FLOOD/RAIN/WINDS
## 5 5
## GUSTNADO HIGH WINDS/COLD
## 5 5
## ICE JAM FLOODING LAKESHORE FLOOD
## 5 5
## RECORD COLD River Flooding
## 5 5
## SNOW/SLEET TORNADO F0
## 5 5
## TSTM WINDS Coastal Flood
## 5 4
## COLD WEATHER DAMAGING FREEZE
## 4 4
## FREEZING DRIZZLE FREEZING RAIN/SNOW
## 4 4
## HIGH WATER MUDSLIDE
## 4 4
## THUNDERSTORM WIND/ TREES TORNADO F1
## 4 4
## UNSEASONABLY COLD WILD FIRES
## 4 4
## AGRICULTURAL FREEZE BRUSH FIRE
## 3 3
## COASTAL FLOODING/EROSION COASTAL STORM
## 3 3
## FLOODS Gradient wind
## 3 3
## HAIL 275 HAILSTORM
## 3 3
## HEAVY SNOW/ICE HIGH WINDS/SNOW
## 3 3
## HURRICANE-GENERATED SWELLS Hypothermia/Exposure
## 3 3
## HYPOTHERMIA/EXPOSURE LAKE EFFECT SNOW
## 3 3
## LANDSLIDES MAJOR FLOOD
## 3 3
## MICROBURST Mixed Precipitation
## 3 3
## MUD SLIDE ROUGH SEAS
## 3 3
## SEVERE THUNDERSTORM WINDS SNOW AND ICE
## 3 3
## THUNDERSTORM WINDS THUNDERSTORMS WIND
## 3 3
## THUNDERTORM WINDS TIDAL FLOODING
## 3 3
## WET MICROBURST WILDFIRES
## 3 3
## WINTRY MIX ASTRONOMICAL LOW TIDE
## 3 2
## Cold Temperature DAM BREAK
## 2 2
## Damaging Freeze Erosion/Cstl Flood
## 2 2
## Extreme Cold FALLING SNOW/ICE
## 2 2
## FLASH FLOOD - HEAVY RAIN FLOOD & HEAVY RAIN
## 2 2
## Freeze Freezing Drizzle
## 2 2
## Glaze GLAZE ICE
## 2 2
## gradient wind GROUND BLIZZARD
## 2 2
## HAIL 175 HAIL/WIND
## 2 2
## HAIL/WINDS HARD FREEZE
## 2 2
## HEAVY SEAS HEAVY SNOW/SQUALLS
## 2 2
## Heavy Surf HIGH SWELLS
## 2 2
## HIGH WIND (G40) HIGH WIND DAMAGE
## 2 2
## ICE FLOES LANDSPOUT
## 2 2
## MARINE HAIL MARINE MISHAP
## 2 2
## MINOR FLOODING Mudslide
## 2 2
## RAIN/SNOW RECORD HEAT
## 2 2
## RECORD SNOW RIP CURRENTS/HEAVY SURF
## 2 2
## RIVER AND STREAM FLOOD ROUGH SURF
## 2 2
## SEVERE THUNDERSTORMS Snow
## 2 2
## SNOW AND HEAVY SNOW Snow Squalls
## 2 2
## SNOW/COLD SNOW/FREEZING RAIN
## 2 2
## SNOW/HIGH WINDS SNOW/ICE STORM
## 2 2
## SNOW/SLEET/FREEZING RAIN SNOWMELT FLOODING
## 2 2
## Strong Wind Strong Winds
## 2 2
## THUNDERSTORM WIND 60 MPH THUNDERSTORM WINDS/ FLOOD
## 2 2
## TORNADO F2 TORNADO F3
## 2 2
## TORNADOES TROPICAL STORM JERRY
## 2 2
## TSTM WIND 55 UNSEASONAL RAIN
## 2 2
## URBAN FLOODS VOLCANIC ASH
## 2 2
## Whirlwind WINTER WEATHER MIX
## 2 2
## ? APACHE COUNTY
## 1 1
## AVALANCE Beach Erosion
## 1 1
## BLACK ICE BLIZZARD/WINTER STORM
## 1 1
## BLOWING DUST blowing snow
## 1 1
## BLOWING SNOW BREAKUP FLOODING
## 1 1
## COASTAL FLOODING/EROSION COASTAL EROSION
## 1 1
## Coastal Storm COASTAL SURGE
## 1 1
## COASTALSTORM COLD AIR TORNADO
## 1 1
## COLD AND SNOW COLD AND WET CONDITIONS
## 1 1
## COLD WAVE COLD/WINDS
## 1 1
## COOL AND WET DENSE SMOKE
## 1 1
## DOWNBURST DROWNING
## 1 1
## DRY MIRCOBURST WINDS DUST DEVIL WATERSPOUT
## 1 1
## DUST STORM/HIGH WINDS Early Frost
## 1 1
## EXCESSIVE RAINFALL EXCESSIVE WETNESS
## 1 1
## Extended Cold EXTREME WIND CHILL
## 1 1
## FLASH FLOOD LANDSLIDES FLASH FLOOD WINDS
## 1 1
## FLASH FLOOD/ FLASH FLOOD/ STREET
## 1 1
## FLASH FLOOD/LANDSLIDE FLASH FLOODING/THUNDERSTORM WI
## 1 1
## FLOOD FLASH FLOOD/FLASH
## 1 1
## FLOOD/FLASH/FLOOD FLOOD/FLASHFLOOD
## 1 1
## FLOOD/RIVER FLOOD FLOODING/HEAVY RAIN
## 1 1
## FOG AND COLD TEMPERATURES FOREST FIRES
## 1 1
## Freezing drizzle Freezing Rain
## 1 1
## FREEZING RAIN/SLEET Freezing Spray
## 1 1
## Frost/Freeze FROST\\FREEZE
## 1 1
## GLAZE/ICE STORM GRADIENT WIND
## 1 1
## GRASS FIRES GUSTY WIND/HAIL
## 1 1
## GUSTY WIND/HVY RAIN Gusty wind/rain
## 1 1
## Gusty winds HAIL 0.75
## 1 1
## HAIL 075 HAIL 100
## 1 1
## HAIL 125 HAIL 150
## 1 1
## HAIL 200 HAIL 450
## 1 1
## HAIL 75 HAIL DAMAGE
## 1 1
## HAZARDOUS SURF Heat Wave
## 1 1
## HEAT WAVE DROUGHT HEAT WAVES
## 1 1
## HEAVY LAKE SNOW HEAVY PRECIPITATION
## 1 1
## HEAVY RAIN AND FLOOD Heavy Rain/High Surf
## 1 1
## HEAVY RAIN/LIGHTNING HEAVY RAIN/SEVERE WEATHER
## 1 1
## HEAVY RAIN/SMALL STREAM URBAN HEAVY RAIN/SNOW
## 1 1
## HEAVY SHOWER HEAVY SNOW AND HIGH WINDS
## 1 1
## HEAVY SNOW AND STRONG WINDS Heavy snow shower
## 1 1
## HEAVY SNOW/BLIZZARD HEAVY SNOW/BLIZZARD/AVALANCHE
## 1 1
## HEAVY SNOW/FREEZING RAIN HEAVY SNOW/HIGH WINDS & FLOOD
## 1 1
## HEAVY SNOW/WIND HEAVY SNOW/WINTER STORM
## 1 1
## HEAVY SNOWPACK Heavy surf and wind
## 1 1
## HEAVY SURF COASTAL FLOODING HEAVY SWELLS
## 1 1
## HIGH HIGH WINDS
## 1 1
## HIGH SURF ADVISORY HIGH TIDES
## 1 1
## HIGH WAVES HIGH WIND 48
## 1 1
## HIGH WIND AND SEAS HIGH WIND/BLIZZARD
## 1 1
## HIGH WIND/HEAVY SNOW HIGH WIND/SEAS
## 1 1
## HIGH WINDS HEAVY RAINS HIGH WINDS/
## 1 1
## HIGH WINDS/COASTAL FLOOD HIGH WINDS/HEAVY RAIN
## 1 1
## Hurricane Edouard HURRICANE EMILY
## 1 1
## HURRICANE FELIX HURRICANE GORDON
## 1 1
## HURRICANE OPAL/HIGH WINDS HVY RAIN
## 1 1
## HYPERTHERMIA/EXPOSURE HYPOTHERMIA
## 1 1
## ICE AND SNOW ICE JAM
## 1 1
## Ice jam flood (minor ICE ON ROAD
## 1 1
## ICE ROADS ICE STORM/FLASH FLOOD
## 1 1
## ICE/STRONG WINDS Lake Effect Snow
## 1 1
## LAKE FLOOD Landslump
## 1 1
## LATE SEASON SNOW Light snow
## 1 1
## Light Snowfall LIGHTING
## 1 1
## LIGHTNING WAUSEON LIGHTNING AND HEAVY RAIN
## 1 1
## LIGHTNING AND THUNDERSTORM WIN LIGHTNING FIRE
## 1 1
## LIGHTNING INJURY LIGHTNING THUNDERSTORM WINDS
## 1 1
## LIGHTNING. LIGHTNING/HEAVY RAIN
## 1 1
## LIGNTNING Marine Accident
## 1 1
## Microburst MICROBURST WINDS
## 1 1
## MUD SLIDES MUD SLIDES URBAN FLOODING
## 1 1
## Mudslides MUDSLIDES
## 1 1
## NON-SEVERE WIND DAMAGE NON-TSTM WIND
## 1 1
## NON TSTM WIND Other
## 1 1
## RAIN/WIND RAINSTORM
## 1 1
## RAPIDLY RISING WATER RECORD RAINFALL
## 1 1
## RECORD/EXCESSIVE HEAT ROCK SLIDE
## 1 1
## ROGUE WAVE RURAL FLOOD
## 1 1
## SEVERE TURBULENCE SLEET
## 1 1
## SLEET/ICE STORM SMALL STREAM FLOOD
## 1 1
## SNOW ACCUMULATION SNOW AND ICE STORM
## 1 1
## SNOW/ BITTER COLD SNOW/ ICE
## 1 1
## SNOW/BLOWING SNOW SNOW/HEAVY SNOW
## 1 1
## SNOW/ICE STORM FORCE WINDS
## 1 1
## THUDERSTORM WINDS THUNDEERSTORM WINDS
## 1 1
## THUNDERESTORM WINDS THUNDERSNOW
## 1 1
## THUNDERSTORM DAMAGE TO THUNDERSTORM HAIL
## 1 1
## THUNDERSTORM WIND (G40) THUNDERSTORM WIND 65 MPH
## 1 1
## THUNDERSTORM WIND 65MPH THUNDERSTORM WIND 98 MPH
## 1 1
## THUNDERSTORM WIND G50 THUNDERSTORM WIND G52
## 1 1
## THUNDERSTORM WIND G55 THUNDERSTORM WIND G60
## 1 1
## THUNDERSTORM WIND TREES THUNDERSTORM WIND.
## 1 1
## THUNDERSTORM WIND/ TREE THUNDERSTORM WIND/AWNING
## 1 1
## THUNDERSTORM WIND/HAIL THUNDERSTORM WIND/LIGHTNING
## 1 1
## THUNDERSTORM WINDS 13 THUNDERSTORM WINDS 63 MPH
## 1 1
## THUNDERSTORM WINDS AND THUNDERSTORM WINDS G60
## 1 1
## THUNDERSTORM WINDS. THUNDERSTORM WINDS/FLOODING
## 1 1
## THUNDERSTORM WINDS/FUNNEL CLOU THUNDERSTORM WINDS53
## 1 1
## THUNDERSTORM WINDSHAIL THUNDERSTORM WINS
## 1 1
## THUNDERSTORMS THUNDERSTORMW
## 1 1
## THUNDERSTORMWINDS THUNDERSTROM WIND
## 1 1
## THUNERSTORM WINDS Tidal Flooding
## 1 1
## TORNADOES, TSTM WIND, HAIL TORNDAO
## 1 1
## Torrential Rainfall TROPICAL STORM ALBERTO
## 1 1
## TROPICAL STORM DEAN TROPICAL STORM GORDON
## 1 1
## Tstm Wind TSTM WIND (G45)
## 1 1
## TSTM WIND (41) TSTM WIND (G35)
## 1 1
## TSTM WIND 40 TSTM WIND 45
## 1 1
## TSTM WIND 65) TSTM WIND AND LIGHTNING
## 1 1
## TSTM WIND DAMAGE TSTM WIND G45
## 1 1
## TSTM WIND G58 TSTMW
## 1 1
## TUNDERSTORM WIND Unseasonable Cold
## 1 1
## UNSEASONABLY WARM AND DRY URBAN AND SMALL
## 1 1
## URBAN AND SMALL STREAM FLOODIN URBAN SMALL
## 1 1
## URBAN/SMALL STREAM WARM WEATHER
## 1 1
## WATERSPOUT-TORNADO WATERSPOUT TORNADO
## 1 1
## WATERSPOUT/ TORNADO WHIRLWIND
## 1 1
## WILD/FOREST FIRES Wind
## 1 1
## WIND AND WAVE Wind Damage
## 1 1
## WIND STORM WIND/HAIL
## 1 1
## WINTER STORM HIGH WINDS WINTER STORMS
## 1 1
## Wintry Mix
## 1
#
## By observation of the 'evtype_counts_sorted' above we see a rapid 'drop-off'
## in orders of magnitude for the frequency of many of these types
## of storms from tens-of-thousands to 100 or less.
#
## now let's drop all EVTYPEs with frequency less than 10;
# Drop elements with counts of 10 or less
#
evtype_counts_sorted_filtered <- evtype_counts_sorted[evtype_counts_sorted > 10]
#
# Print the filtered frequency table
print(evtype_counts_sorted_filtered)
##
## TSTM WIND THUNDERSTORM WIND TORNADO
## 63236 43655 39944
## HAIL FLASH FLOOD LIGHTNING
## 26130 20968 13293
## THUNDERSTORM WINDS FLOOD HIGH WIND
## 12086 10175 5522
## STRONG WIND WINTER STORM HEAVY SNOW
## 3370 1508 1342
## HEAVY RAIN WILDFIRE ICE STORM
## 1105 857 708
## URBAN/SML STREAM FLD EXCESSIVE HEAT HIGH WINDS
## 702 698 657
## TSTM WIND/HAIL TROPICAL STORM WINTER WEATHER
## 441 416 407
## RIP CURRENT WILD/FOREST FIRE FLASH FLOODING
## 400 388 302
## FLOOD/FLASH FLOOD AVALANCHE DROUGHT
## 279 268 266
## BLIZZARD RIP CURRENTS HEAT
## 253 241 215
## EXTREME COLD LAKE-EFFECT SNOW LANDSLIDE
## 197 194 193
## STORM SURGE COASTAL FLOOD URBAN FLOOD
## 177 164 139
## WINTER WEATHER/MIX HIGH SURF HURRICANE
## 139 130 129
## LIGHT SNOW FROST/FREEZE EXTREME COLD/WIND CHILL
## 119 116 111
## MARINE TSTM WIND FOG RIVER FLOOD
## 109 107 107
## DUST STORM COLD/WIND CHILL DUST DEVIL
## 103 90 89
## WIND URBAN FLOODING DRY MICROBURST
## 83 80 78
## DENSE FOG HURRICANE/TYPHOON FLOODING
## 74 72 58
## COASTAL FLOODING SNOW HEAVY SURF/HIGH SURF
## 55 52 50
## STRONG WINDS STORM SURGE/TIDE WATERSPOUT
## 50 47 47
## MARINE STRONG WIND THUNDERSTORM WINDS HAIL TSTM WIND (G45)
## 46 40 37
## FREEZING RAIN TROPICAL DEPRESSION HEAT WAVE
## 35 35 34
## THUNDERSTORM WINDSS MARINE THUNDERSTORM WIND OTHER
## 34 33 33
## COLD HEAVY SURF EXCESSIVE SNOW
## 32 30 25
## ICE ICY ROADS LIGHT FREEZING RAIN
## 24 22 22
## THUNDERSTORM WINDS/HAIL GUSTY WINDS HEAVY SNOW SQUALLS
## 22 21 21
## Light Snow HEAVY RAINS EXTREME WINDCHILL
## 21 20 19
## GLAZE MARINE HIGH WIND THUNDERSTORM
## 19 19 19
## WINDS EXTREME HEAT FLASH FLOOD/FLOOD
## 18 17 16
## FREEZE SNOW SQUALL HEAVY SNOW-SQUALLS
## 16 16 15
## MIXED PRECIPITATION TSUNAMI FLASH FLOODS
## 15 14 13
## FUNNEL CLOUD GUSTY WIND RIVER FLOODING
## 13 13 13
## SMALL HAIL
## 11
#
# Get the number of elements in the filtered frequency table
num_elements <- length(evtype_counts_sorted_filtered)
#
# Print the number of elements
print(num_elements)
## [1] 97
#
## We have 97 EVTYPEs with frequency of 10 or more;
# SO, I choose to extract the top 97 most frequent EVTYPEs
top_97_evtype <- head(evtype_counts_sorted, 97)
# Print the top 10 most frequent EVTYPEs
print(top_97_evtype)
##
## TSTM WIND THUNDERSTORM WIND TORNADO
## 63236 43655 39944
## HAIL FLASH FLOOD LIGHTNING
## 26130 20968 13293
## THUNDERSTORM WINDS FLOOD HIGH WIND
## 12086 10175 5522
## STRONG WIND WINTER STORM HEAVY SNOW
## 3370 1508 1342
## HEAVY RAIN WILDFIRE ICE STORM
## 1105 857 708
## URBAN/SML STREAM FLD EXCESSIVE HEAT HIGH WINDS
## 702 698 657
## TSTM WIND/HAIL TROPICAL STORM WINTER WEATHER
## 441 416 407
## RIP CURRENT WILD/FOREST FIRE FLASH FLOODING
## 400 388 302
## FLOOD/FLASH FLOOD AVALANCHE DROUGHT
## 279 268 266
## BLIZZARD RIP CURRENTS HEAT
## 253 241 215
## EXTREME COLD LAKE-EFFECT SNOW LANDSLIDE
## 197 194 193
## STORM SURGE COASTAL FLOOD URBAN FLOOD
## 177 164 139
## WINTER WEATHER/MIX HIGH SURF HURRICANE
## 139 130 129
## LIGHT SNOW FROST/FREEZE EXTREME COLD/WIND CHILL
## 119 116 111
## MARINE TSTM WIND FOG RIVER FLOOD
## 109 107 107
## DUST STORM COLD/WIND CHILL DUST DEVIL
## 103 90 89
## WIND URBAN FLOODING DRY MICROBURST
## 83 80 78
## DENSE FOG HURRICANE/TYPHOON FLOODING
## 74 72 58
## COASTAL FLOODING SNOW HEAVY SURF/HIGH SURF
## 55 52 50
## STRONG WINDS STORM SURGE/TIDE WATERSPOUT
## 50 47 47
## MARINE STRONG WIND THUNDERSTORM WINDS HAIL TSTM WIND (G45)
## 46 40 37
## FREEZING RAIN TROPICAL DEPRESSION HEAT WAVE
## 35 35 34
## THUNDERSTORM WINDSS MARINE THUNDERSTORM WIND OTHER
## 34 33 33
## COLD HEAVY SURF EXCESSIVE SNOW
## 32 30 25
## ICE ICY ROADS LIGHT FREEZING RAIN
## 24 22 22
## THUNDERSTORM WINDS/HAIL GUSTY WINDS HEAVY SNOW SQUALLS
## 22 21 21
## Light Snow HEAVY RAINS EXTREME WINDCHILL
## 21 20 19
## GLAZE MARINE HIGH WIND THUNDERSTORM
## 19 19 19
## WINDS EXTREME HEAT FLASH FLOOD/FLOOD
## 18 17 16
## FREEZE SNOW SQUALL HEAVY SNOW-SQUALLS
## 16 16 15
## MIXED PRECIPITATION TSUNAMI FLASH FLOODS
## 15 14 13
## FUNNEL CLOUD GUSTY WIND RIVER FLOODING
## 13 13 13
## SMALL HAIL
## 11
#
## NOTE: as an aside, we see by inspection that of the EVTYPEs remaining,
## NONE are lowercase.
##
## To reduce the df_1 dataframe to only include the 97 most frequently
## occurring EVTYPEs identified in the evtype_counts_sorted_filtered,
## we use the filter() function from the dplyr package.
## Here's it is:
#
library(dplyr)
# Get the names of the 97 most frequent EVTYPEs
# 1st, Convert the frequency table to a dataframe
evtype_counts_df <- as.data.frame(evtype_counts_sorted_filtered)
evtype_counts_df
## Var1 Freq
## 1 TSTM WIND 63236
## 2 THUNDERSTORM WIND 43655
## 3 TORNADO 39944
## 4 HAIL 26130
## 5 FLASH FLOOD 20968
## 6 LIGHTNING 13293
## 7 THUNDERSTORM WINDS 12086
## 8 FLOOD 10175
## 9 HIGH WIND 5522
## 10 STRONG WIND 3370
## 11 WINTER STORM 1508
## 12 HEAVY SNOW 1342
## 13 HEAVY RAIN 1105
## 14 WILDFIRE 857
## 15 ICE STORM 708
## 16 URBAN/SML STREAM FLD 702
## 17 EXCESSIVE HEAT 698
## 18 HIGH WINDS 657
## 19 TSTM WIND/HAIL 441
## 20 TROPICAL STORM 416
## 21 WINTER WEATHER 407
## 22 RIP CURRENT 400
## 23 WILD/FOREST FIRE 388
## 24 FLASH FLOODING 302
## 25 FLOOD/FLASH FLOOD 279
## 26 AVALANCHE 268
## 27 DROUGHT 266
## 28 BLIZZARD 253
## 29 RIP CURRENTS 241
## 30 HEAT 215
## 31 EXTREME COLD 197
## 32 LAKE-EFFECT SNOW 194
## 33 LANDSLIDE 193
## 34 STORM SURGE 177
## 35 COASTAL FLOOD 164
## 36 URBAN FLOOD 139
## 37 WINTER WEATHER/MIX 139
## 38 HIGH SURF 130
## 39 HURRICANE 129
## 40 LIGHT SNOW 119
## 41 FROST/FREEZE 116
## 42 EXTREME COLD/WIND CHILL 111
## 43 MARINE TSTM WIND 109
## 44 FOG 107
## 45 RIVER FLOOD 107
## 46 DUST STORM 103
## 47 COLD/WIND CHILL 90
## 48 DUST DEVIL 89
## 49 WIND 83
## 50 URBAN FLOODING 80
## 51 DRY MICROBURST 78
## 52 DENSE FOG 74
## 53 HURRICANE/TYPHOON 72
## 54 FLOODING 58
## 55 COASTAL FLOODING 55
## 56 SNOW 52
## 57 HEAVY SURF/HIGH SURF 50
## 58 STRONG WINDS 50
## 59 STORM SURGE/TIDE 47
## 60 WATERSPOUT 47
## 61 MARINE STRONG WIND 46
## 62 THUNDERSTORM WINDS HAIL 40
## 63 TSTM WIND (G45) 37
## 64 FREEZING RAIN 35
## 65 TROPICAL DEPRESSION 35
## 66 HEAT WAVE 34
## 67 THUNDERSTORM WINDSS 34
## 68 MARINE THUNDERSTORM WIND 33
## 69 OTHER 33
## 70 COLD 32
## 71 HEAVY SURF 30
## 72 EXCESSIVE SNOW 25
## 73 ICE 24
## 74 ICY ROADS 22
## 75 LIGHT FREEZING RAIN 22
## 76 THUNDERSTORM WINDS/HAIL 22
## 77 GUSTY WINDS 21
## 78 HEAVY SNOW SQUALLS 21
## 79 Light Snow 21
## 80 HEAVY RAINS 20
## 81 EXTREME WINDCHILL 19
## 82 GLAZE 19
## 83 MARINE HIGH WIND 19
## 84 THUNDERSTORM 19
## 85 WINDS 18
## 86 EXTREME HEAT 17
## 87 FLASH FLOOD/FLOOD 16
## 88 FREEZE 16
## 89 SNOW SQUALL 16
## 90 HEAVY SNOW-SQUALLS 15
## 91 MIXED PRECIPITATION 15
## 92 TSUNAMI 14
## 93 FLASH FLOODS 13
## 94 FUNNEL CLOUD 13
## 95 GUSTY WIND 13
## 96 RIVER FLOODING 13
## 97 SMALL HAIL 11
str(evtype_counts_df)
## 'data.frame': 97 obs. of 2 variables:
## $ Var1: Factor w/ 97 levels "TSTM WIND","THUNDERSTORM WIND",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Freq: int 63236 43655 39944 26130 20968 13293 12086 10175 5522 3370 ...
# Get the names of the 97 most frequent EVTYPEs
top_97_evtypes <- evtype_counts_df$Var1
top_97_evtypes
## [1] TSTM WIND THUNDERSTORM WIND TORNADO
## [4] HAIL FLASH FLOOD LIGHTNING
## [7] THUNDERSTORM WINDS FLOOD HIGH WIND
## [10] STRONG WIND WINTER STORM HEAVY SNOW
## [13] HEAVY RAIN WILDFIRE ICE STORM
## [16] URBAN/SML STREAM FLD EXCESSIVE HEAT HIGH WINDS
## [19] TSTM WIND/HAIL TROPICAL STORM WINTER WEATHER
## [22] RIP CURRENT WILD/FOREST FIRE FLASH FLOODING
## [25] FLOOD/FLASH FLOOD AVALANCHE DROUGHT
## [28] BLIZZARD RIP CURRENTS HEAT
## [31] EXTREME COLD LAKE-EFFECT SNOW LANDSLIDE
## [34] STORM SURGE COASTAL FLOOD URBAN FLOOD
## [37] WINTER WEATHER/MIX HIGH SURF HURRICANE
## [40] LIGHT SNOW FROST/FREEZE EXTREME COLD/WIND CHILL
## [43] MARINE TSTM WIND FOG RIVER FLOOD
## [46] DUST STORM COLD/WIND CHILL DUST DEVIL
## [49] WIND URBAN FLOODING DRY MICROBURST
## [52] DENSE FOG HURRICANE/TYPHOON FLOODING
## [55] COASTAL FLOODING SNOW HEAVY SURF/HIGH SURF
## [58] STRONG WINDS STORM SURGE/TIDE WATERSPOUT
## [61] MARINE STRONG WIND THUNDERSTORM WINDS HAIL TSTM WIND (G45)
## [64] FREEZING RAIN TROPICAL DEPRESSION HEAT WAVE
## [67] THUNDERSTORM WINDSS MARINE THUNDERSTORM WIND OTHER
## [70] COLD HEAVY SURF EXCESSIVE SNOW
## [73] ICE ICY ROADS LIGHT FREEZING RAIN
## [76] THUNDERSTORM WINDS/HAIL GUSTY WINDS HEAVY SNOW SQUALLS
## [79] Light Snow HEAVY RAINS EXTREME WINDCHILL
## [82] GLAZE MARINE HIGH WIND THUNDERSTORM
## [85] WINDS EXTREME HEAT FLASH FLOOD/FLOOD
## [88] FREEZE SNOW SQUALL HEAVY SNOW-SQUALLS
## [91] MIXED PRECIPITATION TSUNAMI FLASH FLOODS
## [94] FUNNEL CLOUD GUSTY WIND RIVER FLOODING
## [97] SMALL HAIL
## 97 Levels: TSTM WIND THUNDERSTORM WIND TORNADO HAIL FLASH FLOOD ... SMALL HAIL
# Filter df_1 to include only the top 97 EVTYPEs
df_2 <- df_1 %>%
filter(EVTYPE %in% top_97_evtypes)
head(df_2)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE FATALITIES INJURIES
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl>
## 1 1 AL MOBILE 1950-04-18 TORNADO 0 15
## 2 2 AL BALDWIN 1950-04-18 TORNADO 0 0
## 3 3 AL FAYETTE 1951-02-20 TORNADO 0 2
## 4 4 AL MADISON 1951-06-08 TORNADO 0 2
## 5 5 AL CULLMAN 1951-11-15 TORNADO 0 2
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 0 6
dim(df_2)
## [1] 253844 7
dim(df_1)
## [1] 254633 7
#View(df_2)
#
## Remove rows from df_2 with FATALITIES and INJURIES both equal to '0'......
#
df_2 <- df_2 %>%
filter(FATALITIES !=0 | INJURIES !=0)
head(df_2)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE FATALITIES INJURIES
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl>
## 1 1 AL MOBILE 1950-04-18 TORNADO 0 15
## 2 3 AL FAYETTE 1951-02-20 TORNADO 0 2
## 3 4 AL MADISON 1951-06-08 TORNADO 0 2
## 4 5 AL CULLMAN 1951-11-15 TORNADO 0 2
## 5 6 AL LAUDERDALE 1951-11-15 TORNADO 0 6
## 6 7 AL BLOUNT 1951-11-16 TORNADO 0 1
dim(df_2)
## [1] 21728 7
#View(df_2)
#
## We see our 'df_2' 'reduced' to 21,728 x 7....
#
## We now aggregate the FATALITIES and INJURIES for each EVTYPE in the
## dataframe 'df_2', We use the group_by() and summarize() functions
## from the dplyr package:
#
# Group by EVTYPE and summarize FATALITIES and INJURIES
df_3 <- df_2 %>%
group_by(EVTYPE) %>%
summarize(total_fatalities = sum(FATALITIES),
total_injuries = sum(INJURIES))
head(df_3)
## # A tibble: 6 × 3
## EVTYPE total_fatalities total_injuries
## <chr> <dbl> <dbl>
## 1 AVALANCHE 224 170
## 2 BLIZZARD 101 805
## 3 COASTAL FLOOD 3 2
## 4 COASTAL FLOODING 1 0
## 5 COLD 35 48
## 6 COLD/WIND CHILL 95 12
dim(df_3)
## [1] 86 3
#View(df_3)
## This code groups the dataframe 'df_2' by the 'EVTYPE' column and
## then calculates the total number of fatalities and injuries for
## each EVTYPE using the sum() function within the summarize() function.
## The resulting dataframe 'df_3' contains the aggregated information
## for each EVTYPE, including the total number of fatalities and injuries.
#
## REVIEW: Our assignment is to find the types of events (as indicated
## in the EVTYPE variable) are most harmful with respect to
## population health.
#
## By inspection of the View of df_3, we see many EVTYPEs that are close
## to zero. They can be said to not be among 'the most harmful'. So I choose
## to remove these from our df_3 dataframe.
#
# Remove rows where total_fatalities or total_injuries is 5 or less
df_3 <- df_3 %>%
filter(total_fatalities > 5 | total_injuries > 5)
##
#View(df_3)
dim(df_3)
## [1] 68 3
## This code filters the dataframe 'df_3' to exclude rows where either
## 'total_fatalities' or 'total_injuries' is 5 or less. After executing
## this code, the resulting dataframe 'df_3' will contain only the rows
## where either 'total_fatalities' or 'total_injuries' is greater than 5.
#
##### df_3 IS OUR STARTING POINT %%%%%%%%%%%%%%%%%%%
#
##
#View(df_3)
#
From df_3 We also see that we have several EVTYPE storm types that are ‘the same’ such as ‘FLASH FLOOD’, ‘FLASH FLOOD/FLOOD’, ‘FLASH FLOODING’, ‘FLOOD’, ‘FLOODING’, etc…we need to ‘combine’ these…e.g., we will use; ## The regular expression .FLOOD. is used to match any string that contains the substring “FLOOD” within it. .FLOOD. matches any string that contains the substring “FLOOD” anywhere within it, surrounded by any other characters or none at all.
library(dplyr)
## Define the list of storm types
#
storm_types <- c('.*TORNADO.*', '.*HEAT.*', '.*FLOOD.*', '.*SNOW.*', '.*WIND.*',
'.*COLD.*', '.*SURF.*', '.*ICE.*', '.*RIP.*',
'.*LIGHTNING.*', '.*WINTER.*', '.*FIRE.*', '.*HURRICANE.*',
'.*HAIL.*', '.*SURGE.*')
#
storm_types
## [1] ".*TORNADO.*" ".*HEAT.*" ".*FLOOD.*" ".*SNOW.*"
## [5] ".*WIND.*" ".*COLD.*" ".*SURF.*" ".*ICE.*"
## [9] ".*RIP.*" ".*LIGHTNING.*" ".*WINTER.*" ".*FIRE.*"
## [13] ".*HURRICANE.*" ".*HAIL.*" ".*SURGE.*"
#
# Create an empty list to store the resulting dataframes
df_list <- list()
#
# Iterate over each storm type - using a 'for loop'...We start with our
## 'df_3' dataframe formed above...
#
for (i in seq_along(storm_types)) {
# Consolidate EVTYPEs containing the current storm type into a
## single category
df <- if (i == 1) df_3 else df_list[[i - 1]]
df_new <- df %>%
mutate(EVTYPE = ifelse(grepl(storm_types[i], EVTYPE, ignore.case = TRUE),
toupper(storm_types[i]), EVTYPE)) %>%
group_by(EVTYPE) %>%
summarize(total_fatalities = sum(total_fatalities),
total_injuries = sum(total_injuries))
# Store the resulting dataframe in the list
df_list[[i]] <- df_new
# Print dimensions and view the dataframe (optional)
cat("Dimensions of df_", i, ": ", dim(df_new), "\n")
# View(df_new)
}
## Dimensions of df_ 1 : 68 3
## Dimensions of df_ 2 : 65 3
## Dimensions of df_ 3 : 60 3
## Dimensions of df_ 4 : 58 3
## Dimensions of df_ 5 : 43 3
## Dimensions of df_ 6 : 42 3
## Dimensions of df_ 7 : 40 3
## Dimensions of df_ 8 : 39 3
## Dimensions of df_ 9 : 38 3
## Dimensions of df_ 10 : 38 3
## Dimensions of df_ 11 : 36 3
## Dimensions of df_ 12 : 35 3
## Dimensions of df_ 13 : 34 3
## Dimensions of df_ 14 : 33 3
## Dimensions of df_ 15 : 32 3
# The final dataframe after all iterations
final_df <- df_list[[length(df_list)]]
dim(final_df)
## [1] 32 3
#View(final_df)
#
We need to get rid of the ’.*’ symbols from ‘final_df$EVTYPE’… #
final_df$EVTYPE
## [1] ".*COLD.*" ".*FIRE.*" ".*FLOOD.*"
## [4] ".*HAIL.*" ".*HEAT.*" ".*HURRICANE.*"
## [7] ".*ICE.*" ".*LIGHTNING.*" ".*RIP.*"
## [10] ".*SNOW.*" ".*SURF.*" ".*SURGE.*"
## [13] ".*TORNADO.*" ".*WIND.*" ".*WINTER.*"
## [16] "AVALANCHE" "BLIZZARD" "DENSE FOG"
## [19] "DRY MICROBURST" "DUST DEVIL" "DUST STORM"
## [22] "FOG" "FREEZING RAIN" "GLAZE"
## [25] "HEAVY RAIN" "ICY ROADS" "LANDSLIDE"
## [28] "THUNDERSTORM" "TROPICAL STORM" "TSUNAMI"
## [31] "URBAN/SML STREAM FLD" "WATERSPOUT"
#
## We use gsub to extract only the keyword without surrounding symbols
#
final_df$EVTYPE <- gsub(".*?(TORNADO|HEAT|FLOOD|SNOW|WIND|,
|COLD|SURF|ICE|RIP|LIGHTNING|,
|WINTER|FIRE|HAIL|SURGE|HURRICANE).*",
"\\1", final_df$EVTYPE)
final_df$EVTYPE
## [1] "COLD" "FIRE" "FLOOD"
## [4] "HAIL" "HEAT" "HURRICANE"
## [7] "ICE" "LIGHTNING" "RIP"
## [10] "SNOW" "SURF" "SURGE"
## [13] "TORNADO" "WIND" "WINTER"
## [16] "AVALANCHE" "BLIZZARD" "DENSE FOG"
## [19] "DRY MICROBURST" "DUST DEVIL" "DUST STORM"
## [22] "FOG" "FREEZING RAIN" "GLAZE"
## [25] "HEAVY RAIN" "ICY ROADS" "LANDSLIDE"
## [28] "THUNDERSTORM" "TROPICAL STORM" "TSUNAMI"
## [31] "URBAN/SML STREAM FLD" "WATERSPOUT"
# Print the head of the final dataframe to verify changes
head(final_df)
## # A tibble: 6 × 3
## EVTYPE total_fatalities total_injuries
## <chr> <dbl> <dbl>
## 1 COLD 195 279
## 2 FIRE 87 1456
## 3 FLOOD 1504 8591
## 4 HAIL 15 1371
## 5 HEAT 3108 9089
## 6 HURRICANE 125 1321
dim(final_df)
## [1] 32 3
#View(final_df)
#
## 'final_df' is the 'boiled-down' dataframe we want;
#
## now 'sum' and 'sort'...
# Sum total_fatalities and total_injuries and create a new column
#
final_df <- final_df %>%
mutate(Population_Deaths_Injuries = total_fatalities + total_injuries)
# Sort the dataframe in descending order based on Population_Deaths_Injuries
#
final_df <- final_df %>%
arrange(desc(Population_Deaths_Injuries))
#
dim(final_df)
## [1] 32 4
View(final_df)
df_popula_health <- final_df
dim(df_popula_health)
## [1] 32 4
#View(df_popula_health)
#
## Make a 'table'- display the df_popula_health dataframe as a table
#
library(knitr)
#
## select only the 'EVTYPE' and 'Population_Deaths_Injuries' columns
selected_cols <- df_popula_health[ , c('EVTYPE', 'Population_Deaths_Injuries')]
selected_cols
## # A tibble: 32 × 2
## EVTYPE Population_Deaths_Injuries
## <chr> <dbl>
## 1 TORNADO 96979
## 2 WIND 12775
## 3 HEAT 12197
## 4 FLOOD 10095
## 5 LIGHTNING 6046
## 6 ICE 2207
## 7 WINTER 2058
## 8 FIRE 1543
## 9 HURRICANE 1446
## 10 HAIL 1386
## # ℹ 22 more rows
#
kable(selected_cols,
col.names = c("Storm/Weather Events", "Deaths/Injuries"),
caption = "Storm/Weather Events that are Most Harmful With Respect
to Population Health")
| Storm/Weather Events | Deaths/Injuries |
|---|---|
| TORNADO | 96979 |
| WIND | 12775 |
| HEAT | 12197 |
| FLOOD | 10095 |
| LIGHTNING | 6046 |
| ICE | 2207 |
| WINTER | 2058 |
| FIRE | 1543 |
| HURRICANE | 1446 |
| HAIL | 1386 |
| SNOW | 1219 |
| RIP | 1101 |
| BLIZZARD | 906 |
| FOG | 796 |
| COLD | 474 |
| DUST STORM | 462 |
| TROPICAL STORM | 398 |
| AVALANCHE | 394 |
| SURF | 390 |
| DENSE FOG | 360 |
| HEAVY RAIN | 349 |
| GLAZE | 223 |
| TSUNAMI | 162 |
| URBAN/SML STREAM FLD | 107 |
| LANDSLIDE | 90 |
| SURGE | 67 |
| DUST DEVIL | 44 |
| ICY ROADS | 36 |
| WATERSPOUT | 32 |
| DRY MICROBURST | 31 |
| FREEZING RAIN | 30 |
| THUNDERSTORM | 13 |
## This will generate a nicely formatted table with the specified column names
## and a title as a caption. The dataframe 'df_popula_health' is already sorted
## in descending order based on the 'Population_Deaths_Injuries' column,
## as per the previous step.
dim(df_popula_health)
## [1] 32 4
#
By inspection of the ‘Storm/Weather Events Table’ we can see that the ‘top’ 10 to 15 types of events dominate.So determining where to cut off the number of events to display in the plot (e.g., ggplot) depends on factors such as readability and the focus of your analysis. If you have too many events, the labels may overlap or become unreadable, so it’s often a good idea to limit the number of events shown.
As a general guideline, you might consider displaying the top N events with the highest number of deaths/injuries, where N is a reasonable number that allows for clear visualization without overcrowding the plot. You can choose N based on the number of events you believe are significant or commonly recognized.
For example, you might decide to display the top 10 or top 15 events, as these are likely to be the most impactful and relevant for your analysis. However, you should adjust this number based on your specific requirements and preferences.
Once you decide on the number of events to include, you can filter the dataframe to keep only the top N events based on the ‘Deaths/Injuries’ column before creating the ggplot.
Here we filter the dataframe to keep only the top 15 events:
#
library(dplyr)
library(ggplot2)
# Filter the dataframe to keep only the top 15 events
top_events <- selected_cols %>%
arrange(desc(Population_Deaths_Injuries)) %>%
head(15)
top_events
## # A tibble: 15 × 2
## EVTYPE Population_Deaths_Injuries
## <chr> <dbl>
## 1 TORNADO 96979
## 2 WIND 12775
## 3 HEAT 12197
## 4 FLOOD 10095
## 5 LIGHTNING 6046
## 6 ICE 2207
## 7 WINTER 2058
## 8 FIRE 1543
## 9 HURRICANE 1446
## 10 HAIL 1386
## 11 SNOW 1219
## 12 RIP 1101
## 13 BLIZZARD 906
## 14 FOG 796
## 15 COLD 474
#
# Create ggplot
ggplot(top_events, aes(x = EVTYPE, y = Population_Deaths_Injuries)) +
geom_bar(stat = "identity", fill = 'blue') +
labs(title = "Top 15 Most Harmful Deaths/Injuries Storm/Weather Events in USA",
x = "Storm/Weather Events - Years 1950 thru 2011",
y = "Deaths and Injuries") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
#
%%%%%%%%%%%%%%% &&&&&&&&&&&&&&&& ****************
%%%%%%%%%%%%%%% &&&&&&&&&&&&&&&& ****************
%%%%%%%%%%%%%%% &&&&&&&&&&&&&&&& ****************
By focusing on the following variables from our original ’StormData.csv file form NOAA, we can assess which types of events have the greatest economic consequences (Question 2) across the United States. These variables directly measure the severity of economic damage caused by different types of weather events.
EVTYPE (Type of Event): Similar to question 1, analyzing this variable will help us identify which types of events are associated with the greatest economic consequences.
REFNUM : Audit Trail: ‘REFNUM’ acts as an audit trail, enabling tracing data transformations or modifications back to the original dataset. This can be particularly useful for reproducibility and documentation purposes, facilitating transparency and collaboration in the analysis workflow.
STATE : as part of this dataset allows for a nuanced exploration of population health outcomes, facilitates comparative analysis, and enhances the relevance and applicability of your findings to specific geographic regions or jurisdictions.
PROPDMG (Property Damage Estimate): This variable provides an estimate of the property damage caused by each event. It directly measures the economic impact of the event on property and infrastructure.
PROPDMGEXP (Property Damage Exponent): This variable specifies the exponent used to interpret the property damage estimate (e.g., K for thousands, M for millions). It helps us accurately interpret the magnitude of property damage
CROPDMG (Crop Damage Estimate): This variable provides an estimate of the crop damage caused by each event. It measures the economic impact on agricultural production and related industries.
CROPDMGEXP (Crop Damage Exponent): Similar to PROPDMGEXP, this variable specifies the exponent used to interpret the crop damage estimate.
# We use the dataframe 'stmdata_filtered' from above as our starting 'point'...
# We will use the 'same pattern' of reducing dataframes 'stmdata_filtered' and
# subsequent 'df_11' as we did above for Question 1.....
#
df_11 <- stmdata_filtered %>%
select(REFNUM, STATE, COUNTYNAME, BGN_DATE, EVTYPE, PROPDMG, PROPDMGEXP,
CROPDMG, CROPDMGEXP)
#
head(df_11)
## # A tibble: 6 × 9
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## <dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 1 AL MOBILE 4/18/195… TORNA… 25 K 0 <NA>
## 2 2 AL BALDWIN 4/18/195… TORNA… 2.5 K 0 <NA>
## 3 3 AL FAYETTE 2/20/195… TORNA… 25 K 0 <NA>
## 4 4 AL MADISON 6/8/1951… TORNA… 2.5 K 0 <NA>
## 5 5 AL CULLMAN 11/15/19… TORNA… 2.5 K 0 <NA>
## 6 6 AL LAUDERDALE 11/15/19… TORNA… 2.5 K 0 <NA>
#View(df_11)
#
#
## ########## df_11 is our STARTING POINT %%%%%%%%%%
#
#
## get rid of the '0:00:00's.......
library(lubridate)
# Assuming 'BGN_DATE' is currently in character format
df_11 <- df_11 %>%
mutate(BGN_DATE = mdy_hms(BGN_DATE), # Convert to date-time format
BGN_DATE = as.Date(BGN_DATE)) # Convert back to date-only format
# Now 'BGN_DATE' should contain only dates without the '0:00:00' time portion
head(df_11)
## # A tibble: 6 × 9
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP CROPDMG
## <dbl> <chr> <chr> <date> <chr> <dbl> <chr> <dbl>
## 1 1 AL MOBILE 1950-04-18 TORNADO 25 K 0
## 2 2 AL BALDWIN 1950-04-18 TORNADO 2.5 K 0
## 3 3 AL FAYETTE 1951-02-20 TORNADO 25 K 0
## 4 4 AL MADISON 1951-06-08 TORNADO 2.5 K 0
## 5 5 AL CULLMAN 1951-11-15 TORNADO 2.5 K 0
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 2.5 K 0
## # ℹ 1 more variable: CROPDMGEXP <chr>
dim(df_11)
## [1] 254633 9
#View(df_11)
#
## %%%%%%%%%%% beautiful !!!
#
## check for NA's
sum(is.na(df_11))
## [1] 164256
## many NA's
#
## the df_11 dataframe has 254633 observations, 7 features and 0 NA's....
#
## find the number of unique values of EVTYPE in df_1:
length(unique(df_11$EVTYPE))
## [1] 485
#
## there are 488 types of events...
#
## let's list all types of events
unique(df_11$EVTYPE)
## [1] "TORNADO" "TSTM WIND"
## [3] "HAIL" "ICE STORM/FLASH FLOOD"
## [5] "WINTER STORM" "HURRICANE OPAL/HIGH WINDS"
## [7] "THUNDERSTORM WINDS" "HURRICANE ERIN"
## [9] "HURRICANE OPAL" "HEAVY RAIN"
## [11] "LIGHTNING" "THUNDERSTORM WIND"
## [13] "DENSE FOG" "RIP CURRENT"
## [15] "THUNDERSTORM WINS" "FLASH FLOODING"
## [17] "FLASH FLOOD" "TORNADO F0"
## [19] "THUNDERSTORM WINDS LIGHTNING" "THUNDERSTORM WINDS/HAIL"
## [21] "HEAT" "HIGH WINDS"
## [23] "WIND" "HEAVY RAINS"
## [25] "LIGHTNING AND HEAVY RAIN" "THUNDERSTORM WINDS HAIL"
## [27] "COLD" "HEAVY RAIN/LIGHTNING"
## [29] "FLASH FLOODING/THUNDERSTORM WI" "FLOODING"
## [31] "WATERSPOUT" "EXTREME COLD"
## [33] "LIGHTNING/HEAVY RAIN" "BREAKUP FLOODING"
## [35] "HIGH WIND" "FREEZE"
## [37] "RIVER FLOOD" "HIGH WINDS HEAVY RAINS"
## [39] "AVALANCHE" "MARINE MISHAP"
## [41] "HIGH TIDES" "HIGH WIND/SEAS"
## [43] "HIGH WINDS/HEAVY RAIN" "HIGH SEAS"
## [45] "COASTAL FLOOD" "SEVERE TURBULENCE"
## [47] "RECORD RAINFALL" "HEAVY SNOW"
## [49] "HEAVY SNOW/WIND" "DUST STORM"
## [51] "FLOOD" "APACHE COUNTY"
## [53] "SLEET" "DUST DEVIL"
## [55] "ICE STORM" "EXCESSIVE HEAT"
## [57] "THUNDERSTORM WINDS/FUNNEL CLOU" "GUSTY WINDS"
## [59] "FLOODING/HEAVY RAIN" "HEAVY SURF COASTAL FLOODING"
## [61] "HIGH SURF" "WILD FIRES"
## [63] "HIGH" "WINTER STORM HIGH WINDS"
## [65] "WINTER STORMS" "MUDSLIDES"
## [67] "RAINSTORM" "SEVERE THUNDERSTORM"
## [69] "SEVERE THUNDERSTORMS" "SEVERE THUNDERSTORM WINDS"
## [71] "THUNDERSTORMS WINDS" "FLOOD/FLASH FLOOD"
## [73] "FLOOD/RAIN/WINDS" "THUNDERSTORMS"
## [75] "FLASH FLOOD WINDS" "WINDS"
## [77] "FUNNEL CLOUD" "HIGH WIND DAMAGE"
## [79] "STRONG WIND" "HEAVY SNOWPACK"
## [81] "FLASH FLOOD/" "HEAVY SURF"
## [83] "DRY MIRCOBURST WINDS" "DRY MICROBURST"
## [85] "URBAN FLOOD" "THUNDERSTORM WINDSS"
## [87] "MICROBURST WINDS" "HEAT WAVE"
## [89] "UNSEASONABLY WARM" "COASTAL FLOODING"
## [91] "STRONG WINDS" "BLIZZARD"
## [93] "WATERSPOUT/TORNADO" "WATERSPOUT TORNADO"
## [95] "STORM SURGE" "URBAN/SMALL STREAM FLOOD"
## [97] "WATERSPOUT-" "TORNADOES, TSTM WIND, HAIL"
## [99] "TROPICAL STORM ALBERTO" "TROPICAL STORM"
## [101] "TROPICAL STORM GORDON" "TROPICAL STORM JERRY"
## [103] "LIGHTNING THUNDERSTORM WINDS" "URBAN FLOODING"
## [105] "MINOR FLOODING" "WATERSPOUT-TORNADO"
## [107] "LIGHTNING INJURY" "LIGHTNING AND THUNDERSTORM WIN"
## [109] "FLASH FLOODS" "THUNDERSTORM WINDS53"
## [111] "WILDFIRE" "DAMAGING FREEZE"
## [113] "THUNDERSTORM WINDS 13" "HURRICANE"
## [115] "SNOW" "LIGNTNING"
## [117] "FROST" "FREEZING RAIN/SNOW"
## [119] "HIGH WINDS/" "THUNDERSNOW"
## [121] "FLOODS" "COOL AND WET"
## [123] "HEAVY RAIN/SNOW" "GLAZE ICE"
## [125] "MUD SLIDE" "HIGH WINDS"
## [127] "RURAL FLOOD" "MUD SLIDES"
## [129] "EXTREME HEAT" "DROUGHT"
## [131] "COLD AND WET CONDITIONS" "EXCESSIVE WETNESS"
## [133] "SLEET/ICE STORM" "GUSTNADO"
## [135] "FREEZING RAIN" "SNOW AND HEAVY SNOW"
## [137] "GROUND BLIZZARD" "EXTREME WIND CHILL"
## [139] "MAJOR FLOOD" "SNOW/HEAVY SNOW"
## [141] "FREEZING RAIN/SLEET" "ICE JAM FLOODING"
## [143] "COLD AIR TORNADO" "WIND DAMAGE"
## [145] "FOG" "TSTM WIND 55"
## [147] "SMALL STREAM FLOOD" "THUNDERTORM WINDS"
## [149] "HAIL/WINDS" "SNOW AND ICE"
## [151] "WIND STORM" "GRASS FIRES"
## [153] "LAKE FLOOD" "HAIL/WIND"
## [155] "WIND/HAIL" "ICE"
## [157] "SNOW AND ICE STORM" "THUNDERSTORM WINDS"
## [159] "WINTER WEATHER" "DROUGHT/EXCESSIVE HEAT"
## [161] "THUNDERSTORMS WIND" "TUNDERSTORM WIND"
## [163] "URBAN AND SMALL STREAM FLOODIN" "THUNDERSTORM WIND/LIGHTNING"
## [165] "HEAVY RAIN/SEVERE WEATHER" "THUNDERSTORM"
## [167] "WATERSPOUT/ TORNADO" "LIGHTNING."
## [169] "HURRICANE-GENERATED SWELLS" "RIVER AND STREAM FLOOD"
## [171] "HIGH WINDS/COASTAL FLOOD" "RAIN"
## [173] "RIVER FLOODING" "ICE FLOES"
## [175] "THUNDERSTORM WIND G50" "LIGHTNING FIRE"
## [177] "HEAVY LAKE SNOW" "RECORD COLD"
## [179] "HEAVY SNOW/FREEZING RAIN" "COLD WAVE"
## [181] "DUST DEVIL WATERSPOUT" "TORNADO F3"
## [183] "TORNDAO" "FLOOD/RIVER FLOOD"
## [185] "MUD SLIDES URBAN FLOODING" "TORNADO F1"
## [187] "GLAZE/ICE STORM" "GLAZE"
## [189] "HEAVY SNOW/WINTER STORM" "MICROBURST"
## [191] "AVALANCE" "BLIZZARD/WINTER STORM"
## [193] "DUST STORM/HIGH WINDS" "ICE JAM"
## [195] "FOREST FIRES" "FROST\\FREEZE"
## [197] "THUNDERSTORM WINDS." "HVY RAIN"
## [199] "HAIL 150" "HAIL 075"
## [201] "HAIL 100" "THUNDERSTORM WIND G55"
## [203] "HAIL 125" "THUNDERSTORM WIND G60"
## [205] "THUNDERSTORM WINDS G60" "HARD FREEZE"
## [207] "HAIL 200" "HEAVY SNOW AND HIGH WINDS"
## [209] "HEAVY SNOW/HIGH WINDS & FLOOD" "HEAVY RAIN AND FLOOD"
## [211] "RIP CURRENTS/HEAVY SURF" "URBAN AND SMALL"
## [213] "WILDFIRES" "FOG AND COLD TEMPERATURES"
## [215] "SNOW/COLD" "FLASH FLOOD FROM ICE JAMS"
## [217] "TSTM WIND G58" "MUDSLIDE"
## [219] "HEAVY SNOW SQUALLS" "SNOW SQUALL"
## [221] "SNOW/ICE STORM" "HEAVY SNOW/SQUALLS"
## [223] "HEAVY SNOW-SQUALLS" "ICY ROADS"
## [225] "HEAVY MIX" "SNOW FREEZING RAIN"
## [227] "SNOW/SLEET" "SNOW/FREEZING RAIN"
## [229] "SNOW SQUALLS" "SNOW/SLEET/FREEZING RAIN"
## [231] "RECORD SNOW" "HAIL 0.75"
## [233] "RECORD HEAT" "THUNDERSTORM WIND 65MPH"
## [235] "THUNDERSTORM WIND/ TREES" "THUNDERSTORM WIND/AWNING"
## [237] "THUNDERSTORM WIND 98 MPH" "THUNDERSTORM WIND TREES"
## [239] "TORNADO F2" "RIP CURRENTS"
## [241] "HURRICANE EMILY" "COASTAL SURGE"
## [243] "HURRICANE GORDON" "HURRICANE FELIX"
## [245] "THUNDERSTORM WIND 60 MPH" "THUNDERSTORM WINDS 63 MPH"
## [247] "THUNDERSTORM WIND/ TREE" "THUNDERSTORM DAMAGE TO"
## [249] "THUNDERSTORM WIND 65 MPH" "FLASH FLOOD - HEAVY RAIN"
## [251] "THUNDERSTORM WIND." "FLASH FLOOD/ STREET"
## [253] "BLOWING SNOW" "HEAVY SNOW/BLIZZARD"
## [255] "THUNDERSTORM HAIL" "THUNDERSTORM WINDSHAIL"
## [257] "LIGHTNING WAUSEON" "THUDERSTORM WINDS"
## [259] "ICE AND SNOW" "STORM FORCE WINDS"
## [261] "HEAVY SNOW/ICE" "LIGHTING"
## [263] "HIGH WIND/HEAVY SNOW" "THUNDERSTORM WINDS AND"
## [265] "HEAVY PRECIPITATION" "HIGH WIND/BLIZZARD"
## [267] "TSTM WIND DAMAGE" "FLOOD FLASH"
## [269] "RAIN/WIND" "SNOW/ICE"
## [271] "HAIL 75" "HEAT WAVE DROUGHT"
## [273] "HEAVY SNOW/BLIZZARD/AVALANCHE" "HEAT WAVES"
## [275] "UNSEASONABLY WARM AND DRY" "UNSEASONABLY COLD"
## [277] "RECORD/EXCESSIVE HEAT" "THUNDERSTORM WIND G52"
## [279] "HIGH WAVES" "FLASH FLOOD/FLOOD"
## [281] "FLOOD/FLASH" "LOW TEMPERATURE"
## [283] "HEAVY RAINS/FLOODING" "THUNDERESTORM WINDS"
## [285] "THUNDERSTORM WINDS/FLOODING" "HYPOTHERMIA"
## [287] "THUNDEERSTORM WINDS" "THUNERSTORM WINDS"
## [289] "HIGH WINDS/COLD" "COLD/WINDS"
## [291] "SNOW/ BITTER COLD" "COLD WEATHER"
## [293] "RAPIDLY RISING WATER" "WILD/FOREST FIRE"
## [295] "ICE/STRONG WINDS" "SNOW/HIGH WINDS"
## [297] "HIGH WINDS/SNOW" "SNOWMELT FLOODING"
## [299] "HEAVY SNOW AND STRONG WINDS" "SNOW ACCUMULATION"
## [301] "SNOW/ ICE" "SNOW/BLOWING SNOW"
## [303] "TORNADOES" "THUNDERSTORM WIND/HAIL"
## [305] "FREEZING DRIZZLE" "HAIL 175"
## [307] "FLASH FLOODING/FLOOD" "HAIL 275"
## [309] "HAIL 450" "EXCESSIVE RAINFALL"
## [311] "THUNDERSTORMW" "HAILSTORM"
## [313] "TSTM WINDS" "TSTMW"
## [315] "TSTM WIND 65)" "TROPICAL STORM DEAN"
## [317] "THUNDERSTORM WINDS/ FLOOD" "LANDSLIDE"
## [319] "HIGH WIND AND SEAS" "THUNDERSTORMWINDS"
## [321] "WILD/FOREST FIRES" "HEAVY SEAS"
## [323] "HAIL DAMAGE" "FLOOD & HEAVY RAIN"
## [325] "?" "THUNDERSTROM WIND"
## [327] "FLOOD/FLASHFLOOD" "HIGH WATER"
## [329] "HIGH WIND 48" "LANDSLIDES"
## [331] "URBAN/SMALL STREAM" "BRUSH FIRE"
## [333] "HEAVY SHOWER" "HEAVY SWELLS"
## [335] "URBAN SMALL" "URBAN FLOODS"
## [337] "FLASH FLOOD/LANDSLIDE" "HEAVY RAIN/SMALL STREAM URBAN"
## [339] "FLASH FLOOD LANDSLIDES" "TSTM WIND/HAIL"
## [341] "Other" "Ice jam flood (minor"
## [343] "Tstm Wind" "URBAN/SML STREAM FLD"
## [345] "ROUGH SURF" "Heavy Surf"
## [347] "Dust Devil" "Marine Accident"
## [349] "Freeze" "Strong Wind"
## [351] "COASTAL STORM" "Erosion/Cstl Flood"
## [353] "River Flooding" "Damaging Freeze"
## [355] "Beach Erosion" "High Surf"
## [357] "Heavy Rain/High Surf" "Unseasonable Cold"
## [359] "Early Frost" "Wintry Mix"
## [361] "Extreme Cold" "Coastal Flooding"
## [363] "Torrential Rainfall" "Landslump"
## [365] "Hurricane Edouard" "Coastal Storm"
## [367] "TIDAL FLOODING" "Tidal Flooding"
## [369] "Strong Winds" "EXTREME WINDCHILL"
## [371] "Glaze" "Extended Cold"
## [373] "Whirlwind" "Heavy snow shower"
## [375] "Light snow" "Light Snow"
## [377] "MIXED PRECIP" "Freezing Spray"
## [379] "DOWNBURST" "Mudslides"
## [381] "Microburst" "Mudslide"
## [383] "Cold" "Coastal Flood"
## [385] "Snow Squalls" "Wind Damage"
## [387] "Light Snowfall" "Freezing Drizzle"
## [389] "Gusty wind/rain" "GUSTY WIND/HVY RAIN"
## [391] "Wind" "Cold Temperature"
## [393] "Heat Wave" "Snow"
## [395] "COLD AND SNOW" "RAIN/SNOW"
## [397] "TSTM WIND (G45)" "Gusty Winds"
## [399] "GUSTY WIND" "TSTM WIND 40"
## [401] "TSTM WIND 45" "TSTM WIND (41)"
## [403] "TSTM WIND (G40)" "Frost/Freeze"
## [405] "AGRICULTURAL FREEZE" "OTHER"
## [407] "Hypothermia/Exposure" "HYPOTHERMIA/EXPOSURE"
## [409] "Lake Effect Snow" "Freezing Rain"
## [411] "Mixed Precipitation" "BLACK ICE"
## [413] "COASTALSTORM" "LIGHT SNOW"
## [415] "DAM BREAK" "Gusty winds"
## [417] "blowing snow" "GRADIENT WIND"
## [419] "TSTM WIND AND LIGHTNING" "gradient wind"
## [421] "Gradient wind" "Freezing drizzle"
## [423] "WET MICROBURST" "Heavy surf and wind"
## [425] "TYPHOON" "HIGH SWELLS"
## [427] "SMALL HAIL" "UNSEASONAL RAIN"
## [429] "COASTAL FLOODING/EROSION" "TSTM WIND (G45)"
## [431] "HIGH WIND (G40)" "TSTM WIND (G35)"
## [433] "COASTAL EROSION" "SEICHE"
## [435] "COASTAL FLOODING/EROSION" "HYPERTHERMIA/EXPOSURE"
## [437] "WINTRY MIX" "ROCK SLIDE"
## [439] "GUSTY WIND/HAIL" "LANDSPOUT"
## [441] "EXCESSIVE SNOW" "LAKE EFFECT SNOW"
## [443] "FLOOD/FLASH/FLOOD" "MIXED PRECIPITATION"
## [445] "WIND AND WAVE" "LIGHT FREEZING RAIN"
## [447] "ICE ROADS" "ROUGH SEAS"
## [449] "TSTM WIND G45" "NON-SEVERE WIND DAMAGE"
## [451] "WARM WEATHER" "THUNDERSTORM WIND (G40)"
## [453] "LATE SEASON SNOW" "WINTER WEATHER MIX"
## [455] "ROGUE WAVE" "FALLING SNOW/ICE"
## [457] "NON-TSTM WIND" "NON TSTM WIND"
## [459] "BLOWING DUST" "VOLCANIC ASH"
## [461] "HIGH SURF ADVISORY" "HAZARDOUS SURF"
## [463] "WHIRLWIND" "ICE ON ROAD"
## [465] "DROWNING" "EXTREME COLD/WIND CHILL"
## [467] "MARINE TSTM WIND" "HURRICANE/TYPHOON"
## [469] "WINTER WEATHER/MIX" "FROST/FREEZE"
## [471] "ASTRONOMICAL HIGH TIDE" "HEAVY SURF/HIGH SURF"
## [473] "TROPICAL DEPRESSION" "LAKE-EFFECT SNOW"
## [475] "MARINE HIGH WIND" "TSUNAMI"
## [477] "STORM SURGE/TIDE" "COLD/WIND CHILL"
## [479] "LAKESHORE FLOOD" "MARINE THUNDERSTORM WIND"
## [481] "MARINE STRONG WIND" "ASTRONOMICAL LOW TIDE"
## [483] "DENSE SMOKE" "MARINE HAIL"
## [485] "FREEZING FOG"
#
## After manually inspecting the unique values in the 'EVTYPE' column,
## it appears that each of these is a ligitimate 'storm event'....
#
## We see a bunch of 'EVTYPES' that are lowercase. Do I need to
## convert to uppercase? Let's proceed without doing that yet...
#
#
### Generate frequency table for EVTYPE
evtype_counts <- table(df_11$EVTYPE)
#evtype_counts
# Sort the frequency table in descending order
evtype_counts_sorted <- sort(evtype_counts, decreasing = TRUE)
evtype_counts_sorted
##
## TSTM WIND THUNDERSTORM WIND
## 63236 43655
## TORNADO HAIL
## 39944 26130
## FLASH FLOOD LIGHTNING
## 20968 13293
## THUNDERSTORM WINDS FLOOD
## 12086 10175
## HIGH WIND STRONG WIND
## 5522 3370
## WINTER STORM HEAVY SNOW
## 1508 1342
## HEAVY RAIN WILDFIRE
## 1105 857
## ICE STORM URBAN/SML STREAM FLD
## 708 702
## EXCESSIVE HEAT HIGH WINDS
## 698 657
## TSTM WIND/HAIL TROPICAL STORM
## 441 416
## WINTER WEATHER RIP CURRENT
## 407 400
## WILD/FOREST FIRE FLASH FLOODING
## 388 302
## FLOOD/FLASH FLOOD AVALANCHE
## 279 268
## DROUGHT BLIZZARD
## 266 253
## RIP CURRENTS HEAT
## 241 215
## EXTREME COLD LAKE-EFFECT SNOW
## 197 194
## LANDSLIDE STORM SURGE
## 193 177
## COASTAL FLOOD URBAN FLOOD
## 164 139
## WINTER WEATHER/MIX HIGH SURF
## 139 130
## HURRICANE LIGHT SNOW
## 129 119
## FROST/FREEZE EXTREME COLD/WIND CHILL
## 116 111
## MARINE TSTM WIND FOG
## 109 107
## RIVER FLOOD DUST STORM
## 107 103
## COLD/WIND CHILL DUST DEVIL
## 90 89
## WIND URBAN FLOODING
## 83 80
## DRY MICROBURST DENSE FOG
## 78 74
## HURRICANE/TYPHOON FLOODING
## 72 58
## COASTAL FLOODING SNOW
## 55 52
## HEAVY SURF/HIGH SURF STRONG WINDS
## 50 50
## STORM SURGE/TIDE WATERSPOUT
## 47 47
## MARINE STRONG WIND THUNDERSTORM WINDS HAIL
## 46 40
## TSTM WIND (G45) FREEZING RAIN
## 37 35
## TROPICAL DEPRESSION HEAT WAVE
## 35 34
## THUNDERSTORM WINDSS MARINE THUNDERSTORM WIND
## 34 33
## OTHER COLD
## 33 32
## HEAVY SURF EXCESSIVE SNOW
## 30 25
## ICE ICY ROADS
## 24 22
## LIGHT FREEZING RAIN THUNDERSTORM WINDS/HAIL
## 22 22
## GUSTY WINDS HEAVY SNOW SQUALLS
## 21 21
## Light Snow HEAVY RAINS
## 21 20
## EXTREME WINDCHILL GLAZE
## 19 19
## MARINE HIGH WIND THUNDERSTORM
## 19 19
## WINDS EXTREME HEAT
## 18 17
## FLASH FLOOD/FLOOD FREEZE
## 16 16
## SNOW SQUALL HEAVY SNOW-SQUALLS
## 16 15
## MIXED PRECIPITATION TSUNAMI
## 15 14
## FLASH FLOODS FUNNEL CLOUD
## 13 13
## GUSTY WIND RIVER FLOODING
## 13 13
## SMALL HAIL DROUGHT/EXCESSIVE HEAT
## 11 10
## Gusty Winds SNOW FREEZING RAIN
## 10 10
## SNOW SQUALLS HEAVY RAINS/FLOODING
## 10 9
## SEICHE TSTM WIND (G40)
## 9 9
## TYPHOON URBAN/SMALL STREAM FLOOD
## 9 9
## ASTRONOMICAL HIGH TIDE FLASH FLOODING/FLOOD
## 8 8
## HEAVY MIX HIGH SEAS
## 8 8
## HURRICANE OPAL WIND DAMAGE
## 8 8
## FREEZING FOG FROST
## 7 7
## HURRICANE ERIN LOW TEMPERATURE
## 7 7
## SEVERE THUNDERSTORM UNSEASONABLY WARM
## 7 7
## WATERSPOUT- Coastal Flooding
## 7 6
## Cold Dust Devil
## 6 6
## High Surf MIXED PRECIP
## 6 6
## RAIN THUNDERSTORM WINDS LIGHTNING
## 6 6
## THUNDERSTORMS WINDS WATERSPOUT/TORNADO
## 6 6
## FLASH FLOOD FROM ICE JAMS FLOOD/RAIN/WINDS
## 5 5
## GUSTNADO HIGH WINDS/COLD
## 5 5
## ICE JAM FLOODING LAKESHORE FLOOD
## 5 5
## RECORD COLD River Flooding
## 5 5
## SNOW/SLEET TORNADO F0
## 5 5
## TSTM WINDS Coastal Flood
## 5 4
## COLD WEATHER DAMAGING FREEZE
## 4 4
## FREEZING DRIZZLE FREEZING RAIN/SNOW
## 4 4
## HIGH WATER MUDSLIDE
## 4 4
## THUNDERSTORM WIND/ TREES TORNADO F1
## 4 4
## UNSEASONABLY COLD WILD FIRES
## 4 4
## AGRICULTURAL FREEZE BRUSH FIRE
## 3 3
## COASTAL FLOODING/EROSION COASTAL STORM
## 3 3
## FLOODS Gradient wind
## 3 3
## HAIL 275 HAILSTORM
## 3 3
## HEAVY SNOW/ICE HIGH WINDS/SNOW
## 3 3
## HURRICANE-GENERATED SWELLS Hypothermia/Exposure
## 3 3
## HYPOTHERMIA/EXPOSURE LAKE EFFECT SNOW
## 3 3
## LANDSLIDES MAJOR FLOOD
## 3 3
## MICROBURST Mixed Precipitation
## 3 3
## MUD SLIDE ROUGH SEAS
## 3 3
## SEVERE THUNDERSTORM WINDS SNOW AND ICE
## 3 3
## THUNDERSTORM WINDS THUNDERSTORMS WIND
## 3 3
## THUNDERTORM WINDS TIDAL FLOODING
## 3 3
## WET MICROBURST WILDFIRES
## 3 3
## WINTRY MIX ASTRONOMICAL LOW TIDE
## 3 2
## Cold Temperature DAM BREAK
## 2 2
## Damaging Freeze Erosion/Cstl Flood
## 2 2
## Extreme Cold FALLING SNOW/ICE
## 2 2
## FLASH FLOOD - HEAVY RAIN FLOOD & HEAVY RAIN
## 2 2
## Freeze Freezing Drizzle
## 2 2
## Glaze GLAZE ICE
## 2 2
## gradient wind GROUND BLIZZARD
## 2 2
## HAIL 175 HAIL/WIND
## 2 2
## HAIL/WINDS HARD FREEZE
## 2 2
## HEAVY SEAS HEAVY SNOW/SQUALLS
## 2 2
## Heavy Surf HIGH SWELLS
## 2 2
## HIGH WIND (G40) HIGH WIND DAMAGE
## 2 2
## ICE FLOES LANDSPOUT
## 2 2
## MARINE HAIL MARINE MISHAP
## 2 2
## MINOR FLOODING Mudslide
## 2 2
## RAIN/SNOW RECORD HEAT
## 2 2
## RECORD SNOW RIP CURRENTS/HEAVY SURF
## 2 2
## RIVER AND STREAM FLOOD ROUGH SURF
## 2 2
## SEVERE THUNDERSTORMS Snow
## 2 2
## SNOW AND HEAVY SNOW Snow Squalls
## 2 2
## SNOW/COLD SNOW/FREEZING RAIN
## 2 2
## SNOW/HIGH WINDS SNOW/ICE STORM
## 2 2
## SNOW/SLEET/FREEZING RAIN SNOWMELT FLOODING
## 2 2
## Strong Wind Strong Winds
## 2 2
## THUNDERSTORM WIND 60 MPH THUNDERSTORM WINDS/ FLOOD
## 2 2
## TORNADO F2 TORNADO F3
## 2 2
## TORNADOES TROPICAL STORM JERRY
## 2 2
## TSTM WIND 55 UNSEASONAL RAIN
## 2 2
## URBAN FLOODS VOLCANIC ASH
## 2 2
## Whirlwind WINTER WEATHER MIX
## 2 2
## ? APACHE COUNTY
## 1 1
## AVALANCE Beach Erosion
## 1 1
## BLACK ICE BLIZZARD/WINTER STORM
## 1 1
## BLOWING DUST blowing snow
## 1 1
## BLOWING SNOW BREAKUP FLOODING
## 1 1
## COASTAL FLOODING/EROSION COASTAL EROSION
## 1 1
## Coastal Storm COASTAL SURGE
## 1 1
## COASTALSTORM COLD AIR TORNADO
## 1 1
## COLD AND SNOW COLD AND WET CONDITIONS
## 1 1
## COLD WAVE COLD/WINDS
## 1 1
## COOL AND WET DENSE SMOKE
## 1 1
## DOWNBURST DROWNING
## 1 1
## DRY MIRCOBURST WINDS DUST DEVIL WATERSPOUT
## 1 1
## DUST STORM/HIGH WINDS Early Frost
## 1 1
## EXCESSIVE RAINFALL EXCESSIVE WETNESS
## 1 1
## Extended Cold EXTREME WIND CHILL
## 1 1
## FLASH FLOOD LANDSLIDES FLASH FLOOD WINDS
## 1 1
## FLASH FLOOD/ FLASH FLOOD/ STREET
## 1 1
## FLASH FLOOD/LANDSLIDE FLASH FLOODING/THUNDERSTORM WI
## 1 1
## FLOOD FLASH FLOOD/FLASH
## 1 1
## FLOOD/FLASH/FLOOD FLOOD/FLASHFLOOD
## 1 1
## FLOOD/RIVER FLOOD FLOODING/HEAVY RAIN
## 1 1
## FOG AND COLD TEMPERATURES FOREST FIRES
## 1 1
## Freezing drizzle Freezing Rain
## 1 1
## FREEZING RAIN/SLEET Freezing Spray
## 1 1
## Frost/Freeze FROST\\FREEZE
## 1 1
## GLAZE/ICE STORM GRADIENT WIND
## 1 1
## GRASS FIRES GUSTY WIND/HAIL
## 1 1
## GUSTY WIND/HVY RAIN Gusty wind/rain
## 1 1
## Gusty winds HAIL 0.75
## 1 1
## HAIL 075 HAIL 100
## 1 1
## HAIL 125 HAIL 150
## 1 1
## HAIL 200 HAIL 450
## 1 1
## HAIL 75 HAIL DAMAGE
## 1 1
## HAZARDOUS SURF Heat Wave
## 1 1
## HEAT WAVE DROUGHT HEAT WAVES
## 1 1
## HEAVY LAKE SNOW HEAVY PRECIPITATION
## 1 1
## HEAVY RAIN AND FLOOD Heavy Rain/High Surf
## 1 1
## HEAVY RAIN/LIGHTNING HEAVY RAIN/SEVERE WEATHER
## 1 1
## HEAVY RAIN/SMALL STREAM URBAN HEAVY RAIN/SNOW
## 1 1
## HEAVY SHOWER HEAVY SNOW AND HIGH WINDS
## 1 1
## HEAVY SNOW AND STRONG WINDS Heavy snow shower
## 1 1
## HEAVY SNOW/BLIZZARD HEAVY SNOW/BLIZZARD/AVALANCHE
## 1 1
## HEAVY SNOW/FREEZING RAIN HEAVY SNOW/HIGH WINDS & FLOOD
## 1 1
## HEAVY SNOW/WIND HEAVY SNOW/WINTER STORM
## 1 1
## HEAVY SNOWPACK Heavy surf and wind
## 1 1
## HEAVY SURF COASTAL FLOODING HEAVY SWELLS
## 1 1
## HIGH HIGH WINDS
## 1 1
## HIGH SURF ADVISORY HIGH TIDES
## 1 1
## HIGH WAVES HIGH WIND 48
## 1 1
## HIGH WIND AND SEAS HIGH WIND/BLIZZARD
## 1 1
## HIGH WIND/HEAVY SNOW HIGH WIND/SEAS
## 1 1
## HIGH WINDS HEAVY RAINS HIGH WINDS/
## 1 1
## HIGH WINDS/COASTAL FLOOD HIGH WINDS/HEAVY RAIN
## 1 1
## Hurricane Edouard HURRICANE EMILY
## 1 1
## HURRICANE FELIX HURRICANE GORDON
## 1 1
## HURRICANE OPAL/HIGH WINDS HVY RAIN
## 1 1
## HYPERTHERMIA/EXPOSURE HYPOTHERMIA
## 1 1
## ICE AND SNOW ICE JAM
## 1 1
## Ice jam flood (minor ICE ON ROAD
## 1 1
## ICE ROADS ICE STORM/FLASH FLOOD
## 1 1
## ICE/STRONG WINDS Lake Effect Snow
## 1 1
## LAKE FLOOD Landslump
## 1 1
## LATE SEASON SNOW Light snow
## 1 1
## Light Snowfall LIGHTING
## 1 1
## LIGHTNING WAUSEON LIGHTNING AND HEAVY RAIN
## 1 1
## LIGHTNING AND THUNDERSTORM WIN LIGHTNING FIRE
## 1 1
## LIGHTNING INJURY LIGHTNING THUNDERSTORM WINDS
## 1 1
## LIGHTNING. LIGHTNING/HEAVY RAIN
## 1 1
## LIGNTNING Marine Accident
## 1 1
## Microburst MICROBURST WINDS
## 1 1
## MUD SLIDES MUD SLIDES URBAN FLOODING
## 1 1
## Mudslides MUDSLIDES
## 1 1
## NON-SEVERE WIND DAMAGE NON-TSTM WIND
## 1 1
## NON TSTM WIND Other
## 1 1
## RAIN/WIND RAINSTORM
## 1 1
## RAPIDLY RISING WATER RECORD RAINFALL
## 1 1
## RECORD/EXCESSIVE HEAT ROCK SLIDE
## 1 1
## ROGUE WAVE RURAL FLOOD
## 1 1
## SEVERE TURBULENCE SLEET
## 1 1
## SLEET/ICE STORM SMALL STREAM FLOOD
## 1 1
## SNOW ACCUMULATION SNOW AND ICE STORM
## 1 1
## SNOW/ BITTER COLD SNOW/ ICE
## 1 1
## SNOW/BLOWING SNOW SNOW/HEAVY SNOW
## 1 1
## SNOW/ICE STORM FORCE WINDS
## 1 1
## THUDERSTORM WINDS THUNDEERSTORM WINDS
## 1 1
## THUNDERESTORM WINDS THUNDERSNOW
## 1 1
## THUNDERSTORM DAMAGE TO THUNDERSTORM HAIL
## 1 1
## THUNDERSTORM WIND (G40) THUNDERSTORM WIND 65 MPH
## 1 1
## THUNDERSTORM WIND 65MPH THUNDERSTORM WIND 98 MPH
## 1 1
## THUNDERSTORM WIND G50 THUNDERSTORM WIND G52
## 1 1
## THUNDERSTORM WIND G55 THUNDERSTORM WIND G60
## 1 1
## THUNDERSTORM WIND TREES THUNDERSTORM WIND.
## 1 1
## THUNDERSTORM WIND/ TREE THUNDERSTORM WIND/AWNING
## 1 1
## THUNDERSTORM WIND/HAIL THUNDERSTORM WIND/LIGHTNING
## 1 1
## THUNDERSTORM WINDS 13 THUNDERSTORM WINDS 63 MPH
## 1 1
## THUNDERSTORM WINDS AND THUNDERSTORM WINDS G60
## 1 1
## THUNDERSTORM WINDS. THUNDERSTORM WINDS/FLOODING
## 1 1
## THUNDERSTORM WINDS/FUNNEL CLOU THUNDERSTORM WINDS53
## 1 1
## THUNDERSTORM WINDSHAIL THUNDERSTORM WINS
## 1 1
## THUNDERSTORMS THUNDERSTORMW
## 1 1
## THUNDERSTORMWINDS THUNDERSTROM WIND
## 1 1
## THUNERSTORM WINDS Tidal Flooding
## 1 1
## TORNADOES, TSTM WIND, HAIL TORNDAO
## 1 1
## Torrential Rainfall TROPICAL STORM ALBERTO
## 1 1
## TROPICAL STORM DEAN TROPICAL STORM GORDON
## 1 1
## Tstm Wind TSTM WIND (G45)
## 1 1
## TSTM WIND (41) TSTM WIND (G35)
## 1 1
## TSTM WIND 40 TSTM WIND 45
## 1 1
## TSTM WIND 65) TSTM WIND AND LIGHTNING
## 1 1
## TSTM WIND DAMAGE TSTM WIND G45
## 1 1
## TSTM WIND G58 TSTMW
## 1 1
## TUNDERSTORM WIND Unseasonable Cold
## 1 1
## UNSEASONABLY WARM AND DRY URBAN AND SMALL
## 1 1
## URBAN AND SMALL STREAM FLOODIN URBAN SMALL
## 1 1
## URBAN/SMALL STREAM WARM WEATHER
## 1 1
## WATERSPOUT-TORNADO WATERSPOUT TORNADO
## 1 1
## WATERSPOUT/ TORNADO WHIRLWIND
## 1 1
## WILD/FOREST FIRES Wind
## 1 1
## WIND AND WAVE Wind Damage
## 1 1
## WIND STORM WIND/HAIL
## 1 1
## WINTER STORM HIGH WINDS WINTER STORMS
## 1 1
## Wintry Mix
## 1
#
## From the 'evtype_counts_sorted' above we see a rapid 'drop-off'
## in orders of magnitude for the frequency of many of these types
## of storms from tens-of-thousands to 100 or less.
#
#
## now let's drop all EVTYPEs with frequency less than 10;
# Drop elements with counts of 10 or less
#
evtype_counts_sorted_filtered <- evtype_counts_sorted[evtype_counts_sorted > 10]
#
# Print the filtered frequency table
print(evtype_counts_sorted_filtered)
##
## TSTM WIND THUNDERSTORM WIND TORNADO
## 63236 43655 39944
## HAIL FLASH FLOOD LIGHTNING
## 26130 20968 13293
## THUNDERSTORM WINDS FLOOD HIGH WIND
## 12086 10175 5522
## STRONG WIND WINTER STORM HEAVY SNOW
## 3370 1508 1342
## HEAVY RAIN WILDFIRE ICE STORM
## 1105 857 708
## URBAN/SML STREAM FLD EXCESSIVE HEAT HIGH WINDS
## 702 698 657
## TSTM WIND/HAIL TROPICAL STORM WINTER WEATHER
## 441 416 407
## RIP CURRENT WILD/FOREST FIRE FLASH FLOODING
## 400 388 302
## FLOOD/FLASH FLOOD AVALANCHE DROUGHT
## 279 268 266
## BLIZZARD RIP CURRENTS HEAT
## 253 241 215
## EXTREME COLD LAKE-EFFECT SNOW LANDSLIDE
## 197 194 193
## STORM SURGE COASTAL FLOOD URBAN FLOOD
## 177 164 139
## WINTER WEATHER/MIX HIGH SURF HURRICANE
## 139 130 129
## LIGHT SNOW FROST/FREEZE EXTREME COLD/WIND CHILL
## 119 116 111
## MARINE TSTM WIND FOG RIVER FLOOD
## 109 107 107
## DUST STORM COLD/WIND CHILL DUST DEVIL
## 103 90 89
## WIND URBAN FLOODING DRY MICROBURST
## 83 80 78
## DENSE FOG HURRICANE/TYPHOON FLOODING
## 74 72 58
## COASTAL FLOODING SNOW HEAVY SURF/HIGH SURF
## 55 52 50
## STRONG WINDS STORM SURGE/TIDE WATERSPOUT
## 50 47 47
## MARINE STRONG WIND THUNDERSTORM WINDS HAIL TSTM WIND (G45)
## 46 40 37
## FREEZING RAIN TROPICAL DEPRESSION HEAT WAVE
## 35 35 34
## THUNDERSTORM WINDSS MARINE THUNDERSTORM WIND OTHER
## 34 33 33
## COLD HEAVY SURF EXCESSIVE SNOW
## 32 30 25
## ICE ICY ROADS LIGHT FREEZING RAIN
## 24 22 22
## THUNDERSTORM WINDS/HAIL GUSTY WINDS HEAVY SNOW SQUALLS
## 22 21 21
## Light Snow HEAVY RAINS EXTREME WINDCHILL
## 21 20 19
## GLAZE MARINE HIGH WIND THUNDERSTORM
## 19 19 19
## WINDS EXTREME HEAT FLASH FLOOD/FLOOD
## 18 17 16
## FREEZE SNOW SQUALL HEAVY SNOW-SQUALLS
## 16 16 15
## MIXED PRECIPITATION TSUNAMI FLASH FLOODS
## 15 14 13
## FUNNEL CLOUD GUSTY WIND RIVER FLOODING
## 13 13 13
## SMALL HAIL
## 11
#
# Get the number of elements in the filtered frequency table
num_elements <- length(evtype_counts_sorted_filtered)
#
# Print the number of elements
print(num_elements)
## [1] 97
#
## We have 97 EVTYPEs with frequency of 10 or more;
# SO, I choose to extract the top 97 most frequent EVTYPEs
top_97_evtype <- head(evtype_counts_sorted, 97)
# Print the top 10 most frequent EVTYPEs
print(top_97_evtype)
##
## TSTM WIND THUNDERSTORM WIND TORNADO
## 63236 43655 39944
## HAIL FLASH FLOOD LIGHTNING
## 26130 20968 13293
## THUNDERSTORM WINDS FLOOD HIGH WIND
## 12086 10175 5522
## STRONG WIND WINTER STORM HEAVY SNOW
## 3370 1508 1342
## HEAVY RAIN WILDFIRE ICE STORM
## 1105 857 708
## URBAN/SML STREAM FLD EXCESSIVE HEAT HIGH WINDS
## 702 698 657
## TSTM WIND/HAIL TROPICAL STORM WINTER WEATHER
## 441 416 407
## RIP CURRENT WILD/FOREST FIRE FLASH FLOODING
## 400 388 302
## FLOOD/FLASH FLOOD AVALANCHE DROUGHT
## 279 268 266
## BLIZZARD RIP CURRENTS HEAT
## 253 241 215
## EXTREME COLD LAKE-EFFECT SNOW LANDSLIDE
## 197 194 193
## STORM SURGE COASTAL FLOOD URBAN FLOOD
## 177 164 139
## WINTER WEATHER/MIX HIGH SURF HURRICANE
## 139 130 129
## LIGHT SNOW FROST/FREEZE EXTREME COLD/WIND CHILL
## 119 116 111
## MARINE TSTM WIND FOG RIVER FLOOD
## 109 107 107
## DUST STORM COLD/WIND CHILL DUST DEVIL
## 103 90 89
## WIND URBAN FLOODING DRY MICROBURST
## 83 80 78
## DENSE FOG HURRICANE/TYPHOON FLOODING
## 74 72 58
## COASTAL FLOODING SNOW HEAVY SURF/HIGH SURF
## 55 52 50
## STRONG WINDS STORM SURGE/TIDE WATERSPOUT
## 50 47 47
## MARINE STRONG WIND THUNDERSTORM WINDS HAIL TSTM WIND (G45)
## 46 40 37
## FREEZING RAIN TROPICAL DEPRESSION HEAT WAVE
## 35 35 34
## THUNDERSTORM WINDSS MARINE THUNDERSTORM WIND OTHER
## 34 33 33
## COLD HEAVY SURF EXCESSIVE SNOW
## 32 30 25
## ICE ICY ROADS LIGHT FREEZING RAIN
## 24 22 22
## THUNDERSTORM WINDS/HAIL GUSTY WINDS HEAVY SNOW SQUALLS
## 22 21 21
## Light Snow HEAVY RAINS EXTREME WINDCHILL
## 21 20 19
## GLAZE MARINE HIGH WIND THUNDERSTORM
## 19 19 19
## WINDS EXTREME HEAT FLASH FLOOD/FLOOD
## 18 17 16
## FREEZE SNOW SQUALL HEAVY SNOW-SQUALLS
## 16 16 15
## MIXED PRECIPITATION TSUNAMI FLASH FLOODS
## 15 14 13
## FUNNEL CLOUD GUSTY WIND RIVER FLOODING
## 13 13 13
## SMALL HAIL
## 11
#
## NOTE: as an aside, we see by inspection that of the EVTYPEs remaining,
## NONE are lowercase.
##
## To reduce the df_11 dataframe to only include the 97 most frequently
## occurring EVTYPEs identified in the evtype_counts_sorted_filtered,
## we use the filter() function from the dplyr package.
## Here's it is:
#
library(dplyr)
# Get the names of the 97 most frequent EVTYPEs
# 1st, Convert the frequency table to a dataframe
evtype_counts_df <- as.data.frame(evtype_counts_sorted_filtered)
evtype_counts_df
## Var1 Freq
## 1 TSTM WIND 63236
## 2 THUNDERSTORM WIND 43655
## 3 TORNADO 39944
## 4 HAIL 26130
## 5 FLASH FLOOD 20968
## 6 LIGHTNING 13293
## 7 THUNDERSTORM WINDS 12086
## 8 FLOOD 10175
## 9 HIGH WIND 5522
## 10 STRONG WIND 3370
## 11 WINTER STORM 1508
## 12 HEAVY SNOW 1342
## 13 HEAVY RAIN 1105
## 14 WILDFIRE 857
## 15 ICE STORM 708
## 16 URBAN/SML STREAM FLD 702
## 17 EXCESSIVE HEAT 698
## 18 HIGH WINDS 657
## 19 TSTM WIND/HAIL 441
## 20 TROPICAL STORM 416
## 21 WINTER WEATHER 407
## 22 RIP CURRENT 400
## 23 WILD/FOREST FIRE 388
## 24 FLASH FLOODING 302
## 25 FLOOD/FLASH FLOOD 279
## 26 AVALANCHE 268
## 27 DROUGHT 266
## 28 BLIZZARD 253
## 29 RIP CURRENTS 241
## 30 HEAT 215
## 31 EXTREME COLD 197
## 32 LAKE-EFFECT SNOW 194
## 33 LANDSLIDE 193
## 34 STORM SURGE 177
## 35 COASTAL FLOOD 164
## 36 URBAN FLOOD 139
## 37 WINTER WEATHER/MIX 139
## 38 HIGH SURF 130
## 39 HURRICANE 129
## 40 LIGHT SNOW 119
## 41 FROST/FREEZE 116
## 42 EXTREME COLD/WIND CHILL 111
## 43 MARINE TSTM WIND 109
## 44 FOG 107
## 45 RIVER FLOOD 107
## 46 DUST STORM 103
## 47 COLD/WIND CHILL 90
## 48 DUST DEVIL 89
## 49 WIND 83
## 50 URBAN FLOODING 80
## 51 DRY MICROBURST 78
## 52 DENSE FOG 74
## 53 HURRICANE/TYPHOON 72
## 54 FLOODING 58
## 55 COASTAL FLOODING 55
## 56 SNOW 52
## 57 HEAVY SURF/HIGH SURF 50
## 58 STRONG WINDS 50
## 59 STORM SURGE/TIDE 47
## 60 WATERSPOUT 47
## 61 MARINE STRONG WIND 46
## 62 THUNDERSTORM WINDS HAIL 40
## 63 TSTM WIND (G45) 37
## 64 FREEZING RAIN 35
## 65 TROPICAL DEPRESSION 35
## 66 HEAT WAVE 34
## 67 THUNDERSTORM WINDSS 34
## 68 MARINE THUNDERSTORM WIND 33
## 69 OTHER 33
## 70 COLD 32
## 71 HEAVY SURF 30
## 72 EXCESSIVE SNOW 25
## 73 ICE 24
## 74 ICY ROADS 22
## 75 LIGHT FREEZING RAIN 22
## 76 THUNDERSTORM WINDS/HAIL 22
## 77 GUSTY WINDS 21
## 78 HEAVY SNOW SQUALLS 21
## 79 Light Snow 21
## 80 HEAVY RAINS 20
## 81 EXTREME WINDCHILL 19
## 82 GLAZE 19
## 83 MARINE HIGH WIND 19
## 84 THUNDERSTORM 19
## 85 WINDS 18
## 86 EXTREME HEAT 17
## 87 FLASH FLOOD/FLOOD 16
## 88 FREEZE 16
## 89 SNOW SQUALL 16
## 90 HEAVY SNOW-SQUALLS 15
## 91 MIXED PRECIPITATION 15
## 92 TSUNAMI 14
## 93 FLASH FLOODS 13
## 94 FUNNEL CLOUD 13
## 95 GUSTY WIND 13
## 96 RIVER FLOODING 13
## 97 SMALL HAIL 11
str(evtype_counts_df)
## 'data.frame': 97 obs. of 2 variables:
## $ Var1: Factor w/ 97 levels "TSTM WIND","THUNDERSTORM WIND",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Freq: int 63236 43655 39944 26130 20968 13293 12086 10175 5522 3370 ...
# Get the names of the 97 most frequent EVTYPEs
top_97_evtypes <- evtype_counts_df$Var1
top_97_evtypes
## [1] TSTM WIND THUNDERSTORM WIND TORNADO
## [4] HAIL FLASH FLOOD LIGHTNING
## [7] THUNDERSTORM WINDS FLOOD HIGH WIND
## [10] STRONG WIND WINTER STORM HEAVY SNOW
## [13] HEAVY RAIN WILDFIRE ICE STORM
## [16] URBAN/SML STREAM FLD EXCESSIVE HEAT HIGH WINDS
## [19] TSTM WIND/HAIL TROPICAL STORM WINTER WEATHER
## [22] RIP CURRENT WILD/FOREST FIRE FLASH FLOODING
## [25] FLOOD/FLASH FLOOD AVALANCHE DROUGHT
## [28] BLIZZARD RIP CURRENTS HEAT
## [31] EXTREME COLD LAKE-EFFECT SNOW LANDSLIDE
## [34] STORM SURGE COASTAL FLOOD URBAN FLOOD
## [37] WINTER WEATHER/MIX HIGH SURF HURRICANE
## [40] LIGHT SNOW FROST/FREEZE EXTREME COLD/WIND CHILL
## [43] MARINE TSTM WIND FOG RIVER FLOOD
## [46] DUST STORM COLD/WIND CHILL DUST DEVIL
## [49] WIND URBAN FLOODING DRY MICROBURST
## [52] DENSE FOG HURRICANE/TYPHOON FLOODING
## [55] COASTAL FLOODING SNOW HEAVY SURF/HIGH SURF
## [58] STRONG WINDS STORM SURGE/TIDE WATERSPOUT
## [61] MARINE STRONG WIND THUNDERSTORM WINDS HAIL TSTM WIND (G45)
## [64] FREEZING RAIN TROPICAL DEPRESSION HEAT WAVE
## [67] THUNDERSTORM WINDSS MARINE THUNDERSTORM WIND OTHER
## [70] COLD HEAVY SURF EXCESSIVE SNOW
## [73] ICE ICY ROADS LIGHT FREEZING RAIN
## [76] THUNDERSTORM WINDS/HAIL GUSTY WINDS HEAVY SNOW SQUALLS
## [79] Light Snow HEAVY RAINS EXTREME WINDCHILL
## [82] GLAZE MARINE HIGH WIND THUNDERSTORM
## [85] WINDS EXTREME HEAT FLASH FLOOD/FLOOD
## [88] FREEZE SNOW SQUALL HEAVY SNOW-SQUALLS
## [91] MIXED PRECIPITATION TSUNAMI FLASH FLOODS
## [94] FUNNEL CLOUD GUSTY WIND RIVER FLOODING
## [97] SMALL HAIL
## 97 Levels: TSTM WIND THUNDERSTORM WIND TORNADO HAIL FLASH FLOOD ... SMALL HAIL
# Filter df_11 to include only the top 97 EVTYPEs
df_22 <- df_11 %>%
filter(EVTYPE %in% top_97_evtypes)
head(df_22)
## # A tibble: 6 × 9
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP CROPDMG
## <dbl> <chr> <chr> <date> <chr> <dbl> <chr> <dbl>
## 1 1 AL MOBILE 1950-04-18 TORNADO 25 K 0
## 2 2 AL BALDWIN 1950-04-18 TORNADO 2.5 K 0
## 3 3 AL FAYETTE 1951-02-20 TORNADO 25 K 0
## 4 4 AL MADISON 1951-06-08 TORNADO 2.5 K 0
## 5 5 AL CULLMAN 1951-11-15 TORNADO 2.5 K 0
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 2.5 K 0
## # ℹ 1 more variable: CROPDMGEXP <chr>
dim(df_22)
## [1] 253844 9
dim(df_11)
## [1] 254633 9
#View(df_22)
#
# Subset the dataframe to include only the specified columns
#
df_22_reduced <- subset(df_22, select = c("REFNUM", "STATE", "COUNTYNAME", "BGN_DATE", "EVTYPE", "PROPDMG", "PROPDMGEXP"))
# Print the head of the reduced dataframe
head(df_22_reduced)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP
## <dbl> <chr> <chr> <date> <chr> <dbl> <chr>
## 1 1 AL MOBILE 1950-04-18 TORNADO 25 K
## 2 2 AL BALDWIN 1950-04-18 TORNADO 2.5 K
## 3 3 AL FAYETTE 1951-02-20 TORNADO 25 K
## 4 4 AL MADISON 1951-06-08 TORNADO 2.5 K
## 5 5 AL CULLMAN 1951-11-15 TORNADO 2.5 K
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 2.5 K
dim(df_22_reduced)
## [1] 253844 7
#View(df_22_reduced)
#
## Remove rows from df_22_reduced where the values are '0', 0.00', and 'NA'
## in the columns 'PROPDMG' and 'PROPDMGEXP', you can use the subset()
## function along with logical conditions. Here's we do it:
#
# Remove rows with '0', 0.00', and 'NA' in PROPDMG and PROPDMGEXP columns
df_22_reduced_filtered <- subset(df_22_reduced,
!(PROPDMG == 0 | PROPDMG == 0.00 | is.na(PROPDMG)
| PROPDMGEXP == "" | is.na(PROPDMGEXP)))
#
# Print the head of the filtered dataframe
head(df_22_reduced_filtered)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP
## <dbl> <chr> <chr> <date> <chr> <dbl> <chr>
## 1 1 AL MOBILE 1950-04-18 TORNADO 25 K
## 2 2 AL BALDWIN 1950-04-18 TORNADO 2.5 K
## 3 3 AL FAYETTE 1951-02-20 TORNADO 25 K
## 4 4 AL MADISON 1951-06-08 TORNADO 2.5 K
## 5 5 AL CULLMAN 1951-11-15 TORNADO 2.5 K
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 2.5 K
dim(df_22_reduced_filtered)
## [1] 238502 7
#
## Let's deal with 'PROPDMGEXP' ;
str(df_22_reduced_filtered)
## tibble [238,502 × 7] (S3: tbl_df/tbl/data.frame)
## $ REFNUM : num [1:238502] 1 2 3 4 5 6 7 8 9 10 ...
## $ STATE : chr [1:238502] "AL" "AL" "AL" "AL" ...
## $ COUNTYNAME: chr [1:238502] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ BGN_DATE : Date[1:238502], format: "1950-04-18" "1950-04-18" ...
## $ EVTYPE : chr [1:238502] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ PROPDMG : num [1:238502] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr [1:238502] "K" "K" "K" "K" ...
#
## PROPDMGEXP is type 'character';
##
## find unique valuesin the 'PROPDMGEXP' column;
unique_char <- unique(df_22_reduced_filtered$PROPDMGEXP)
#
print(unique_char)
## [1] "K" "M" "B" "+" "0" "5" "m" "6" "4" "2" "7" "3" "H" "-"
#
## We see all the characters can be identified as 'useful' exponents to the
## 'PROPDMGEXP' 'numerics. But there are 2 meaningless 'exponents'; they are
## '+' and '-'. Let's get rid of those rows.
#
## To remove rows that have the '+' and '-' characters in the 'PROPDMGEXP' column,
## you can use the subset() function with logical conditions to filter the
## dataframe. Here we do it:
#
# Filter out rows with '+' and '-' characters in PROPDMGEXP column
df_22_reduced_filtered_1 <- subset(df_22_reduced_filtered, !(PROPDMGEXP == "+" |
PROPDMGEXP == "-"))
# Print the head of the filtered dataframe
head(df_22_reduced_filtered_1)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP
## <dbl> <chr> <chr> <date> <chr> <dbl> <chr>
## 1 1 AL MOBILE 1950-04-18 TORNADO 25 K
## 2 2 AL BALDWIN 1950-04-18 TORNADO 2.5 K
## 3 3 AL FAYETTE 1951-02-20 TORNADO 25 K
## 4 4 AL MADISON 1951-06-08 TORNADO 2.5 K
## 5 5 AL CULLMAN 1951-11-15 TORNADO 2.5 K
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 2.5 K
unique_char <- unique(df_22_reduced_filtered_1$PROPDMGEXP)
#
print(unique_char)
## [1] "K" "M" "B" "0" "5" "m" "6" "4" "2" "7" "3" "H"
#
# we got rid of the '+' and '-' rows...
##
## We need to make 'df_22_reduced_filtered' name less cumbersome;
## let's use 'df_22_rf'
df_22_rf <- df_22_reduced_filtered_1
head(df_22_rf)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP
## <dbl> <chr> <chr> <date> <chr> <dbl> <chr>
## 1 1 AL MOBILE 1950-04-18 TORNADO 25 K
## 2 2 AL BALDWIN 1950-04-18 TORNADO 2.5 K
## 3 3 AL FAYETTE 1951-02-20 TORNADO 25 K
## 4 4 AL MADISON 1951-06-08 TORNADO 2.5 K
## 5 5 AL CULLMAN 1951-11-15 TORNADO 2.5 K
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 2.5 K
dim(df_22_rf)
## [1] 238498 7
#View(df_22_rf)
#
Now that we have cleared out weird symbols, we convert the good symbols in the ‘PROPDMGEXP’ column to their corresponding multiplier values, we create a mapping between the symbols and their corresponding powers, then use that mapping to perform the conversion. Here’s how we do it:
# Define a mapping between symbols and their corresponding powers
multiplier_mapping <- c("K" = 10^3, "M" = 10^6, "B" = 10^9, "0" = 10^0, "5" = 10^5,
"m" = 10^3,
"6" = 10^6, "4" = 10^4, "2" = 10^2, "7" = 10^7, "3" = 10^3,
"H" = 10^2)
# Replace symbols in PROPDMGEXP column with their corresponding multipliers
df_22_rf$PROPDMGEXP <- multiplier_mapping[as.character(df_22_rf$PROPDMGEXP)]
# Print the head of the dataframe to verify changes
head(df_22_rf)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl>
## 1 1 AL MOBILE 1950-04-18 TORNADO 25 1000
## 2 2 AL BALDWIN 1950-04-18 TORNADO 2.5 1000
## 3 3 AL FAYETTE 1951-02-20 TORNADO 25 1000
## 4 4 AL MADISON 1951-06-08 TORNADO 2.5 1000
## 5 5 AL CULLMAN 1951-11-15 TORNADO 2.5 1000
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 2.5 1000
dim(df_22_rf)
## [1] 238498 7
#View(df_22_rf)
#
## To make sure we got all types of '10^x' in PROPDMGEXP;
#
uniq_multi <- unique(df_22_rf$PROPDMGEXP)
print(uniq_multi)
## [1] 1e+03 1e+06 1e+09 1e+00 1e+05 1e+04 1e+02 1e+07
#
## look's OK...
#
## Now add a new column 'PROPDMGTOT' to the 'df_22_rf' dataframe, resulting
# from multiplying each value of the column 'PROPDMG' by its corresponding
## value in the 'PROPDMGEXP' column, we simply perform element-wise
## multiplication. Here we do it:
# Add a new column 'PROPDMGTOT' to df_filtered
df_22_rf$PROPDMGTOT <- df_22_rf$PROPDMG * df_22_rf$PROPDMGEXP
# Print the head of the dataframe to verify changes
head(df_22_rf)
## # A tibble: 6 × 8
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP PROPDMGTOT
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl> <dbl>
## 1 1 AL MOBILE 1950-04-18 TORNADO 25 1000 25000
## 2 2 AL BALDWIN 1950-04-18 TORNADO 2.5 1000 2500
## 3 3 AL FAYETTE 1951-02-20 TORNADO 25 1000 25000
## 4 4 AL MADISON 1951-06-08 TORNADO 2.5 1000 2500
## 5 5 AL CULLMAN 1951-11-15 TORNADO 2.5 1000 2500
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 2.5 1000 2500
dim(df_22_rf)
## [1] 238498 8
#View(df_22_rf)
#
## Now we need to 'sort' 'df_22_rf' using 'PROPDMGTOT' to find the
## 'dominant' storm types...w.r.t. property damage...
df_sort <- df_22_rf[order(df_22_rf$PROPDMGTOT, decreasing = TRUE), ]
print(df_sort)
## # A tibble: 238,498 × 8
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP PROPDMGTOT
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl> <dbl>
## 1 605943 CA NAPA 2006-01-01 FLOOD 115 1000000000 1.15e11
## 2 577616 LA LAZ040 - 059 - … 2005-08-29 STORM… 31.3 1000000000 3.13e10
## 3 577615 LA LAZ034>040 - 04… 2005-08-28 HURRI… 16.9 1000000000 1.69e10
## 4 581535 MS MSZ080>082 2005-08-29 STORM… 11.3 1000000000 1.13e10
## 5 569288 FL FLZ068>069 - 07… 2005-10-24 HURRI… 10 1000000000 1 e10
## 6 581533 MS MSZ068>071 - 07… 2005-08-28 HURRI… 7.35 1000000000 7.35e 9
## 7 581537 MS MSZ018>019 - 02… 2005-08-29 HURRI… 5.88 1000000000 5.88e 9
## 8 529299 FL FLZ055 - 060>06… 2004-08-13 HURRI… 5.42 1000000000 5.42e 9
## 9 444407 TX TXZ163>164 - 17… 2001-06-05 TROPI… 5.15 1000000000 5.15e 9
## 10 187564 AL ALZ001>018 1993-03-12 WINTE… 5 1000000000 5 e 9
## # ℹ 238,488 more rows
head(df_sort)
## # A tibble: 6 × 8
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP PROPDMGTOT
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl> <dbl>
## 1 605943 CA NAPA 2006-01-01 FLOOD 115 1000000000 1.15e11
## 2 577616 LA LAZ040 - 059 - 0… 2005-08-29 STORM… 31.3 1000000000 3.13e10
## 3 577615 LA LAZ034>040 - 046… 2005-08-28 HURRI… 16.9 1000000000 1.69e10
## 4 581535 MS MSZ080>082 2005-08-29 STORM… 11.3 1000000000 1.13e10
## 5 569288 FL FLZ068>069 - 072… 2005-10-24 HURRI… 10 1000000000 1 e10
## 6 581533 MS MSZ068>071 - 077… 2005-08-28 HURRI… 7.35 1000000000 7.35e 9
#View(df_sort)
#
Because the numbers at the ‘top of the sort’ are well above 1,000,000,000; Let’s reduce the ‘df_22_rf’ dataframe to show only rows where ‘PROPDMGTOT’ is greater than $100,000…
#
df_22_rf_reduced <- subset(df_sort, PROPDMGTOT > 100000)
head(df_22_rf_reduced)
## # A tibble: 6 × 8
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE PROPDMG PROPDMGEXP PROPDMGTOT
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl> <dbl>
## 1 605943 CA NAPA 2006-01-01 FLOOD 115 1000000000 1.15e11
## 2 577616 LA LAZ040 - 059 - 0… 2005-08-29 STORM… 31.3 1000000000 3.13e10
## 3 577615 LA LAZ034>040 - 046… 2005-08-28 HURRI… 16.9 1000000000 1.69e10
## 4 581535 MS MSZ080>082 2005-08-29 STORM… 11.3 1000000000 1.13e10
## 5 569288 FL FLZ068>069 - 072… 2005-10-24 HURRI… 10 1000000000 1 e10
## 6 581533 MS MSZ068>071 - 077… 2005-08-28 HURRI… 7.35 1000000000 7.35e 9
#View(df_22_rf_reduced)
dim(df_22_rf_reduced)
## [1] 34118 8
#
## we have 34,117 rows remaining. Let's pick out the storm types in EVTYPE
## and 'consolidate' them into the 'repeated main types'....
df_33 <- df_22_rf_reduced
dim(df_33)
## [1] 34118 8
#View(df_33)
#
## #
## We see our 'df_33' 'reduced' to 34,117 x 8....
#
## We now aggregate the PROPDMGTOT for each EVTYPE in the
## dataframe 'df_33', We use the group_by() and summarize() functions
## from the dplyr package:
#
# Group by EVTYPE and summarize PROPDMGTOT
df_A <- df_33 %>%
group_by(EVTYPE) %>%
summarize(PROPDMG = sum(PROPDMGTOT))
head(df_A)
## # A tibble: 6 × 2
## EVTYPE PROPDMG
## <chr> <dbl>
## 1 AVALANCHE 3125000
## 2 BLIZZARD 653215950
## 3 COASTAL FLOOD 234750000
## 4 COASTAL FLOODING 125800000
## 5 COLD 500000
## 6 COLD/WIND CHILL 1400000
dim(df_A)
## [1] 81 2
#View(df_A)
#
This code groups the dataframe ‘df_33’ by the ‘EVTYPE’ column and then calculates the total property damage for each EVTYPE using the sum() function within the summarize() function. The resulting dataframe ‘df_A’ contains the aggregated information for each EVTYPE, including the total (dollars) property damage.We also see that ‘df_A’ dimensions are reduced to something more manageable…From df_A We also see that we have several EVTYPE storm types that are ‘the same’ such as ‘FLASH FLOOD’, ‘FLASH FLOOD/FLOOD’, ‘FLASH FLOODING’, ‘FLOOD’, ‘FLOODING’, etc…we need to ‘combine’ these…e.g., we will consolidate to the single EVTYPE = ‘FLOOD’…
By inspection of the ‘df_A’ dataframe we can determine the storm types that we need to focus on for the major property damage EVTYPEs….
library(dplyr)
#
## Define the list of storm types...
#
storm_types <- c('.*TORNADO.*', '.*FLOOD.*', '.*SNOW.*', '.*WIND.*',
'.*HEAT.*', '.*SURF.*', '.*ICE.*', '.*LIGHTNING.*',
'.*WINTER.*', '.*FIRE.*', '.*HURRICANE.*', '.*HAIL.*',
'.*SURGE.*', '.*DROUGHT.*', '.*RAIN.*')
#
#
storm_types
## [1] ".*TORNADO.*" ".*FLOOD.*" ".*SNOW.*" ".*WIND.*"
## [5] ".*HEAT.*" ".*SURF.*" ".*ICE.*" ".*LIGHTNING.*"
## [9] ".*WINTER.*" ".*FIRE.*" ".*HURRICANE.*" ".*HAIL.*"
## [13] ".*SURGE.*" ".*DROUGHT.*" ".*RAIN.*"
# Create an empty list to store the resulting dataframes
df_list <- list()
df_list
## list()
# Iterate over each storm type
for (i in seq_along(storm_types)) {
# Consolidate EVTYPEs containing the current storm type into a single category
df <- if (i == 1) df_A else df_list[[i - 1]]
df_new <- df %>%
mutate(EVTYPE = ifelse(grepl(storm_types[i], EVTYPE, ignore.case = TRUE),
toupper(storm_types[i]), EVTYPE)) %>%
group_by(EVTYPE) %>%
summarize(PROPDMG = sum(PROPDMG))
# Store the resulting dataframe in the list
df_list[[i]] <- df_new
# Print dimensions and view the dataframe (optional)
cat("Dimensions of df_", i, ": ", dim(df_new), "\n")
# View(df_new)
}
## Dimensions of df_ 1 : 81 2
## Dimensions of df_ 2 : 69 2
## Dimensions of df_ 3 : 65 2
## Dimensions of df_ 4 : 46 2
## Dimensions of df_ 5 : 44 2
## Dimensions of df_ 6 : 42 2
## Dimensions of df_ 7 : 41 2
## Dimensions of df_ 8 : 41 2
## Dimensions of df_ 9 : 39 2
## Dimensions of df_ 10 : 38 2
## Dimensions of df_ 11 : 37 2
## Dimensions of df_ 12 : 37 2
## Dimensions of df_ 13 : 36 2
## Dimensions of df_ 14 : 36 2
## Dimensions of df_ 15 : 34 2
# The final dataframe after all iterations
final_df1 <- df_list[[length(df_list)]]
final_df1
## # A tibble: 34 × 2
## EVTYPE PROPDMG
## <chr> <dbl>
## 1 .*DROUGHT.* 1045373000
## 2 .*FIRE.* 7748149130
## 3 .*FLOOD.* 167280844220
## 4 .*HAIL.* 15442153000
## 5 .*HEAT.* 19329000
## 6 .*HURRICANE.* 81173079010
## 7 .*ICE.* 3945879380
## 8 .*LIGHTNING.* 709814400
## 9 .*RAIN.* 695455000
## 10 .*SNOW.* 955754900
## # ℹ 24 more rows
dim(final_df1)
## [1] 34 2
#View(final_df1)
#
## get rid of the '.*' symbols from 'final_df1$EVTYPE'...
#
final_df1$EVTYPE
## [1] ".*DROUGHT.*" ".*FIRE.*" ".*FLOOD.*"
## [4] ".*HAIL.*" ".*HEAT.*" ".*HURRICANE.*"
## [7] ".*ICE.*" ".*LIGHTNING.*" ".*RAIN.*"
## [10] ".*SNOW.*" ".*SURF.*" ".*SURGE.*"
## [13] ".*TORNADO.*" ".*WIND.*" ".*WINTER.*"
## [16] "AVALANCHE" "BLIZZARD" "COLD"
## [19] "DENSE FOG" "DRY MICROBURST" "DUST STORM"
## [22] "EXTREME COLD" "FOG" "FREEZE"
## [25] "FROST/FREEZE" "GLAZE" "LANDSLIDE"
## [28] "MIXED PRECIPITATION" "THUNDERSTORM" "TROPICAL DEPRESSION"
## [31] "TROPICAL STORM" "TSUNAMI" "URBAN/SML STREAM FLD"
## [34] "WATERSPOUT"
#
## get rid of the '.*' symbols from 'final_df$EVTYPE'...
#
head(final_df1$EVTYPE, 20)
## [1] ".*DROUGHT.*" ".*FIRE.*" ".*FLOOD.*" ".*HAIL.*"
## [5] ".*HEAT.*" ".*HURRICANE.*" ".*ICE.*" ".*LIGHTNING.*"
## [9] ".*RAIN.*" ".*SNOW.*" ".*SURF.*" ".*SURGE.*"
## [13] ".*TORNADO.*" ".*WIND.*" ".*WINTER.*" "AVALANCHE"
## [17] "BLIZZARD" "COLD" "DENSE FOG" "DRY MICROBURST"
#
## # Use gsub to extract only the keyword without surrounding symbols
#
final_df1$EVTYPE <- gsub(".*?(TORNADO|HEAT|FLOOD|SNOW|WIND|,
|SURF|ICE|LIGHTNING|,
|WINTER|FIRE|HAIL|SURGE|HURRICANE|DROUGHT|RAIN).*",
"\\1", final_df1$EVTYPE)
final_df1$EVTYPE
## [1] "DROUGHT" "FIRE" "FLOOD"
## [4] "HAIL" "HEAT" "HURRICANE"
## [7] "ICE" "LIGHTNING" "RAIN"
## [10] "SNOW" "SURF" "SURGE"
## [13] "TORNADO" "WIND" "WINTER"
## [16] "AVALANCHE" "BLIZZARD" "COLD"
## [19] "DENSE FOG" "DRY MICROBURST" "DUST STORM"
## [22] "EXTREME COLD" "FOG" "FREEZE"
## [25] "FROST/FREEZE" "GLAZE" "LANDSLIDE"
## [28] "MIXED PRECIPITATION" "THUNDERSTORM" "TROPICAL DEPRESSION"
## [31] "TROPICAL STORM" "TSUNAMI" "URBAN/SML STREAM FLD"
## [34] "WATERSPOUT"
# Print the head of the final dataframe to verify changes
head(final_df1)
## # A tibble: 6 × 2
## EVTYPE PROPDMG
## <chr> <dbl>
## 1 DROUGHT 1045373000
## 2 FIRE 7748149130
## 3 FLOOD 167280844220
## 4 HAIL 15442153000
## 5 HEAT 19329000
## 6 HURRICANE 81173079010
dim(final_df1)
## [1] 34 2
View(final_df1)
##
## #
## 'final_df1' is the 'boiled-down' dataframe we want;
#
## now 'sort'...
#
# Sort the dataframe final_df1' in descending order for total property damage...
#
final_df1 <- final_df1 %>%
arrange(desc(PROPDMG))
#
dim(final_df1)
## [1] 34 2
#View(final_df1)
#
#
## 'final_df1' is the 'boiled-down' dataframe we want for property damage;
#
## #
## At this point, to help us remember our assignment goal, let's rename
## 'final_df1'...
#
df_prop_damage <- final_df1
dim(df_prop_damage)
## [1] 34 2
#View(df_prop_damage)
#
## We see we need to get the 'df_prop_damage' column PROPDMG into
## descending order...
#
##
df_prop_damage_desc <- df_prop_damage %>%
arrange(desc(PROPDMG))
df_prop_damage_desc
## # A tibble: 34 × 2
## EVTYPE PROPDMG
## <chr> <dbl>
## 1 FLOOD 167280844220
## 2 HURRICANE 81173079010
## 3 TORNADO 56429502790
## 4 SURGE 47960122000
## 5 HAIL 15442153000
## 6 WIND 14463704710
## 7 FIRE 7748149130
## 8 TROPICAL STORM 7697140000
## 9 WINTER 6677580900
## 10 ICE 3945879380
## # ℹ 24 more rows
#
## Make a 'table'- display the df_prop_damage dataframe as a table
#
library(knitr)
#
## select only the 'EVTYPE' and 'USA_Property_Damage' columns
selected_cols <- df_prop_damage_desc[ , c('EVTYPE', 'PROPDMG')]
selected_cols
## # A tibble: 34 × 2
## EVTYPE PROPDMG
## <chr> <dbl>
## 1 FLOOD 167280844220
## 2 HURRICANE 81173079010
## 3 TORNADO 56429502790
## 4 SURGE 47960122000
## 5 HAIL 15442153000
## 6 WIND 14463704710
## 7 FIRE 7748149130
## 8 TROPICAL STORM 7697140000
## 9 WINTER 6677580900
## 10 ICE 3945879380
## # ℹ 24 more rows
#
kable(selected_cols,
col.names = c("Storm/Weather Events", "USA Property Damage, $"),
caption = "Storm/Weather Events that are Most Harmful With Respect
to USA Property Damage $ Cost")
| Storm/Weather Events | USA Property Damage, $ |
|---|---|
| FLOOD | 167280844220 |
| HURRICANE | 81173079010 |
| TORNADO | 56429502790 |
| SURGE | 47960122000 |
| HAIL | 15442153000 |
| WIND | 14463704710 |
| FIRE | 7748149130 |
| TROPICAL STORM | 7697140000 |
| WINTER | 6677580900 |
| ICE | 3945879380 |
| DROUGHT | 1045373000 |
| SNOW | 955754900 |
| LIGHTNING | 709814400 |
| RAIN | 695455000 |
| BLIZZARD | 653215950 |
| LANDSLIDE | 321251000 |
| TSUNAMI | 143854000 |
| SURF | 98825000 |
| EXTREME COLD | 66650000 |
| URBAN/SML STREAM FLD | 42967000 |
| HEAT | 19329000 |
| FOG | 11610000 |
| FROST/FREEZE | 8870000 |
| WATERSPOUT | 8700000 |
| DENSE FOG | 7785000 |
| DRY MICROBURST | 5620000 |
| AVALANCHE | 3125000 |
| DUST STORM | 2940000 |
| TROPICAL DEPRESSION | 1150000 |
| GLAZE | 650000 |
| COLD | 500000 |
| THUNDERSTORM | 500000 |
| FREEZE | 200000 |
| MIXED PRECIPITATION | 135000 |
## This will generate a nicely formatted table with the specified column names
## and a title as a caption. The dataframe 'df_prop_damage' is already sorted
## in descending order based on the 'USA Property Damage' column,
## as per the previous step.
dim(df_prop_damage_desc)
## [1] 34 2
#
By inspection of the ‘Storm/Weather Events Table’ we can see that the ‘top’ 10 to 15 types of events dominate.So determining where to cut off the number of events to display in the plot (e.g., ggplot) depends on factors such as readability and the focus of your analysis. If you have too many events, the labels may overlap or become unreadable, so it’s often a good idea to limit the number of events shown.
As a general guideline, you might consider displaying the top N events with the highest number of deaths/injuries, where N is a reasonable number that allows for clear visualization without overcrowding the plot. You can choose N based on the number of events you believe are significant or commonly recognized.
For example, you might decide to display the top 10 or top 15 events, as these are likely to be the most impactful and relevant for your analysis. However, you should adjust this number based on your specific requirements and preferences.
Once you decide on the number of events to include, you can filter the dataframe to keep only the top N events based on the ‘USA Property Damage’ column before creating the ggplot. #
#
## Here's an example of how you can filter the dataframe to keep only the
## top 12 events:
#
library(dplyr)
library(ggplot2)
#
# NOW PREPARE THE PLOT....
##
## From the Storm Damage table, we see that in the top storm events there are quite a few
## that are 'minor', e.g., HEAT, LANDSLIDE, LIGHTENING, MIXED PRECIPITATION,
## RIAN, SNOW, SURF, THUNDERSTORM, TROPICAL DEPRESSION, TSUNAMI, SMALL
## STREAM FLOOD, WATERSPOUT.
##
## We, therefore, focus on the larger impacts resulting in sizeable
## property damage costs. We chose to look at just the 'top 12'....
#
#
#
# Filter the dataframe to keep only the top 12 events
top_events <- selected_cols %>%
head(12)
top_events
## # A tibble: 12 × 2
## EVTYPE PROPDMG
## <chr> <dbl>
## 1 FLOOD 167280844220
## 2 HURRICANE 81173079010
## 3 TORNADO 56429502790
## 4 SURGE 47960122000
## 5 HAIL 15442153000
## 6 WIND 14463704710
## 7 FIRE 7748149130
## 8 TROPICAL STORM 7697140000
## 9 WINTER 6677580900
## 10 ICE 3945879380
## 11 DROUGHT 1045373000
## 12 SNOW 955754900
#
# Create ggplot
ggplot(top_events, aes(x = EVTYPE, y = PROPDMG)) +
geom_bar(stat = "identity", fill = 'red') +
labs(title = "Top 12 Most Harmful/Costly Property Damage Storm/Weather Events in USA",
x = "Storm/Weather Events - Years 1950 thru 2011",
y = "Property Damage Cost $") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
#
########### %%%%%%%%%%%%%%% &&&&&&&&&&&&&&&& ****************
## We see 'df_22' which we formed earlier in 'chunk 15'(above) has 'PROPDMG' and 'CROPDMG' features (columns);
##
## We chose to examine 'property damage' and 'crop damage' separately.
## We've just completed 'property damage'
#
#
## so let's remove property damage 1st;
## 'take out' columns 'PROPDMG' and PROPGMGEXP'...
# Subset the dataframe to include only the specified columns for 'CROPDMG'...
#
df_22_reduced1 <- subset(df_22, select = c("REFNUM", "STATE", "COUNTYNAME", "BGN_DATE", "EVTYPE",
"CROPDMG", "CROPDMGEXP"))
# Print the head of the reduced dataframe
head(df_22_reduced1)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE CROPDMG CROPDMGEXP
## <dbl> <chr> <chr> <date> <chr> <dbl> <chr>
## 1 1 AL MOBILE 1950-04-18 TORNADO 0 <NA>
## 2 2 AL BALDWIN 1950-04-18 TORNADO 0 <NA>
## 3 3 AL FAYETTE 1951-02-20 TORNADO 0 <NA>
## 4 4 AL MADISON 1951-06-08 TORNADO 0 <NA>
## 5 5 AL CULLMAN 1951-11-15 TORNADO 0 <NA>
## 6 6 AL LAUDERDALE 1951-11-15 TORNADO 0 <NA>
dim(df_22_reduced1)
## [1] 253844 7
#View(df_22_reduced1)
#
Remove rows from df_22_reduced1 where the values are ‘0’, 0.00’, and ‘NA’ in the columns ‘CROPDMG’ and ‘CROPDMGEXP’, you can use the subset() function along with logical conditions. Here we do it: #
# Remove rows with '0', 0.00', and 'NA' in CROPDMG and CROPDMGEXP columns
df_22_reduced_filtered1 <- subset(df_22_reduced1,
!(CROPDMG == 0 | CROPDMG == 0.00 | is.na(CROPDMG)
| CROPDMGEXP == "" | is.na(CROPDMGEXP)))
#
# Print the head of the filtered dataframe
head(df_22_reduced_filtered1)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE CROPDMG CROPDMGEXP
## <dbl> <chr> <chr> <date> <chr> <dbl> <chr>
## 1 188271 AL LIMESTONE 1994-06-26 THUNDERSTORM WINDS 500 K
## 2 187640 AL BIBB 1994-03-24 THUNDERSTORM WINDS 50 K
## 3 187641 AL BIBB 1994-03-24 THUNDERSTORM WINDS 50 K
## 4 187667 AL BLOUNT 1994-11-28 TORNADO 5 K
## 5 187710 AL CHAMBERS 1994-03-24 TORNADO 50 K
## 6 187737 AL CHEROKEE 1995-07-09 THUNDERSTORM WINDS/HAIL 15 K
dim(df_22_reduced_filtered1)
## [1] 21960 7
#
## Let's deal with 'CROPDMGEXP' ;
str(df_22_reduced_filtered1)
## tibble [21,960 × 7] (S3: tbl_df/tbl/data.frame)
## $ REFNUM : num [1:21960] 188271 187640 187641 187667 187710 ...
## $ STATE : chr [1:21960] "AL" "AL" "AL" "AL" ...
## $ COUNTYNAME: chr [1:21960] "LIMESTONE" "BIBB" "BIBB" "BLOUNT" ...
## $ BGN_DATE : Date[1:21960], format: "1994-06-26" "1994-03-24" ...
## $ EVTYPE : chr [1:21960] "THUNDERSTORM WINDS" "THUNDERSTORM WINDS" "THUNDERSTORM WINDS" "TORNADO" ...
## $ CROPDMG : num [1:21960] 500 50 50 5 50 15 5 50 50 0.5 ...
## $ CROPDMGEXP: chr [1:21960] "K" "K" "K" "K" ...
#
## CROPDMGEXP is type 'character';
##
## find unique valuesin the 'CROPDMGEXP' column;
unique_char <- unique(df_22_reduced_filtered1$CROPDMGEXP)
#
print(unique_char)
## [1] "K" "M" "B" "k" "0"
##
## We see all the characters can be identified as 'useful' exponents to the
## 'CROPDMG' 'numerics.
#
#
## We need to make 'df_22_reduced_filtered1' name less cumbersome;
## let's use 'df_22_rf1'
df_22_rf1 <- df_22_reduced_filtered1
head(df_22_rf1)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE CROPDMG CROPDMGEXP
## <dbl> <chr> <chr> <date> <chr> <dbl> <chr>
## 1 188271 AL LIMESTONE 1994-06-26 THUNDERSTORM WINDS 500 K
## 2 187640 AL BIBB 1994-03-24 THUNDERSTORM WINDS 50 K
## 3 187641 AL BIBB 1994-03-24 THUNDERSTORM WINDS 50 K
## 4 187667 AL BLOUNT 1994-11-28 TORNADO 5 K
## 5 187710 AL CHAMBERS 1994-03-24 TORNADO 50 K
## 6 187737 AL CHEROKEE 1995-07-09 THUNDERSTORM WINDS/HAIL 15 K
dim(df_22_rf1)
## [1] 21960 7
#View(df_22_rf1)
#
Now that we have found no ‘weird’ symbols, we convert the good symbols in the ‘CROPDMGEXP’ column to their corresponding multiplier values. We create a mapping between the symbols and their corresponding powers, then use that mapping to perform the conversion. Here’s how we do it:
# Define a mapping between symbols and their corresponding powers
multiplier_mapping <- c("K" = 10^3, "M" = 10^6, "B" = 10^9, "0" = 10^0,
"k" = 10^3)
#
#
# Replace symbols in PROPDMGEXP column with their corresponding multiplier
df_22_rf1$CROPDMGEXP <- multiplier_mapping[as.character(df_22_rf1$CROPDMGEXP)]
# Print the head of the dataframe to verify changes
head(df_22_rf1)
## # A tibble: 6 × 7
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE CROPDMG CROPDMGEXP
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl>
## 1 188271 AL LIMESTONE 1994-06-26 THUNDERSTORM WINDS 500 1000
## 2 187640 AL BIBB 1994-03-24 THUNDERSTORM WINDS 50 1000
## 3 187641 AL BIBB 1994-03-24 THUNDERSTORM WINDS 50 1000
## 4 187667 AL BLOUNT 1994-11-28 TORNADO 5 1000
## 5 187710 AL CHAMBERS 1994-03-24 TORNADO 50 1000
## 6 187737 AL CHEROKEE 1995-07-09 THUNDERSTORM WINDS/HAIL 15 1000
dim(df_22_rf1)
## [1] 21960 7
#View(df_22_rf1)
#
## To make sure we got all types of '10^x' in CROPDMGEXP;
#
uniq_multi <- unique(df_22_rf1$CROPDMGEXP)
print(uniq_multi)
## [1] 1e+03 1e+06 1e+09 1e+00
#
## look's OK...
#
## Now add a new column 'CROPDMGTOT' to the 'df_22_rf1' dataframe, resulting
# from multiplying each value of the column 'CROPDMG' by its corresponding
## value in the 'CROPDMGEXP' column, we simply perform element-wise
## multiplication. Here we do it:
# Add a new column 'CROPDMG' to df_filtered
df_22_rf1$CROPDMGTOT <- df_22_rf1$CROPDMG * df_22_rf1$CROPDMGEXP
# Print the head of the dataframe to verify changes
head(df_22_rf1)
## # A tibble: 6 × 8
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE CROPDMG CROPDMGEXP CROPDMGTOT
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl> <dbl>
## 1 188271 AL LIMESTONE 1994-06-26 THUNDERSTORM… 500 1000 500000
## 2 187640 AL BIBB 1994-03-24 THUNDERSTORM… 50 1000 50000
## 3 187641 AL BIBB 1994-03-24 THUNDERSTORM… 50 1000 50000
## 4 187667 AL BLOUNT 1994-11-28 TORNADO 5 1000 5000
## 5 187710 AL CHAMBERS 1994-03-24 TORNADO 50 1000 50000
## 6 187737 AL CHEROKEE 1995-07-09 THUNDERSTORM… 15 1000 15000
dim(df_22_rf1)
## [1] 21960 8
#View(df_22_rf1)
#
## Now we need to 'sort' 'df_22_rf1' using 'CROPDMGTOT' to find the
## 'dominant' storm types...w.r.t. property damage...
df_sort <- df_22_rf1[order(df_22_rf1$CROPDMGTOT, decreasing = TRUE), ]
print(df_sort)
## # A tibble: 21,960 × 8
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE CROPDMG CROPDMGEXP CROPDMGTOT
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl> <dbl>
## 1 198375 IL ADAMS, CALHOUN … 1993-08-31 RIVER… 5 1000000000 5000000000
## 2 211887 MS MSZ001 - 023 - … 1994-02-09 ICE S… 5 1000000000 5000000000
## 3 581537 MS MSZ018>019 - 02… 2005-08-29 HURRI… 1.51 1000000000 1510000000
## 4 639314 TX TXZ091>095 - 10… 2006-01-01 DROUG… 1 1000000000 1000000000
## 5 312976 CA CAZ020>021 1998-12-20 EXTRE… 596 1000000 596000000
## 6 422598 IA IAZ004>007 - 01… 2001-08-01 DROUG… 579. 1000000 578850000
## 7 410125 TX TXZ021>044 2000-11-01 DROUG… 515 1000000 515000000
## 8 201230 IA IAZ004>011 - 01… 1995-08-01 DROUG… 0.5 1000000000 500000000
## 9 336988 OK OKZ049 - 053 - … 1998-07-06 DROUG… 500 1000000 500000000
## 10 366653 NC NCZ007>011 - 02… 1999-09-15 HURRI… 500 1000000 500000000
## # ℹ 21,950 more rows
head(df_sort)
## # A tibble: 6 × 8
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE CROPDMG CROPDMGEXP CROPDMGTOT
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl> <dbl>
## 1 198375 IL ADAMS, CALHOUN A… 1993-08-31 RIVER… 5 1000000000 5000000000
## 2 211887 MS MSZ001 - 023 - 0… 1994-02-09 ICE S… 5 1000000000 5000000000
## 3 581537 MS MSZ018>019 - 025… 2005-08-29 HURRI… 1.51 1000000000 1510000000
## 4 639314 TX TXZ091>095 - 100… 2006-01-01 DROUG… 1 1000000000 1000000000
## 5 312976 CA CAZ020>021 1998-12-20 EXTRE… 596 1000000 596000000
## 6 422598 IA IAZ004>007 - 015… 2001-08-01 DROUG… 579. 1000000 578850000
#View(df_sort)
#
## Because the numbers at the 'top of the sort' are well above 1,000,000,000;
## Let's reduce the 'df_22_rf1' dataframe to show only rows
## where 'CROPDMGTOT' is greater than $100,000...
#
df_22_rf_reduced <- subset(df_sort, CROPDMGTOT > 100000)
head(df_22_rf_reduced)
## # A tibble: 6 × 8
## REFNUM STATE COUNTYNAME BGN_DATE EVTYPE CROPDMG CROPDMGEXP CROPDMGTOT
## <dbl> <chr> <chr> <date> <chr> <dbl> <dbl> <dbl>
## 1 198375 IL ADAMS, CALHOUN A… 1993-08-31 RIVER… 5 1000000000 5000000000
## 2 211887 MS MSZ001 - 023 - 0… 1994-02-09 ICE S… 5 1000000000 5000000000
## 3 581537 MS MSZ018>019 - 025… 2005-08-29 HURRI… 1.51 1000000000 1510000000
## 4 639314 TX TXZ091>095 - 100… 2006-01-01 DROUG… 1 1000000000 1000000000
## 5 312976 CA CAZ020>021 1998-12-20 EXTRE… 596 1000000 596000000
## 6 422598 IA IAZ004>007 - 015… 2001-08-01 DROUG… 579. 1000000 578850000
#View(df_22_rf_reduced)
dim(df_22_rf_reduced)
## [1] 4620 8
#
## we have 4,620 rows remaining. Let's pick out the storm types in EVTYPE
## and 'consolidate' them into the 'repeated main types'....
df_333 <- df_22_rf_reduced
dim(df_333)
## [1] 4620 8
#View(df_333)
#
## #
## We see our 'df_333' 'reduced' to 4,620 x 8....
#
## We now aggregate the CROPDMGTOT for each EVTYPE in the
## dataframe 'df_333', We use the group_by() and summarize() functions
## from the dplyr package:
#
# Group by EVTYPE and summarize CROPDMG
df_AA <- df_333 %>%
group_by(EVTYPE) %>%
summarize(CROPDMG = sum(CROPDMGTOT))
head(df_AA)
## # A tibble: 6 × 2
## EVTYPE CROPDMG
## <chr> <dbl>
## 1 BLIZZARD 112000000
## 2 COLD/WIND CHILL 500000
## 3 DROUGHT 13971130000
## 4 DUST STORM 3000000
## 5 EXCESSIVE HEAT 492400000
## 6 EXTREME COLD 1291724000
dim(df_AA)
## [1] 48 2
#View(df_AA)
#
This code groups the dataframe ‘df_333’ by the ‘EVTYPE’ column and then calculates the total crop damage for each EVTYPE using the sum() function within the summarize() function. The resulting dataframe ‘df_333’ contains the aggregated information for each EVTYPE, including the total (dollars) crop damage. Then on forming the ‘tibble dataframe df_AA’ we have a manageable 48 x 2 to give us our desired crop damage information.
## By inspection of the 'df_AA' dataframe we can determine the storm types that we need to focus on for the major crop damage EVTYPEs....From df_AA We also see that we have several EVTYPE storm types that are 'the same' such as 'FLASH FLOOD', 'FLASH FLOOD/FLOOD', 'FLASH FLOODING', 'FLOOD', 'FLOODING', etc...we need to 'combine' these...e.g., we will use;
##
## The regular expression .*FLOOD.* is used to match any string that contains the substring "FLOOD" within it surrounded by any other characters or none at all.
#
library(dplyr)
#
## Define the list of storm types...
#
storm_types <- c('.*TORNADO.*', '.*FLOOD.*', '.*SNOW.*', '.*WIND.*',
'.*HEAT.*', '.*SURF.*', '.*ICE.*', '.*LIGHTNING.*',
'.*WINTER.*', '.*FIRE.*', '.*HURRICANE.*', '.*HAIL.*',
'.*SURGE.*', '.*DROUGHT.*', '.*RAIN.*', '.*FREEZE.*')
#
#
storm_types
## [1] ".*TORNADO.*" ".*FLOOD.*" ".*SNOW.*" ".*WIND.*"
## [5] ".*HEAT.*" ".*SURF.*" ".*ICE.*" ".*LIGHTNING.*"
## [9] ".*WINTER.*" ".*FIRE.*" ".*HURRICANE.*" ".*HAIL.*"
## [13] ".*SURGE.*" ".*DROUGHT.*" ".*RAIN.*" ".*FREEZE.*"
# Create an empty list to store the resulting dataframes
df_list <- list()
df_list
## list()
# Iterate over each storm type
for (i in seq_along(storm_types)) {
# Consolidate EVTYPEs containing the current storm type into a single category
df <- if (i == 1) df_AA else df_list[[i - 1]]
df_new <- df %>%
mutate(EVTYPE = ifelse(grepl(storm_types[i], EVTYPE, ignore.case = TRUE),
toupper(storm_types[i]), EVTYPE)) %>%
group_by(EVTYPE) %>%
summarize(CROPDMG = sum(CROPDMG))
# Store the resulting dataframe in the list
df_list[[i]] <- df_new
# Print dimensions and view the dataframe (optional)
cat("Dimensions of df_", i, ": ", dim(df_new), "\n")
# View(df_new)
}
## Dimensions of df_ 1 : 48 2
## Dimensions of df_ 2 : 43 2
## Dimensions of df_ 3 : 43 2
## Dimensions of df_ 4 : 31 2
## Dimensions of df_ 5 : 28 2
## Dimensions of df_ 6 : 28 2
## Dimensions of df_ 7 : 28 2
## Dimensions of df_ 8 : 28 2
## Dimensions of df_ 9 : 27 2
## Dimensions of df_ 10 : 26 2
## Dimensions of df_ 11 : 25 2
## Dimensions of df_ 12 : 24 2
## Dimensions of df_ 13 : 24 2
## Dimensions of df_ 14 : 24 2
## Dimensions of df_ 15 : 23 2
## Dimensions of df_ 16 : 22 2
# The final dataframe after all iterations
final_df11 <- df_list[[length(df_list)]]
final_df11
## # A tibble: 22 × 2
## EVTYPE CROPDMG
## <chr> <dbl>
## 1 .*DROUGHT.* 13971130000
## 2 .*FIRE.* 400213900
## 3 .*FLOOD.* 12151233000
## 4 .*FREEZE.* 1539524000
## 5 .*HAIL.* 2871470200
## 6 .*HEAT.* 904050000
## 7 .*HURRICANE.* 5349405800
## 8 .*ICE.* 5021450000
## 9 .*LIGHTNING.* 10500000
## 10 .*RAIN.* 791358000
## # ℹ 12 more rows
dim(final_df11)
## [1] 22 2
View(final_df11)
#
## get rid of the '.*' symbols from 'final_df11$EVTYPE'...
#
final_df11$EVTYPE
## [1] ".*DROUGHT.*" ".*FIRE.*" ".*FLOOD.*"
## [4] ".*FREEZE.*" ".*HAIL.*" ".*HEAT.*"
## [7] ".*HURRICANE.*" ".*ICE.*" ".*LIGHTNING.*"
## [10] ".*RAIN.*" ".*SNOW.*" ".*SURGE.*"
## [13] ".*TORNADO.*" ".*WIND.*" ".*WINTER.*"
## [16] "BLIZZARD" "DUST STORM" "EXTREME COLD"
## [19] "LANDSLIDE" "THUNDERSTORM" "TROPICAL STORM"
## [22] "URBAN/SML STREAM FLD"
#
## # Use gsub to extract only the keyword without surrounding symbols
#
#
## get rid of the '.*' symbols from 'final_df$EVTYPE'...
#
head(final_df11$EVTYPE, 20)
## [1] ".*DROUGHT.*" ".*FIRE.*" ".*FLOOD.*" ".*FREEZE.*"
## [5] ".*HAIL.*" ".*HEAT.*" ".*HURRICANE.*" ".*ICE.*"
## [9] ".*LIGHTNING.*" ".*RAIN.*" ".*SNOW.*" ".*SURGE.*"
## [13] ".*TORNADO.*" ".*WIND.*" ".*WINTER.*" "BLIZZARD"
## [17] "DUST STORM" "EXTREME COLD" "LANDSLIDE" "THUNDERSTORM"
#
## # Use gsub to extract only the keyword without surrounding symbols
#
final_df11$EVTYPE <- gsub(".*?(TORNADO|HEAT|FLOOD|SNOW|WIND|,
|ICE|LIGHTNING|FREEZE|,
|WINTER|FIRE|HAIL|SURGE|HURRICANE|DROUGHT|RAIN).*",
"\\1", final_df11$EVTYPE)
final_df11$EVTYPE
## [1] "DROUGHT" "FIRE" "FLOOD"
## [4] "FREEZE" "HAIL" "HEAT"
## [7] "HURRICANE" "ICE" "LIGHTNING"
## [10] "RAIN" "SNOW" "SURGE"
## [13] "TORNADO" "WIND" "WINTER"
## [16] "BLIZZARD" "DUST STORM" "EXTREME COLD"
## [19] "LANDSLIDE" "THUNDERSTORM" "TROPICAL STORM"
## [22] "URBAN/SML STREAM FLD"
# Print the head of the final dataframe to verify changes
head(final_df11)
## # A tibble: 6 × 2
## EVTYPE CROPDMG
## <chr> <dbl>
## 1 DROUGHT 13971130000
## 2 FIRE 400213900
## 3 FLOOD 12151233000
## 4 FREEZE 1539524000
## 5 HAIL 2871470200
## 6 HEAT 904050000
dim(final_df11)
## [1] 22 2
#View(final_df11)
##
## #
## 'final_df11' is the 'boiled-down' dataframe we want;
#
## now 'sort'...
#
# Sort the dataframe final_df11' in descending order for total crop damage...
#
final_df11 <- final_df11 %>%
arrange(desc(CROPDMG))
#
#dim(final_df11)
#View(final_df11)
#
#
## 'final_df11' is the 'boiled-down' dataframe we want for crop damage;
#
## #
## At this point, to help us remember our assignment goal, let's rename
## 'final_df11'...
#
df_crop_damage <- final_df11
#dim(df_crop_damage)
#View(df_crop_damage)
#
## We see we need to get the 'df_crop_damage' column CROPDMG into
## descending order...
#
##
df_crop_damage_desc <- df_crop_damage %>%
arrange(desc(CROPDMG))
df_crop_damage_desc
## # A tibble: 22 × 2
## EVTYPE CROPDMG
## <chr> <dbl>
## 1 DROUGHT 13971130000
## 2 FLOOD 12151233000
## 3 HURRICANE 5349405800
## 4 ICE 5021450000
## 5 HAIL 2871470200
## 6 WIND 1909443700
## 7 FREEZE 1539524000
## 8 EXTREME COLD 1291724000
## 9 HEAT 904050000
## 10 RAIN 791358000
## # ℹ 12 more rows
#
## Make a 'table'- display the df_crop_damage dataframe as a table
#
library(knitr)
#
## select only the 'EVTYPE' and 'USA_Crop_Damage' columns
selected_cols <- df_crop_damage_desc[ , c('EVTYPE', 'CROPDMG')]
selected_cols
## # A tibble: 22 × 2
## EVTYPE CROPDMG
## <chr> <dbl>
## 1 DROUGHT 13971130000
## 2 FLOOD 12151233000
## 3 HURRICANE 5349405800
## 4 ICE 5021450000
## 5 HAIL 2871470200
## 6 WIND 1909443700
## 7 FREEZE 1539524000
## 8 EXTREME COLD 1291724000
## 9 HEAT 904050000
## 10 RAIN 791358000
## # ℹ 12 more rows
#
kable(selected_cols,
col.names = c("Storm/Weather Events", "USA Crop Damage, $"),
caption = "Storm/Weather Events that are Most Harmful/Costly With Respect
to USA Crop Damage, $")
| Storm/Weather Events | USA Crop Damage, $ |
|---|---|
| DROUGHT | 13971130000 |
| FLOOD | 12151233000 |
| HURRICANE | 5349405800 |
| ICE | 5021450000 |
| HAIL | 2871470200 |
| WIND | 1909443700 |
| FREEZE | 1539524000 |
| EXTREME COLD | 1291724000 |
| HEAT | 904050000 |
| RAIN | 791358000 |
| TROPICAL STORM | 677325000 |
| FIRE | 400213900 |
| TORNADO | 388976900 |
| SNOW | 134400000 |
| BLIZZARD | 112000000 |
| WINTER | 41510000 |
| LANDSLIDE | 20000000 |
| LIGHTNING | 10500000 |
| URBAN/SML STREAM FLD | 6542000 |
| DUST STORM | 3000000 |
| THUNDERSTORM | 1000000 |
| SURGE | 750000 |
## This will generate a nicely formatted table with the specified column names
## and a title as a caption. The dataframe 'df_crop_damage' is already sorted
## in descending order based on the 'USA Crop Damage' column,
## as per the previous step.
dim(df_crop_damage_desc)
## [1] 22 2
#
By inspection of the ‘Storm/Weather Events Table’ we can see that the ‘top’ 10 to 15 types of events dominate.So determining where to cut off the number of events to display in the plot (e.g., ggplot) depends on factors such as readability and the focus of your analysis. If you have too many events, the labels may overlap or become unreadable, so it’s often a good idea to limit the number of events shown.
As a general guideline, you might consider displaying the top N events with the highest amount of crop damage, where N is a reasonable number that allows for clear visualization without overcrowding the plot. You can choose N based on the number of events you believe are significant or commonly recognized.
For example, you might decide to display the top 10 or top 15 events, as these are likely to be the most impactful and relevant for your analysis. However, you should adjust this number based on your specific requirements and preferences.
Once you decide on the number of events to include, you can filter the dataframe to keep only the top N events based on the ‘USA Crop Damage’ column before creating the ggplot.
In the ‘Storm/Weather Events Table’, we see that in the top 20 storm events we chose there are quite a few that are ‘minor’, e.g., LANDSLIDE, LIGHTENING, RAIN, SNOW, THUNDERSTORM, TSUNAMI. We, therefore, focus on the larger impacts resulting in sizeable crop damage costs. We chose to look at just the ‘top 12’…. #
Here’s how you can filter the dataframe to keep only the top 12 events: #
library(dplyr)
library(ggplot2)
##
#
# NOW PREPARE THE PLOT...
# Filter the dataframe to keep only the top 12 events
top_events <- selected_cols %>%
head(12)
top_events
## # A tibble: 12 × 2
## EVTYPE CROPDMG
## <chr> <dbl>
## 1 DROUGHT 13971130000
## 2 FLOOD 12151233000
## 3 HURRICANE 5349405800
## 4 ICE 5021450000
## 5 HAIL 2871470200
## 6 WIND 1909443700
## 7 FREEZE 1539524000
## 8 EXTREME COLD 1291724000
## 9 HEAT 904050000
## 10 RAIN 791358000
## 11 TROPICAL STORM 677325000
## 12 FIRE 400213900
#
# Create ggplot
ggplot(top_events, aes(x = EVTYPE, y = CROPDMG)) +
geom_bar(stat = "identity", fill = 'orange') +
labs(title = "Top 12 Most Harmful/Costly Crop Damage Storm/Weather Events in USA, $",
x = "Storm/Weather Events - Years 1950 thru 2011",
y = "Crop Damage Cost $") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
#