Introduction

Growing up in North Carolina, I vividly remember looking out the window at thunderstorms and seeing lightning and pouring rain. As I get older, I don’t notice these storms as often. In this project, I will explore reporting of severe thunderstorms and answer the question: why do I remember more thunderstorms as a child?

I will use storm events data collected by the National Oceanic and Atmospheric Administration (NOAA) to explore how storms have changed throughout time (https://www.ncdc.noaa.gov/stormevents/ftp.jsp). This data was entered by NOAA’s National Weather Service (NWS). One caveat of this data set is that it only covers significant weather phenomena such as events that cause loss of life, injuries, significant property damage, and/or disruption to commerce. This means that milder thunderstorms and rain showers will not be included in this data analysis allowing me to focus on the most noticeable storms in different areas.

The key questions to answer include the following: - Have there been more or less storms over time? - Do the number of storms change significantly across the East Coast? - What type of storms are being reported in different counties?

Load Libraries

For this project, I will use the tidyverse, plotly, and ggthemes libraries to import, clean, analyze, and plot data.

library(tidyverse)
library(plotly)
library(ggthemes)

Import Data

The NOAA Storm Events data set is very large and separated by year and between details, fatalities, and locations. I will focus on the detailed data of each documented storm event and limit the analysis from 1990 to present to cover my lifespan. The number of rows in the provided CSV files grows quickly so to manage memory of my system during import, I grouped the data by decade (1990’s, 2000’s, 2010’s). I also formatted the columns during import.

tbl_90s <- list.files(pattern = "*.*details.*199.*\\.csv$") %>% 
    map_df(~ read_delim(., delim = ",", col_types = cols(
        
        BEGIN_YEARMONTH = col_double(), BEGIN_DAY = col_double(), BEGIN_TIME = col_double(),
        END_YEARMONTH = col_double(), END_DAY = col_double(), END_TIME = col_double(),
        EPISODE_ID = col_double(), EVENT_ID = col_double(), STATE = col_character(),
        STATE_FIPS = col_double(), YEAR = col_double(), MONTH_NAME = col_character(),
        EVENT_TYPE = col_character(), CZ_TYPE = col_character(), CZ_FIPS = col_double(),
        CZ_NAME = col_character(), WFO = col_character(), BEGIN_DATE_TIME = col_character(),
        CZ_TIMEZONE = col_character(), END_DATE_TIME = col_character(), INJURIES_DIRECT = col_double(),
        INJURIES_INDIRECT = col_double(), DEATHS_DIRECT = col_double(), DEATHS_INDIRECT = col_double(),
        DAMAGE_PROPERTY = col_character(), DAMAGE_CROPS = col_character(), SOURCE = col_logical(),
        MAGNITUDE = col_double(), MAGNITUDE_TYPE = col_logical(), FLOOD_CAUSE = col_logical(),
        CATEGORY = col_logical(), TOR_F_SCALE = col_character(), TOR_LENGTH = col_double(),
        TOR_WIDTH = col_double(), TOR_OTHER_WFO = col_logical(), TOR_OTHER_CZ_STATE = col_logical(),
        TOR_OTHER_CZ_FIPS = col_logical(), TOR_OTHER_CZ_NAME = col_logical(), BEGIN_RANGE = col_double(),
        BEGIN_AZIMUTH = col_character(), BEGIN_LOCATION = col_character(), END_RANGE = col_double(),
        END_AZIMUTH = col_character(), END_LOCATION = col_character(), BEGIN_LAT = col_double(),
        BEGIN_LON = col_double(), END_LAT = col_double(), END_LON = col_double(), 
        EPISODE_NARRATIVE = col_character(), EVENT_NARRATIVE = col_logical(), DATA_SOURCE = col_character()), 
        
        trim_ws = TRUE))
tbl_00s <- list.files(pattern = "*.*details.*d200.*\\.csv$") %>% 
    map_df(~ read_delim(., delim = ",", col_types = cols(
        
        BEGIN_YEARMONTH = col_double(), BEGIN_DAY = col_double(), BEGIN_TIME = col_double(),
        END_YEARMONTH = col_double(), END_DAY = col_double(), END_TIME = col_double(),
        EPISODE_ID = col_double(), EVENT_ID = col_double(), STATE = col_character(),
        STATE_FIPS = col_double(), YEAR = col_double(), MONTH_NAME = col_character(),
        EVENT_TYPE = col_character(), CZ_TYPE = col_character(), CZ_FIPS = col_double(),
        CZ_NAME = col_character(), WFO = col_character(), BEGIN_DATE_TIME = col_character(),
        CZ_TIMEZONE = col_character(), END_DATE_TIME = col_character(), INJURIES_DIRECT = col_double(),
        INJURIES_INDIRECT = col_double(), DEATHS_DIRECT = col_double(), DEATHS_INDIRECT = col_double(),
        DAMAGE_PROPERTY = col_character(), DAMAGE_CROPS = col_character(), SOURCE = col_logical(),
        MAGNITUDE = col_double(), MAGNITUDE_TYPE = col_logical(), FLOOD_CAUSE = col_logical(),
        CATEGORY = col_logical(), TOR_F_SCALE = col_character(), TOR_LENGTH = col_double(),
        TOR_WIDTH = col_double(), TOR_OTHER_WFO = col_logical(), TOR_OTHER_CZ_STATE = col_logical(),
        TOR_OTHER_CZ_FIPS = col_logical(), TOR_OTHER_CZ_NAME = col_logical(), BEGIN_RANGE = col_double(),
        BEGIN_AZIMUTH = col_character(), BEGIN_LOCATION = col_character(), END_RANGE = col_double(),
        END_AZIMUTH = col_character(), END_LOCATION = col_character(), BEGIN_LAT = col_double(),
        BEGIN_LON = col_double(), END_LAT = col_double(), END_LON = col_double(), 
        EPISODE_NARRATIVE = col_character(), EVENT_NARRATIVE = col_logical(), DATA_SOURCE = col_character()), 
        
        trim_ws = TRUE))
tbl_10s <- list.files(pattern = "*.*details.*d201.*\\.csv$") %>% 
    map_df(~ read_delim(., delim = ",", col_types = cols(
        
        BEGIN_YEARMONTH = col_double(), BEGIN_DAY = col_double(), BEGIN_TIME = col_double(),
        END_YEARMONTH = col_double(), END_DAY = col_double(), END_TIME = col_double(),
        EPISODE_ID = col_double(), EVENT_ID = col_double(), STATE = col_character(),
        STATE_FIPS = col_double(), YEAR = col_double(), MONTH_NAME = col_character(),
        EVENT_TYPE = col_character(), CZ_TYPE = col_character(), CZ_FIPS = col_double(),
        CZ_NAME = col_character(), WFO = col_character(), BEGIN_DATE_TIME = col_character(),
        CZ_TIMEZONE = col_character(), END_DATE_TIME = col_character(), INJURIES_DIRECT = col_double(),
        INJURIES_INDIRECT = col_double(), DEATHS_DIRECT = col_double(), DEATHS_INDIRECT = col_double(),
        DAMAGE_PROPERTY = col_character(), DAMAGE_CROPS = col_character(), SOURCE = col_logical(),
        MAGNITUDE = col_double(), MAGNITUDE_TYPE = col_logical(), FLOOD_CAUSE = col_logical(),
        CATEGORY = col_logical(), TOR_F_SCALE = col_character(), TOR_LENGTH = col_double(),
        TOR_WIDTH = col_double(), TOR_OTHER_WFO = col_logical(), TOR_OTHER_CZ_STATE = col_logical(),
        TOR_OTHER_CZ_FIPS = col_logical(), TOR_OTHER_CZ_NAME = col_logical(), BEGIN_RANGE = col_double(),
        BEGIN_AZIMUTH = col_character(), BEGIN_LOCATION = col_character(), END_RANGE = col_double(),
        END_AZIMUTH = col_character(), END_LOCATION = col_character(), BEGIN_LAT = col_double(),
        BEGIN_LON = col_double(), END_LAT = col_double(), END_LON = col_double(), 
        EPISODE_NARRATIVE = col_character(), EVENT_NARRATIVE = col_logical(), DATA_SOURCE = col_character()), 
        
        trim_ws = TRUE))

Data Cleanup

Before cleaning the data, I used a summary to quickly assess each of the columns and what type of values they contain.

summary(tbl_90s)
##  BEGIN_YEARMONTH    BEGIN_DAY       BEGIN_TIME   END_YEARMONTH   
##  Min.   :199001   Min.   : 1.00   Min.   :   0   Min.   :199001  
##  1st Qu.:199505   1st Qu.: 7.00   1st Qu.: 900   1st Qu.:199505  
##  Median :199701   Median :15.00   Median :1540   Median :199701  
##  Mean   :199614   Mean   :15.11   Mean   :1357   Mean   :199614  
##  3rd Qu.:199806   3rd Qu.:23.00   3rd Qu.:1840   3rd Qu.:199806  
##  Max.   :199912   Max.   :31.00   Max.   :2359   Max.   :199912  
##                                                                  
##     END_DAY         END_TIME      EPISODE_ID         EVENT_ID       
##  Min.   : 1.00   Min.   :   0   Min.   :1000003   Min.   : 5535309  
##  1st Qu.: 8.00   1st Qu.:1200   1st Qu.:2043300   1st Qu.: 5603796  
##  Median :16.00   Median :1630   Median :2068252   Median : 5671697  
##  Mean   :16.09   Mean   :1500   Mean   :2012393   Mean   : 7022320  
##  3rd Qu.:24.00   3rd Qu.:1925   3rd Qu.:2084476   3rd Qu.:10063536  
##  Max.   :31.00   Max.   :2359   Max.   :2414717   Max.   :10358522  
##                                 NA's   :81747                       
##     STATE             STATE_FIPS         YEAR       MONTH_NAME       
##  Length:269655      Min.   : 1.00   Min.   :1990   Length:269655     
##  Class :character   1st Qu.:19.00   1st Qu.:1995   Class :character  
##  Mode  :character   Median :31.00   Median :1997   Mode  :character  
##                     Mean   :30.99   Mean   :1996                     
##                     3rd Qu.:45.00   3rd Qu.:1998                     
##                     Max.   :99.00   Max.   :1999                     
##                                                                      
##   EVENT_TYPE          CZ_TYPE             CZ_FIPS         CZ_NAME         
##  Length:269655      Length:269655      Min.   :  0.00   Length:269655     
##  Class :character   Class :character   1st Qu.: 25.00   Class :character  
##  Mode  :character   Mode  :character   Median : 63.00   Mode  :character  
##                                        Mean   : 86.55                     
##                                        3rd Qu.:115.00                     
##                                        Max.   :840.00                     
##                                                                           
##      WFO            BEGIN_DATE_TIME    CZ_TIMEZONE        END_DATE_TIME     
##  Length:269655      Length:269655      Length:269655      Length:269655     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  INJURIES_DIRECT    INJURIES_INDIRECT DEATHS_DIRECT      DEATHS_INDIRECT
##  Min.   :  0.0000   Min.   :0         Min.   : 0.00000   Min.   :0      
##  1st Qu.:  0.0000   1st Qu.:0         1st Qu.: 0.00000   1st Qu.:0      
##  Median :  0.0000   Median :0         Median : 0.00000   Median :0      
##  Mean   :  0.1176   Mean   :0         Mean   : 0.01167   Mean   :0      
##  3rd Qu.:  0.0000   3rd Qu.:0         3rd Qu.: 0.00000   3rd Qu.:0      
##  Max.   :800.0000   Max.   :0         Max.   :93.00000   Max.   :0      
##                                                                         
##  DAMAGE_PROPERTY    DAMAGE_CROPS        SOURCE          MAGNITUDE      
##  Length:269655      Length:269655      Mode:logical   Min.   :   0.00  
##  Class :character   Class :character   NA's:269655    1st Qu.:   0.75  
##  Mode  :character   Mode  :character                  Median :   1.00  
##                                                       Mean   :  16.18  
##                                                       3rd Qu.:  50.00  
##                                                       Max.   :1000.00  
##                                                       NA's   :111225   
##  MAGNITUDE_TYPE FLOOD_CAUSE    CATEGORY       TOR_F_SCALE       
##  Mode:logical   Mode:logical   Mode:logical   Length:269655     
##  NA's:269655    NA's:269655    NA's:269655    Class :character  
##                                               Mode  :character  
##                                                                 
##                                                                 
##                                                                 
##                                                                 
##    TOR_LENGTH        TOR_WIDTH      TOR_OTHER_WFO  TOR_OTHER_CZ_STATE
##  Min.   :   0.00   Min.   :   0.0   Mode:logical   Mode:logical      
##  1st Qu.:   0.00   1st Qu.:   0.0   NA's:269655    NA's:269655       
##  Median :   0.00   Median :   0.0                                    
##  Mean   :   0.43   Mean   :  13.2                                    
##  3rd Qu.:   0.00   3rd Qu.:   0.0                                    
##  Max.   :2315.00   Max.   :2640.0                                    
##  NA's   :182436    NA's   :182436                                    
##  TOR_OTHER_CZ_FIPS TOR_OTHER_CZ_NAME  BEGIN_RANGE     BEGIN_AZIMUTH     
##  Mode:logical      Mode:logical      Min.   :   0.0   Length:269655     
##  NA's:269655       NA's:269655       1st Qu.:   0.0   Class :character  
##                                      Median :   0.0   Mode  :character  
##                                      Mean   :   2.5                     
##                                      3rd Qu.:   4.0                     
##                                      Max.   :3749.0                     
##                                      NA's   :152280                     
##  BEGIN_LOCATION       END_RANGE      END_AZIMUTH        END_LOCATION      
##  Length:269655      Min.   :  0.00   Length:269655      Length:269655     
##  Class :character   1st Qu.:  0.00   Class :character   Class :character  
##  Mode  :character   Median :  0.00   Mode  :character   Mode  :character  
##                     Mean   :  1.89                                        
##                     3rd Qu.:  2.00                                        
##                     Max.   :925.00                                        
##                     NA's   :152665                                        
##    BEGIN_LAT        BEGIN_LON          END_LAT          END_LON       
##  Min.   :17.70    Min.   :-159.72   Min.   :17.70    Min.   :-159.72  
##  1st Qu.:33.93    1st Qu.: -97.77   1st Qu.:34.18    1st Qu.: -97.75  
##  Median :37.17    Median : -92.58   Median :37.28    Median : -92.40  
##  Mean   :37.48    Mean   : -91.46   Mean   :37.61    Mean   : -91.44  
##  3rd Qu.:40.92    3rd Qu.: -84.35   3rd Qu.:41.03    3rd Qu.: -84.30  
##  Max.   :49.18    Max.   : -12.18   Max.   :49.18    Max.   :  81.51  
##  NA's   :139813   NA's   :139813    NA's   :177144   NA's   :177144   
##  EPISODE_NARRATIVE  EVENT_NARRATIVE DATA_SOURCE       
##  Length:269655      Mode:logical    Length:269655     
##  Class :character   NA's:269655     Class :character  
##  Mode  :character                   Mode  :character  
##                                                       
##                                                       
##                                                       
## 

From this assessment, I decided to keep columns relating to the year, month, and day (BEGIN_YEARMONTH, BEGIN_DAY, END_YEARMONTH, YEAR, MONTH_NAME, END_DAY). I also kept information regarding the location of the data such as county vs marine (CZ_NAME, CZ_TYPE, STATE, BEGIN_LAT, BEGIN_LON). I decided the NOAA EVENT_ID number might be useful for distinguishing different storms. The EVENT_TYPE column will allow me to filter down to thunderstorms and the information regarding deaths and injuries might be a good measure for storm severity. The final column was FLOOD_CAUSE as an additional factor to add with the EVENT_TYPE.

I kept the data frames separated by decade because I didn’t want to combine them into a giant data frame yet. I also used the CZ_TYPE column to remove marine storms and leave only storms in counties and NWS zones. This is because I never lived at sea or on the coast to experience marine storm events. I added a column that changed the month name to a numeric value to make plotting easier. I also combined the injury and death columns because I do not plan to distinguish between direct and indirect events due to the storms. Finally, I renamed CZ_NAME to county_loc_name because that was the only non-obvious column title.

df_90s <- tbl_90s %>%
    
    filter(CZ_TYPE != "M") %>%
    
    mutate(injuries = INJURIES_DIRECT + INJURIES_INDIRECT,
           deaths = DEATHS_DIRECT + DEATHS_INDIRECT,
           month_num = match(MONTH_NAME, month.name)) %>%

    select(BEGIN_YEARMONTH, BEGIN_DAY, END_YEARMONTH, END_DAY, EVENT_ID, 
           STATE, YEAR, MONTH_NAME, month_num, EVENT_TYPE, CZ_NAME, 
           injuries, deaths, FLOOD_CAUSE, BEGIN_LAT, BEGIN_LON) %>%
    
    rename(county_loc_name = CZ_NAME)

df_00s <- tbl_00s %>%
    
    filter(CZ_TYPE != "M") %>%
    
    mutate(injuries = INJURIES_DIRECT + INJURIES_INDIRECT, 
           deaths = DEATHS_DIRECT + DEATHS_INDIRECT,
           month_num = match(MONTH_NAME, month.name)) %>%

    select(BEGIN_YEARMONTH, BEGIN_DAY, END_YEARMONTH, END_DAY, EVENT_ID, 
           STATE, YEAR, MONTH_NAME, month_num, EVENT_TYPE, CZ_NAME, 
           injuries, deaths, FLOOD_CAUSE, BEGIN_LAT, BEGIN_LON) %>%
    
    rename(county_loc_name = CZ_NAME)

df_10s <- tbl_10s %>%
    
    filter(CZ_TYPE != "M") %>%
    
    mutate(injuries = INJURIES_DIRECT + INJURIES_INDIRECT, 
           deaths = DEATHS_DIRECT + DEATHS_INDIRECT,
           month_num = match(MONTH_NAME, month.name)) %>%

    select(BEGIN_YEARMONTH, BEGIN_DAY, END_YEARMONTH, END_DAY, EVENT_ID, 
           STATE, YEAR, MONTH_NAME, month_num, EVENT_TYPE, CZ_NAME, 
           injuries, deaths, FLOOD_CAUSE, BEGIN_LAT, BEGIN_LON) %>%
    
    rename(county_loc_name = CZ_NAME)

Now that all of the smaller decade data frames have been cleaned and formatted, I can combine them into a final data frame to analysis.

df_full <- rbind(df_90s, df_00s, df_10s)

Statistical Analysis

I wanted to first analyze the frequency of storm events over time and, out of curiosity, for each month. This was accomplished using histograms.

s1 <- ggplot(df_full, aes(x = YEAR)) +
    geom_histogram(bins = 30, col = "black") +
    ggtitle("Number of Storm Events per Year") + 
    ylab("Total Number of Storm Events") +
    xlab("Year") + 
    theme_fivethirtyeight() + 
    theme(axis.title = element_text())

s2 <- ggplot(df_full, aes(x = month_num)) +
    geom_histogram(bins = 12, col = "black") +
    theme_fivethirtyeight() +
    ggtitle("Number of Storm Events per Month") + 
    ylab("Total Number of Storm Events") +
    xlab("Month") +
    scale_x_continuous(breaks = c(3, 6, 9)) +
    theme(axis.title = element_text())

s1

s2

The histograms show a dramatic increase in storm events in the late 1990s that stays consistent through 2019. It is unclear if this is due to an increase in data reporting and collection or if global warming has change severe storm patterns. The most storms occur during the summer months which is expected due to hurricane seasons and warmer air.

Explorartory Analysis

To understand how to best visualize the data and address how thunderstorms have changed over time, I need to look at how the data is broken up. First I looked at the states included and noticed that it was broader than the 50 states, including regions like Lake Superior or the Virgin Islands. I decided to limit the analysis to the East Coast states of the US because those are the only areas I’ve lived in.

occurences <- table(unlist(df_full$STATE))
occurences
## 
##              ALABAMA               ALASKA       AMERICAN SAMOA 
##                29754                 7707                  401 
##              ARIZONA             ARKANSAS       ATLANTIC NORTH 
##                10564                35468                 7962 
##       ATLANTIC SOUTH           CALIFORNIA             COLORADO 
##                 4996                25640                36424 
##          CONNECTICUT             DELAWARE DISTRICT OF COLUMBIA 
##                 5322                 3875                  748 
##            E PACIFIC              FLORIDA              GEORGIA 
##                  158                28733                41877 
##                 GUAM       GULF OF ALASKA       GULF OF MEXICO 
##                  433                   21                 8784 
##               HAWAII        HAWAII WATERS                IDAHO 
##                12774                   24                 8073 
##             ILLINOIS              INDIANA                 IOWA 
##                46209                31031                52325 
##               KANSAS             KENTUCKY            LAKE ERIE 
##                69525                41257                  485 
##           LAKE HURON        LAKE MICHIGAN         LAKE ONTARIO 
##                  485                 1645                  155 
##        LAKE ST CLAIR        LAKE SUPERIOR            LOUISIANA 
##                  316                  787                20608 
##                MAINE             MARYLAND        MASSACHUSETTS 
##                12342                18560                12626 
##             MICHIGAN            MINNESOTA          MISSISSIPPI 
##                27133                40088                28154 
##             MISSOURI              MONTANA             NEBRASKA 
##                52835                25541                44334 
##               NEVADA        NEW HAMPSHIRE           NEW JERSEY 
##                 5355                 6584                21369 
##           NEW MEXICO             NEW YORK       NORTH CAROLINA 
##                16180                40613                41293 
##         NORTH DAKOTA                 OHIO             OKLAHOMA 
##                23545                37903                54827 
##               OREGON         PENNSYLVANIA          PUERTO RICO 
##                 9161                38686                 4333 
##         RHODE ISLAND       SOUTH CAROLINA         SOUTH DAKOTA 
##                 2020                24570                37781 
##        ST LAWRENCE R            TENNESSEE                TEXAS 
##                   10                33107               110194 
##                 UTAH              VERMONT       VIRGIN ISLANDS 
##                 8492                 8368                  355 
##             VIRGINIA           WASHINGTON        WEST VIRGINIA 
##                42635                 7691                20394 
##            WISCONSIN              WYOMING 
##                33921                18650
east_coast <- c("FLORIDA", "GEORGIA", "SOUTH CAROLINA", "NORTH CAROLINA", "VIRGINIA", "MARYLAND", "DELAWARE", "NEW JERSEY", "NEW YORK", "CONNECTICUT", "RHODE ISLAND", "MASSACHUSETTS", "NEW HAMPSHIRE", "MAINE")
df_east <- df_full %>%
    filter(STATE %in% east_coast) 

p1 <- df_east %>%
    ggplot(aes(x = STATE)) +
    geom_bar(stat = "count") + 
    coord_flip() +
    ggtitle("Number of Storm Events per East Coast State") + 
    xlab("East Coast State") +
    ylab("Number of Storm Events") + 
    theme_fivethirtyeight() + 
    theme(axis.title = element_text())
p1

This reveals that North Carolina has experienced more storms than other East Coast states which would suggest I did grow up in a region where I might experience many storms. Additionally, Maryland has experienced less than half the normal of storm events as North Carolina. This suggests that I moved to a region with fewer storms, decreasing my likelihood of noticing a severe storm event.

Next, I looked at how the events are classified by NOAA NWS. I converted the event types to factors and counted the occurrence of each level associated with thunderstorms.

events <- as.factor(df_east$EVENT_TYPE)
levels(events)
##  [1] "Astronomical Low Tide"         "Avalanche"                    
##  [3] "Blizzard"                      "Coastal Flood"                
##  [5] "Cold/Wind Chill"               "Debris Flow"                  
##  [7] "Dense Fog"                     "Dense Smoke"                  
##  [9] "Drought"                       "Dust Devil"                   
## [11] "Dust Storm"                    "Excessive Heat"               
## [13] "Extreme Cold/Wind Chill"       "Flash Flood"                  
## [15] "Flood"                         "Freezing Fog"                 
## [17] "Frost/Freeze"                  "Funnel Cloud"                 
## [19] "Hail"                          "Heat"                         
## [21] "Heavy Rain"                    "Heavy Snow"                   
## [23] "High Surf"                     "High Wind"                    
## [25] "Hurricane"                     "Hurricane (Typhoon)"          
## [27] "Ice Storm"                     "Lake-Effect Snow"             
## [29] "Lakeshore Flood"               "Landslide"                    
## [31] "Lightning"                     "Rip Current"                  
## [33] "Seiche"                        "Sleet"                        
## [35] "Storm Surge/Tide"              "Strong Wind"                  
## [37] "Thunderstorm Wind"             "THUNDERSTORM WIND/ TREE"      
## [39] "THUNDERSTORM WIND/ TREES"      "THUNDERSTORM WINDS HEAVY RAIN"
## [41] "THUNDERSTORM WINDS LIGHTNING"  "THUNDERSTORM WINDS/FLOODING"  
## [43] "Tornado"                       "TORNADO/WATERSPOUT"           
## [45] "TORNADOES, TSTM WIND, HAIL"    "Tropical Depression"          
## [47] "Tropical Storm"                "Tsunami"                      
## [49] "Waterspout"                    "Wildfire"                     
## [51] "Winter Storm"                  "Winter Weather"
thunderstorms <- c("Flash Flood", "Heavy Rain", "Lightning", "Thunderstorm Wind", "THUNDERSTORM WIND/ TREE", 
                   "THUNDERSTORM WIND/ TREES", "THUNDERSTORM WINDS HEAVY RAIN", "THUNDERSTORM WINDS LIGHTNING",
                   "THUNDERSTORM WINDS/FLOODING", "Tropical Storm")

p2 <- df_east %>%
    filter(EVENT_TYPE %in% thunderstorms) %>%
    ggplot(aes(x = EVENT_TYPE)) +
    geom_bar(stat = "count") + 
    coord_flip() + 
    ggtitle("Storm Event Types") + 
    ylab("Thunderstorm Related Events") +
    xlab("Total Count") + 
    theme_fivethirtyeight() + 
    theme(axis.title = element_text())
p2

There are a few combination event types that do not seem relevant due to the very low number of occurrences.

Has the number of severe storm events changed during my lifetime based on location?

To further address my research question, I want to look at the two states that I’ve spent the most amount of time in, North Carolina and Maryland. I want to address how the number of storm events compares each year. line chart showing x v y (year vs storms)

df_east_thunder <- df_east %>%
    filter(EVENT_TYPE %in% thunderstorms)

fig1 <- df_east_thunder %>%
    filter(STATE == "MARYLAND" | STATE == "NORTH CAROLINA") %>%
    group_by(STATE, YEAR) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = YEAR, y = n, color = STATE)) +
    geom_line() + 
    ggtitle("Storm Event in NC and MD") + 
    ylab("Number of Thunderstorm Events") +
    xlab("Year") + 
    theme_fivethirtyeight() + 
    theme(axis.title = element_text())

fig1

This plot shows that the upward trend in thunderstorms over time is consistent between North Carolina and Maryland even though North Carolina continues to have more storm events. This shows that moving to Maryland has specifically reduced my exposure to thunderstorms.

Next, I want to explore storm events reported in North Carolina and Maryland over the years. I also reduced the classification of a thunderstorm based on event types that have a significant amount of data.

fig2 <- df_east_thunder %>%
    filter(STATE %in% c("NORTH CAROLINA", "MARYLAND")) %>%
    filter(EVENT_TYPE %in% c("Flash Flood", "Heavy Rain", "Lightning", "Thunderstorm Wind", "Tropical Storm")) %>%
    group_by(STATE, EVENT_TYPE, YEAR, .drop = FALSE) %>%
    summarise(n = n()) %>%
    plot_ly(
        x = ~EVENT_TYPE, 
        y = ~n, 
        color = ~STATE, 
        frame = ~YEAR, 
        text = ~STATE, 
        type = 'bar'
  )

fig2 <- fig2 %>% layout(title = "Different Thunderstorm Events over Time",
         yaxis = list(title = "Number of Thunderstorm Events"),
         legend = list(title="State"))

fig2

This plot allows for a more granular analysis and also shows that North Carolina consistently experiences more of each thunderstorm event over time with Thunderstorm Wind and Flash Flood being the most common. Another question that arises from this plot is why were more event types included later and how does that affect our understanding of older data. Was lightning not reported in the early 90’s or was it grouped with another event type?

For the final analysis, I made a new data frame that looks at county level data. I focused on the four counties I’ve spent the most time in which include Guilford, Orange, and Durham Counties in North Carolina and Montgomery County in Maryland.

test <- df_east_thunder %>%
    filter(county_loc_name == "GUILFORD" |
           county_loc_name == "ORANGE" & STATE == "NORTH CAROLINA" |
           county_loc_name == "DURHAM" |
           county_loc_name == "MONTGOMERY" & STATE == "MARYLAND")

I then made a plot to see if different locations at the county level can also explain my opinion that I’ve seen fewer thunderstorms as I’ve gotten older.

fig3 <- df_east_thunder %>%
    filter(county_loc_name == "GUILFORD" |
           county_loc_name == "ORANGE" & STATE == "NORTH CAROLINA" |
           county_loc_name == "DURHAM" |
           county_loc_name == "MONTGOMERY" & STATE == "MARYLAND") %>%
    group_by(county_loc_name, YEAR) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = YEAR, y = n, fill = county_loc_name)) +
    geom_line() +
    scale_x_continuous(breaks = c(1990, 2000, 2010)) +
    facet_wrap(~county_loc_name, ncol=2) +
    ggtitle("Storm Events in Different Counties") + 
    ylab("Number of Thunderstorm Related Events") +
    xlab("Year") + 
    theme_fivethirtyeight() + 
    theme(legend.position="none", axis.title = element_text())

fig3

From this analysis, it shows that Montgomery County, the last location I moved to, actually has the highest number of events post 2017 and Guilford County, the first location I lived in, had a similar number of storms as other locations in the 1990s.

Conclusion

While the data showed that North Carolina has more severe thunderstorms than Maryland, suggested that my move to Maryland would explain the decrease in storms that I noticed. This is supported by weather historians that show North Carolina has had more thunderstorms than Maryland (https://www.wunderground.com/blog/weatherhistorian/thunderstorms-the-stormiest-places-in-the-usa-and-the-world.html). The county level data showed that I moved to a region with a higher number of severe storms suggesting that I should be noticed more storms outside my window. My only explanation for this would be related to a busier schedule as an adult and being distracted by work.