Growing up in North Carolina, I vividly remember looking out the window at thunderstorms and seeing lightning and pouring rain. As I get older, I don’t notice these storms as often. In this project, I will explore reporting of severe thunderstorms and answer the question: why do I remember more thunderstorms as a child?
I will use storm events data collected by the National Oceanic and Atmospheric Administration (NOAA) to explore how storms have changed throughout time (https://www.ncdc.noaa.gov/stormevents/ftp.jsp). This data was entered by NOAA’s National Weather Service (NWS). One caveat of this data set is that it only covers significant weather phenomena such as events that cause loss of life, injuries, significant property damage, and/or disruption to commerce. This means that milder thunderstorms and rain showers will not be included in this data analysis allowing me to focus on the most noticeable storms in different areas.
The key questions to answer include the following: - Have there been more or less storms over time? - Do the number of storms change significantly across the East Coast? - What type of storms are being reported in different counties?
For this project, I will use the tidyverse, plotly, and ggthemes libraries to import, clean, analyze, and plot data.
library(tidyverse)
library(plotly)
library(ggthemes)
The NOAA Storm Events data set is very large and separated by year and between details, fatalities, and locations. I will focus on the detailed data of each documented storm event and limit the analysis from 1990 to present to cover my lifespan. The number of rows in the provided CSV files grows quickly so to manage memory of my system during import, I grouped the data by decade (1990’s, 2000’s, 2010’s). I also formatted the columns during import.
tbl_90s <- list.files(pattern = "*.*details.*199.*\\.csv$") %>%
map_df(~ read_delim(., delim = ",", col_types = cols(
BEGIN_YEARMONTH = col_double(), BEGIN_DAY = col_double(), BEGIN_TIME = col_double(),
END_YEARMONTH = col_double(), END_DAY = col_double(), END_TIME = col_double(),
EPISODE_ID = col_double(), EVENT_ID = col_double(), STATE = col_character(),
STATE_FIPS = col_double(), YEAR = col_double(), MONTH_NAME = col_character(),
EVENT_TYPE = col_character(), CZ_TYPE = col_character(), CZ_FIPS = col_double(),
CZ_NAME = col_character(), WFO = col_character(), BEGIN_DATE_TIME = col_character(),
CZ_TIMEZONE = col_character(), END_DATE_TIME = col_character(), INJURIES_DIRECT = col_double(),
INJURIES_INDIRECT = col_double(), DEATHS_DIRECT = col_double(), DEATHS_INDIRECT = col_double(),
DAMAGE_PROPERTY = col_character(), DAMAGE_CROPS = col_character(), SOURCE = col_logical(),
MAGNITUDE = col_double(), MAGNITUDE_TYPE = col_logical(), FLOOD_CAUSE = col_logical(),
CATEGORY = col_logical(), TOR_F_SCALE = col_character(), TOR_LENGTH = col_double(),
TOR_WIDTH = col_double(), TOR_OTHER_WFO = col_logical(), TOR_OTHER_CZ_STATE = col_logical(),
TOR_OTHER_CZ_FIPS = col_logical(), TOR_OTHER_CZ_NAME = col_logical(), BEGIN_RANGE = col_double(),
BEGIN_AZIMUTH = col_character(), BEGIN_LOCATION = col_character(), END_RANGE = col_double(),
END_AZIMUTH = col_character(), END_LOCATION = col_character(), BEGIN_LAT = col_double(),
BEGIN_LON = col_double(), END_LAT = col_double(), END_LON = col_double(),
EPISODE_NARRATIVE = col_character(), EVENT_NARRATIVE = col_logical(), DATA_SOURCE = col_character()),
trim_ws = TRUE))
tbl_00s <- list.files(pattern = "*.*details.*d200.*\\.csv$") %>%
map_df(~ read_delim(., delim = ",", col_types = cols(
BEGIN_YEARMONTH = col_double(), BEGIN_DAY = col_double(), BEGIN_TIME = col_double(),
END_YEARMONTH = col_double(), END_DAY = col_double(), END_TIME = col_double(),
EPISODE_ID = col_double(), EVENT_ID = col_double(), STATE = col_character(),
STATE_FIPS = col_double(), YEAR = col_double(), MONTH_NAME = col_character(),
EVENT_TYPE = col_character(), CZ_TYPE = col_character(), CZ_FIPS = col_double(),
CZ_NAME = col_character(), WFO = col_character(), BEGIN_DATE_TIME = col_character(),
CZ_TIMEZONE = col_character(), END_DATE_TIME = col_character(), INJURIES_DIRECT = col_double(),
INJURIES_INDIRECT = col_double(), DEATHS_DIRECT = col_double(), DEATHS_INDIRECT = col_double(),
DAMAGE_PROPERTY = col_character(), DAMAGE_CROPS = col_character(), SOURCE = col_logical(),
MAGNITUDE = col_double(), MAGNITUDE_TYPE = col_logical(), FLOOD_CAUSE = col_logical(),
CATEGORY = col_logical(), TOR_F_SCALE = col_character(), TOR_LENGTH = col_double(),
TOR_WIDTH = col_double(), TOR_OTHER_WFO = col_logical(), TOR_OTHER_CZ_STATE = col_logical(),
TOR_OTHER_CZ_FIPS = col_logical(), TOR_OTHER_CZ_NAME = col_logical(), BEGIN_RANGE = col_double(),
BEGIN_AZIMUTH = col_character(), BEGIN_LOCATION = col_character(), END_RANGE = col_double(),
END_AZIMUTH = col_character(), END_LOCATION = col_character(), BEGIN_LAT = col_double(),
BEGIN_LON = col_double(), END_LAT = col_double(), END_LON = col_double(),
EPISODE_NARRATIVE = col_character(), EVENT_NARRATIVE = col_logical(), DATA_SOURCE = col_character()),
trim_ws = TRUE))
tbl_10s <- list.files(pattern = "*.*details.*d201.*\\.csv$") %>%
map_df(~ read_delim(., delim = ",", col_types = cols(
BEGIN_YEARMONTH = col_double(), BEGIN_DAY = col_double(), BEGIN_TIME = col_double(),
END_YEARMONTH = col_double(), END_DAY = col_double(), END_TIME = col_double(),
EPISODE_ID = col_double(), EVENT_ID = col_double(), STATE = col_character(),
STATE_FIPS = col_double(), YEAR = col_double(), MONTH_NAME = col_character(),
EVENT_TYPE = col_character(), CZ_TYPE = col_character(), CZ_FIPS = col_double(),
CZ_NAME = col_character(), WFO = col_character(), BEGIN_DATE_TIME = col_character(),
CZ_TIMEZONE = col_character(), END_DATE_TIME = col_character(), INJURIES_DIRECT = col_double(),
INJURIES_INDIRECT = col_double(), DEATHS_DIRECT = col_double(), DEATHS_INDIRECT = col_double(),
DAMAGE_PROPERTY = col_character(), DAMAGE_CROPS = col_character(), SOURCE = col_logical(),
MAGNITUDE = col_double(), MAGNITUDE_TYPE = col_logical(), FLOOD_CAUSE = col_logical(),
CATEGORY = col_logical(), TOR_F_SCALE = col_character(), TOR_LENGTH = col_double(),
TOR_WIDTH = col_double(), TOR_OTHER_WFO = col_logical(), TOR_OTHER_CZ_STATE = col_logical(),
TOR_OTHER_CZ_FIPS = col_logical(), TOR_OTHER_CZ_NAME = col_logical(), BEGIN_RANGE = col_double(),
BEGIN_AZIMUTH = col_character(), BEGIN_LOCATION = col_character(), END_RANGE = col_double(),
END_AZIMUTH = col_character(), END_LOCATION = col_character(), BEGIN_LAT = col_double(),
BEGIN_LON = col_double(), END_LAT = col_double(), END_LON = col_double(),
EPISODE_NARRATIVE = col_character(), EVENT_NARRATIVE = col_logical(), DATA_SOURCE = col_character()),
trim_ws = TRUE))
Before cleaning the data, I used a summary to quickly assess each of the columns and what type of values they contain.
summary(tbl_90s)
## BEGIN_YEARMONTH BEGIN_DAY BEGIN_TIME END_YEARMONTH
## Min. :199001 Min. : 1.00 Min. : 0 Min. :199001
## 1st Qu.:199505 1st Qu.: 7.00 1st Qu.: 900 1st Qu.:199505
## Median :199701 Median :15.00 Median :1540 Median :199701
## Mean :199614 Mean :15.11 Mean :1357 Mean :199614
## 3rd Qu.:199806 3rd Qu.:23.00 3rd Qu.:1840 3rd Qu.:199806
## Max. :199912 Max. :31.00 Max. :2359 Max. :199912
##
## END_DAY END_TIME EPISODE_ID EVENT_ID
## Min. : 1.00 Min. : 0 Min. :1000003 Min. : 5535309
## 1st Qu.: 8.00 1st Qu.:1200 1st Qu.:2043300 1st Qu.: 5603796
## Median :16.00 Median :1630 Median :2068252 Median : 5671697
## Mean :16.09 Mean :1500 Mean :2012393 Mean : 7022320
## 3rd Qu.:24.00 3rd Qu.:1925 3rd Qu.:2084476 3rd Qu.:10063536
## Max. :31.00 Max. :2359 Max. :2414717 Max. :10358522
## NA's :81747
## STATE STATE_FIPS YEAR MONTH_NAME
## Length:269655 Min. : 1.00 Min. :1990 Length:269655
## Class :character 1st Qu.:19.00 1st Qu.:1995 Class :character
## Mode :character Median :31.00 Median :1997 Mode :character
## Mean :30.99 Mean :1996
## 3rd Qu.:45.00 3rd Qu.:1998
## Max. :99.00 Max. :1999
##
## EVENT_TYPE CZ_TYPE CZ_FIPS CZ_NAME
## Length:269655 Length:269655 Min. : 0.00 Length:269655
## Class :character Class :character 1st Qu.: 25.00 Class :character
## Mode :character Mode :character Median : 63.00 Mode :character
## Mean : 86.55
## 3rd Qu.:115.00
## Max. :840.00
##
## WFO BEGIN_DATE_TIME CZ_TIMEZONE END_DATE_TIME
## Length:269655 Length:269655 Length:269655 Length:269655
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## INJURIES_DIRECT INJURIES_INDIRECT DEATHS_DIRECT DEATHS_INDIRECT
## Min. : 0.0000 Min. :0 Min. : 0.00000 Min. :0
## 1st Qu.: 0.0000 1st Qu.:0 1st Qu.: 0.00000 1st Qu.:0
## Median : 0.0000 Median :0 Median : 0.00000 Median :0
## Mean : 0.1176 Mean :0 Mean : 0.01167 Mean :0
## 3rd Qu.: 0.0000 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.:0
## Max. :800.0000 Max. :0 Max. :93.00000 Max. :0
##
## DAMAGE_PROPERTY DAMAGE_CROPS SOURCE MAGNITUDE
## Length:269655 Length:269655 Mode:logical Min. : 0.00
## Class :character Class :character NA's:269655 1st Qu.: 0.75
## Mode :character Mode :character Median : 1.00
## Mean : 16.18
## 3rd Qu.: 50.00
## Max. :1000.00
## NA's :111225
## MAGNITUDE_TYPE FLOOD_CAUSE CATEGORY TOR_F_SCALE
## Mode:logical Mode:logical Mode:logical Length:269655
## NA's:269655 NA's:269655 NA's:269655 Class :character
## Mode :character
##
##
##
##
## TOR_LENGTH TOR_WIDTH TOR_OTHER_WFO TOR_OTHER_CZ_STATE
## Min. : 0.00 Min. : 0.0 Mode:logical Mode:logical
## 1st Qu.: 0.00 1st Qu.: 0.0 NA's:269655 NA's:269655
## Median : 0.00 Median : 0.0
## Mean : 0.43 Mean : 13.2
## 3rd Qu.: 0.00 3rd Qu.: 0.0
## Max. :2315.00 Max. :2640.0
## NA's :182436 NA's :182436
## TOR_OTHER_CZ_FIPS TOR_OTHER_CZ_NAME BEGIN_RANGE BEGIN_AZIMUTH
## Mode:logical Mode:logical Min. : 0.0 Length:269655
## NA's:269655 NA's:269655 1st Qu.: 0.0 Class :character
## Median : 0.0 Mode :character
## Mean : 2.5
## 3rd Qu.: 4.0
## Max. :3749.0
## NA's :152280
## BEGIN_LOCATION END_RANGE END_AZIMUTH END_LOCATION
## Length:269655 Min. : 0.00 Length:269655 Length:269655
## Class :character 1st Qu.: 0.00 Class :character Class :character
## Mode :character Median : 0.00 Mode :character Mode :character
## Mean : 1.89
## 3rd Qu.: 2.00
## Max. :925.00
## NA's :152665
## BEGIN_LAT BEGIN_LON END_LAT END_LON
## Min. :17.70 Min. :-159.72 Min. :17.70 Min. :-159.72
## 1st Qu.:33.93 1st Qu.: -97.77 1st Qu.:34.18 1st Qu.: -97.75
## Median :37.17 Median : -92.58 Median :37.28 Median : -92.40
## Mean :37.48 Mean : -91.46 Mean :37.61 Mean : -91.44
## 3rd Qu.:40.92 3rd Qu.: -84.35 3rd Qu.:41.03 3rd Qu.: -84.30
## Max. :49.18 Max. : -12.18 Max. :49.18 Max. : 81.51
## NA's :139813 NA's :139813 NA's :177144 NA's :177144
## EPISODE_NARRATIVE EVENT_NARRATIVE DATA_SOURCE
## Length:269655 Mode:logical Length:269655
## Class :character NA's:269655 Class :character
## Mode :character Mode :character
##
##
##
##
From this assessment, I decided to keep columns relating to the year, month, and day (BEGIN_YEARMONTH, BEGIN_DAY, END_YEARMONTH, YEAR, MONTH_NAME, END_DAY). I also kept information regarding the location of the data such as county vs marine (CZ_NAME, CZ_TYPE, STATE, BEGIN_LAT, BEGIN_LON). I decided the NOAA EVENT_ID number might be useful for distinguishing different storms. The EVENT_TYPE column will allow me to filter down to thunderstorms and the information regarding deaths and injuries might be a good measure for storm severity. The final column was FLOOD_CAUSE as an additional factor to add with the EVENT_TYPE.
I kept the data frames separated by decade because I didn’t want to combine them into a giant data frame yet. I also used the CZ_TYPE column to remove marine storms and leave only storms in counties and NWS zones. This is because I never lived at sea or on the coast to experience marine storm events. I added a column that changed the month name to a numeric value to make plotting easier. I also combined the injury and death columns because I do not plan to distinguish between direct and indirect events due to the storms. Finally, I renamed CZ_NAME to county_loc_name because that was the only non-obvious column title.
df_90s <- tbl_90s %>%
filter(CZ_TYPE != "M") %>%
mutate(injuries = INJURIES_DIRECT + INJURIES_INDIRECT,
deaths = DEATHS_DIRECT + DEATHS_INDIRECT,
month_num = match(MONTH_NAME, month.name)) %>%
select(BEGIN_YEARMONTH, BEGIN_DAY, END_YEARMONTH, END_DAY, EVENT_ID,
STATE, YEAR, MONTH_NAME, month_num, EVENT_TYPE, CZ_NAME,
injuries, deaths, FLOOD_CAUSE, BEGIN_LAT, BEGIN_LON) %>%
rename(county_loc_name = CZ_NAME)
df_00s <- tbl_00s %>%
filter(CZ_TYPE != "M") %>%
mutate(injuries = INJURIES_DIRECT + INJURIES_INDIRECT,
deaths = DEATHS_DIRECT + DEATHS_INDIRECT,
month_num = match(MONTH_NAME, month.name)) %>%
select(BEGIN_YEARMONTH, BEGIN_DAY, END_YEARMONTH, END_DAY, EVENT_ID,
STATE, YEAR, MONTH_NAME, month_num, EVENT_TYPE, CZ_NAME,
injuries, deaths, FLOOD_CAUSE, BEGIN_LAT, BEGIN_LON) %>%
rename(county_loc_name = CZ_NAME)
df_10s <- tbl_10s %>%
filter(CZ_TYPE != "M") %>%
mutate(injuries = INJURIES_DIRECT + INJURIES_INDIRECT,
deaths = DEATHS_DIRECT + DEATHS_INDIRECT,
month_num = match(MONTH_NAME, month.name)) %>%
select(BEGIN_YEARMONTH, BEGIN_DAY, END_YEARMONTH, END_DAY, EVENT_ID,
STATE, YEAR, MONTH_NAME, month_num, EVENT_TYPE, CZ_NAME,
injuries, deaths, FLOOD_CAUSE, BEGIN_LAT, BEGIN_LON) %>%
rename(county_loc_name = CZ_NAME)
Now that all of the smaller decade data frames have been cleaned and formatted, I can combine them into a final data frame to analysis.
df_full <- rbind(df_90s, df_00s, df_10s)
I wanted to first analyze the frequency of storm events over time and, out of curiosity, for each month. This was accomplished using histograms.
s1 <- ggplot(df_full, aes(x = YEAR)) +
geom_histogram(bins = 30, col = "black") +
ggtitle("Number of Storm Events per Year") +
ylab("Total Number of Storm Events") +
xlab("Year") +
theme_fivethirtyeight() +
theme(axis.title = element_text())
s2 <- ggplot(df_full, aes(x = month_num)) +
geom_histogram(bins = 12, col = "black") +
theme_fivethirtyeight() +
ggtitle("Number of Storm Events per Month") +
ylab("Total Number of Storm Events") +
xlab("Month") +
scale_x_continuous(breaks = c(3, 6, 9)) +
theme(axis.title = element_text())
s1
s2
The histograms show a dramatic increase in storm events in the late 1990s that stays consistent through 2019. It is unclear if this is due to an increase in data reporting and collection or if global warming has change severe storm patterns. The most storms occur during the summer months which is expected due to hurricane seasons and warmer air.
To understand how to best visualize the data and address how thunderstorms have changed over time, I need to look at how the data is broken up. First I looked at the states included and noticed that it was broader than the 50 states, including regions like Lake Superior or the Virgin Islands. I decided to limit the analysis to the East Coast states of the US because those are the only areas I’ve lived in.
occurences <- table(unlist(df_full$STATE))
occurences
##
## ALABAMA ALASKA AMERICAN SAMOA
## 29754 7707 401
## ARIZONA ARKANSAS ATLANTIC NORTH
## 10564 35468 7962
## ATLANTIC SOUTH CALIFORNIA COLORADO
## 4996 25640 36424
## CONNECTICUT DELAWARE DISTRICT OF COLUMBIA
## 5322 3875 748
## E PACIFIC FLORIDA GEORGIA
## 158 28733 41877
## GUAM GULF OF ALASKA GULF OF MEXICO
## 433 21 8784
## HAWAII HAWAII WATERS IDAHO
## 12774 24 8073
## ILLINOIS INDIANA IOWA
## 46209 31031 52325
## KANSAS KENTUCKY LAKE ERIE
## 69525 41257 485
## LAKE HURON LAKE MICHIGAN LAKE ONTARIO
## 485 1645 155
## LAKE ST CLAIR LAKE SUPERIOR LOUISIANA
## 316 787 20608
## MAINE MARYLAND MASSACHUSETTS
## 12342 18560 12626
## MICHIGAN MINNESOTA MISSISSIPPI
## 27133 40088 28154
## MISSOURI MONTANA NEBRASKA
## 52835 25541 44334
## NEVADA NEW HAMPSHIRE NEW JERSEY
## 5355 6584 21369
## NEW MEXICO NEW YORK NORTH CAROLINA
## 16180 40613 41293
## NORTH DAKOTA OHIO OKLAHOMA
## 23545 37903 54827
## OREGON PENNSYLVANIA PUERTO RICO
## 9161 38686 4333
## RHODE ISLAND SOUTH CAROLINA SOUTH DAKOTA
## 2020 24570 37781
## ST LAWRENCE R TENNESSEE TEXAS
## 10 33107 110194
## UTAH VERMONT VIRGIN ISLANDS
## 8492 8368 355
## VIRGINIA WASHINGTON WEST VIRGINIA
## 42635 7691 20394
## WISCONSIN WYOMING
## 33921 18650
east_coast <- c("FLORIDA", "GEORGIA", "SOUTH CAROLINA", "NORTH CAROLINA", "VIRGINIA", "MARYLAND", "DELAWARE", "NEW JERSEY", "NEW YORK", "CONNECTICUT", "RHODE ISLAND", "MASSACHUSETTS", "NEW HAMPSHIRE", "MAINE")
df_east <- df_full %>%
filter(STATE %in% east_coast)
p1 <- df_east %>%
ggplot(aes(x = STATE)) +
geom_bar(stat = "count") +
coord_flip() +
ggtitle("Number of Storm Events per East Coast State") +
xlab("East Coast State") +
ylab("Number of Storm Events") +
theme_fivethirtyeight() +
theme(axis.title = element_text())
p1
This reveals that North Carolina has experienced more storms than other East Coast states which would suggest I did grow up in a region where I might experience many storms. Additionally, Maryland has experienced less than half the normal of storm events as North Carolina. This suggests that I moved to a region with fewer storms, decreasing my likelihood of noticing a severe storm event.
Next, I looked at how the events are classified by NOAA NWS. I converted the event types to factors and counted the occurrence of each level associated with thunderstorms.
events <- as.factor(df_east$EVENT_TYPE)
levels(events)
## [1] "Astronomical Low Tide" "Avalanche"
## [3] "Blizzard" "Coastal Flood"
## [5] "Cold/Wind Chill" "Debris Flow"
## [7] "Dense Fog" "Dense Smoke"
## [9] "Drought" "Dust Devil"
## [11] "Dust Storm" "Excessive Heat"
## [13] "Extreme Cold/Wind Chill" "Flash Flood"
## [15] "Flood" "Freezing Fog"
## [17] "Frost/Freeze" "Funnel Cloud"
## [19] "Hail" "Heat"
## [21] "Heavy Rain" "Heavy Snow"
## [23] "High Surf" "High Wind"
## [25] "Hurricane" "Hurricane (Typhoon)"
## [27] "Ice Storm" "Lake-Effect Snow"
## [29] "Lakeshore Flood" "Landslide"
## [31] "Lightning" "Rip Current"
## [33] "Seiche" "Sleet"
## [35] "Storm Surge/Tide" "Strong Wind"
## [37] "Thunderstorm Wind" "THUNDERSTORM WIND/ TREE"
## [39] "THUNDERSTORM WIND/ TREES" "THUNDERSTORM WINDS HEAVY RAIN"
## [41] "THUNDERSTORM WINDS LIGHTNING" "THUNDERSTORM WINDS/FLOODING"
## [43] "Tornado" "TORNADO/WATERSPOUT"
## [45] "TORNADOES, TSTM WIND, HAIL" "Tropical Depression"
## [47] "Tropical Storm" "Tsunami"
## [49] "Waterspout" "Wildfire"
## [51] "Winter Storm" "Winter Weather"
thunderstorms <- c("Flash Flood", "Heavy Rain", "Lightning", "Thunderstorm Wind", "THUNDERSTORM WIND/ TREE",
"THUNDERSTORM WIND/ TREES", "THUNDERSTORM WINDS HEAVY RAIN", "THUNDERSTORM WINDS LIGHTNING",
"THUNDERSTORM WINDS/FLOODING", "Tropical Storm")
p2 <- df_east %>%
filter(EVENT_TYPE %in% thunderstorms) %>%
ggplot(aes(x = EVENT_TYPE)) +
geom_bar(stat = "count") +
coord_flip() +
ggtitle("Storm Event Types") +
ylab("Thunderstorm Related Events") +
xlab("Total Count") +
theme_fivethirtyeight() +
theme(axis.title = element_text())
p2
There are a few combination event types that do not seem relevant due to the very low number of occurrences.
To further address my research question, I want to look at the two states that I’ve spent the most amount of time in, North Carolina and Maryland. I want to address how the number of storm events compares each year. line chart showing x v y (year vs storms)
df_east_thunder <- df_east %>%
filter(EVENT_TYPE %in% thunderstorms)
fig1 <- df_east_thunder %>%
filter(STATE == "MARYLAND" | STATE == "NORTH CAROLINA") %>%
group_by(STATE, YEAR) %>%
summarise(n = n()) %>%
ggplot(aes(x = YEAR, y = n, color = STATE)) +
geom_line() +
ggtitle("Storm Event in NC and MD") +
ylab("Number of Thunderstorm Events") +
xlab("Year") +
theme_fivethirtyeight() +
theme(axis.title = element_text())
fig1
This plot shows that the upward trend in thunderstorms over time is consistent between North Carolina and Maryland even though North Carolina continues to have more storm events. This shows that moving to Maryland has specifically reduced my exposure to thunderstorms.
Next, I want to explore storm events reported in North Carolina and Maryland over the years. I also reduced the classification of a thunderstorm based on event types that have a significant amount of data.
fig2 <- df_east_thunder %>%
filter(STATE %in% c("NORTH CAROLINA", "MARYLAND")) %>%
filter(EVENT_TYPE %in% c("Flash Flood", "Heavy Rain", "Lightning", "Thunderstorm Wind", "Tropical Storm")) %>%
group_by(STATE, EVENT_TYPE, YEAR, .drop = FALSE) %>%
summarise(n = n()) %>%
plot_ly(
x = ~EVENT_TYPE,
y = ~n,
color = ~STATE,
frame = ~YEAR,
text = ~STATE,
type = 'bar'
)
fig2 <- fig2 %>% layout(title = "Different Thunderstorm Events over Time",
yaxis = list(title = "Number of Thunderstorm Events"),
legend = list(title="State"))
fig2
This plot allows for a more granular analysis and also shows that North Carolina consistently experiences more of each thunderstorm event over time with Thunderstorm Wind and Flash Flood being the most common. Another question that arises from this plot is why were more event types included later and how does that affect our understanding of older data. Was lightning not reported in the early 90’s or was it grouped with another event type?
For the final analysis, I made a new data frame that looks at county level data. I focused on the four counties I’ve spent the most time in which include Guilford, Orange, and Durham Counties in North Carolina and Montgomery County in Maryland.
test <- df_east_thunder %>%
filter(county_loc_name == "GUILFORD" |
county_loc_name == "ORANGE" & STATE == "NORTH CAROLINA" |
county_loc_name == "DURHAM" |
county_loc_name == "MONTGOMERY" & STATE == "MARYLAND")
I then made a plot to see if different locations at the county level can also explain my opinion that I’ve seen fewer thunderstorms as I’ve gotten older.
fig3 <- df_east_thunder %>%
filter(county_loc_name == "GUILFORD" |
county_loc_name == "ORANGE" & STATE == "NORTH CAROLINA" |
county_loc_name == "DURHAM" |
county_loc_name == "MONTGOMERY" & STATE == "MARYLAND") %>%
group_by(county_loc_name, YEAR) %>%
summarise(n = n()) %>%
ggplot(aes(x = YEAR, y = n, fill = county_loc_name)) +
geom_line() +
scale_x_continuous(breaks = c(1990, 2000, 2010)) +
facet_wrap(~county_loc_name, ncol=2) +
ggtitle("Storm Events in Different Counties") +
ylab("Number of Thunderstorm Related Events") +
xlab("Year") +
theme_fivethirtyeight() +
theme(legend.position="none", axis.title = element_text())
fig3
From this analysis, it shows that Montgomery County, the last location I moved to, actually has the highest number of events post 2017 and Guilford County, the first location I lived in, had a similar number of storms as other locations in the 1990s.
While the data showed that North Carolina has more severe thunderstorms than Maryland, suggested that my move to Maryland would explain the decrease in storms that I noticed. This is supported by weather historians that show North Carolina has had more thunderstorms than Maryland (https://www.wunderground.com/blog/weatherhistorian/thunderstorms-the-stormiest-places-in-the-usa-and-the-world.html). The county level data showed that I moved to a region with a higher number of severe storms suggesting that I should be noticed more storms outside my window. My only explanation for this would be related to a busier schedule as an adult and being distracted by work.