The sudden onset of a worldwide pandemic has certainly not been easy for anyone. Therefore, feelings of anxiety and depression, among other negative emotions, are not completely unexpected. Whether COVID-19 has personally affected you close to home or you are watching from the sidelines, there are many factors to deal with. For this reason, I decided to take a closer look at how the rate in which people are feeling symptoms of anxiety and depression has changed throughout the past year. According to the CDC’s website, the Household Pulse Survey began in April of 2020. This data focused on different aspects according to race/ethnicity, age, sex, education, disability status, and state. As a student at the University of New Haven, I decided to focus on comparing the symptoms of anxiety or depression in Connecticut and the United States. To visualize this comparision, I utilized line and bar charts. I also included maps using the latest data at the time, from July 5th of 2021, to see how the symptoms of U.S. citizens were faring as a whole.

According to the CDC, these values were calculated by asking questions from the Patient Health Questionaire-2 (PHQ-2) and the Generalized Anxiety Disorder 2-item (GAD-2). For example, the questions asked if the respondent has experienced any of the following over the last seven days:

The respondents were given a scale to rate the frequency in which these symptoms occurred, starting with 0 being “not at all” and ending with 3 being “nearly every day”. From there, “the two responses for each scale are added together” (“Mental Health - Household Pulse Survey”). The exact calculations for the final values were not provided.

library(dplyr)
library(ggplot2)
library(hrbrthemes)
library(extrafont)
library(maps)
library(viridis)
library(rvest)
library(leaflet)
anxdep <- read.csv("anxdep.csv")
anxdep <- anxdep%>%
  mutate(Date = forcats::fct_relevel(Date, "2019", "April 23-May 4, 2020", "May 7-May 12, 2020", "May 14-May 19, 2020", "May 21-May 26, 2020"))
line_chart2 <- ggplot(anxdep,
                      aes(x = Date,
                          y = Value, group = 1)) +
  geom_line(color="#A8A7A7", size = 1) +
  geom_point(color="#808080") +
  theme_ipsum() +
  theme(text=element_text(family="Verdana", size=12)) +
  ggtitle("Symptoms of Anxiety or Depression (2019 - 2020)", subtitle = "Source: Household Pulse Survey, U.S. Census Bureau")
line_chart2

anxiety <- read.csv("anxiety.csv")
anxiety <- anxiety%>%
  mutate(Date = forcats::fct_relevel(Date, "2019", "April 23-May 4, 2020", "May 7-May 12, 2020", "May 14-May 19, 2020", "May 21-May 26, 2020"))
line_chart3 <- ggplot(anxiety,
                      aes(x = Date,
                          y = Value, group = 1)) +
  geom_line(color="#CC527A", size = 1) +
  geom_point(color="#808080") +
  theme_ipsum() +
  theme(text=element_text(family="Verdana", size=12)) +
  ggtitle("Symptoms of Anxiety (2019 - 2020)", subtitle = "Source: Household Pulse Survey, U.S. Census Bureau")
line_chart3

depression <- read.csv("depression.csv")
depression <- depression%>%
  mutate(Date = forcats::fct_relevel(Date, "2019", "April 23-May 4, 2020", "May 7-May 12, 2020", "May 14-May 19, 2020", "May 21-May 26, 2020"))
line_chart4 <- ggplot(depression,
                      aes(x = Date,
                          y = Value, group = 1)) +
  geom_line(color="#E8175D", size = 1) +
  geom_point(color="#808080") +
  theme_ipsum() +
  theme(text=element_text(family="Verdana", size=12)) +
  ggtitle("Symptoms of Depression (2019 - 2020)", subtitle = "Source: Household Pulse Survey, U.S. Census Bureau")
line_chart4

These line graphs were adapted from Table 1 in U.S. Census Bureau-assessed prevalence of anxiety and depressive symptoms in 2019 and during the 2020 COVID-19 pandemic (Twenge and Joiner). Prior to May of 2020, the survey data was not collected in the same manner. For this reason, the 2019 data point is from January - June. Due to this discrepancy, I ran into an error which I learned could be easily solved by including group=1 in the aes area of the code (“Ggplot2 Line Chart”). Another issue I ran into time and again, was that the dates were not being plotted chronologically. However, this was another easy fix by mutating and doing a factor re-level prior to running the ggplot code (“Trying to Manually Reorder a Bar Chart”). In the code itself, I used the date variable as the x and value as the y (Holtz, “Line Chart with R and ggplot”). It is clear to see that as a whole, there was a huge increase in feelings of both depression and anxiety from 2019 to 2020. However, throughout the month of May, there were very little increases and decreases for all three charts.

d <- read.csv("MentalHealthData.csv")
copy <- data.frame(d)
copy1 <- data.frame(d)

d$Time.Period.End.Date <- as.Date(d$Time.Period.End.Date, format = "%m/%d/%Y")
d <- rename(d,Date=Time.Period.End.Date)
d <- d%>%filter(Subgroup=="United States"|Subgroup=="Connecticut")
d <- d%>%filter(Indicator=="Symptoms of Anxiety Disorder or Depressive Disorder")
d <- d[-c(25, 44, 57), ]
line_chart <- ggplot(d,
                     aes(x = Date,
                         y = Value,
                         colour = State,
                         group = State)) +
  geom_line(aes(color = State),
            size = 1) +
  geom_point() +
  scale_color_manual(values = c("#90a58a", "#d17b4a")) +
  theme_ipsum() +
  theme(text=element_text(family="Verdana", size=12)) +
  ggtitle("Symptoms of Anxiety or Depression (2020 - 2021)", subtitle = "Source: Household Pulse Survey, U.S. Census Bureau") 
line_chart

I used this line chart to compare Connecticut and the United States’ values from April 2020 to July 2021. Using the date on the x-axis and the value variable on the y, the state variable was inputted as the color and group, and again further down as the actual lines. I also added points so it would be clearer where exactly each data point was in reference to each other. There were some huge differences between the U.S. and CT at certain points during the past year. Connecticut appeared to hit much lower points than the U.S. as a whole was ever able to. It is noticeable that towards the end, both groups appear to be going down quite a bit. This could possibly be related to the new vaccine, and people were feeling more comfortable and relieved.

d <- d%>%
  mutate(Time.Period.Label = forcats::fct_relevel(Time.Period.Label, "Apr 23 - May 5", "May 7 - May 12", "May 14 - May 19", "May 21 - May 26", "May 28 - June 2", "June 4 - June 9", "June 11 - June 16", "June 18 - June 23", "June 25 - June 30", "July 2 - July 7", "July 9 - July 14", "July 16 - July 21", "Aug 19 - Aug 31", "Sep 2 - Sep 14", "Sep 16 - Sep 28", "Sep 30 - Oct 12", "Oct 14 - Oct 26", "Oct 28 - Nov 9", "Nov 11 - Nov 23", "Nov 25 - Dec 7", "Dec 9 - Dec 21", "Jan 6 - Jan 18", "Jan 20 - Feb 1", "Feb 3 - Feb 15", "Feb 17 - Mar 1", "Mar 3 - Mar 15", "Mar 17 - Mar 29", "Apr 14 - Apr 26", "Apr 28 - May 10", "May 12 - May 24", "May 26 - Jun 7", "Jun 9 - Jun 21", "Jun 23 - Jul 5"))
bar_chart <- ggplot(d, aes(x = Time.Period.Label, y = Value, fill = factor(State))) +
  geom_bar(stat = "identity", position = "dodge", color = "grey40")+
  scale_fill_manual(values = c("#90a58a", "#d17b4a")) +
  theme_ipsum() +
  theme(axis.text.x=element_text(angle=90, hjust=1))+
  ggtitle("Symptoms of Anxiety or Depression (2020 - 2021)", subtitle = "Source: Household Pulse Survey, U.S. Census Bureau") +
  xlab("Time Period")
bar_chart

My intention for the bar chart was to make a side-by-side comparison of the survey results from every two weeks. Once again, I included Connecticut as well as the United States. To start, I had to mutate and do a factor re-level because the dates were not in their proper order. I then used the time period of when the survey was done as the x variable, the value as the y, and the states as the fill. This would guarantee that the two states would be different colors so they could be easily differentiated. Since there is so much data, the bar chart looks a bit cluttered. While it is not ideal, adjusting the angle of the x-axis labels to 90 certainly helps to clarify. It appears that Connecticut was at its lowest point May 26th to June 7th. However, the U.S. was at its lowest during June 23rd to July 5th. Both Connecticut and the United States have had their fair share of ups and downs, this could possibly correspond to what was in the news about the virus, as well as respondents personal dealings with it.

copy <- copy%>%filter(Indicator=="Symptoms of Anxiety Disorder or Depressive Disorder")
copy <- copy%>%filter(Time.Period.Label=="Jun 23 - Jul 5")
copy <- copy%>%filter(Group=="By State") 

my_state_map <- map_data("state")
states <- unique(my_state_map$region)
View(states)

values <- data.frame(
  "region"=states, 
  "value"=c(35.9, 33.4, 26.4, 30.3, 27.9, 26.8, 25.8, 28.0, 26.6, 30.5, 28.9, 24.0, 28.8, 
            25.3, 27.5, 34.3, 29.8, 23.7, 30.4, 26.5, 29.0, 21.6, 33.0, 30.0, 26.9, 25.5, 
            35.6, 29.9, 21.9, 36.6, 26.7, 31.2, 27.1, 31.2, 37.5, 36.0, 25.8, 26.2, 32.0, 
            25.2, 29.8, 31.6, 28.1, 25.5, 25.9, 28.3, 34.1, 24.8, 27.5))

map_data_combined <- left_join(my_state_map, values, by="region")
View(map_data_combined)
ggplot(data=map_data_combined, mapping=aes(x=long, y=lat, group=group, fill=value))+
  geom_polygon(color="black") +
  scale_fill_viridis(option = "rocket") +
  labs(title = "Symptoms of Anxiety or Depressive Disorder in the U.S.",
       subtitle = "Household Pulse Survey, U.S. Census Bureau",
       caption = "According to the latest survey results from July 5th of 2021") +
  theme_ipsum()

url <- "https://developers.google.com/public-data/docs/canonical/states_csv" 
page <- read_html(url) 
table <- html_table(page, fill = TRUE) 
table
## [[1]]
## # A tibble: 52 x 4
##    state latitude longitude name                
##    <chr>    <dbl>     <dbl> <chr>               
##  1 AK        63.6    -154.  Alaska              
##  2 AL        32.3     -86.9 Alabama             
##  3 AR        35.2     -91.8 Arkansas            
##  4 AZ        34.0    -111.  Arizona             
##  5 CA        36.8    -119.  California          
##  6 CO        39.6    -106.  Colorado            
##  7 CT        41.6     -73.1 Connecticut         
##  8 DC        38.9     -77.0 District of Columbia
##  9 DE        38.9     -75.5 Delaware            
## 10 FL        27.7     -81.5 Florida             
## # ... with 42 more rows
table1 = data.frame(text = table)

table1 <- rename(table1,state=text.state)
table1 <- rename(table1,latitude=text.latitude)
table1 <- rename(table1,longitude=text.longitude)
table1 <- rename(table1,region=text.name)
table1 <- table1[-c(40), ]

copy1 <- copy1%>%filter(Indicator=="Symptoms of Anxiety Disorder or Depressive Disorder")
copy1 <- copy1%>%filter(Time.Period.Label=="Jun 23 - Jul 5")
copy1 <- copy1%>%filter(Group=="By State") 
copy1 <- rename(copy1,region=State)

data_combined <- left_join(table1, copy1, by="region")
View(data_combined)

mybins <- seq(20, 40, by=4)
mypalette <- colorBin( palette="RdPu", domain=data_combined$Value, na.color="transparent", bins=mybins)

mytext <- paste(
  "State: ", data_combined$region, "<br/>", 
  "Value: ", data_combined$Value) %>%
  lapply(htmltools::HTML)
m <- leaflet(data_combined) %>% 
  addTiles()  %>% 
  addProviderTiles("Stamen.TonerHybrid") %>%
  addCircleMarkers(~longitude, ~latitude, 
                   fillColor = ~mypalette(Value), fillOpacity = 0.9, color="white", radius=10, stroke=FALSE,
                   label = mytext,
                   labelOptions = labelOptions( style = list("font-weight" = "normal", padding = "3px 8px"), textsize = "13px", direction = "auto")
  ) %>%
  addLegend( pal=mypalette, values=~Value, opacity=0.9, title = "Value", position = "bottomright" )%>%
addLayersControl(
  options = layersControlOptions(collapsed = FALSE)) %>%
  htmlwidgets::onRender("
        function() {
            $('.leaflet-control-layers-overlays').prepend('<label style=\"text-align:left\">Symptoms of Anxiety or Depression <br/>Survey results from July 5, 2021</label>');
        }
    ")

m 

Using the data from July 5th, which is the last group from phase 3.1, I created maps using both the ggplot and leaflet libraries. For the basic map, I started by using a left join to put my state and CDC data frames together. When it came to the actual map’s code, I used the values as the fill in the aes section of the code. I later incorporated these values further down in the code to color a scale, although it is opposite to what I was intending: lightest color is the highest value, and vice versa. I mostly followed a similar process when creating the map using leaflet, although that was a bit more complicated. I struggled to find a dataset for the states that would work the way I wanted it to, so I pulled a table from Google’s Dataset Publishing Language website. I once again used a left join to combine my data with the corresponding states. Then, I created bins, keeping in mind what my minimum and maximum values were to create a scale. My final step before creating the actual map was to use the paste and lapply function to create text that I would later input in the final code. This text would be used as the popup to display the state and its value (Holtz, “Interactive Chloropleth Map”). For this map, I used circle markers with a color scale for an easier visual of which states had the highest symptom value (Holtz, “Interactive Bubble Map”). I struggled with figuring out how to add a title and subtitle because it is not a usual add-on as it is in ggplot. I learned that piping a layer control and adding html will help (“Add Title to Layers Control Box”). In the end, it turns out the state with the most severe anxiety and depression symptoms is Oklahoma, while Minnesota is at the other end of the scale. It is possible that this has changed in the most recent update.

The pandemic has not been easy on anyone – and these visualizations show what a tough time U.S. citizens have been having. From living in isolation to working on the front line, and everything in between, it has really put a damper on our mental health. For many of us, this could be the first time we are experiencing something so life-changing. The Household Pulse Survey continues to be updated every two weeks, and it will be interesting to see how our mental health progresses as a country.

Acknowledgement

I would like to thank my previous Data Visualization and Communication professor, Gazi M. Duman, Ph.D., for his extremely helpful guidance on completing this project.

Data Sources

“Early Release of Selected Mental Health Estimates Based on Data from the January–June 2019.” Centers for Disease Control and Prevention, National Center for Health Statistics, http://www.cdc.gov/nchs/data/nhis/earlyrelease/ERmentalhealth-508.pdf.

“Mental Health - Household Pulse Survey - Covid-19.” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 14 July 2021, http://www.cdc.gov/nchs/covid19/pulse/mental-health.htm.

“States.csv | Dataset PUBLISHING Language | Google Developers.” Google, Google, 20 Jan. 2012, http://developers.google.com/public-data/docs/canonical/states_csv.

Twenge, Jean M, and Thomas E Joiner. “U.S. Census Bureau-Assessed Prevalence of Anxiety and Depressive Symptoms in 2019 and during the 2020 COVID-19 Pandemic.” PubMed Central, Wiley Public Health Emergency Collection, 15 July 2020, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7405486/#.

Other Sources

“Add Title to Layers Control Box in Leaflet Using R.” Stack Overflow, http://www.stackoverflow.com/questions/52413381/add-title-to-layers-control-box-in-leaflet-using-r/62686095#62686095.

Dervieux, Christophe. “How to Hide Message, Warning, Results.” GitHub, http://www.github.com/yihui/rmarkdown-cookbook/issues/12.

“Ggplot2 Line Chart Gives ‘geom_path: Each Group Consist of Only One Observation. Do You Need to Adjust the GROUP Aesthetic?”.” Stack Overflow, http://www.stackoverflow.com/questions/27082601/ggplot2-line-chart-gives-geom-path-each-group-consist-of-only-one-observation.

Holtz, Yan. “Choropleth Map with R and ggplot2.” The R Graph Gallery, http://www.r-graph-gallery.com/327-chloropleth-map-from-geojson-with-ggplot2.html.

Holtz, Yan. “Interactive Bubble Map with r and Leaflet.” The R Graph Gallery, http://www.r-graph-gallery.com/19-map-leafletr.html.

Holtz, Yan. “Interactive Choropleth Map with r and Leaflet.” The R Graph Gallery, http://www.r-graph-gallery.com/183-choropleth-map-with-leaflet.html.

Holtz, Yan. “Line Chart with r and ggplot2.” The R Graph Gallery, http://www.r-graph-gallery.com/line-chart-ggplot2.html.

“How to Hide Code in Rmarkdown, with Option to See It.” Stack Overflow, http://www.stackoverflow.com/questions/14127321/how-to-hide-code-in-rmarkdown-with-option-to-see-it.

“Trying to Manually Reorder a Bar Chart in r.” Stack Overflow, http://www.stackoverflow.com/questions/61275146/trying-to-manually-reorder-a-bar-chart-in-r.