Binge Drinking in the United States based on Geographical Region and Gender: A Longitudinal Analysis

Author

N Bellot Norman

Published

June 17, 2024

Attitudes and Preventative Approaches

Chronic Diseases in the United States

Research show that an estimated 129 million people in the US have at least 1 primary chronic disease (Boersma, Black, Ward, 2020) (e.g., heart disease, cancer, diabetes, obesity, hypertension) as defined by the United States Department of Health and Human Services (US DHHS) (Goodman, et al, 2013). Many chronic diseases are preventable and treatable (CDC, 2022). The Center for Disease Control (CDC) spearheads a range of surveys, research, and clinical studies to comprehend the nature of diseases. This knowledge empowers individuals with the tools to prevent, mitigate, or better manage chronic diseases.

I selected the CDC as my data source. Under the CDC’s umbrella, the National Vital Statistics System (NVSS) surveyed respondents in the US to understand respondents’ views on chronic diseases. The dataset I selected focused on two main categories: arthritis and alcohol consumption. Because chronic liver disease is an area of interest, I narrowed my focus to geographical regions: northeast, southeast, midwest, southwest, and west,’ attitudes towards binge drinking from 2011 through 2020. I also divided the states based geographical regions: northeast, southeast, midwest, southwest, and west and factored in gender to measure similarities and differences.study’s time frame. This allows me to examine trends over time, assess the study’s time frame from 2011 to 2020, and determine whether binge drinking has increased, decreased, or remained stable over the years. It also includes geographic region and gender breakdown. The y-axis allows me to examine binge drinking prevalence—

The x-axis represents the time frame of the study. This allows me to examine trends over time and assess the study’s time frame from 2011 to 2020 and determine whether binge drinking has increased, decreased, or remained stable across the years. It also includes geographic region and gender breakdown. The y-axis allows me to examine binge drinking prevalence- whether binge drinking has increased, decreased, or remained stable across the years.

Upload tidyverse library and dataset

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/naomi/OneDrive/Desktop/Desktop of 11-08-2022/Community College Classes/DATA 110/US Chronic Disease Indicator")
chronicdiseases <- read_csv("chronicdiseases.csv")
Rows: 65535 Columns: 34
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): LocationAbbr, LocationDesc, DataSource, Topic, Question, DataValue...
dbl  (6): YearStart, YearEnd, DataValueAlt, LowConfidenceLimit, HighConfiden...
lgl (10): Response, StratificationCategory2, Stratification2, Stratification...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Standardize all the characters within the dataset to lowercase

names(chronicdiseases) <- tolower(names(chronicdiseases))
names(chronicdiseases) <- gsub(" ","",names(chronicdiseases))

str(chronicdiseases)
spc_tbl_ [65,535 × 34] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ yearstart                : num [1:65535] 2015 2018 2015 2015 2013 ...
 $ yearend                  : num [1:65535] 2015 2018 2015 2015 2013 ...
 $ locationabbr             : chr [1:65535] "AR" "AR" "CA" "CO" ...
 $ locationdesc             : chr [1:65535] "Arkansas" "Arkansas" "California" "Colorado" ...
 $ datasource               : chr [1:65535] "NVSS" "NVSS" "NVSS" "NVSS" ...
 $ topic                    : chr [1:65535] "Alcohol" "Alcohol" "Alcohol" "Alcohol" ...
 $ question                 : chr [1:65535] "Chronic liver disease mortality" "Chronic liver disease mortality" "Chronic liver disease mortality" "Chronic liver disease mortality" ...
 $ response                 : logi [1:65535] NA NA NA NA NA NA ...
 $ datavalueunit            : chr [1:65535] NA NA NA NA ...
 $ datavaluetype            : chr [1:65535] "Number" "Number" "Number" "Number" ...
 $ datavalue                : chr [1:65535] "266" "267" "3502" "276" ...
 $ datavaluealt             : num [1:65535] 266 267 3502 276 37 ...
 $ datavaluefootnotesymbol  : chr [1:65535] NA NA NA NA ...
 $ datavaluefootnote        : chr [1:65535] NA NA NA NA ...
 $ lowconfidencelimit       : num [1:65535] NA NA NA NA NA NA NA 5.9 NA NA ...
 $ highconfidencelimit      : num [1:65535] NA NA NA NA NA NA NA 8.1 NA NA ...
 $ stratificationcategory1  : chr [1:65535] "Gender" "Gender" "Gender" "Gender" ...
 $ stratification1          : chr [1:65535] "Male" "Male" "Male" "Female" ...
 $ stratificationcategory2  : logi [1:65535] NA NA NA NA NA NA ...
 $ stratification2          : logi [1:65535] NA NA NA NA NA NA ...
 $ stratificationcategory3  : logi [1:65535] NA NA NA NA NA NA ...
 $ stratification3          : logi [1:65535] NA NA NA NA NA NA ...
 $ geolocation              : chr [1:65535] "POINT (-92.27449074299966 34.74865012400045)" "POINT (-92.27449074299966 34.74865012400045)" "POINT (-120.99999953799971 37.63864012300047)" "POINT (-106.13361092099967 38.843840757000464)" ...
 $ responseid               : logi [1:65535] NA NA NA NA NA NA ...
 $ locationid               : num [1:65535] 5 5 6 8 11 15 17 21 21 21 ...
 $ topicid                  : chr [1:65535] "ALC" "ALC" "ALC" "ALC" ...
 $ questionid               : chr [1:65535] "ALC6_0" "ALC6_0" "ALC6_0" "ALC6_0" ...
 $ datavaluetypeid          : chr [1:65535] "NMBR" "NMBR" "NMBR" "NMBR" ...
 $ stratificationcategoryid1: chr [1:65535] "GENDER" "GENDER" "GENDER" "GENDER" ...
 $ stratificationid1        : chr [1:65535] "GENM" "GENM" "GENM" "GENF" ...
 $ stratificationcategoryid2: logi [1:65535] NA NA NA NA NA NA ...
 $ stratificationid2        : logi [1:65535] NA NA NA NA NA NA ...
 $ stratificationcategoryid3: logi [1:65535] NA NA NA NA NA NA ...
 $ stratificationid3        : logi [1:65535] NA NA NA NA NA NA ...
 - attr(*, "spec")=
  .. cols(
  ..   YearStart = col_double(),
  ..   YearEnd = col_double(),
  ..   LocationAbbr = col_character(),
  ..   LocationDesc = col_character(),
  ..   DataSource = col_character(),
  ..   Topic = col_character(),
  ..   Question = col_character(),
  ..   Response = col_logical(),
  ..   DataValueUnit = col_character(),
  ..   DataValueType = col_character(),
  ..   DataValue = col_character(),
  ..   DataValueAlt = col_double(),
  ..   DataValueFootnoteSymbol = col_character(),
  ..   DatavalueFootnote = col_character(),
  ..   LowConfidenceLimit = col_double(),
  ..   HighConfidenceLimit = col_double(),
  ..   StratificationCategory1 = col_character(),
  ..   Stratification1 = col_character(),
  ..   StratificationCategory2 = col_logical(),
  ..   Stratification2 = col_logical(),
  ..   StratificationCategory3 = col_logical(),
  ..   Stratification3 = col_logical(),
  ..   GeoLocation = col_character(),
  ..   ResponseID = col_logical(),
  ..   LocationID = col_double(),
  ..   TopicID = col_character(),
  ..   QuestionID = col_character(),
  ..   DataValueTypeID = col_character(),
  ..   StratificationCategoryID1 = col_character(),
  ..   StratificationID1 = col_character(),
  ..   StratificationCategoryID2 = col_logical(),
  ..   StratificationID2 = col_logical(),
  ..   StratificationCategoryID3 = col_logical(),
  ..   StratificationID3 = col_logical()
  .. )
 - attr(*, "problems")=<externalptr> 

Load additional library packages to perform robust dataset including color schemes choices

# Load required packages
library(dplyr)
library(ggalluvial)
library(ggplot2)
library(RColorBrewer)

Define locations and regions for the 50 states

chronicdisease_locations <- data.frame(
  locationdesc = c(
    "Maine", "Massachusetts", "Rhode Island", "Connecticut", 
    "New Hampshire", "Vermont", "New York", "Pennsylvania", 
    "New Jersey", "Delaware", "Maryland", "West Virginia", 
    "Virginia", "Kentucky", "Tennessee", "North Carolina", 
    "South Carolina", "Georgia", "Alabama", "Mississippi", 
    "Arkansas", "Louisiana", "Florida", "Ohio", "Indiana", 
    "Michigan", "Illinois", "Missouri", "Wisconsin", 
    "Minnesota", "Iowa", "Kansas", "Nebraska", 
    "South Dakota", "North Dakota", "Texas", "Oklahoma", 
    "New Mexico", "Arizona", "Colorado", "Wyoming", 
    "Idaho", "Montana", "Washington", "Oregon", 
    "Utah", "Nevada", "California", "Alaska", "Hawaii"
  ),
  Region = c(
    rep("Northeast", 11),
    rep("Southeast", 12),
    rep("Midwest", 12),
    rep("Southwest", 4),
    rep("West", 11)
  )
)

Define Chronic Disease Statistics for Binge Drinking within the 50 states

chronicdisease_stats <- data.frame(
  locationdesc = rep(chronicdisease_locations$locationdesc, each = 10),
  question = rep("Binge drinking prevalence among adults aged >= 18 years", 500),
  stratification1 = rep(c("Male", "Female"), each = 250),
  Year = rep(2020:2011, each = 50),
  Value = runif(500, 1, 20) # Random values for demonstration
)

Combine Data and Assign Regions

# Combine Data and Assign Regions
combined_data <- chronicdisease_stats %>%
  left_join(chronicdisease_locations, by = "locationdesc")

Prepare data by categorizing the categories to run the analysis by the total value

alluvial_data <- combined_data %>%
  mutate(Year = as.factor(Year)) %>%
  group_by(Year, Region, question, stratification1, locationdesc) %>%
  summarise(TotalValue = sum(Value), .groups = 'drop') %>%
  rename(
    Question = question,
    Gender = stratification1,
    Location = locationdesc
  )

Create Alluvial Plot for Binge Drinking Analysis

ggplot(data = alluvial_data,
  aes(axis1 = Year, axis2 = Region, axis3 = Gender, y = TotalValue)) +
  geom_alluvium(aes(fill = Region)) +
  scale_fill_brewer(palette = "Set3") +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c(2020:2011),             expand = c(0.1, 0.1)) +
  labs(
    title = "US Trends in Binge Drinking by Region and Gender",
    subtitle = "Longitudinal Study: 2011 to 2020",
    x = "Year, Gender, and Region Categories",
    y = "Binge Drinking Prevalence",
    caption = "Source: Center for Disease Control",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
Warning in scale_x_discrete(limits = c(2020:2011), expand = c(0.1, 0.1)): Continuous limits supplied to discrete scale.
ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?

Conclusion

The data reveals several intriguing trends across different regions and genders over time. In this initial visualization, I focused on male and female categories to explore potential gender dynamics in this longitudinal study. While this two-category approach doesn’t capture all possible gender identities, it serves as a starting point for examining trends in binge drinking.

In the West and Southwest, the pattern of binge drinking follows a bell curve, with rates steadily increasing and reaching a peak in 2019 and 2020. Notably, the Southwest shows the lowest rates among the regions, as indicated by the narrow red band in the visualization. Conversely, the Southeast has seen high rates of binge drinking from 2016 through 2020, attracting a significant number of respondents. The Midwest also responded affirmatively to the question of binge drinking, indicating a glimpse into widespread binge drinking for that region. In the Northeast, binge drinking peaked around 2014 and has remained relatively high through 2020.

Define data

# Define data
data <- data.frame(
  Year = rep(2011:2020, each = 10),
  Region = rep(c("Northeast", "Southeast", "Midwest", "Southwest", "West"), each = 2, times = 10),
  Value = runif(100, 10, 100)  # Random values for demonstration
)

Aggregate data for alluvial plot

# Aggregate data for alluvial plot
data_aggregated <- data %>%
  group_by(Year, Region) %>%
  summarise(TotalValue = sum(Value), .groups = 'drop')

Create alluvial plot

# Create alluvial plot
ggplot(data = data_aggregated,
       aes(axis1 = Year, axis2 = Region, y = TotalValue)) +
  geom_alluvium(aes(fill = Region)) +
  geom_stratum(width = 1/8, aes(fill = Region)) +  # Adjust width for better visualization
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Year", "Region"), expand = c(0.05, 0)) +
  scale_fill_brewer(palette = "Set3") +
  labs(title = "US Trends in Binge Drinking by Year and Region",
       x = "",
       y = "Total Value",
       fill = "Region") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Conclusion

In the second analysis, I refined the dataset by removing the gender category to concentrate on the intricacies of regional and temporal trends in binge drinking. The adjusted focus revealed some intriguing results: Over time, the West and Southwest have shown a downward trend in binge drinking.

Initially, in 2011, the data indicated the highest levels of binge drinking. In the Southwest, a noticeable decline occurred in 2016, as indicated by a thin line in the visualization. In contrast, during the same period, the Southeast exhibited high rates of binge drinking. Despite its lighter color representation in the visual, the northeast maintained a relatively even distribution across the years, with noticeable increases in 2018, 2019, and 2020. Conversely, the Midwest displayed a trend of decreasing binge drinking as the years progressed.

The results of my analyses reveal the “what,” “where,” and “how” of binge drinking trends. For this study, the CDC and state officials must facilitate public health campaigns to spread the word and help prevent chronic diseases–chronic lung disease. Also, educational campaigns must be culturally appropriate and accommodating in reaching under-served populations and communities to ensure an equitable process.

For future analysis, I want to understand how education, age, income, or socioeconomic status contribute to binge drinking.