1 Introduction

Excessive alcohol consumption is a major public health issue in the United States, contributing to preventable mortality and reduced life expectancy. Alcohol-related harm has been widely documented in epidemiological and forensic literature, with estimates of more than more than 3 million alcohol related deaths globally each year (World Health Organization, 2018; Li et al., 2017). While in the United States specifically, excessive alcohol consumption has been linked to approximately 88,000 deaths annually, highlighting its significance as a population health concern (Stahre et al., 2014).

This project uses United States per-capita ethanol consumption data to explore how alcohol consumption varies over time, across regions, and between beverage types (beer, wine, and spirits). It also evaluates whether beverage-specific consumption can be used to predict total alcohol consumption and whether state-level drinking patterns differ based on dominant beverage type.

Thus, the aim of this analysis is to provide a clearer understanding of how alcohol consumption patterns shift across geography and time in the United States in order to support broader public health interpretation by highlighting structural patterns in consumption that may be relevant for prevention strategies and policy development.

2 Methods

This study uses alcohol consumption data from the United States, structured at national, regional, and state levels. The dataset includes per-capita ethanol consumption for beer, wine, spirits, and total alcohol between the years of 1977 and 2023. Important to note is that ethanol is the type of alcohol present in alcoholic beverages, and expressing consumption as gallons of ethanol per capita provides a standardized measure of pure alcohol intake.

The data were imported into R and analysed using the dplyr, ggplot2, and knitr packages. The dataset was then divided into three subsets (national, regional, and state-level) to allow analysis at different geographic scales. This step was necessary to distinguish broad national trends from more localized variation across regions and states.

To assess how alcohol consumption changes over time on a national level, visual trend analysis (scatter plots and smoothing lines), Pearson correlation, and linear regression models were used. These methods were selected because they allow both visual interpretation of trends and quantification of the strength and direction of relationships over time.

Regional differences were examined using descriptive statistics and box plots to compare distributions across groups. A one-way ANOVA was then applied to test whether observed differences in mean consumption between regions were statistically significant, followed by Tukey post-hoc tests to identify specific group differences.

To evaluate changes in beverage preferences, separate correlation and linear regression models were fitted for beer, wine, and spirits. This allowed comparison of how each beverage type contributes to overall trends in alcohol consumption.

Predictive modelling was used to assess whether beverage-specific consumption could estimate total alcohol consumption. An 80/20 train-test split was used to evaluate model performance. Model accuracy was assessed using RMSE and out-of-sample R² to determine predictive reliability.

Finally, states were classified by dominant beverage type, defined as the beverage category (beer, wine, or spirits) with the highest average per-capita consumption over time. This classification was used to assess whether differences in dominant consumption patterns correspond to differences in total alcohol intake.

3 Results

Results are presented as data preparation and according to the five research questions. Each section includes descriptive statistics, visualizations, and statistical tests where appropriate to assess patterns in alcohol consumption across time, region, and beverage type.

3.1 Data set preparation

national_data <- alcohol_data[alcohol_data$state_name == "Us Total", ]

region_data <- alcohol_data[alcohol_data$state_name %in% 
                              c("Midwest Region", "South Region", "Northeast Region", "West Region"), ]

state_data <- alcohol_data[!(alcohol_data$state_name %in% 
                               c("Us Total", "Midwest Region", "South Region", "Northeast Region", "West Region")), ]

The dataset was filtered into national, regional, and state-level subsets to allow focused analysis of U.S total alcohol consumption trends over time.

3.2 Question 1: Has total alcohol consumption changed over time in the United States?

3.2.1 Descriptive statistics

summary(national_data$ethanol_all_drinks_gallons_per_capita)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.150   2.245   2.330   2.387   2.525   2.760
overall_summary <- data.frame(
  Mean = mean(national_data$ethanol_all_drinks_gallons_per_capita),
  SD = sd(national_data$ethanol_all_drinks_gallons_per_capita),
  Min = min(national_data$ethanol_all_drinks_gallons_per_capita),
  Max = max(national_data$ethanol_all_drinks_gallons_per_capita)
)

knitr::kable(overall_summary,
             caption = "Table 1. Summary statistics for US alcohol consumption.")
Table 1. Summary statistics for US alcohol consumption.
Mean SD Min Max
2.386808 0.1838845 2.15 2.76

The summary statistics provide an overview of the distribution of the national ethanol consumption in the United States over the observed time period. The summary of mean, standard deviation, minimum, and maximum values are shown in Table 1.

3.2.2 Distribution of total alcohol consumption

Figure 1. Distribution of total alcohol consumption per capita in the United States.

ggplot(national_data, aes(x = ethanol_all_drinks_gallons_per_capita)) +
  geom_histogram(bins = 20, fill = "orange", color = "black") +
  theme_minimal() +
  labs(
    x = "Total ethanol (gallons per capita)",
    y = "Frequency"
  )

The distribution shows variation across the observed range of values (Figure 1). Frequency indicates how many times that number of gallons as a measure of total alcohol consumption is present in the national data per capita.

3.2.3 Relationship between year and alcohol consumption

Figure 2. National alcohol consuption in gallons per capita throughout the years represented in the study.

ggplot(national_data, aes(x = year, y = ethanol_all_drinks_gallons_per_capita)) +
  geom_point(alpha = 0.7) +
  theme_minimal() +
  labs(
    x = "Year",
    y = "Total ethanol (gallons per capita)"
  )

Figure 3. Linear trend in total alcohol consumption over time in the United States.

ggplot(national_data, aes(x = year, y = ethanol_all_drinks_gallons_per_capita)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal()

The model suggests a general decline in consumption across the study period (Figure 3).

3.2.4 Correlation analysis

cor(national_data$year,
    national_data$ethanol_all_drinks_gallons_per_capita,
    use = "complete.obs")
## [1] -0.4645642

A Pearson correlation was used to assess the relationship between year and total alcohol consumption. The results indicate a moderate negative relationship (r = -0.4645642), suggesting that consumption has decreased over time.

3.2.5 Linear regression model

model1 <- lm(ethanol_all_drinks_gallons_per_capita ~ year,
             data = national_data)

summary(model1)
## 
## Call:
## lm(formula = ethanol_all_drinks_gallons_per_capita ~ year, data = national_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26796 -0.13977  0.02042  0.12385  0.28403 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 14.84749    3.54085   4.193 0.000127 ***
## year        -0.00623    0.00177  -3.519 0.001003 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1646 on 45 degrees of freedom
## Multiple R-squared:  0.2158, Adjusted R-squared:  0.1984 
## F-statistic: 12.38 on 1 and 45 DF,  p-value: 0.001003

A simple linear regression model was used to estimate the effect of year on total alcohol consumption. The results indicate that year is a statistically significant predictor of consumption (p = 0.001), with a negative coefficient (β = -0.0062). This suggests a gradual decline in per-capita alcohol consumption over time.

However, the model explains a limited proportion of variance (R² = 2158), indicating that additional factors likely contribute to changes in consumption patterns.

3.2.6 Model diagnostics

shapiro.test(residuals(model1))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(model1)
## W = 0.95664, p-value = 0.07918
par(mfrow = c(2, 2))
plot(model1)

A Shapiro-Wilk test was conducted to assess the normality of the model. The test was not statistically significant (p > 0.05), indicating that the residuals do not significantly deviate from a normal distribution. Then, diagnostic plots were used to assess model assumptions. The results suggest that the relationship between year and alcohol consumption is not strictly linear. Instead, the trend appears to fluctuate over time, with periods of increase and decrease.

3.2.7 Interpretation of findings for Question 1.

Overall, the analysis suggests that total alcohol consumption in the United States has declined moderately over time, as supported by both correlation and regression results.

However, the relationship is not perfectly linear. Visual inspection of the data suggests periods of increase in earlier decades, decline in the 1990s, and partial increases in more recent years.

3.3 Question 2: Do regions differ in total alcohol consumption?

3.3.1 Descriptive statistis by region.

Regional summary statistics were calculated to compare mean alcohol consumption across U.S. regions.

Table 2. Summary statistics of total alcohol consumption by US region.

id="q2stats"
region_summary <- region_data %>%
  group_by(state_name) %>%
  summarise(
    mean_total = mean(ethanol_all_drinks_gallons_per_capita, na.rm = TRUE),
    sd_total = sd(ethanol_all_drinks_gallons_per_capita, na.rm = TRUE),
    n = n()
  )

knitr::kable(region_summary,
             caption = "Table 2. Summary statistics of alcohol consumption by region")
Table 2. Summary statistics of alcohol consumption by region
state_name mean_total sd_total n
Midwest Region 2.346170 0.1452218 47
Northeast Region 2.394043 0.2275738 47
South Region 2.290000 0.1271938 47
West Region 2.594468 0.3153660 47

Table 2 shows differences in mean alcohol consumption across regions, along with variability and sample sizes.

3.3.2 Visual comparison of regions

Figure 4. Distribution of total alcohol consumption across US regions.

ggplot(region_data, aes(x = state_name, y = ethanol_all_drinks_gallons_per_capita)) +
  geom_boxplot(fill = "lightyellow") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(
    x = "Region",
    y = "Total ethanol (gallons per capita)"
  )

The West region appears to have higher median consumption and greater variability compared to other regions (Figure 4).

3.3.3 One-way ANOVA

anova_model <- aov(ethanol_all_drinks_gallons_per_capita ~ state_name,
                   data = region_data)

summary(anova_model)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## state_name    3  2.477  0.8256   17.52 4.77e-10 ***
## Residuals   184  8.672  0.0471                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A one-way ANOVA was conducted to test whether mean alcohol consumption differs across U.S. regions.

The results indicate that there is a statistically significant difference between at least two regions (p < 0.001).

3.3.4 Post-hoc Tukey test and test for normality

TukeyHSD(anova_model)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = ethanol_all_drinks_gallons_per_capita ~ state_name, data = region_data)
## 
## $state_name
##                                        diff         lwr        upr     p adj
## Northeast Region-Midwest Region  0.04787234 -0.06823322 0.16397790 0.7088266
## South Region-Midwest Region     -0.05617021 -0.17227578 0.05993535 0.5931770
## West Region-Midwest Region       0.24829787  0.13219231 0.36440344 0.0000006
## South Region-Northeast Region   -0.10404255 -0.22014812 0.01206301 0.0964794
## West Region-Northeast Region     0.20042553  0.08431997 0.31653110 0.0000782
## West Region-South Region         0.30446809  0.18836252 0.42057365 0.0000000
shapiro.test(residuals(anova_model))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(anova_model)
## W = 0.95286, p-value = 6.796e-06

This shows that The West region has significantly higher alcohol consumption than the Midwest, Northeast, and South regions. However there were no statistically significant differences observed between the Midwest, Northeast, and South regions. Also, the test for normality indicated that the created by the model deviate significantly from a normal distribution (p = 6.796e-06).

3.3.5 Interpretation of findings for Question 2.

Regional differences in alcohol consumption are statistically significant, especially is regards to the West region.

3.4 Question 3: Is the consumption of beer, wine, and spirits changing over time?

3.4.1 Beer consumption over time.

Figure 5. Beer consumption per capita over time in the United States.

id="beerplot"
ggplot(national_data, aes(x = year, y = ethanol_beer_gallons_per_capita)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal() +
  labs(
    x = "Year",
    y = "Beer consumption (gallons per capita)"
  )

Beer consumption shows a strong downward trend over time, suggesting a long-term decline (Figure 5).

3.4.2 Wine consumption over time.

Figure 6. Wine consumption per capita over time in the United States.

ggplot(national_data, aes(x = year, y = ethanol_wine_gallons_per_capita)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal() +
  labs(
    x = "Year",
    y = "Wine consumption (gallons per capita)"
  )

Wine consumption shows an overall upward trend, indicating increasing popularity over time. There was however a dip in consumption in the 1990s.

3.4.3 Spirits consumption over time.

Figure 7. Spirits consumption per capita over time in the United States.

ggplot(national_data, aes(x = year, y = ethanol_spirit_gallons_per_capita)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal() +
  labs(
    x = "Year",
    y = "Spirits consumption (gallons per capita)"
  )

Spirits consumption shows no strong visible linear trend over time.

3.4.4 Correlation analysis.

cor_beer <- cor(national_data$year, national_data$ethanol_beer_gallons_per_capita, use = "complete.obs")
cor_wine <- cor(national_data$year, national_data$ethanol_wine_gallons_per_capita, use = "complete.obs")
cor_spirits <- cor(national_data$year, national_data$ethanol_spirit_gallons_per_capita, use = "complete.obs")

cor_beer
## [1] -0.957177
cor_wine
## [1] 0.6961335
cor_spirits
## [1] -0.1142831

The correlation analysis shows:

  1. Beer has a very strong negative relationship with year (-0.957177).
  2. Wine has a moderate positive relationship with year (0.6961335).
  3. Spirits show a weak negative relationship with year (-0.1142831).

3.4.5 Linear regression models.

beer_model <- lm(ethanol_beer_gallons_per_capita ~ year, data = national_data)
wine_model <- lm(ethanol_wine_gallons_per_capita ~ year, data = national_data)
spirit_model <- lm(ethanol_spirit_gallons_per_capita ~ year, data = national_data)

summary(beer_model)
## 
## Call:
## lm(formula = ethanol_beer_gallons_per_capita ~ year, data = national_data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.121436 -0.014583  0.004625  0.022002  0.047387 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 16.4201110  0.6855023   23.95   <2e-16 ***
## year        -0.0076018  0.0003427  -22.18   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03187 on 45 degrees of freedom
## Multiple R-squared:  0.9162, Adjusted R-squared:  0.9143 
## F-statistic: 491.9 on 1 and 45 DF,  p-value: < 2.2e-16
summary(wine_model)
## 
## Call:
## lm(formula = ethanol_wine_gallons_per_capita ~ year, data = national_data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.065555 -0.032827  0.001078  0.030648  0.065407 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.8792414  0.8056693  -6.056 2.58e-07 ***
## year         0.0026203  0.0004028   6.505 5.55e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03746 on 45 degrees of freedom
## Multiple R-squared:  0.4846, Adjusted R-squared:  0.4731 
## F-statistic: 42.31 on 1 and 45 DF,  p-value: 5.547e-08
summary(spirit_model)
## 
## Call:
## lm(formula = ethanol_spirit_gallons_per_capita ~ year, data = national_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.19149 -0.11589 -0.03236  0.10894  0.29989 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  3.222303   3.128830   1.030    0.309
## year        -0.001207   0.001564  -0.772    0.444
## 
## Residual standard error: 0.1455 on 45 degrees of freedom
## Multiple R-squared:  0.01306,    Adjusted R-squared:  -0.008871 
## F-statistic: 0.5955 on 1 and 45 DF,  p-value: 0.4443

These models confirm the patterns observed in the plots correlations. Beer consumption shows a statistically significant decline over time (p < 0.001) while wine consumption shows a significant increase over time (p < 0.001). Spirits, on the other hand, is not signifcantly associated with year (P > 0.05).

3.4.6 Interpretation of Question 3.

Beverage-specific trends suggest clear shifts in alcohol preferences over time in the United States.

3.5 Question 4. Can total alcohol consumption be predicted using individual beverage types?

This section evaluates whether total alcohol consumption in U.S. states can be predicted using only beer, wine, or spirits consumption. A full model including all beverage types is also included for comparison.

3.5.1 Data splitting.

id="split4"
set.seed(123)

sample_index <- sample(1:nrow(state_data), 0.8 * nrow(state_data))

train_data <- state_data[sample_index, ]
test_data <- state_data[-sample_index, ]

The dataset was split into training and testing sets to evaluate model generalisability, with an 80/20 train-test split to evaluate out-of-sample performance.

Four linear regression models were constructed:

model_beer <- lm(ethanol_all_drinks_gallons_per_capita ~ ethanol_beer_gallons_per_capita,
                 data = train_data)

model_wine <- lm(ethanol_all_drinks_gallons_per_capita ~ ethanol_wine_gallons_per_capita,
                 data = train_data)

model_spirits <- lm(ethanol_all_drinks_gallons_per_capita ~ ethanol_spirit_gallons_per_capita,
                    data = train_data)

model_full <- lm(ethanol_all_drinks_gallons_per_capita ~ 
                   ethanol_beer_gallons_per_capita +
                   ethanol_wine_gallons_per_capita +
                   ethanol_spirit_gallons_per_capita,
                 data = train_data)

3.5.2 Model evalution.

Predictions were generated on the test set and evaluated using RMSE and R².

pred_beer <- predict(model_beer, newdata = test_data)
pred_wine <- predict(model_wine, newdata = test_data)
pred_spirits <- predict(model_spirits, newdata = test_data)
pred_full <- predict(model_full, newdata = test_data)

rmse_beer <- sqrt(mean((test_data$ethanol_all_drinks_gallons_per_capita - pred_beer)^2))
rmse_wine <- sqrt(mean((test_data$ethanol_all_drinks_gallons_per_capita - pred_wine)^2))
rmse_spirits <- sqrt(mean((test_data$ethanol_all_drinks_gallons_per_capita - pred_spirits)^2))
rmse_full <- sqrt(mean((test_data$ethanol_all_drinks_gallons_per_capita - pred_full)^2))

r2_beer <- 1 - sum((test_data$ethanol_all_drinks_gallons_per_capita - pred_beer)^2) /
  sum((test_data$ethanol_all_drinks_gallons_per_capita - mean(test_data$ethanol_all_drinks_gallons_per_capita))^2)

r2_wine <- 1 - sum((test_data$ethanol_all_drinks_gallons_per_capita - pred_wine)^2) /
  sum((test_data$ethanol_all_drinks_gallons_per_capita - mean(test_data$ethanol_all_drinks_gallons_per_capita))^2)

r2_spirits <- 1 - sum((test_data$ethanol_all_drinks_gallons_per_capita - pred_spirits)^2) /
  sum((test_data$ethanol_all_drinks_gallons_per_capita - mean(test_data$ethanol_all_drinks_gallons_per_capita))^2)

r2_full <- 1 - sum((test_data$ethanol_all_drinks_gallons_per_capita - pred_full)^2) /
  sum((test_data$ethanol_all_drinks_gallons_per_capita - mean(test_data$ethanol_all_drinks_gallons_per_capita))^2)

3.5.3 Model comparison.

Table 4. Model performance comparison for predicting total alcohol consumption (test set).

model_comparison <- data.frame(
  Model = c("Beer only", "Wine only", "Spirits only", "Full model"),
  RMSE = c(rmse_beer, rmse_wine, rmse_spirits, rmse_full),
  R2 = c(r2_beer, r2_wine, r2_spirits, r2_full)
)

knitr::kable(model_comparison,
             caption = "Table 4. Predictive performance of alcohol consumption models (test set)")
Table 4. Predictive performance of alcohol consumption models (test set)
Model RMSE R2
Beer only 0.4867822 0.4331973
Wine only 0.4499380 0.5157519
Spirits only 0.2388452 0.8635431
Full model 0.0058068 0.9999193

Table 4 compares model performance using RMSE (error) and R² (explained variance). Lower RMSE and higher R² indicate better predictive performance.

3.5.4 Results of the predictive models and interpretation of Question 4.

The predictive performance of the models varies depending on which beverage type is used as the predictor. The spirits-only model performs best out of the single-predictor models, while wine shows a moderate level of predictive ability and beer performs the weakest in predicting total alcohol consumption. When all three beverage types are included together, the full model produces a highly accurate predictive model. This is expected because total alcohol consumption is directly made up of beer, wine, and spirits, so the variables are inherently linked rather than independent. This suggest that spirits are the most informative single predictor of total alcohol consumption, while beer contributes the least information when considered on its own.

3.6 Question 5: Do states with different dominant alcohol types differ in total alcohol consumption?

This section examines whether states classified by their dominant beverage type (beer, wine, or spirits) differ in their average total alcohol consumption. The dominant beverage type is defined as the alcohol category (beer, wine, or spirits) with the highest average per-capita consumption within each state over the study period.

3.6.1 State-level summary and classifications.

id="q5prep"
state_summary <- state_data %>%
  group_by(state_name) %>%
  summarise(
    mean_beer = mean(ethanol_beer_gallons_per_capita, na.rm = TRUE),
    mean_wine = mean(ethanol_wine_gallons_per_capita, na.rm = TRUE),
    mean_spirits = mean(ethanol_spirit_gallons_per_capita, na.rm = TRUE),
    mean_total = mean(ethanol_all_drinks_gallons_per_capita, na.rm = TRUE)
  )

state_summary$dominant_beverage <- apply(
  state_summary[, c("mean_beer", "mean_wine", "mean_spirits")],
  1,
  function(x) {
    c("Beer", "Wine", "Spirits")[which.max(x)]
  }
)

States were classified according to their dominant beverage type based on average consumption levels.

3.6.2 Distribution of dominant beverage types.

Table 5. Number of US states by dominant beverage type

knitr::kable(table(state_summary$dominant_beverage),
             caption = "Table 5. Number of states by dominant beverage type")
Table 5. Number of states by dominant beverage type
Var1 Freq
Beer 47
Spirits 4
id="q5states"
state_classification <- state_summary[, c("state_name", "dominant_beverage")]

Table 6. Classification of US states by dominant beverage type

knitr::kable(state_classification,
             caption = "Table 6. State classification by dominant beverage type")
Table 6. State classification by dominant beverage type
state_name dominant_beverage
Alabama Beer
Alaska Beer
Arizona Beer
Arkansas Beer
California Beer
Colorado Beer
Connecticut Spirits
Delaware Spirits
District Of Columbia Spirits
Florida Beer
Georgia Beer
Hawaii Beer
Idaho Beer
Illinois Beer
Indiana Beer
Iowa Beer
Kansas Beer
Kentucky Beer
Louisiana Beer
Maine Beer
Maryland Beer
Massachusetts Beer
Michigan Beer
Minnesota Beer
Mississippi Beer
Missouri Beer
Montana Beer
Nebraska Beer
Nevada Beer
New Hampshire Spirits
New Jersey Beer
New Mexico Beer
New York Beer
North Carolina Beer
North Dakota Beer
Ohio Beer
Oklahoma Beer
Oregon Beer
Pennsylvania Beer
Rhode Island Beer
South Carolina Beer
South Dakota Beer
Tennessee Beer
Texas Beer
Utah Beer
Vermont Beer
Virginia Beer
Washington Beer
West Virginia Beer
Wisconsin Beer
Wyoming Beer

Table 5 shows the distribution of states across dominant beverage categories. No states were found to be classified as wine dominant.

Table 6 shows how each state was classified based on the beverage type with the highest mean per-capita consumption over time.

3.6.3 Comparison of total consumption across groups.

Figure 8. Mean total alcohol consumption by dominant beverage type across US states.

ggplot(state_summary, aes(x = dominant_beverage, y = mean_total)) +
  geom_boxplot(fill = "purple") +
  theme_minimal() +
  labs(
    x = "Dominant Beverage Type",
    y = "Mean total alcohol consumption"
  )

This figure compares total alcohol consumption across states grouped by dominant beverage type. Spirits-dominant states appear to have higher average total consumption compared to beer-dominant states.

3.6.4 One-way ANOVA

anova_q5 <- aov(mean_total ~ dominant_beverage, data = state_summary)
summary(anova_q5)
##                   Df Sum Sq Mean Sq F value   Pr(>F)    
## dominant_beverage  1  5.053   5.053   21.72 2.46e-05 ***
## Residuals         49 11.402   0.233                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A one-way ANOVA was conducted to test whether mean total alcohol consumption differs across dominant beverage groups. The results indicate a statistically significant difference between groups (p < 0.001).

3.6.5 Post-hoc Tukey test and test for normality.

TukeyHSD(anova_q5)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mean_total ~ dominant_beverage, data = state_summary)
## 
## $dominant_beverage
##                  diff       lwr      upr    p adj
## Spirits-Beer 1.170813 0.6659116 1.675714 2.46e-05
shapiro.test(residuals(anova_q5))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(anova_q5)
## W = 0.96107, p-value = 0.09264

Post-hoc comparisons show that spirits-dominant states have significantly higher total alcohol consumption than beer-dominant states. No meaningful comparisons involving wine-dominant states are possible, as no states fall into this category. Then, the Shapiro-Wilk test was conducted to assess the normality and found a result of p = 0.09264, indicating a normal distribution.

3.6.6 Interpretation of Question 5.

States were primarily classified into beer-dominant and spirits-dominant groups, with no states identified as wine-dominant. This suggests that wine is not the primary beverage type in any state when averaged over time.

The analysis also shows that spirits-dominant states tend to have higher total alcohol consumption compared to beer-dominant states, and this difference is statistically significant.

However, this result should be interpreted with caution because the spirits-dominant group contains very few states, which limits how generalisable the finding is across the broader population.

4 Discussion

The results show that total alcohol consumption in the United States has generally declined over time, although the trend is not strictly linear. This suggests that consumption patterns fluctuate across different periods.

There are also clear regional differences in consumption, with the West region showing significantly higher average alcohol intake compared to other US regions. These differences may reflect underlying cultural or demographic variation, but the analysis does not include additional variables such as income, policy differences, or population structure, which limits interpretation.

Beverage-specific trends indicate a shift in alcohol preferences over time. Beer consumption has decreased substantially, wine consumption has increased, and spirits consumption has remained relatively stable. This suggests that changes in total alcohol consumption are partly driven by changing beverage preferences rather than uniform changes across all alcohol types. Additionally, redictive modelling shows that spirits consumption is the strongest single predictor of total alcohol intake, while beer is the weakest.

At the state level, spirits-dominant states tend to have higher total consumption than beer-dominant states, but this result is limited by the small number of spirits-dominant states and the absence of wine-dominant states.

5 Recommendations

The findings of this analysis suggest several implications for public health interpretation and potential policy focus. Specfically, the observed decline in beer consumption alongside increasing wine consumption suggests a shift in beverage preference rather than a uniform reduction in alcohol use. Public health messaging may benefit from reflecting these changing patterns, rather than focusing solely on overall consumption. Further, the consistently higher consumption observed in certain regions, particularly the West, suggests that alcohol-related interventions may need to be tailored geographically. Regional variation indicates that national-level strategies may not fully capture local risk profiles. Also, the relatively strong association between spirits consumption and total alcohol intake suggests that spirits may serve as a useful indicator for identifying higher-consumption populations.

Future research could benefit from incorporating sociocultural context alongside consumption data.

6 Literature Cited

Li, R., Hu, L., Hu, L., Zhang, X., Phipps, R., Fowler, D. R., Chen, F., & Li, L. (2017). Evaluation of Acute Alcohol Intoxication as the Primary Cause of Death: A Diagnostic Challenge for Forensic Pathologists. Journal of Forensic Sciences, 62(5), 1213–1219. https://doi.org/10.1111/1556-4029.13412

National Institute on Alcohol Abuse and Alcoholism. (2024, December). What is A standard drink? | national institute on alcohol abuse and alcoholism (NIAAA). Nih.gov. https://www.niaaa.nih.gov/alcohols-effects-health/what-standard-drink

Shahrukh, S. I. (2023). US Alcohol Consumption by State 1977–2023. Kaggle.com. https://www.kaggle.com/datasets/sanaijlalshahrukh/us-alcohol-consumption-by-state-19772023

Stahre, M., Roeber, J., Kanny, D., Brewer, R. D., & Zhang, X. (2014). Contribution of Excessive Alcohol Consumption to Deaths and Years of Potential Life Lost in the United States. Preventing Chronic Disease, 11. https://doi.org/10.5888/pcd11.130293

World Health Organization. (2018, September 27). Global status report on alcohol and health 2018. Www.who.int. https://www.who.int/publications/i/item/9789241565639