Excessive alcohol consumption is a major public health issue in the United States, contributing to preventable mortality and reduced life expectancy. Alcohol-related harm has been widely documented in epidemiological and forensic literature, with estimates of more than more than 3 million alcohol related deaths globally each year (World Health Organization, 2018; Li et al., 2017). While in the United States specifically, excessive alcohol consumption has been linked to approximately 88,000 deaths annually, highlighting its significance as a population health concern (Stahre et al., 2014).
This project uses United States per-capita ethanol consumption data to explore how alcohol consumption varies over time, across regions, and between beverage types (beer, wine, and spirits). It also evaluates whether beverage-specific consumption can be used to predict total alcohol consumption and whether state-level drinking patterns differ based on dominant beverage type.
Thus, the aim of this analysis is to provide a clearer understanding of how alcohol consumption patterns shift across geography and time in the United States in order to support broader public health interpretation by highlighting structural patterns in consumption that may be relevant for prevention strategies and policy development.
This study uses alcohol consumption data from the United States, structured at national, regional, and state levels. The dataset includes per-capita ethanol consumption for beer, wine, spirits, and total alcohol between the years of 1977 and 2023. Important to note is that ethanol is the type of alcohol present in alcoholic beverages, and expressing consumption as gallons of ethanol per capita provides a standardized measure of pure alcohol intake.
The data were imported into R and analysed using the dplyr, ggplot2, and knitr packages. The dataset was then divided into three subsets (national, regional, and state-level) to allow analysis at different geographic scales. This step was necessary to distinguish broad national trends from more localized variation across regions and states.
To assess how alcohol consumption changes over time on a national level, visual trend analysis (scatter plots and smoothing lines), Pearson correlation, and linear regression models were used. These methods were selected because they allow both visual interpretation of trends and quantification of the strength and direction of relationships over time.
Regional differences were examined using descriptive statistics and box plots to compare distributions across groups. A one-way ANOVA was then applied to test whether observed differences in mean consumption between regions were statistically significant, followed by Tukey post-hoc tests to identify specific group differences.
To evaluate changes in beverage preferences, separate correlation and linear regression models were fitted for beer, wine, and spirits. This allowed comparison of how each beverage type contributes to overall trends in alcohol consumption.
Predictive modelling was used to assess whether beverage-specific consumption could estimate total alcohol consumption. An 80/20 train-test split was used to evaluate model performance. Model accuracy was assessed using RMSE and out-of-sample R² to determine predictive reliability.
Finally, states were classified by dominant beverage type, defined as the beverage category (beer, wine, or spirits) with the highest average per-capita consumption over time. This classification was used to assess whether differences in dominant consumption patterns correspond to differences in total alcohol intake.
Results are presented as data preparation and according to the five research questions. Each section includes descriptive statistics, visualizations, and statistical tests where appropriate to assess patterns in alcohol consumption across time, region, and beverage type.
national_data <- alcohol_data[alcohol_data$state_name == "Us Total", ]
region_data <- alcohol_data[alcohol_data$state_name %in%
c("Midwest Region", "South Region", "Northeast Region", "West Region"), ]
state_data <- alcohol_data[!(alcohol_data$state_name %in%
c("Us Total", "Midwest Region", "South Region", "Northeast Region", "West Region")), ]
The dataset was filtered into national, regional, and state-level subsets to allow focused analysis of U.S total alcohol consumption trends over time.
summary(national_data$ethanol_all_drinks_gallons_per_capita)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.150 2.245 2.330 2.387 2.525 2.760
overall_summary <- data.frame(
Mean = mean(national_data$ethanol_all_drinks_gallons_per_capita),
SD = sd(national_data$ethanol_all_drinks_gallons_per_capita),
Min = min(national_data$ethanol_all_drinks_gallons_per_capita),
Max = max(national_data$ethanol_all_drinks_gallons_per_capita)
)
knitr::kable(overall_summary,
caption = "Table 1. Summary statistics for US alcohol consumption.")
| Mean | SD | Min | Max |
|---|---|---|---|
| 2.386808 | 0.1838845 | 2.15 | 2.76 |
The summary statistics provide an overview of the distribution of the national ethanol consumption in the United States over the observed time period. The summary of mean, standard deviation, minimum, and maximum values are shown in Table 1.
Figure 1. Distribution of total alcohol consumption per capita in the United States.
ggplot(national_data, aes(x = ethanol_all_drinks_gallons_per_capita)) +
geom_histogram(bins = 20, fill = "orange", color = "black") +
theme_minimal() +
labs(
x = "Total ethanol (gallons per capita)",
y = "Frequency"
)
The distribution shows variation across the observed range of values (Figure 1). Frequency indicates how many times that number of gallons as a measure of total alcohol consumption is present in the national data per capita.
Figure 2. National alcohol consuption in gallons per capita throughout the years represented in the study.
ggplot(national_data, aes(x = year, y = ethanol_all_drinks_gallons_per_capita)) +
geom_point(alpha = 0.7) +
theme_minimal() +
labs(
x = "Year",
y = "Total ethanol (gallons per capita)"
)
Figure 3. Linear trend in total alcohol consumption over time in the United States.
ggplot(national_data, aes(x = year, y = ethanol_all_drinks_gallons_per_capita)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal()
The model suggests a general decline in consumption across the study period (Figure 3).
cor(national_data$year,
national_data$ethanol_all_drinks_gallons_per_capita,
use = "complete.obs")
## [1] -0.4645642
A Pearson correlation was used to assess the relationship between year and total alcohol consumption. The results indicate a moderate negative relationship (r = -0.4645642), suggesting that consumption has decreased over time.
model1 <- lm(ethanol_all_drinks_gallons_per_capita ~ year,
data = national_data)
summary(model1)
##
## Call:
## lm(formula = ethanol_all_drinks_gallons_per_capita ~ year, data = national_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.26796 -0.13977 0.02042 0.12385 0.28403
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.84749 3.54085 4.193 0.000127 ***
## year -0.00623 0.00177 -3.519 0.001003 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1646 on 45 degrees of freedom
## Multiple R-squared: 0.2158, Adjusted R-squared: 0.1984
## F-statistic: 12.38 on 1 and 45 DF, p-value: 0.001003
A simple linear regression model was used to estimate the effect of year on total alcohol consumption. The results indicate that year is a statistically significant predictor of consumption (p = 0.001), with a negative coefficient (β = -0.0062). This suggests a gradual decline in per-capita alcohol consumption over time.
However, the model explains a limited proportion of variance (R² = 2158), indicating that additional factors likely contribute to changes in consumption patterns.
shapiro.test(residuals(model1))
##
## Shapiro-Wilk normality test
##
## data: residuals(model1)
## W = 0.95664, p-value = 0.07918
par(mfrow = c(2, 2))
plot(model1)
A Shapiro-Wilk test was conducted to assess the normality of the model. The test was not statistically significant (p > 0.05), indicating that the residuals do not significantly deviate from a normal distribution. Then, diagnostic plots were used to assess model assumptions. The results suggest that the relationship between year and alcohol consumption is not strictly linear. Instead, the trend appears to fluctuate over time, with periods of increase and decrease.
Overall, the analysis suggests that total alcohol consumption in the United States has declined moderately over time, as supported by both correlation and regression results.
However, the relationship is not perfectly linear. Visual inspection of the data suggests periods of increase in earlier decades, decline in the 1990s, and partial increases in more recent years.
Regional summary statistics were calculated to compare mean alcohol consumption across U.S. regions.
Table 2. Summary statistics of total alcohol consumption by US region.
id="q2stats"
region_summary <- region_data %>%
group_by(state_name) %>%
summarise(
mean_total = mean(ethanol_all_drinks_gallons_per_capita, na.rm = TRUE),
sd_total = sd(ethanol_all_drinks_gallons_per_capita, na.rm = TRUE),
n = n()
)
knitr::kable(region_summary,
caption = "Table 2. Summary statistics of alcohol consumption by region")
| state_name | mean_total | sd_total | n |
|---|---|---|---|
| Midwest Region | 2.346170 | 0.1452218 | 47 |
| Northeast Region | 2.394043 | 0.2275738 | 47 |
| South Region | 2.290000 | 0.1271938 | 47 |
| West Region | 2.594468 | 0.3153660 | 47 |
Table 2 shows differences in mean alcohol consumption across regions, along with variability and sample sizes.
Figure 4. Distribution of total alcohol consumption across US regions.
ggplot(region_data, aes(x = state_name, y = ethanol_all_drinks_gallons_per_capita)) +
geom_boxplot(fill = "lightyellow") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
x = "Region",
y = "Total ethanol (gallons per capita)"
)
The West region appears to have higher median consumption and greater variability compared to other regions (Figure 4).
anova_model <- aov(ethanol_all_drinks_gallons_per_capita ~ state_name,
data = region_data)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## state_name 3 2.477 0.8256 17.52 4.77e-10 ***
## Residuals 184 8.672 0.0471
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A one-way ANOVA was conducted to test whether mean alcohol consumption differs across U.S. regions.
The results indicate that there is a statistically significant difference between at least two regions (p < 0.001).
TukeyHSD(anova_model)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = ethanol_all_drinks_gallons_per_capita ~ state_name, data = region_data)
##
## $state_name
## diff lwr upr p adj
## Northeast Region-Midwest Region 0.04787234 -0.06823322 0.16397790 0.7088266
## South Region-Midwest Region -0.05617021 -0.17227578 0.05993535 0.5931770
## West Region-Midwest Region 0.24829787 0.13219231 0.36440344 0.0000006
## South Region-Northeast Region -0.10404255 -0.22014812 0.01206301 0.0964794
## West Region-Northeast Region 0.20042553 0.08431997 0.31653110 0.0000782
## West Region-South Region 0.30446809 0.18836252 0.42057365 0.0000000
shapiro.test(residuals(anova_model))
##
## Shapiro-Wilk normality test
##
## data: residuals(anova_model)
## W = 0.95286, p-value = 6.796e-06
This shows that The West region has significantly higher alcohol consumption than the Midwest, Northeast, and South regions. However there were no statistically significant differences observed between the Midwest, Northeast, and South regions. Also, the test for normality indicated that the created by the model deviate significantly from a normal distribution (p = 6.796e-06).
Regional differences in alcohol consumption are statistically significant, especially is regards to the West region.
Figure 5. Beer consumption per capita over time in the United States.
id="beerplot"
ggplot(national_data, aes(x = year, y = ethanol_beer_gallons_per_capita)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal() +
labs(
x = "Year",
y = "Beer consumption (gallons per capita)"
)
Beer consumption shows a strong downward trend over time, suggesting a long-term decline (Figure 5).
Figure 6. Wine consumption per capita over time in the United States.
ggplot(national_data, aes(x = year, y = ethanol_wine_gallons_per_capita)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal() +
labs(
x = "Year",
y = "Wine consumption (gallons per capita)"
)
Wine consumption shows an overall upward trend, indicating increasing popularity over time. There was however a dip in consumption in the 1990s.
Figure 7. Spirits consumption per capita over time in the United States.
ggplot(national_data, aes(x = year, y = ethanol_spirit_gallons_per_capita)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal() +
labs(
x = "Year",
y = "Spirits consumption (gallons per capita)"
)
Spirits consumption shows no strong visible linear trend over time.
cor_beer <- cor(national_data$year, national_data$ethanol_beer_gallons_per_capita, use = "complete.obs")
cor_wine <- cor(national_data$year, national_data$ethanol_wine_gallons_per_capita, use = "complete.obs")
cor_spirits <- cor(national_data$year, national_data$ethanol_spirit_gallons_per_capita, use = "complete.obs")
cor_beer
## [1] -0.957177
cor_wine
## [1] 0.6961335
cor_spirits
## [1] -0.1142831
The correlation analysis shows:
beer_model <- lm(ethanol_beer_gallons_per_capita ~ year, data = national_data)
wine_model <- lm(ethanol_wine_gallons_per_capita ~ year, data = national_data)
spirit_model <- lm(ethanol_spirit_gallons_per_capita ~ year, data = national_data)
summary(beer_model)
##
## Call:
## lm(formula = ethanol_beer_gallons_per_capita ~ year, data = national_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.121436 -0.014583 0.004625 0.022002 0.047387
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.4201110 0.6855023 23.95 <2e-16 ***
## year -0.0076018 0.0003427 -22.18 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03187 on 45 degrees of freedom
## Multiple R-squared: 0.9162, Adjusted R-squared: 0.9143
## F-statistic: 491.9 on 1 and 45 DF, p-value: < 2.2e-16
summary(wine_model)
##
## Call:
## lm(formula = ethanol_wine_gallons_per_capita ~ year, data = national_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.065555 -0.032827 0.001078 0.030648 0.065407
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.8792414 0.8056693 -6.056 2.58e-07 ***
## year 0.0026203 0.0004028 6.505 5.55e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03746 on 45 degrees of freedom
## Multiple R-squared: 0.4846, Adjusted R-squared: 0.4731
## F-statistic: 42.31 on 1 and 45 DF, p-value: 5.547e-08
summary(spirit_model)
##
## Call:
## lm(formula = ethanol_spirit_gallons_per_capita ~ year, data = national_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.19149 -0.11589 -0.03236 0.10894 0.29989
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.222303 3.128830 1.030 0.309
## year -0.001207 0.001564 -0.772 0.444
##
## Residual standard error: 0.1455 on 45 degrees of freedom
## Multiple R-squared: 0.01306, Adjusted R-squared: -0.008871
## F-statistic: 0.5955 on 1 and 45 DF, p-value: 0.4443
These models confirm the patterns observed in the plots correlations. Beer consumption shows a statistically significant decline over time (p < 0.001) while wine consumption shows a significant increase over time (p < 0.001). Spirits, on the other hand, is not signifcantly associated with year (P > 0.05).
Beverage-specific trends suggest clear shifts in alcohol preferences over time in the United States.
This section evaluates whether total alcohol consumption in U.S. states can be predicted using only beer, wine, or spirits consumption. A full model including all beverage types is also included for comparison.
id="split4"
set.seed(123)
sample_index <- sample(1:nrow(state_data), 0.8 * nrow(state_data))
train_data <- state_data[sample_index, ]
test_data <- state_data[-sample_index, ]
The dataset was split into training and testing sets to evaluate model generalisability, with an 80/20 train-test split to evaluate out-of-sample performance.
Four linear regression models were constructed:
model_beer <- lm(ethanol_all_drinks_gallons_per_capita ~ ethanol_beer_gallons_per_capita,
data = train_data)
model_wine <- lm(ethanol_all_drinks_gallons_per_capita ~ ethanol_wine_gallons_per_capita,
data = train_data)
model_spirits <- lm(ethanol_all_drinks_gallons_per_capita ~ ethanol_spirit_gallons_per_capita,
data = train_data)
model_full <- lm(ethanol_all_drinks_gallons_per_capita ~
ethanol_beer_gallons_per_capita +
ethanol_wine_gallons_per_capita +
ethanol_spirit_gallons_per_capita,
data = train_data)
Predictions were generated on the test set and evaluated using RMSE and R².
pred_beer <- predict(model_beer, newdata = test_data)
pred_wine <- predict(model_wine, newdata = test_data)
pred_spirits <- predict(model_spirits, newdata = test_data)
pred_full <- predict(model_full, newdata = test_data)
rmse_beer <- sqrt(mean((test_data$ethanol_all_drinks_gallons_per_capita - pred_beer)^2))
rmse_wine <- sqrt(mean((test_data$ethanol_all_drinks_gallons_per_capita - pred_wine)^2))
rmse_spirits <- sqrt(mean((test_data$ethanol_all_drinks_gallons_per_capita - pred_spirits)^2))
rmse_full <- sqrt(mean((test_data$ethanol_all_drinks_gallons_per_capita - pred_full)^2))
r2_beer <- 1 - sum((test_data$ethanol_all_drinks_gallons_per_capita - pred_beer)^2) /
sum((test_data$ethanol_all_drinks_gallons_per_capita - mean(test_data$ethanol_all_drinks_gallons_per_capita))^2)
r2_wine <- 1 - sum((test_data$ethanol_all_drinks_gallons_per_capita - pred_wine)^2) /
sum((test_data$ethanol_all_drinks_gallons_per_capita - mean(test_data$ethanol_all_drinks_gallons_per_capita))^2)
r2_spirits <- 1 - sum((test_data$ethanol_all_drinks_gallons_per_capita - pred_spirits)^2) /
sum((test_data$ethanol_all_drinks_gallons_per_capita - mean(test_data$ethanol_all_drinks_gallons_per_capita))^2)
r2_full <- 1 - sum((test_data$ethanol_all_drinks_gallons_per_capita - pred_full)^2) /
sum((test_data$ethanol_all_drinks_gallons_per_capita - mean(test_data$ethanol_all_drinks_gallons_per_capita))^2)
Table 4. Model performance comparison for predicting total alcohol consumption (test set).
model_comparison <- data.frame(
Model = c("Beer only", "Wine only", "Spirits only", "Full model"),
RMSE = c(rmse_beer, rmse_wine, rmse_spirits, rmse_full),
R2 = c(r2_beer, r2_wine, r2_spirits, r2_full)
)
knitr::kable(model_comparison,
caption = "Table 4. Predictive performance of alcohol consumption models (test set)")
| Model | RMSE | R2 |
|---|---|---|
| Beer only | 0.4867822 | 0.4331973 |
| Wine only | 0.4499380 | 0.5157519 |
| Spirits only | 0.2388452 | 0.8635431 |
| Full model | 0.0058068 | 0.9999193 |
Table 4 compares model performance using RMSE (error) and R² (explained variance). Lower RMSE and higher R² indicate better predictive performance.
The predictive performance of the models varies depending on which beverage type is used as the predictor. The spirits-only model performs best out of the single-predictor models, while wine shows a moderate level of predictive ability and beer performs the weakest in predicting total alcohol consumption. When all three beverage types are included together, the full model produces a highly accurate predictive model. This is expected because total alcohol consumption is directly made up of beer, wine, and spirits, so the variables are inherently linked rather than independent. This suggest that spirits are the most informative single predictor of total alcohol consumption, while beer contributes the least information when considered on its own.
This section examines whether states classified by their dominant beverage type (beer, wine, or spirits) differ in their average total alcohol consumption. The dominant beverage type is defined as the alcohol category (beer, wine, or spirits) with the highest average per-capita consumption within each state over the study period.
id="q5prep"
state_summary <- state_data %>%
group_by(state_name) %>%
summarise(
mean_beer = mean(ethanol_beer_gallons_per_capita, na.rm = TRUE),
mean_wine = mean(ethanol_wine_gallons_per_capita, na.rm = TRUE),
mean_spirits = mean(ethanol_spirit_gallons_per_capita, na.rm = TRUE),
mean_total = mean(ethanol_all_drinks_gallons_per_capita, na.rm = TRUE)
)
state_summary$dominant_beverage <- apply(
state_summary[, c("mean_beer", "mean_wine", "mean_spirits")],
1,
function(x) {
c("Beer", "Wine", "Spirits")[which.max(x)]
}
)
States were classified according to their dominant beverage type based on average consumption levels.
Table 5. Number of US states by dominant beverage type
knitr::kable(table(state_summary$dominant_beverage),
caption = "Table 5. Number of states by dominant beverage type")
| Var1 | Freq |
|---|---|
| Beer | 47 |
| Spirits | 4 |
id="q5states"
state_classification <- state_summary[, c("state_name", "dominant_beverage")]
Table 6. Classification of US states by dominant beverage type
knitr::kable(state_classification,
caption = "Table 6. State classification by dominant beverage type")
| state_name | dominant_beverage |
|---|---|
| Alabama | Beer |
| Alaska | Beer |
| Arizona | Beer |
| Arkansas | Beer |
| California | Beer |
| Colorado | Beer |
| Connecticut | Spirits |
| Delaware | Spirits |
| District Of Columbia | Spirits |
| Florida | Beer |
| Georgia | Beer |
| Hawaii | Beer |
| Idaho | Beer |
| Illinois | Beer |
| Indiana | Beer |
| Iowa | Beer |
| Kansas | Beer |
| Kentucky | Beer |
| Louisiana | Beer |
| Maine | Beer |
| Maryland | Beer |
| Massachusetts | Beer |
| Michigan | Beer |
| Minnesota | Beer |
| Mississippi | Beer |
| Missouri | Beer |
| Montana | Beer |
| Nebraska | Beer |
| Nevada | Beer |
| New Hampshire | Spirits |
| New Jersey | Beer |
| New Mexico | Beer |
| New York | Beer |
| North Carolina | Beer |
| North Dakota | Beer |
| Ohio | Beer |
| Oklahoma | Beer |
| Oregon | Beer |
| Pennsylvania | Beer |
| Rhode Island | Beer |
| South Carolina | Beer |
| South Dakota | Beer |
| Tennessee | Beer |
| Texas | Beer |
| Utah | Beer |
| Vermont | Beer |
| Virginia | Beer |
| Washington | Beer |
| West Virginia | Beer |
| Wisconsin | Beer |
| Wyoming | Beer |
Table 5 shows the distribution of states across dominant beverage categories. No states were found to be classified as wine dominant.
Table 6 shows how each state was classified based on the beverage type with the highest mean per-capita consumption over time.
Figure 8. Mean total alcohol consumption by dominant beverage type across US states.
ggplot(state_summary, aes(x = dominant_beverage, y = mean_total)) +
geom_boxplot(fill = "purple") +
theme_minimal() +
labs(
x = "Dominant Beverage Type",
y = "Mean total alcohol consumption"
)
This figure compares total alcohol consumption across states grouped by dominant beverage type. Spirits-dominant states appear to have higher average total consumption compared to beer-dominant states.
anova_q5 <- aov(mean_total ~ dominant_beverage, data = state_summary)
summary(anova_q5)
## Df Sum Sq Mean Sq F value Pr(>F)
## dominant_beverage 1 5.053 5.053 21.72 2.46e-05 ***
## Residuals 49 11.402 0.233
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A one-way ANOVA was conducted to test whether mean total alcohol consumption differs across dominant beverage groups. The results indicate a statistically significant difference between groups (p < 0.001).
TukeyHSD(anova_q5)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = mean_total ~ dominant_beverage, data = state_summary)
##
## $dominant_beverage
## diff lwr upr p adj
## Spirits-Beer 1.170813 0.6659116 1.675714 2.46e-05
shapiro.test(residuals(anova_q5))
##
## Shapiro-Wilk normality test
##
## data: residuals(anova_q5)
## W = 0.96107, p-value = 0.09264
Post-hoc comparisons show that spirits-dominant states have significantly higher total alcohol consumption than beer-dominant states. No meaningful comparisons involving wine-dominant states are possible, as no states fall into this category. Then, the Shapiro-Wilk test was conducted to assess the normality and found a result of p = 0.09264, indicating a normal distribution.
States were primarily classified into beer-dominant and spirits-dominant groups, with no states identified as wine-dominant. This suggests that wine is not the primary beverage type in any state when averaged over time.
The analysis also shows that spirits-dominant states tend to have higher total alcohol consumption compared to beer-dominant states, and this difference is statistically significant.
However, this result should be interpreted with caution because the spirits-dominant group contains very few states, which limits how generalisable the finding is across the broader population.
The results show that total alcohol consumption in the United States has generally declined over time, although the trend is not strictly linear. This suggests that consumption patterns fluctuate across different periods.
There are also clear regional differences in consumption, with the West region showing significantly higher average alcohol intake compared to other US regions. These differences may reflect underlying cultural or demographic variation, but the analysis does not include additional variables such as income, policy differences, or population structure, which limits interpretation.
Beverage-specific trends indicate a shift in alcohol preferences over time. Beer consumption has decreased substantially, wine consumption has increased, and spirits consumption has remained relatively stable. This suggests that changes in total alcohol consumption are partly driven by changing beverage preferences rather than uniform changes across all alcohol types. Additionally, redictive modelling shows that spirits consumption is the strongest single predictor of total alcohol intake, while beer is the weakest.
At the state level, spirits-dominant states tend to have higher total consumption than beer-dominant states, but this result is limited by the small number of spirits-dominant states and the absence of wine-dominant states.
The findings of this analysis suggest several implications for public health interpretation and potential policy focus. Specfically, the observed decline in beer consumption alongside increasing wine consumption suggests a shift in beverage preference rather than a uniform reduction in alcohol use. Public health messaging may benefit from reflecting these changing patterns, rather than focusing solely on overall consumption. Further, the consistently higher consumption observed in certain regions, particularly the West, suggests that alcohol-related interventions may need to be tailored geographically. Regional variation indicates that national-level strategies may not fully capture local risk profiles. Also, the relatively strong association between spirits consumption and total alcohol intake suggests that spirits may serve as a useful indicator for identifying higher-consumption populations.
Future research could benefit from incorporating sociocultural context alongside consumption data.
Li, R., Hu, L., Hu, L., Zhang, X., Phipps, R., Fowler, D. R., Chen, F., & Li, L. (2017). Evaluation of Acute Alcohol Intoxication as the Primary Cause of Death: A Diagnostic Challenge for Forensic Pathologists. Journal of Forensic Sciences, 62(5), 1213–1219. https://doi.org/10.1111/1556-4029.13412
National Institute on Alcohol Abuse and Alcoholism. (2024, December). What is A standard drink? | national institute on alcohol abuse and alcoholism (NIAAA). Nih.gov. https://www.niaaa.nih.gov/alcohols-effects-health/what-standard-drink
Shahrukh, S. I. (2023). US Alcohol Consumption by State 1977–2023. Kaggle.com. https://www.kaggle.com/datasets/sanaijlalshahrukh/us-alcohol-consumption-by-state-19772023
Stahre, M., Roeber, J., Kanny, D., Brewer, R. D., & Zhang, X. (2014). Contribution of Excessive Alcohol Consumption to Deaths and Years of Potential Life Lost in the United States. Preventing Chronic Disease, 11. https://doi.org/10.5888/pcd11.130293
World Health Organization. (2018, September 27). Global status report on alcohol and health 2018. Www.who.int. https://www.who.int/publications/i/item/9789241565639