Introduction

The World Health Organization (WHO) Global Health Observatory collected data on alcohol consumption across the world. This descriptive analysis explores global alcohol consumption trends using data from the World Health Organization (WHO) Global Health Observatory. The dataset measures per capita alcohol consumption in liters of pure alcohol across countries worldwide, spanning from 2001 through 2021. Over this 21-year window, the world experienced economic booms, recessions, shifting public health policies, and most significantly, the COVID-19 pandemic. The data has been structured and visualized to provide general descriptive analytics of alcohol consumption across the world and across multiple decades.

# Libraries
library(data.table)
library(dplyr)
library(ggplot2)
library(scales)
library(ggrepel)
library(RColorBrewer)
library(ggthemes)
library(leaflet)
library(htmlwidgets)
library(plotly)

# Read in data, dropping columns not needed for analysis
df <- fread("who_alcohol_total_per_capita_2000_2022_clean.csv", na.strings=c(NA,""), 
            drop = c("sex", "IndicatorCode", "lower_ci", "upper_ci", "ci_width"))

Dataset & Preparation

The data used for the visualizations focused on three specific categories. The first was ‘country’, character data spelling out the name of the country associated with the respective data entry. The second, ‘year’, represented temporal data captured as integers. The third category, ‘alcohol_liters_per_capita’, denoted a numerical value reflecting the volume of alcohol consumption per capita.

The original dataset spanned the years 2000 to 2022; however, it was observed that duplicate values were recorded for both 2000/2001 and 2021/2022. The data from the years 2000 and 2022 were excluded and the resulting dataset contains 3,948 observations across 188 countries and 21 years (2001–2021).

# 2000 and 2001 have identical values across all columns (duplicate data carried forward)
# 2021 and 2022 have identical values across all columns (duplicate data carried forward)
# Removing 2000 and 2022 to eliminate duplicate data points

#identify all rows where year is 2000
rows_2000 <- which(df$year == 2000)

#identify all rows where year is 2022
rows_2022 <- which(df$year == 2022)

#combine rows for 2000 and 2022 into single variable
rows_to_drop <- c(rows_2000, rows_2022)

#remove rows for 2000 and 2022 from the data set
df <- df[-rows_to_drop,]

#identify the most recent year in the data set (2021)
max_year <- max(df$year)

Findings

The findings provided are based on a limited set of data; however, the descriptive analysis is intended to depict multiple relationships between the data collected. The visualizations begin with a global view of total consumption by year, then identify which countries lead the world in drinking, map their geographic concentration, then the United States alcohol consumption is brought in as a point of comparison, and finally the is placed U.S. side-by-side against the top 10 alcohol consuming nations.

Total Alcohol Consumption by Year

The chart below aggregates per capita consumption across all countries for each year, providing a global view of worldwide drinking trends.

Global consumption held remarkably steady from 2001 through the mid-2010s, fluctuating within a narrow band. The most dramatic shift occurs in the final years of the dataset. A visible decline in 2020 and 2021 coincides directly with the COVID-19 pandemic. Worldwide lockdowns, the closure of bars and restaurants, issues with supply chain, and economic downturn all potentially contributed to reduced alcohol intake across many nations.

#structure data to aggregate total alcohol consumption across all countries for each year

#create data frame with total consumption summed across all countries by year
hist_df <- df %>%
  select(year, alcohol_liters_per_capita) %>%
  group_by(year) %>%
  summarise(total = sum(alcohol_liters_per_capita), .groups = 'keep') %>%
  data.frame()

#display total alcholo consumption per year in histogram
chart1 <- ggplot(hist_df, aes(x = year, y = total)) +
  geom_bar(color = "black", fill = "blue", stat = "identity") +
  labs(title = "Total Alcohol Consumption by Year", x = "Year", y = "Total Alcohol (Liters per Capita)") +
  scale_y_continuous(labels = comma) +
  geom_text(aes(label = scales::comma(round(total, 0))), vjust = -0.5, size = 3) +
  theme(plot.title = element_text(hjust = 0.5))

#add labels for all the years in the data set to x axis, space them, and angle at 45 degrees
x_axis_labels <- min(hist_df$year):max(hist_df$year)
chart1 <- chart1 + scale_x_continuous(labels = x_axis_labels, breaks = x_axis_labels) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
chart1

Top 5 Countries Over Time

The chart below tracks the five countries with the highest average per capita alcohol consumption across the entire 2001–2021 period. The legend is ordered by each country’s consumption level in 2001, to visually track the change in the rankings across the two decades.

Several notable trends are present in this visualization. Romania consistently ranked as the world’s top alcohol-consuming country during most of the period, reaching a peak of nearly 19.5 liters per person in the early 2000s. Although the numbers have decreased over time, Romania still held the highest rate at over 17 liters in 2021. In contrast, Burkina Faso exhibited steady consumption levels between 17 and 18 liters for almost twenty years; however, after 2019, it saw a dramatic drop—from more than 17 liters to about 7.5 liters by 2021. This was one of the steepest declines in the dataset and although the lowest consumption levels coincide with the pandemic, Burkina Faso was on a downward trend starting in 2010.

Czechia remained steady around 14 liters throughout, demonstrating little year-to-year volatility. Georgia showed the most dramatic upward trajectory, climbing from roughly 12 liters in 2001 to over 15 liters by 2021. Lithuania displayed the most volatile pattern, rising sharply through the 2010s before declining after 2016, but is trending upward as of 2019.

#calculate the average alcohol consumption per country across all years
agg_total <- df %>%
  select(country, alcohol_liters_per_capita) %>%
  group_by(country) %>%
  summarise(avg = mean(alcohol_liters_per_capita), .groups = 'keep') %>%
  data.frame()

#sort by average consumption in descending order
agg_total <- agg_total[order(agg_total$avg, decreasing = TRUE),]

#create top5_countries variable to group the top 5 countries with highest average alcohol consumption
top5_countries <- agg_total$country[1:5]


#new data frame filtered to only top 5 countries with country, year, and consumption columns
top5_df <- df %>%
  filter(country %in% top5_countries) %>%
  select(country, year, alcohol_liters_per_capita) %>%
  data.frame()

#set legend order based on 2001 consumption values from highest to lowest
country_order <- top5_df[top5_df$year == 2001,]
country_order <- country_order[order(country_order$alcohol_liters_per_capita, decreasing = TRUE),]
country_order <- country_order$country

#sets the level, which sets the order for the legend to match 2001 values highest to lowest
top5_df$country <- factor(top5_df$country, levels = country_order)

#create variable for x axis year labels
x_axis_labels <- min(top5_df$year):max(top5_df$year)

#multiple line chart with data points, year labels angled at 45 degrees, Set1 color palette
ggplot(top5_df, aes(x = year, y = alcohol_liters_per_capita, group = country)) +
  geom_line(aes(color = country), size = 2) +
  labs(title = "Top 5 Countries Alcohol Consumption by Year (2001-2021)",
       x = "Year",
       y = "Alcohol (Liters per Capita)",
       color = "Country") +
  scale_y_continuous(labels = comma) +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_point(shape = 21, size = 3, color = 'black', fill = 'white') +
  scale_x_continuous(labels = x_axis_labels, breaks = x_axis_labels, minor_breaks = NULL) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_color_brewer(palette = "Set1")

Top 5 Countries Map (2021)

The interactive map below pinpoints the five highest alcohol-consuming countries for the most recent year in the dataset (2021). This visualization provides geographic demographic information. The markers contain the country’s name, consumption level, and year. The marker size is proportional to per capita consumption.

An interesting geographic trend is apparent: in 2021, the five countries with the highest alcohol consumption are all located in Eastern Europe and the Caucasus region. Romania tops the list with more than 17 liters per person, followed by Georgia, Latvia, Czechia, and Lithuania. The regional clustering is perhaps due to cultural traditions that involve production of alcohol such as wine and vodka.

#filter data to only include the most recent year (2021)
map_df <- df[df$year == max_year,]

#sort by alcohol consumption in descending order
map_df <- map_df[order(map_df$alcohol_liters_per_capita, decreasing = TRUE),]

#select the top 5 countries with highest alcohol consumption
map_top5 <- data.frame(map_df[1:5,])

#identify approximate latitude and longitude for the top 5 countries
Romania <- c(45.9432, 24.9668)
Georgia <- c(42.3154, 43.3569)
Latvia <- c(56.8796, 24.6032)
Czechia <- c(49.8175, 15.4730)
Lithuania <- c(55.1694, 23.8813)

#create data frame for latitude and longitude coordinates
gps_df <- data.frame(rbind(Romania, Georgia, Latvia, Czechia, Lithuania))

#rename the columns
colnames(gps_df) <- c("Lat", "Long")

#add country names from row names into a column
gps_df$country <- row.names(gps_df)

#change the lat and long from char to numeric
gps_df$Lat <- as.numeric(gps_df$Lat)
gps_df$Long <- as.numeric(gps_df$Long)

#add alcohol consumption values to gpr_df
gps_df$alcohol <- map_top5$alcohol_liters_per_capita

#map with formatted popups showing country, consumption, and year; circle size proportional to consumption
m <- leaflet() %>%
  addProviderTiles(providers$OpenStreetMap) %>%
  setView(lng = 28.0, lat = 50.0, zoom = 4) %>%
  addCircles(
    lng = gps_df$Long,
    lat = gps_df$Lat,
    opacity = 10,
    color = "red",
    fillColor = "blue",
    fillOpacity = 0.5,
    popup = paste(gps_df$country, "<br>",
                  "Alcohol Consumption:", round(gps_df$alcohol, 2), "liters per capita", "<br>",
                  "Year:", max_year),
    label = paste(gps_df$country, "-", round(gps_df$alcohol, 2), "L/capita"),
    radius = gps_df$alcohol * 5000) %>%
  addLegend(position = "bottomright",
            colors = "red",
            labels = "Alcohol Consumption",
            title = paste("Top 5 Countries (", max_year, ")"),
            opacity = 1)
m

United States Over Time

The line chart below isolates the United States, revealing a clear and consistent upward trend in alcohol consumption from 2001 to 2021. The highest and lowest data points are highlighted in red with their values labeled for emphasis.

U.S. consumption started at its lowest recorded level in 2001 at 8.81 liters per capita and climbed steadily over two decades to reach 9.81 liters by 2021. This is approximately a 10% increase across the dataset. It is a steady upward trend, with the only significant declines occurring during “The Great Recession”.

The U.S. had the greatest upward trend in 2019 which continued into the pandemic. Interestingly, consumption reached its all-time high of 9.81 liters in 2021, the final year of the dataset. This is in stark contrast to the data shown in the “Total Alcohol Consumption by Year” histogram, where the global consumption was at the lowest in the data set.

#filter data set to only include United States
us_df <- df[df$country == "United States",]

#sort by year in ascending order
us_df <- us_df[order(us_df$year),]

#identify the high and low data points for United States consumption
hi_low <- us_df %>%
  filter(alcohol_liters_per_capita == min(alcohol_liters_per_capita) | alcohol_liters_per_capita == max(alcohol_liters_per_capita)) %>%
  data.frame()

#create variable for x axis year labels
x_axis_labels <- min(us_df$year):max(us_df$year)

#line chart with high and low points highlighted in red and labeled
ggplot(us_df, aes(x = year, y = alcohol_liters_per_capita)) +
  geom_line(color = 'black', size = 1) +
  geom_point(shape = 21, size = 4, color = 'red', fill = 'white') +
  labs(x = "Year",
       y = "Alcohol (Liters per Capita)",
       title = "United States Alcohol Consumption (2001-2021)",
       caption = "Source: WHO Global Health Observatory") +
  scale_y_continuous(labels = comma) +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_x_continuous(labels = x_axis_labels, breaks = x_axis_labels, minor_breaks = NULL) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  geom_point(data = hi_low, aes(x = year, y = alcohol_liters_per_capita), shape = 21, size = 4, fill = 'red', color = 'red') +
  geom_label_repel(aes(label = ifelse(alcohol_liters_per_capita == max(alcohol_liters_per_capita) | alcohol_liters_per_capita == min(alcohol_liters_per_capita), round(alcohol_liters_per_capita, 2), "")),
                   box.padding = 1,
                   point.padding = 2,
                   size = 4,
                   color = 'Grey50', segment.color = 'black')

Heatmap: Top 10 vs USA

The heatmap below places the top 10 alcohol-consuming countries alongside the United States across all 21 years to visualize how the United States compares countries with the top alcohol consumption. Darker red tiles indicate higher consumption; lighter tiles indicate lower levels. Each tile is labeled with the exact value in liters per capita.

The visual contrast shows Romania occupies the darkest red row across nearly every year, reinforcing its position as the world’s leading consumer throughout the time frame of the data set. The United States row at the bottom is noticeably lighter than every other country on the chart which is a visual confirmation that American drinking operates on a fundamentally lower scale than the global top tier.

At roughly 9–10 liters per capita, the U.S. consumes about half of what Romania does and sits meaningfully below even the 10th-ranked country. The gap is consistent and persistent across all 21 years. There is no year in which the U.S. approaches the consumption levels of even the lower-ranked nations in this comparison.

#create data frame for countries by average consumption across all years
heat_avg <- df %>%
  select(country, alcohol_liters_per_capita) %>%
  group_by(country) %>%
  summarise(avg = mean(alcohol_liters_per_capita), .groups = 'keep') %>%
  data.frame()

#sort by average consumption in descending order
heat_avg <- heat_avg[order(heat_avg$avg, decreasing = TRUE),]

#select top 10 country names with highest average alcohol consumption
heat_top10 <- heat_avg$country[1:10]

#combine top 10 countries with United States into a single list for a total of 11 countries
heat_countries <- c(heat_top10, "United States")

#new data frame filtered to top 10 countries and the US with consumption by country and year
heat_df <- df %>%
  filter(country %in% heat_countries) %>%
  select(country, year, alcohol_liters_per_capita) %>%
  group_by(country, year) %>%
  summarise(n = sum(alcohol_liters_per_capita), .groups = 'keep') %>%
  data.frame()

heat_df$year <- as.factor(heat_df$year)
#change year from numeric to factor (discrete) for heatmap x axis

#creates variable mylevels for ordering countries, top 10 first then United States
mylevels <- c(heat_top10, "United States")
heat_df$country <- factor(heat_df$country, levels = mylevels)

#variable to create segments in 2 liter increments for the legend
breaks <- c(seq(0, max(heat_df$n), by = 2))

#heatmap showing top 10 countries and us with consumption by country and year
g <- ggplot(heat_df, aes(x = year, y = country, fill = n)) +
  geom_tile(color = "black") +
  geom_text(aes(label = round(n, 1)), size = 2.5) +
  coord_equal(ratio = 1) +
  labs(title = "Heatmap: Alcohol Consumption by Country vs USA (2001-2021)",
       x = "Year",
       y = "Country",
       fill = "Liters per Capita") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_discrete(limits = rev(levels(heat_df$country))) +
  scale_fill_continuous(low = "white", high = "red", breaks = breaks, labels = comma) +
  guides(fill = guide_legend(reverse = TRUE, override.aes = list(color = "black"))) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

#convert ggplot heatmap to interactive plotly, hover shows consumption, year, and country
gg <- ggplotly(g, tooltip = c("n", "year", "country"), height = 500) %>%
  style(hoverlabel = list(bgcolor = "white"))
gg

Conclusion

The visualizations created as part of the descriptive analysis of WHO alcohol consumption data from 2001 to 2021 present three notable observations.

First, Eastern Europe dominates global alcohol consumption. Romania, Czechia, Latvia, Lithuania, and Georgia consistently rank among the world’s heaviest-drinking nations, with per capita levels often exceeding 14 liters of pure alcohol annually.

Second, the United States follows a steady, persistent upward trend. Over 21 years, U.S. consumption rose from 8.81 to 9.81 liters per capita. While this still places the U.S. well below the top 10 globally (roughly half of Romania’s consumption), the consistent year-over-year climb, especially during the pandemic raises some concerns of the trajectory the U.S. is on.

Lastly, COVID-19 had a divergent impact on drinking behavior across the world. The pandemic drove a measurable decline in global alcohol consumption; however, the United States moved in the opposite direction, reaching its all-time recorded high in 2021. This divergence likely reflects the shift to at-home drinking, expanded delivery services, and pandemic-era stress. The contrast between the global and American response to COVID-19 is one of the most notable findings in this dataset.