Final Project

cat('<img src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/..." width="600">')
<img src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/..." width="600">

Introduction

Tobacco use is one of the most persistent public health challenges in the United States. Despite decades of anti-smoking campaigns, educational efforts, and policy changes, millions of adults still use tobacco in its many forms — not just cigarettes, but also cigars, pipe tobacco, snuff, and chewing tobacco. While much attention is often given to cigarette smoking, other forms of tobacco use remain widespread and sometimes overlooked in national conversations.

I chose this topic because of a personal connection: my dad used to chew tobacco. I used to tease him and say that chewing tobacco was such an “old man thing” to do, but he always insisted that he still knew plenty of people who used it — and not just older men. That stuck with me. It made me realize how different types of tobacco have continued to be used by different groups of people, even in a time when smoking rates have declined. I’m proud to say my dad has now been tobacco-free for five years, but our conversations about his past use got me curious about what kinds of tobacco products are still being used today, and in what quantities.

For this project, I analyzed a dataset from the Centers for Disease Control and Prevention (CDC), which contains national-level data on adult tobacco consumption in the U.S. from 2000 to the present. The dataset includes a breakdown of different product types — such as cigarettes, cigars, and smokeless tobacco — along with measurements of total volume and per capita consumption. Variables include both quantitative values like “Total” tobacco consumed and “Total_Per_Capita”, as well as categorical variables like “Submeasure” (the specific tobacco product), “Topic” (such as “Combustible” or “Noncombustible”), and “Data_Value_Unit” (e.g., pounds or cigarette equivalents).

Although the dataset doesn’t include a detailed ReadMe file, it comes directly from the CDC and appears to be compiled from national tobacco sales and consumption data. Because of this, I feel confident using it to explore trends in the types of tobacco products still being used by adults in the U.S. today.

In this project, I focus on data from the year 2020 to explore which tobacco products are still being consumed the most, how usage differs between combustible and noncombustible forms, and how total usage relates to per capita exposure. By doing this, I hope to better understand how tobacco use in the U.S. is changing — and which products are sticking around, even when they seem “old school” at first glance.

Background

Tobacco use in the United States has steadily declined over the past several decades, especially cigarette smoking, which fell from about 42% of adults in 1965 to just under 13% in recent years (CDC, 2022). However, the use of other tobacco products such as cigars, snuff, and chewing tobacco has persisted, particularly among adult men, rural populations, and certain regions of the U.S. According to the U.S. Department of Health and Human Services, smokeless tobacco products — like chewing tobacco and snuff — are still widely used by more than 5 million adults, many of whom may not view these products as harmful as cigarettes. Research from the National Institute on Drug Abuse also shows that many people who quit smoking cigarettes sometimes switch to other tobacco forms instead of quitting altogether. These trends help explain why certain “old school” tobacco products, like the kind my dad used to use, are still prevalent today.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(readr)
library(plotly) 

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
tobacco <- read_csv("adult_tobacco_consumption_2000_present_cdc.csv")
Rows: 312 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): LocationAbbrev, LocationDesc, Topic, Measure, Submeasure, Data Valu...
dbl (8): Year, Population, Domestic, Imports, Total, Domestic Per Capita, Im...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tobacco_2020 <- tobacco %>%
  filter(Year == 2020) %>%
  select(Topic, Measure, Submeasure, `Data Value Unit`, Total, `Total Per Capita`)
colnames(tobacco)
 [1] "Year"                "LocationAbbrev"      "LocationDesc"       
 [4] "Population"          "Topic"               "Measure"            
 [7] "Submeasure"          "Data Value Unit"     "Domestic"           
[10] "Imports"             "Total"               "Domestic Per Capita"
[13] "Imports Per Capita"  "Total Per Capita"   
ggplot(tobacco_2020, aes(x = reorder(Submeasure, -`Total Per Capita`), 
                         y = `Total Per Capita`, 
                         fill = Topic)) +
  geom_col() +
  labs(title = "Per Capita Tobacco Use by Product Type (2020)",
       x = "Tobacco Product", y = "Per Capita Use",
       caption = "Source: CDC Adult Tobacco Consumption Data") +
  theme_light() +
  scale_fill_brewer(palette = "Set2") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Per capita tobacco consumption by product type in 2020. Cigarettes still dominate, but smokeless tobacco and cigars maintain a noticeable share. Source: CDC Adult Tobacco Consumption Data.

ggplot(tobacco_2020, aes(x = reorder(Submeasure, -Total), 
                         y = Total, 
                         fill = `Data Value Unit`)) + 
  geom_col() +
  labs(title = "Total Tobacco Volume by Product Type (2020)",
       x = "Tobacco Product", y = "Total Volume",
       caption = "Source: CDC Adult Tobacco Consumption Data") +
  theme_classic() +
  scale_fill_viridis_d() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Total tobacco volume (in pounds or cigarette equivalents) consumed in 2020, broken down by product type and unit. Combustible products, especially cigarettes, account for the bulk of total tobacco consumption. Source: CDC.

library(plotly)

p <- ggplot(tobacco_2020, aes(x = Total, 
                              y = `Total Per Capita`, 
                              color = Topic, 
                              label = Submeasure)) +
  geom_point(size = 4) +
  labs(title = "Per Capita vs Total Tobacco Use (2020)",
       x = "Total Volume", 
       y = "Per Capita Use",
       caption = "Source: CDC Adult Tobacco Consumption Data") +
  theme_light() +
  scale_color_brewer(palette = "Set1")

ggplotly(p)

Scatterplot showing the relationship between total volume and per capita tobacco consumption in 2020. Tobacco types with high overall use also tend to have higher per capita rates, but there are exceptions. Source: CDC.

model <- lm(`Total Per Capita` ~ Total, data = tobacco_2020)
summary(model)

Call:
lm(formula = `Total Per Capita` ~ Total, data = tobacco_2020)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.66174 -0.06400 -0.06327  0.06805  0.55513 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.396e-02  1.084e-01    0.59    0.567    
Total       3.889e-09  1.129e-12 3445.79   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3514 on 11 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 1.187e+07 on 1 and 11 DF,  p-value: < 2.2e-16

Conclusion

This project gave me a chance to dive deeper into adult tobacco use in the U.S., and I ended up learning a lot more than I expected. I had always assumed that cigarette smoking was the main form of tobacco use today, but it turns out that other products — like cigars and smokeless tobacco — are still used by millions of people. I found it especially interesting that products like chewing tobacco, which I thought were outdated, are still being consumed regularly. It made me realize that tobacco habits are not just about trends, but also about culture, habit, and personal preference.

One surprising pattern I noticed was that even though cigarettes had the highest total volume, other products like snuff had high per capita use among smaller groups. It suggests that even if a product isn’t popular across the whole population, it can still have a big impact on specific subgroups.

There were a few things I wanted to explore further, but couldn’t due to limitations in the data. For example, the dataset only covered national-level information, so I wasn’t able to compare across states or regions. I also wished I had access to age, gender, or race-based breakdowns to explore who is using what and why. If I had more time or data, I would have loved to look at how tobacco use differs between younger and older adults, or between urban and rural areas.

Overall, I’m really glad I chose this topic. It felt meaningful because of my dad’s story, and it helped me better understand the broader landscape of tobacco use in America today.

Works Cited

Centers for Disease Control and Prevention. (2022). Current Cigarette Smoking Among Adults in the United States. https://www.cdc.gov/tobacco/data_statistics/fact_sheets/adult_data/cig_smoking/index.htm

U.S. Department of Health and Human Services. (2020). Smokeless Tobacco: Health Effects. https://www.nidcr.nih.gov

National Institute on Drug Abuse. (2021). Tobacco, Nicotine, and E-Cigarettes Research Report. https://nida.nih.gov/publications/research-reports/tobacco-nicotine-e-cigarettes