Adult Tobacco Use

Author

Wilfried Bilong

Adult Tobacco Use in the United States

Background

I’m using the Adult Tobacco Use data set for this project. This data set shows figures for adult tobacco use in the United States over a period of 20 years (2000 - 2020). It looks at the population using tobacco in the U.S. from 2000 to 2020, and includes other variables like the type of tobacco being used, the domestic imports Per Capita and so on.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)
library(readr)
library(RColorBrewer)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(highcharter)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use
adultTobaccoUseUS_cdc <- read_csv("Desktop/Data 110/adultTobaccoUseUS.cdc.csv")
Rows: 273 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): LocationAbbrev, LocationDesc, Topic, Measure, Submeasure, Data Valu...
dbl (2): Year, Imports Per Capita
num (6): Population, Domestic, Imports, Total, Domestic Per Capita, Total Pe...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(adultTobaccoUseUS_cdc, aes(x = Year, y = Population, color = Measure)) +
  geom_line() +
  geom_point() +
  theme_classic(base_size = 12) +
  labs(title = "Population Using Tobacco Over the Years", caption = "Source = CDC")

This is a simple data visualization showing the increase in tobacco use overtime through the years. This also shows that in modern times smokeless tobacco seems to be the most used kind.

Graph_1 <- 
  ggplot(adultTobaccoUseUS_cdc, aes(x = Year, y = Domestic, color = Measure)) +  theme_light(base_size = 12) + 
  labs(title = "Domestic purchase of Tobacco over the Years", caption = "Source = CDC")
Graph_2 <- Graph_1 +
  geom_bar(aes(x = Year, y = Domestic, color = Measure), 
           position = "dodge", stat = "identity")

Graph_2

While in the first graph we saw that the population of those using tobacco over the years slowly but surely rose, assumption being people in general were buying more, yet here we see a different story. From 2000 to 2020 the domestic purchase (within the U.S.) of tobacco as a whole seems to have been decreasing over the years. It may help to take a close look at the specific measure of tobacco to see if we can answer any questions.

Steps Towards Linear Regression

Lin_Reg <-
  ggplot(adultTobaccoUseUS_cdc, aes(x = `Domestic Per Capita`, y = `Imports Per Capita`)) +
   theme_light(base_size = 12) +
   labs(title = "Domestic vs Imports Per Capita", x = "Domestic", y = "Imports", caption = "Source = CDC") + 
   geom_point(aes(color = Year)) 
Lin_Reg

Lin_Reg2 <- 
  Lin_Reg + geom_smooth(color = "purple")
Lin_Reg2
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

From this graph we find a relatively positve relationship between domestic and imports per capita. In a simple sense this is probably showing that the the tobacco bought domestically was originally imported. The numbers tell us that the more a person seems to buy Domestically the more they become acquainted with tobacco. My theory is that as people get used to the product the more they want authentic product. One other interesting find supporting my previous claim is that the highest correlation between domestic and imports per capita can be found in 2020.

cor(adultTobaccoUseUS_cdc$`Domestic Per Capita`, adultTobaccoUseUS_cdc$`Imports Per Capita`)
[1] 0.8866531

The correlation is almost perfectly linear. (Like I alluded to) The variables have an incredibly strong association to each other.

Equation <- lm(`Imports Per Capita` ~ `Domestic Per Capita`, data = adultTobaccoUseUS_cdc)
summary(Equation)

Call:
lm(formula = `Imports Per Capita` ~ `Domestic Per Capita`, data = adultTobaccoUseUS_cdc)

Residuals:
    Min      1Q  Median      3Q     Max 
-33.149  -3.350  -3.344  -0.192  86.106 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           3.343500   0.783354   4.268 2.73e-05 ***
`Domestic Per Capita` 0.044007   0.001394  31.564  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.84 on 271 degrees of freedom
Multiple R-squared:  0.7862,    Adjusted R-squared:  0.7854 
F-statistic: 996.3 on 1 and 271 DF,  p-value: < 2.2e-16

IPC stands for Imports Per Capita and DPC stands for Domestic Per Capita.

IPC = 0.044(DPC) + 3.344

These two variables have a very strong correlation that is almost linear. The Point of all this is to help us understand the trends we find between 2000 and 2020. There is a reason why people seem to have decreased their use of tobacco over the years and it’s important not to allow population to deceive us. One theory I can postulate and look deeper into is that in the last 5 years a specific category of adults have been using tobacco and especially those that are imported from outside rather than grown, produced, or purchased here domestically.

Data Visualization

highchart() |> 
  hc_add_series(data = adultTobaccoUseUS_cdc, 
                type = "Line", hcaes(x = Year, 
                                     y = `Total Per Capita`, 
                                     group = `Imports Per Capita`)) |> 
  hc_xAxis(title = list(text = "Year")) |> 
  hc_yAxis(title = list(text = "TPC")) |>
  hc_legend(title = list(text = "IPC"))