I chose this Data set because it provides a rich source of data on a topic that is both timely and important.Tobacco use is a major public health concern and is the leading cause of preventable disease, disability and death in the United States. Nearly 40 million U.S adults still smoke ciggarettes. 3.08 million middle and high school students use at least one tobacco product including 3-ciggarettes. I got this data set from the center for disease control and prevention. This dataset highlights the trends in adult total and per capita consumption of both combustilble tobacco from 2000 to present There are 15 variables in this data set and 273 observations. I got this data set from This is the link to the data set I used https://www.cdc.gov/statesystem/featured-datasets/index.html An article in the CDC page states that a characteristic of adults who used smokeless tobacco in 2020 is that more than 2 in every 100 adults aged 18 or older reported current use of smokeless tobacco products. Which represents 5.7 million adults.( https://www.cdc.gov/tobacco/data_statistics/fact_sheets/smokeless/use_us/index.htm#adult-national) I am going to look at the relationship between the total tobacco consumption per capita and different measures related to tobacco use in the US in 2022.

1. Year: This is a numerical variable which represents the year in which the data was collected.

2.Locationabbrev: This categorical variable represents where the data was collected from which is The US.

3.Locationdesc: This variable is a categorical variable which shows that the data is all from national

4.Population: This numerical variable represents the size of people who use tobacco

5.Topic: This variable represents what kind of tobacco depending on whether it is combustible tobacco or non-combustible tobacco

6.Measure:This categorical variable indicates the type of tobacco products. This shows how the tobacco is distributed. Whether its loose tobacco, smokeless tobacco cigar or all combustibles

7.Submeasure: This is a more detailed subcategory of measure

8. Data.value.unit

9.Domestic:This is a numerical variable that shows how much tobacco is produced in the country

10: Imports: This is a numerical variable that represents the imported value of tobacco being brought in to the country

11.Total: This numerical variable represents the both imports and domestic amount of tobacco in the country

12.Domestic per capita: This numerical variable shows the amount of tobacco used domestically per person in a given time period.

13: Imports per capita: This is a numerical variable that represents the amount of tobacco imported from other countries per person in a given time period.

15. Total per capita: This is a numerical variable that represents the total value of the measure being used including both dometic and imported values.

Loading packeges

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(tmap)
## Warning: package 'tmap' was built under R version 4.2.3
library(tmaptools)
## Warning: package 'tmaptools' was built under R version 4.2.3
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.2.3
library(sf)
## Warning: package 'sf' was built under R version 4.2.3
## Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(leaflet.extras)
## Warning: package 'leaflet.extras' was built under R version 4.2.3
library(dplyr)
library(rio)
## Warning: package 'rio' was built under R version 4.2.3
library(sp)
## Warning: package 'sp' was built under R version 4.2.3
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:rio':
## 
##     export
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

Set working directory and load the dataset

setwd("C:/Users/amani/OneDrive/Desktop/Data110")
adultTobaccoUseUS <- read.csv("adultTobaccoUseUS.csv")
head(adultTobaccoUseUS)
##   Year LocationAbbrev LocationDesc  Population                  Topic
## 1 2000             US     National 209,786,736 Noncombustible Tobacco
## 2 2000             US     National 209,786,736    Combustible Tobacco
## 3 2000             US     National 209,786,736    Combustible Tobacco
## 4 2000             US     National 209,786,736    Combustible Tobacco
## 5 2000             US     National 209,786,736    Combustible Tobacco
## 6 2000             US     National 209,786,736    Combustible Tobacco
##             Measure          Submeasure       Data.Value.Unit        Domestic
## 1 Smokeless Tobacco     Chewing Tobacco                Pounds      45,502,156
## 2        Cigarettes  Cigarette Removals            Cigarettes 423,250,355,675
## 3            Cigars        Total Cigars                Cigars   5,612,867,329
## 4     Loose Tobacco Total Loose Tobacco Cigarette Equivalents   8,291,276,800
## 5     Loose Tobacco Total Loose Tobacco                Pounds      16,841,656
## 6            Cigars        Small Cigars                Cigars   2,243,135,044
##          Imports           Total Domestic.Per.Capita Imports.Per.Capita
## 1         91,965      45,594,121               0.217                  0
## 2 12,319,663,000 435,570,018,675               2,018                 59
## 3    548,243,000   6,161,110,329                  27                  3
## 4    702,741,662   8,994,018,462                  40                  3
## 5      1,427,444      18,269,100                   0                  0
## 6     36,049,000   2,279,184,044                  11                  0
##   Total.Per.Capita
## 1            0.217
## 2            2,076
## 3               29
## 4               43
## 5                0
## 6               11
df <- read.csv("adultTobaccoUseUS.csv")

The percentage of Population using Combustible and Non-Combustilbe Tobacco

By looking at this graph we can conclude that pipe tobacco,Roll your own tobacco, Total loose tobacco is the most used form of tobacco products.

ggplot(df, aes(x = Topic, fill = Submeasure)) +
  geom_bar(position = "dodge", alpha = 0.8) +
  scale_y_continuous(labels = function(x) paste0(x/1, "%")) +
  labs(x = "Topic", y = "Population Percentage", title = "Percentage of Tobacco being used")

Using dplyr to filter the data to only include rows where Year is greater than or equal to 2020 and call it Recent data

Recent_data <- adultTobaccoUseUS %>% 
    filter(Year >= 2020)