Introduction

Research Question

Is there a pattern over the years (2002-2018) for alcohol and tobacco use among young adults (18-25) in Maryland

The data set I am using is titled “Drugs,” which comes from the https://corgis-edu.github.io/corgis/csv/drugs/ . It includes data about substance use and abuse, such as alcohol, tobacco, marijuana, and cocaine across different U.S. states and age groups. The data originates from the National Survey on Drug Use and Health (NSDUH), collected from 2002 to 2018. Each record represents a specific U.S. state and year, providing both the total number of users (in thousand) and rates (percentage of the population) for each substance and age group. In this data set, there are 867 observations and 53 variables.

For this project, I focused on four specific columns from the data set:

year state Rates.Alcohol.Use Past Month.18-25 Rates.Tobacco.Use Past Month.18-25

The Rates.Alcohol.Use Past Month.18-25 and Rates.Tobacco.Use Past Month.18-25 columns show the percentage of young adults (ages 18-25) in each state who reported using alcohol or tobacco in the past month. The year is showing from 2002 to 2018 and the state shows the states in U.S.By comparing these four variables, I can explore is there is a pattern between use of alcohol and tobacco among young adults (18-25) in Maryland state over the years.

Data Analysis

For this project, I wanted to see if there is a pattern between alcohol and tobacco use among young adults (18-25) in Maryland over the years.I started by cleaning my data set to make sure it was easy to work with. I fixed the column names, removed missing values, and kept only the four variables I needed. The rates of alcohol use, tobacco use in the past month for the 18-25 age group, state and year.

And then I group the data set by state and year. After that, I summarize the average of alcohol use and tobacco use for only Maryland state. Next, I calculated the maximum average for alcohol use and tobacco use. After that, I wanted to know which year was the maximum alcohol and tobacco use happened. Finally I got the summary table for alcohol and tobacco use and I did a line-point visualization.

Load the libraries

library(tidyverse)

Load the data set

drugs <- read_csv("drugs.csv")

To look at the data type and first 6 rows

str(drugs)
## spc_tbl_ [867 × 53] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ State                                            : chr [1:867] "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ Year                                             : num [1:867] 2002 2002 2002 2002 2002 ...
##  $ Population.12-17                                 : num [1:867] 380805 69400 485521 232986 3140739 ...
##  $ Population.18-25                                 : num [1:867] 499453 62791 602265 302029 3919577 ...
##  $ Population.26+                                   : num [1:867] 2812905 368460 3329482 1687337 21392421 ...
##  $ Totals.Alcohol.Use Disorder Past Year.12-17      : num [1:867] 18 4 36 14 173 26 16 4 1 73 ...
##  $ Totals.Alcohol.Use Disorder Past Year.18-25      : num [1:867] 68 12 117 53 581 102 61 16 12 266 ...
##  $ Totals.Alcohol.Use Disorder Past Year.26+        : num [1:867] 138 27 258 101 1298 ...
##  $ Rates.Alcohol.Use Disorder Past Year.12-17       : num [1:867] 48.3 61.5 73.8 61.5 55.1 ...
##  $ Rates.Alcohol.Use Disorder Past Year.18-25       : num [1:867] 136 188 194 176 148 ...
##  $ Rates.Alcohol.Use Disorder Past Year.26+         : num [1:867] 49.1 73.7 77.5 59.8 60.7 ...
##  $ Totals.Alcohol.Use Past Month.12-17              : num [1:867] 57 11 91 39 480 80 56 12 4 234 ...
##  $ Totals.Alcohol.Use Past Month.18-25              : num [1:867] 254 38 352 162 2176 ...
##  $ Totals.Alcohol.Use Past Month.26+                : num [1:867] 1048 206 1774 664 11847 ...
##  $ Rates.Alcohol.Use Past Month.12-17               : num [1:867] 150 159 187 167 153 ...
##  $ Rates.Alcohol.Use Past Month.18-25               : num [1:867] 510 598 585 538 555 ...
##  $ Rates.Alcohol.Use Past Month.26+                 : num [1:867] 373 559 533 393 554 ...
##  $ Totals.Tobacco.Cigarette Past Month.12-17        : num [1:867] 52 9 62 37 235 53 40 9 2 165 ...
##  $ Totals.Tobacco.Cigarette Past Month.18-25        : num [1:867] 196 28 234 154 1245 ...
##  $ Totals.Tobacco.Cigarette Past Month.26+          : num [1:867] 728 92 919 539 4028 ...
##  $ Rates.Tobacco.Cigarette Past Month.12-17         : num [1:867] 136.9 132.5 128.4 160.5 74.8 ...
##  $ Rates.Tobacco.Cigarette Past Month.18-25         : num [1:867] 392 440 388 510 318 ...
##  $ Rates.Tobacco.Cigarette Past Month.26+           : num [1:867] 259 250 276 319 188 ...
##  $ Totals.Illicit Drugs.Cocaine Used Past Year.12-17: num [1:867] 6 2 16 4 53 10 5 1 0 26 ...
##  $ Totals.Illicit Drugs.Cocaine Used Past Year.18-25: num [1:867] 27 5 51 18 259 51 23 7 3 108 ...
##  $ Totals.Illicit Drugs.Cocaine Used Past Year.26+  : num [1:867] 49 5 86 26 410 83 33 11 14 220 ...
##  $ Rates.Illicit Drugs.Cocaine Used Past Year.12-17 : num [1:867] 16.6 24.4 32.4 18.6 17 ...
##  $ Rates.Illicit Drugs.Cocaine Used Past Year.18-25 : num [1:867] 54.9 83.7 85.3 60.5 66.1 ...
##  $ Rates.Illicit Drugs.Cocaine Used Past Year.26+   : num [1:867] 17.5 13.8 25.7 15.2 19.2 ...
##  $ Totals.Marijuana.New Users.12-17                 : num [1:867] 20 4 25 13 158 23 20 4 2 77 ...
##  $ Totals.Marijuana.New Users.18-25                 : num [1:867] 18 2 18 10 126 16 11 3 3 52 ...
##  $ Totals.Marijuana.New Users.26+                   : num [1:867] 2 0 3 1 17 3 2 0 0 8 ...
##  $ Rates.Marijuana.New Users.12-17                  : num [1:867] 59.7 77.7 64.2 66.7 58.8 ...
##  $ Rates.Marijuana.New Users.18-25                  : num [1:867] 62.3 84.2 60.1 71.1 62.7 ...
##  $ Rates.Marijuana.New Users.26+                    : num [1:867] 0.914 1.625 1.364 0.991 1.437 ...
##  $ Totals.Marijuana.Used Past Month.12-17           : num [1:867] 24 8 38 19 241 38 27 6 2 115 ...
##  $ Totals.Marijuana.Used Past Month.18-25           : num [1:867] 62 15 91 50 631 107 76 19 18 279 ...
##  $ Totals.Marijuana.Used Past Month.26+             : num [1:867] 73 26 122 57 978 168 94 20 26 526 ...
##  $ Rates.Marijuana.Used Past Month.12-17            : num [1:867] 63.7 110.8 77.4 79.7 76.6 ...
##  $ Rates.Marijuana.Used Past Month.18-25            : num [1:867] 125 240 152 165 161 ...
##  $ Rates.Marijuana.Used Past Month.26+              : num [1:867] 26 71.4 36.7 33.9 45.7 ...
##  $ Totals.Marijuana.Used Past Year.12-17            : num [1:867] 49 13 82 37 443 75 52 12 5 217 ...
##  $ Totals.Marijuana.Used Past Year.18-25            : num [1:867] 119 24 166 87 1109 ...
##  $ Totals.Marijuana.Used Past Year.26+              : num [1:867] 141 46 215 104 1670 300 158 36 40 900 ...
##  $ Rates.Marijuana.Used Past Year.12-17             : num [1:867] 128 189 170 158 141 ...
##  $ Rates.Marijuana.Used Past Year.18-25             : num [1:867] 238 389 275 289 283 ...
##  $ Rates.Marijuana.Used Past Year.26+               : num [1:867] 50.3 124.6 64.6 61.5 78.1 ...
##  $ Totals.Tobacco.Use Past Month.12-17              : num [1:867] 63 11 73 46 290 67 45 11 3 193 ...
##  $ Totals.Tobacco.Use Past Month.18-25              : num [1:867] 226 30 240 169 1377 ...
##  $ Totals.Tobacco.Use Past Month.26+                : num [1:867] 930 112 1032 660 4721 ...
##  $ Rates.Tobacco.Use Past Month.12-17               : num [1:867] 166.6 163.9 151.1 195.7 92.2 ...
##  $ Rates.Tobacco.Use Past Month.18-25               : num [1:867] 452 484 398 559 351 ...
##  $ Rates.Tobacco.Use Past Month.26+                 : num [1:867] 331 304 310 391 221 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   State = col_character(),
##   ..   Year = col_double(),
##   ..   `Population.12-17` = col_double(),
##   ..   `Population.18-25` = col_double(),
##   ..   `Population.26+` = col_double(),
##   ..   `Totals.Alcohol.Use Disorder Past Year.12-17` = col_double(),
##   ..   `Totals.Alcohol.Use Disorder Past Year.18-25` = col_double(),
##   ..   `Totals.Alcohol.Use Disorder Past Year.26+` = col_double(),
##   ..   `Rates.Alcohol.Use Disorder Past Year.12-17` = col_double(),
##   ..   `Rates.Alcohol.Use Disorder Past Year.18-25` = col_double(),
##   ..   `Rates.Alcohol.Use Disorder Past Year.26+` = col_double(),
##   ..   `Totals.Alcohol.Use Past Month.12-17` = col_double(),
##   ..   `Totals.Alcohol.Use Past Month.18-25` = col_double(),
##   ..   `Totals.Alcohol.Use Past Month.26+` = col_double(),
##   ..   `Rates.Alcohol.Use Past Month.12-17` = col_double(),
##   ..   `Rates.Alcohol.Use Past Month.18-25` = col_double(),
##   ..   `Rates.Alcohol.Use Past Month.26+` = col_double(),
##   ..   `Totals.Tobacco.Cigarette Past Month.12-17` = col_double(),
##   ..   `Totals.Tobacco.Cigarette Past Month.18-25` = col_double(),
##   ..   `Totals.Tobacco.Cigarette Past Month.26+` = col_double(),
##   ..   `Rates.Tobacco.Cigarette Past Month.12-17` = col_double(),
##   ..   `Rates.Tobacco.Cigarette Past Month.18-25` = col_double(),
##   ..   `Rates.Tobacco.Cigarette Past Month.26+` = col_double(),
##   ..   `Totals.Illicit Drugs.Cocaine Used Past Year.12-17` = col_double(),
##   ..   `Totals.Illicit Drugs.Cocaine Used Past Year.18-25` = col_double(),
##   ..   `Totals.Illicit Drugs.Cocaine Used Past Year.26+` = col_double(),
##   ..   `Rates.Illicit Drugs.Cocaine Used Past Year.12-17` = col_double(),
##   ..   `Rates.Illicit Drugs.Cocaine Used Past Year.18-25` = col_double(),
##   ..   `Rates.Illicit Drugs.Cocaine Used Past Year.26+` = col_double(),
##   ..   `Totals.Marijuana.New Users.12-17` = col_double(),
##   ..   `Totals.Marijuana.New Users.18-25` = col_double(),
##   ..   `Totals.Marijuana.New Users.26+` = col_double(),
##   ..   `Rates.Marijuana.New Users.12-17` = col_double(),
##   ..   `Rates.Marijuana.New Users.18-25` = col_double(),
##   ..   `Rates.Marijuana.New Users.26+` = col_double(),
##   ..   `Totals.Marijuana.Used Past Month.12-17` = col_double(),
##   ..   `Totals.Marijuana.Used Past Month.18-25` = col_double(),
##   ..   `Totals.Marijuana.Used Past Month.26+` = col_double(),
##   ..   `Rates.Marijuana.Used Past Month.12-17` = col_double(),
##   ..   `Rates.Marijuana.Used Past Month.18-25` = col_double(),
##   ..   `Rates.Marijuana.Used Past Month.26+` = col_double(),
##   ..   `Totals.Marijuana.Used Past Year.12-17` = col_double(),
##   ..   `Totals.Marijuana.Used Past Year.18-25` = col_double(),
##   ..   `Totals.Marijuana.Used Past Year.26+` = col_double(),
##   ..   `Rates.Marijuana.Used Past Year.12-17` = col_double(),
##   ..   `Rates.Marijuana.Used Past Year.18-25` = col_double(),
##   ..   `Rates.Marijuana.Used Past Year.26+` = col_double(),
##   ..   `Totals.Tobacco.Use Past Month.12-17` = col_double(),
##   ..   `Totals.Tobacco.Use Past Month.18-25` = col_double(),
##   ..   `Totals.Tobacco.Use Past Month.26+` = col_double(),
##   ..   `Rates.Tobacco.Use Past Month.12-17` = col_double(),
##   ..   `Rates.Tobacco.Use Past Month.18-25` = col_double(),
##   ..   `Rates.Tobacco.Use Past Month.26+` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
head(drugs)
## # A tibble: 6 × 53
##   State       Year `Population.12-17` `Population.18-25` `Population.26+`
##   <chr>      <dbl>              <dbl>              <dbl>            <dbl>
## 1 Alabama     2002             380805             499453          2812905
## 2 Alaska      2002              69400              62791           368460
## 3 Arizona     2002             485521             602265          3329482
## 4 Arkansas    2002             232986             302029          1687337
## 5 California  2002            3140739            3919577         21392421
## 6 Colorado    2002             385648             493921          2798960
## # ℹ 48 more variables: `Totals.Alcohol.Use Disorder Past Year.12-17` <dbl>,
## #   `Totals.Alcohol.Use Disorder Past Year.18-25` <dbl>,
## #   `Totals.Alcohol.Use Disorder Past Year.26+` <dbl>,
## #   `Rates.Alcohol.Use Disorder Past Year.12-17` <dbl>,
## #   `Rates.Alcohol.Use Disorder Past Year.18-25` <dbl>,
## #   `Rates.Alcohol.Use Disorder Past Year.26+` <dbl>,
## #   `Totals.Alcohol.Use Past Month.12-17` <dbl>, …

Cleaning data set

names(drugs) <- gsub("[.]","_", names(drugs)) ##convert . to _ in columns names
names(drugs) <- gsub("_$","", names(drugs)) ##remove last underscore and leave nothing
names(drugs) <- tolower(names(drugs)) ##to convert column names into lowercase

head(drugs) ##to verify 
## # A tibble: 6 × 53
##   state       year `population_12-17` `population_18-25` `population_26+`
##   <chr>      <dbl>              <dbl>              <dbl>            <dbl>
## 1 Alabama     2002             380805             499453          2812905
## 2 Alaska      2002              69400              62791           368460
## 3 Arizona     2002             485521             602265          3329482
## 4 Arkansas    2002             232986             302029          1687337
## 5 California  2002            3140739            3919577         21392421
## 6 Colorado    2002             385648             493921          2798960
## # ℹ 48 more variables: `totals_alcohol_use disorder past year_12-17` <dbl>,
## #   `totals_alcohol_use disorder past year_18-25` <dbl>,
## #   `totals_alcohol_use disorder past year_26+` <dbl>,
## #   `rates_alcohol_use disorder past year_12-17` <dbl>,
## #   `rates_alcohol_use disorder past year_18-25` <dbl>,
## #   `rates_alcohol_use disorder past year_26+` <dbl>,
## #   `totals_alcohol_use past month_12-17` <dbl>, …

Handling NAs

young_adults <- drugs |>
  filter(!is.na(`rates_alcohol_use past month_18-25`) & !is.na(`rates_tobacco_use past month_18-25`))

I saw that there are no NAs because the drugs observation is the same as the young adults.

Selecting columns I need for my research question and take the mean for alcohol use and tobacco use

young_adults_selected <- young_adults |>
  group_by(state,year) |>
  summarise(alcohol_rate = mean(`rates_alcohol_use past month_18-25`),tobacco_rate=mean(`rates_tobacco_use past month_18-25`))|>
  filter(state=="Maryland")
## `summarise()` has grouped output by 'state'. You can override using the
## `.groups` argument.
young_adults_selected
## # A tibble: 17 × 4
## # Groups:   state [1]
##    state     year alcohol_rate tobacco_rate
##    <chr>    <dbl>        <dbl>        <dbl>
##  1 Maryland  2002         642.         420.
##  2 Maryland  2003         641.         419.
##  3 Maryland  2004         629.         400.
##  4 Maryland  2005         630.         395.
##  5 Maryland  2006         659.         386.
##  6 Maryland  2007         659.         367.
##  7 Maryland  2008         617.         357.
##  8 Maryland  2009         591.         359.
##  9 Maryland  2010         643.         359.
## 10 Maryland  2011         661.         363.
## 11 Maryland  2012         649.         351.
## 12 Maryland  2013         634.         327.
## 13 Maryland  2014         626.         340.
## 14 Maryland  2015         623.         320.
## 15 Maryland  2016         591.         261.
## 16 Maryland  2017         597.         245.
## 17 Maryland  2018         574.         243.

Calculating the maximum rate

max_alcohol =  max(young_adults_selected$alcohol_rate)
max_tobacco =  max(young_adults_selected$tobacco_rate)

max_alcohol
## [1] 660.849
max_tobacco
## [1] 420.454

Looking what year the maximum alcohol and tobacco use happened

max_year_alcohol = young_adults_selected$year[which(young_adults_selected$alcohol_rate==max_alcohol)]
max_year_tobacco = young_adults_selected$year[which(young_adults_selected$tobacco_rate==max_tobacco)]

max_year_alcohol
## [1] 2011
max_year_tobacco
## [1] 2002

Summary Table

young_adults_summary<- summary(young_adults_selected[c("alcohol_rate","tobacco_rate")])

young_adults_summary
##   alcohol_rate    tobacco_rate  
##  Min.   :574.4   Min.   :243.0  
##  1st Qu.:617.4   1st Qu.:326.6  
##  Median :629.6   Median :359.2  
##  Mean   :627.4   Mean   :347.8  
##  3rd Qu.:643.2   3rd Qu.:386.2  
##  Max.   :660.8   Max.   :420.5

Visualization

ggplot(young_adults_selected,aes(x =year)) + 
  geom_line(aes(y = alcohol_rate, color = "Alcohol Use"), size = 1.2) +
  geom_line(aes(y = tobacco_rate, color = "Tobacco Use"), size = 1.2) +
  geom_point(aes(y = alcohol_rate, color = "Alcohol Use"), size = 2) +
  geom_point(aes(y = tobacco_rate, color = "Tobacco Use"), size = 2) +
  labs(title = "Alcohol and Tobacco Use Among Young Adults (18-25) in Maryland",
       x = "Year",y= "Use Rate (Past Month, Ages 18-25)", color = "Substance") +
  theme_light() +
  scale_color_brewer(palette = "Set1")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Conclusion

From my analysis, I found that alcohol use among young adults (ages 18-25) in Maryland peaked in 2011, while tobacco use was highest in 2002. These results make sense when looking at what was happening socially and politically during those times.

In 2002, tobacco use was still common among young adults because anti-smoking laws and public awareness campaigns were just starting to take effect. Maryland did not fully implement strict smoking regulations until later in the 2000s. Programs like the Maryland Tobacco Quitline and national campaigns by the CDC gained traction a few years afterward, which likely helped reduce smoking rates after 2002.

In 2011, alcohol use reached its highest point among young adults. Around this time, binge drinking, drinking large amounts of alcohol in a short period was a major public health concern. According to the CDC, binge drinking was especially common among 18-25 year old, particularly in college settings. The years following the 2008 economic recession were also stressful for many young people, with unemployment and lifestyle changes possibly leading to more alcohol consumption as a coping behavior or social outlet.

Overall, these findings show how social events, laws, and cultural behaviors can influence substance use. In the future, I would like to explore whether these patterns changed after 2018 or compare Maryland to other states to see if similar trend appear.

Citation

https://www.statology.org/summary-function-in-r/ <- I used this to get the summary table

https://www.cdc.gov/mmwr/preview/mmwrhtml/su6203a13.htm#:~:text=In%202011%2C%20the%20overall%20prevalence,during%20the%20past%2030%20days. <- Alcohol use

https://www.lung.org/research/sotc/tobacco-timeline <- Tobacco use

https://health.maryland.gov/phpa/ohpetup/pages/tob_quit.aspx <- Maryland Tobacco Quitline

https://www.geeksforgeeks.org/r-language/how-to-connect-paired-points-with-lines-in-scatterplot-in-ggplot2-in-r/ <- For the visualization