Is there a pattern over the years (2002-2018) for alcohol and tobacco use among young adults (18-25) in Maryland
The data set I am using is titled “Drugs,” which comes from the https://corgis-edu.github.io/corgis/csv/drugs/ . It includes data about substance use and abuse, such as alcohol, tobacco, marijuana, and cocaine across different U.S. states and age groups. The data originates from the National Survey on Drug Use and Health (NSDUH), collected from 2002 to 2018. Each record represents a specific U.S. state and year, providing both the total number of users (in thousand) and rates (percentage of the population) for each substance and age group. In this data set, there are 867 observations and 53 variables.
For this project, I focused on four specific columns from the data set:
year state Rates.Alcohol.Use Past Month.18-25 Rates.Tobacco.Use Past Month.18-25
The Rates.Alcohol.Use Past Month.18-25 and Rates.Tobacco.Use Past Month.18-25 columns show the percentage of young adults (ages 18-25) in each state who reported using alcohol or tobacco in the past month. The year is showing from 2002 to 2018 and the state shows the states in U.S.By comparing these four variables, I can explore is there is a pattern between use of alcohol and tobacco among young adults (18-25) in Maryland state over the years.
For this project, I wanted to see if there is a pattern between alcohol and tobacco use among young adults (18-25) in Maryland over the years.I started by cleaning my data set to make sure it was easy to work with. I fixed the column names, removed missing values, and kept only the four variables I needed. The rates of alcohol use, tobacco use in the past month for the 18-25 age group, state and year.
And then I group the data set by state and year. After that, I summarize the average of alcohol use and tobacco use for only Maryland state. Next, I calculated the maximum average for alcohol use and tobacco use. After that, I wanted to know which year was the maximum alcohol and tobacco use happened. Finally I got the summary table for alcohol and tobacco use and I did a line-point visualization.
library(tidyverse)
drugs <- read_csv("drugs.csv")
str(drugs)
## spc_tbl_ [867 × 53] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ State : chr [1:867] "Alabama" "Alaska" "Arizona" "Arkansas" ...
## $ Year : num [1:867] 2002 2002 2002 2002 2002 ...
## $ Population.12-17 : num [1:867] 380805 69400 485521 232986 3140739 ...
## $ Population.18-25 : num [1:867] 499453 62791 602265 302029 3919577 ...
## $ Population.26+ : num [1:867] 2812905 368460 3329482 1687337 21392421 ...
## $ Totals.Alcohol.Use Disorder Past Year.12-17 : num [1:867] 18 4 36 14 173 26 16 4 1 73 ...
## $ Totals.Alcohol.Use Disorder Past Year.18-25 : num [1:867] 68 12 117 53 581 102 61 16 12 266 ...
## $ Totals.Alcohol.Use Disorder Past Year.26+ : num [1:867] 138 27 258 101 1298 ...
## $ Rates.Alcohol.Use Disorder Past Year.12-17 : num [1:867] 48.3 61.5 73.8 61.5 55.1 ...
## $ Rates.Alcohol.Use Disorder Past Year.18-25 : num [1:867] 136 188 194 176 148 ...
## $ Rates.Alcohol.Use Disorder Past Year.26+ : num [1:867] 49.1 73.7 77.5 59.8 60.7 ...
## $ Totals.Alcohol.Use Past Month.12-17 : num [1:867] 57 11 91 39 480 80 56 12 4 234 ...
## $ Totals.Alcohol.Use Past Month.18-25 : num [1:867] 254 38 352 162 2176 ...
## $ Totals.Alcohol.Use Past Month.26+ : num [1:867] 1048 206 1774 664 11847 ...
## $ Rates.Alcohol.Use Past Month.12-17 : num [1:867] 150 159 187 167 153 ...
## $ Rates.Alcohol.Use Past Month.18-25 : num [1:867] 510 598 585 538 555 ...
## $ Rates.Alcohol.Use Past Month.26+ : num [1:867] 373 559 533 393 554 ...
## $ Totals.Tobacco.Cigarette Past Month.12-17 : num [1:867] 52 9 62 37 235 53 40 9 2 165 ...
## $ Totals.Tobacco.Cigarette Past Month.18-25 : num [1:867] 196 28 234 154 1245 ...
## $ Totals.Tobacco.Cigarette Past Month.26+ : num [1:867] 728 92 919 539 4028 ...
## $ Rates.Tobacco.Cigarette Past Month.12-17 : num [1:867] 136.9 132.5 128.4 160.5 74.8 ...
## $ Rates.Tobacco.Cigarette Past Month.18-25 : num [1:867] 392 440 388 510 318 ...
## $ Rates.Tobacco.Cigarette Past Month.26+ : num [1:867] 259 250 276 319 188 ...
## $ Totals.Illicit Drugs.Cocaine Used Past Year.12-17: num [1:867] 6 2 16 4 53 10 5 1 0 26 ...
## $ Totals.Illicit Drugs.Cocaine Used Past Year.18-25: num [1:867] 27 5 51 18 259 51 23 7 3 108 ...
## $ Totals.Illicit Drugs.Cocaine Used Past Year.26+ : num [1:867] 49 5 86 26 410 83 33 11 14 220 ...
## $ Rates.Illicit Drugs.Cocaine Used Past Year.12-17 : num [1:867] 16.6 24.4 32.4 18.6 17 ...
## $ Rates.Illicit Drugs.Cocaine Used Past Year.18-25 : num [1:867] 54.9 83.7 85.3 60.5 66.1 ...
## $ Rates.Illicit Drugs.Cocaine Used Past Year.26+ : num [1:867] 17.5 13.8 25.7 15.2 19.2 ...
## $ Totals.Marijuana.New Users.12-17 : num [1:867] 20 4 25 13 158 23 20 4 2 77 ...
## $ Totals.Marijuana.New Users.18-25 : num [1:867] 18 2 18 10 126 16 11 3 3 52 ...
## $ Totals.Marijuana.New Users.26+ : num [1:867] 2 0 3 1 17 3 2 0 0 8 ...
## $ Rates.Marijuana.New Users.12-17 : num [1:867] 59.7 77.7 64.2 66.7 58.8 ...
## $ Rates.Marijuana.New Users.18-25 : num [1:867] 62.3 84.2 60.1 71.1 62.7 ...
## $ Rates.Marijuana.New Users.26+ : num [1:867] 0.914 1.625 1.364 0.991 1.437 ...
## $ Totals.Marijuana.Used Past Month.12-17 : num [1:867] 24 8 38 19 241 38 27 6 2 115 ...
## $ Totals.Marijuana.Used Past Month.18-25 : num [1:867] 62 15 91 50 631 107 76 19 18 279 ...
## $ Totals.Marijuana.Used Past Month.26+ : num [1:867] 73 26 122 57 978 168 94 20 26 526 ...
## $ Rates.Marijuana.Used Past Month.12-17 : num [1:867] 63.7 110.8 77.4 79.7 76.6 ...
## $ Rates.Marijuana.Used Past Month.18-25 : num [1:867] 125 240 152 165 161 ...
## $ Rates.Marijuana.Used Past Month.26+ : num [1:867] 26 71.4 36.7 33.9 45.7 ...
## $ Totals.Marijuana.Used Past Year.12-17 : num [1:867] 49 13 82 37 443 75 52 12 5 217 ...
## $ Totals.Marijuana.Used Past Year.18-25 : num [1:867] 119 24 166 87 1109 ...
## $ Totals.Marijuana.Used Past Year.26+ : num [1:867] 141 46 215 104 1670 300 158 36 40 900 ...
## $ Rates.Marijuana.Used Past Year.12-17 : num [1:867] 128 189 170 158 141 ...
## $ Rates.Marijuana.Used Past Year.18-25 : num [1:867] 238 389 275 289 283 ...
## $ Rates.Marijuana.Used Past Year.26+ : num [1:867] 50.3 124.6 64.6 61.5 78.1 ...
## $ Totals.Tobacco.Use Past Month.12-17 : num [1:867] 63 11 73 46 290 67 45 11 3 193 ...
## $ Totals.Tobacco.Use Past Month.18-25 : num [1:867] 226 30 240 169 1377 ...
## $ Totals.Tobacco.Use Past Month.26+ : num [1:867] 930 112 1032 660 4721 ...
## $ Rates.Tobacco.Use Past Month.12-17 : num [1:867] 166.6 163.9 151.1 195.7 92.2 ...
## $ Rates.Tobacco.Use Past Month.18-25 : num [1:867] 452 484 398 559 351 ...
## $ Rates.Tobacco.Use Past Month.26+ : num [1:867] 331 304 310 391 221 ...
## - attr(*, "spec")=
## .. cols(
## .. State = col_character(),
## .. Year = col_double(),
## .. `Population.12-17` = col_double(),
## .. `Population.18-25` = col_double(),
## .. `Population.26+` = col_double(),
## .. `Totals.Alcohol.Use Disorder Past Year.12-17` = col_double(),
## .. `Totals.Alcohol.Use Disorder Past Year.18-25` = col_double(),
## .. `Totals.Alcohol.Use Disorder Past Year.26+` = col_double(),
## .. `Rates.Alcohol.Use Disorder Past Year.12-17` = col_double(),
## .. `Rates.Alcohol.Use Disorder Past Year.18-25` = col_double(),
## .. `Rates.Alcohol.Use Disorder Past Year.26+` = col_double(),
## .. `Totals.Alcohol.Use Past Month.12-17` = col_double(),
## .. `Totals.Alcohol.Use Past Month.18-25` = col_double(),
## .. `Totals.Alcohol.Use Past Month.26+` = col_double(),
## .. `Rates.Alcohol.Use Past Month.12-17` = col_double(),
## .. `Rates.Alcohol.Use Past Month.18-25` = col_double(),
## .. `Rates.Alcohol.Use Past Month.26+` = col_double(),
## .. `Totals.Tobacco.Cigarette Past Month.12-17` = col_double(),
## .. `Totals.Tobacco.Cigarette Past Month.18-25` = col_double(),
## .. `Totals.Tobacco.Cigarette Past Month.26+` = col_double(),
## .. `Rates.Tobacco.Cigarette Past Month.12-17` = col_double(),
## .. `Rates.Tobacco.Cigarette Past Month.18-25` = col_double(),
## .. `Rates.Tobacco.Cigarette Past Month.26+` = col_double(),
## .. `Totals.Illicit Drugs.Cocaine Used Past Year.12-17` = col_double(),
## .. `Totals.Illicit Drugs.Cocaine Used Past Year.18-25` = col_double(),
## .. `Totals.Illicit Drugs.Cocaine Used Past Year.26+` = col_double(),
## .. `Rates.Illicit Drugs.Cocaine Used Past Year.12-17` = col_double(),
## .. `Rates.Illicit Drugs.Cocaine Used Past Year.18-25` = col_double(),
## .. `Rates.Illicit Drugs.Cocaine Used Past Year.26+` = col_double(),
## .. `Totals.Marijuana.New Users.12-17` = col_double(),
## .. `Totals.Marijuana.New Users.18-25` = col_double(),
## .. `Totals.Marijuana.New Users.26+` = col_double(),
## .. `Rates.Marijuana.New Users.12-17` = col_double(),
## .. `Rates.Marijuana.New Users.18-25` = col_double(),
## .. `Rates.Marijuana.New Users.26+` = col_double(),
## .. `Totals.Marijuana.Used Past Month.12-17` = col_double(),
## .. `Totals.Marijuana.Used Past Month.18-25` = col_double(),
## .. `Totals.Marijuana.Used Past Month.26+` = col_double(),
## .. `Rates.Marijuana.Used Past Month.12-17` = col_double(),
## .. `Rates.Marijuana.Used Past Month.18-25` = col_double(),
## .. `Rates.Marijuana.Used Past Month.26+` = col_double(),
## .. `Totals.Marijuana.Used Past Year.12-17` = col_double(),
## .. `Totals.Marijuana.Used Past Year.18-25` = col_double(),
## .. `Totals.Marijuana.Used Past Year.26+` = col_double(),
## .. `Rates.Marijuana.Used Past Year.12-17` = col_double(),
## .. `Rates.Marijuana.Used Past Year.18-25` = col_double(),
## .. `Rates.Marijuana.Used Past Year.26+` = col_double(),
## .. `Totals.Tobacco.Use Past Month.12-17` = col_double(),
## .. `Totals.Tobacco.Use Past Month.18-25` = col_double(),
## .. `Totals.Tobacco.Use Past Month.26+` = col_double(),
## .. `Rates.Tobacco.Use Past Month.12-17` = col_double(),
## .. `Rates.Tobacco.Use Past Month.18-25` = col_double(),
## .. `Rates.Tobacco.Use Past Month.26+` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
head(drugs)
## # A tibble: 6 × 53
## State Year `Population.12-17` `Population.18-25` `Population.26+`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Alabama 2002 380805 499453 2812905
## 2 Alaska 2002 69400 62791 368460
## 3 Arizona 2002 485521 602265 3329482
## 4 Arkansas 2002 232986 302029 1687337
## 5 California 2002 3140739 3919577 21392421
## 6 Colorado 2002 385648 493921 2798960
## # ℹ 48 more variables: `Totals.Alcohol.Use Disorder Past Year.12-17` <dbl>,
## # `Totals.Alcohol.Use Disorder Past Year.18-25` <dbl>,
## # `Totals.Alcohol.Use Disorder Past Year.26+` <dbl>,
## # `Rates.Alcohol.Use Disorder Past Year.12-17` <dbl>,
## # `Rates.Alcohol.Use Disorder Past Year.18-25` <dbl>,
## # `Rates.Alcohol.Use Disorder Past Year.26+` <dbl>,
## # `Totals.Alcohol.Use Past Month.12-17` <dbl>, …
names(drugs) <- gsub("[.]","_", names(drugs)) ##convert . to _ in columns names
names(drugs) <- gsub("_$","", names(drugs)) ##remove last underscore and leave nothing
names(drugs) <- tolower(names(drugs)) ##to convert column names into lowercase
head(drugs) ##to verify
## # A tibble: 6 × 53
## state year `population_12-17` `population_18-25` `population_26+`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Alabama 2002 380805 499453 2812905
## 2 Alaska 2002 69400 62791 368460
## 3 Arizona 2002 485521 602265 3329482
## 4 Arkansas 2002 232986 302029 1687337
## 5 California 2002 3140739 3919577 21392421
## 6 Colorado 2002 385648 493921 2798960
## # ℹ 48 more variables: `totals_alcohol_use disorder past year_12-17` <dbl>,
## # `totals_alcohol_use disorder past year_18-25` <dbl>,
## # `totals_alcohol_use disorder past year_26+` <dbl>,
## # `rates_alcohol_use disorder past year_12-17` <dbl>,
## # `rates_alcohol_use disorder past year_18-25` <dbl>,
## # `rates_alcohol_use disorder past year_26+` <dbl>,
## # `totals_alcohol_use past month_12-17` <dbl>, …
young_adults <- drugs |>
filter(!is.na(`rates_alcohol_use past month_18-25`) & !is.na(`rates_tobacco_use past month_18-25`))
I saw that there are no NAs because the drugs observation is the same as the young adults.
young_adults_selected <- young_adults |>
group_by(state,year) |>
summarise(alcohol_rate = mean(`rates_alcohol_use past month_18-25`),tobacco_rate=mean(`rates_tobacco_use past month_18-25`))|>
filter(state=="Maryland")
## `summarise()` has grouped output by 'state'. You can override using the
## `.groups` argument.
young_adults_selected
## # A tibble: 17 × 4
## # Groups: state [1]
## state year alcohol_rate tobacco_rate
## <chr> <dbl> <dbl> <dbl>
## 1 Maryland 2002 642. 420.
## 2 Maryland 2003 641. 419.
## 3 Maryland 2004 629. 400.
## 4 Maryland 2005 630. 395.
## 5 Maryland 2006 659. 386.
## 6 Maryland 2007 659. 367.
## 7 Maryland 2008 617. 357.
## 8 Maryland 2009 591. 359.
## 9 Maryland 2010 643. 359.
## 10 Maryland 2011 661. 363.
## 11 Maryland 2012 649. 351.
## 12 Maryland 2013 634. 327.
## 13 Maryland 2014 626. 340.
## 14 Maryland 2015 623. 320.
## 15 Maryland 2016 591. 261.
## 16 Maryland 2017 597. 245.
## 17 Maryland 2018 574. 243.
max_alcohol = max(young_adults_selected$alcohol_rate)
max_tobacco = max(young_adults_selected$tobacco_rate)
max_alcohol
## [1] 660.849
max_tobacco
## [1] 420.454
max_year_alcohol = young_adults_selected$year[which(young_adults_selected$alcohol_rate==max_alcohol)]
max_year_tobacco = young_adults_selected$year[which(young_adults_selected$tobacco_rate==max_tobacco)]
max_year_alcohol
## [1] 2011
max_year_tobacco
## [1] 2002
young_adults_summary<- summary(young_adults_selected[c("alcohol_rate","tobacco_rate")])
young_adults_summary
## alcohol_rate tobacco_rate
## Min. :574.4 Min. :243.0
## 1st Qu.:617.4 1st Qu.:326.6
## Median :629.6 Median :359.2
## Mean :627.4 Mean :347.8
## 3rd Qu.:643.2 3rd Qu.:386.2
## Max. :660.8 Max. :420.5
ggplot(young_adults_selected,aes(x =year)) +
geom_line(aes(y = alcohol_rate, color = "Alcohol Use"), size = 1.2) +
geom_line(aes(y = tobacco_rate, color = "Tobacco Use"), size = 1.2) +
geom_point(aes(y = alcohol_rate, color = "Alcohol Use"), size = 2) +
geom_point(aes(y = tobacco_rate, color = "Tobacco Use"), size = 2) +
labs(title = "Alcohol and Tobacco Use Among Young Adults (18-25) in Maryland",
x = "Year",y= "Use Rate (Past Month, Ages 18-25)", color = "Substance") +
theme_light() +
scale_color_brewer(palette = "Set1")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
From my analysis, I found that alcohol use among young adults (ages 18-25) in Maryland peaked in 2011, while tobacco use was highest in 2002. These results make sense when looking at what was happening socially and politically during those times.
In 2002, tobacco use was still common among young adults because anti-smoking laws and public awareness campaigns were just starting to take effect. Maryland did not fully implement strict smoking regulations until later in the 2000s. Programs like the Maryland Tobacco Quitline and national campaigns by the CDC gained traction a few years afterward, which likely helped reduce smoking rates after 2002.
In 2011, alcohol use reached its highest point among young adults. Around this time, binge drinking, drinking large amounts of alcohol in a short period was a major public health concern. According to the CDC, binge drinking was especially common among 18-25 year old, particularly in college settings. The years following the 2008 economic recession were also stressful for many young people, with unemployment and lifestyle changes possibly leading to more alcohol consumption as a coping behavior or social outlet.
Overall, these findings show how social events, laws, and cultural behaviors can influence substance use. In the future, I would like to explore whether these patterns changed after 2018 or compare Maryland to other states to see if similar trend appear.
https://www.statology.org/summary-function-in-r/ <- I used this to get the summary table
https://www.cdc.gov/mmwr/preview/mmwrhtml/su6203a13.htm#:~:text=In%202011%2C%20the%20overall%20prevalence,during%20the%20past%2030%20days. <- Alcohol use
https://www.lung.org/research/sotc/tobacco-timeline <- Tobacco use
https://health.maryland.gov/phpa/ohpetup/pages/tob_quit.aspx <- Maryland Tobacco Quitline
https://www.geeksforgeeks.org/r-language/how-to-connect-paired-points-with-lines-in-scatterplot-in-ggplot2-in-r/ <- For the visualization