Data Source

Data drawn from the National Health Interview Survey (NHIS), a survey conducted annually since 1997 by the National Institutes of Health. The smoking behaviors of the general population will be explored.

Load Packages

Load the packages necessary to (1)import, (2)manipulate, and (3)visualize data.

library(readr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)#notgapminder
## Warning: package 'ggplot2' was built under R version 4.0.5

Import Data

Import your data into R

mydata <- read.csv("C:/Users/12055/Documents/R/RPubs/NHIS Smoking Behavior.csv")

#preview data
head(mydata)
##   year Behav_EverSmokeCigs_B Behav_CigsPerDay_N MentalHealth_MentalIllnessK6_C
## 1 1997                     0                  0                       Low Risk
## 2 1997                     0                  0                           <NA>
## 3 1997                     1                  5                       Low Risk
## 4 1997                     0                  0                       Low Risk
## 5 1997                     0                  0                       Low Risk
## 6 1997                     1                  0                            MMD
names(mydata)
## [1] "year"                           "Behav_EverSmokeCigs_B"         
## [3] "Behav_CigsPerDay_N"             "MentalHealth_MentalIllnessK6_C"

Calculate Avg Daily Cigarettes

mydata%>%
  select(Behav_CigsPerDay_N,year)%>%
  rename(NumCigs = Behav_CigsPerDay_N)%>%
  filter(year > 1997)%>%
  summarize(AvgNumCigs = mean(NumCigs))
##   AvgNumCigs
## 1   2.708063

Calculate Avg Daily Cigarettes by Year

mydata%>%
  select(Behav_CigsPerDay_N,year)%>%
  rename(NumCigs = Behav_CigsPerDay_N)%>%
  filter(year > 1997)%>%
  group_by(year)%>%
  summarize(AvgNumCigs = mean(NumCigs))
## # A tibble: 19 x 2
##     year AvgNumCigs
##    <int>      <dbl>
##  1  1998       3.83
##  2  1999       3.57
##  3  2000       3.48
##  4  2001       3.43
##  5  2002       3.29
##  6  2003       3.09
##  7  2004       2.92
##  8  2005       2.88
##  9  2006       2.74
## 10  2007       2.47
## 11  2008       2.67
## 12  2009       2.49
## 13  2010       2.32
## 14  2011       2.32
## 15  2012       2.23
## 16  2013       2.08
## 17  2014       1.99
## 18  2015       1.87
## 19  2016       1.95

Interpretation

The average number of cigarettes smoked daily declined each year between 1998 and 2007.

Visualization

mydata%>%
  select(Behav_CigsPerDay_N,year)%>%
  rename(NumCigs = Behav_CigsPerDay_N)%>%
  filter(year > 1997)%>%
  group_by(year)%>%
  summarize(AvgNumCigs = mean(NumCigs))%>%
  ggplot()+
  geom_line(aes(x=year, y=AvgNumCigs, color=AvgNumCigs))