Data drawn from the National Health Interview Survey (NHIS), a survey conducted annually since 1997 by the National Institutes of Health. The smoking behaviors of the general population will be explored.
Load the packages necessary to (1)import, (2)manipulate, and (3)visualize data.
library(readr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)#notgapminder
## Warning: package 'ggplot2' was built under R version 4.0.5
Import your data into R
mydata <- read.csv("C:/Users/12055/Documents/R/RPubs/NHIS Smoking Behavior.csv")
#preview data
head(mydata)
## year Behav_EverSmokeCigs_B Behav_CigsPerDay_N MentalHealth_MentalIllnessK6_C
## 1 1997 0 0 Low Risk
## 2 1997 0 0 <NA>
## 3 1997 1 5 Low Risk
## 4 1997 0 0 Low Risk
## 5 1997 0 0 Low Risk
## 6 1997 1 0 MMD
names(mydata)
## [1] "year" "Behav_EverSmokeCigs_B"
## [3] "Behav_CigsPerDay_N" "MentalHealth_MentalIllnessK6_C"
mydata%>%
select(Behav_CigsPerDay_N,year)%>%
rename(NumCigs = Behav_CigsPerDay_N)%>%
filter(year > 1997)%>%
summarize(AvgNumCigs = mean(NumCigs))
## AvgNumCigs
## 1 2.708063
mydata%>%
select(Behav_CigsPerDay_N,year)%>%
rename(NumCigs = Behav_CigsPerDay_N)%>%
filter(year > 1997)%>%
group_by(year)%>%
summarize(AvgNumCigs = mean(NumCigs))
## # A tibble: 19 x 2
## year AvgNumCigs
## <int> <dbl>
## 1 1998 3.83
## 2 1999 3.57
## 3 2000 3.48
## 4 2001 3.43
## 5 2002 3.29
## 6 2003 3.09
## 7 2004 2.92
## 8 2005 2.88
## 9 2006 2.74
## 10 2007 2.47
## 11 2008 2.67
## 12 2009 2.49
## 13 2010 2.32
## 14 2011 2.32
## 15 2012 2.23
## 16 2013 2.08
## 17 2014 1.99
## 18 2015 1.87
## 19 2016 1.95
The average number of cigarettes smoked daily declined each year between 1998 and 2007.
mydata%>%
select(Behav_CigsPerDay_N,year)%>%
rename(NumCigs = Behav_CigsPerDay_N)%>%
filter(year > 1997)%>%
group_by(year)%>%
summarize(AvgNumCigs = mean(NumCigs))%>%
ggplot()+
geom_line(aes(x=year, y=AvgNumCigs, color=AvgNumCigs))