Questions:
1. What are the potential ways that one might exit a cohort before the end of the study date?
Some ways that participants might exit a cohort before the end of the study include: death, cessation of smoking, and emigration.
2. Why do we include the person-time contributed by individuals who are lost to follow-up?
We include the person-time contributed by individuals who are lost to follow-up in order accurately represent the dynamic nature of study populations in rate calculations.
3. Import the data.
# setup
setwd("/Users/shmoops/Desktop/UNC 2019-2020/FALL/EPID 600/")
# read in data
library(readxl)
data = read_xlsx("Data+and+Codebook+-+anti-smoking+advertisement+evaluation+data.xlsx")
# making headers lower case
colnames(data) = tolower(colnames(data))
# frequency of cohort demographic variables
library(dplyr)
a = tableone::CreateTableOne(vars = colnames(data), data = data, factorVars = c("sex", "race", "ethnicity", "ses", "education", "advert exposure", "smoking cessation", "ltf"))
# printing table one summary
print(summary(a))
##
## ### Summary of continuous variables ###
##
## strata: Overall
## n miss p.miss mean sd median p25 p75 min max skew kurt
## id 1000 0 0 500 289 500 251 750 1 1000 0.0 -1.20
## age 1000 0 0 47 13 48 39 57 5 78 -0.3 -0.61
## person days 1000 0 0 454 130 540 293 540 14 624 -1.1 -0.07
##
## =======================================================================================
##
## ### Summary of categorical variables ###
##
## strata: Overall
## var n miss p.miss level freq percent cum.percent
## sex 1000 0 0.0 0 355 35.5 35.5
## 1 645 64.5 100.0
##
## race 1000 0 0.0 1 685 68.5 68.5
## 2 230 23.0 91.5
## 3 10 1.0 92.5
## 4 25 2.5 95.0
## 5 50 5.0 100.0
##
## ethnicity 1000 0 0.0 0 775 77.5 77.5
## 1 225 22.5 100.0
##
## ses 1000 0 0.0 0 715 71.5 71.5
## 1 285 28.5 100.0
##
## education 1000 0 0.0 1 175 17.5 17.5
## 2 465 46.5 64.0
## 3 359 35.9 99.9
## 4 1 0.1 100.0
##
## advert exposure 1000 1 0.1 0 730 73.1 73.1
## 1 269 26.9 100.0
##
## smoking cessation 1000 0 0.0 0 912 91.2 91.2
## 1 88 8.8 100.0
##
## ltf 1000 0 0.0 0 681 68.1 68.1
## 1 319 31.9 100.0
##
## NULL
5. Using your data set, calculate the “risk” of smoking cessation in the cohort, both including and excluding those lost to follow up.
a) Including subjects that were lost to follow up:
# total cohort
(sum(data$`smoking cessation`)/nrow(data))*100
## [1] 8.8
b) Excluding those lost to follow up:
# Drop those who end dates are less than the max end date
followup_pop = data %>% filter(ltf == 0)
# total cohort
(sum(followup_pop$`smoking cessation`)/nrow(followup_pop))*100
## [1] 12.92217
6. Calculate the “rate” of smoking cessation in the total cohort.
# total cohort
(sum(data$`smoking cessation`)/(sum(data$`person days`))/365.25)
## [1] 5.309268e-07
7. Compare your calculated risk estimates
a) including and excluding those lost with each other
By excluding those lost to follow up from the risk calculation, the “risk” of smoking cessation in the population of interest increases (from 8.8% to 12.9% in the total cohort). This happens because those lost to follow up are not able to report smoking cessation, so they artificially decrease the risk of smoking cessation by making the denominator larger than it should be in the first calculation.
b) with your calculated rate. How are the estimates different from each other?
Compared to the risk calculations, the rate value is much smaller because its denominator represents the cumulative participitory time of all people in the study.
8. Calculate the risk for both those individuals that saw the advertisement and for those that did not. Include those lost to follow up in your calculations.
# risk in exposed pop (saw advertisments)
exposed_pop = data %>% filter(`advert exposure` == 1)
(sum(exposed_pop$`smoking cessation`)/nrow(exposed_pop))*100
## [1] 11.89591
# risk unexposed pop (did not see advertisement)
unexposed_pop = data %>% filter(`advert exposure` == 0)
(sum(unexposed_pop$`smoking cessation`)/nrow(unexposed_pop))*100
## [1] 7.671233
What would you do with the individual with missing data?
I will exclude the individual with missing data
data = data%>% filter(!is.na(`advert exposure`))
Does the advertisement seem to help increase smoking cessation?
The advertisement does seem to help increase smoking cessation.
9. What happens to the risk estimates and your conclusions in the previous question when you exclude those lost to follow up? With respect to smoking cessation, how were those that stayed in the study different from those that were lost to follow up?
# risk in exposed pop (saw advertisments)
exposed_pop = followup_pop %>% filter(`advert exposure` == 1)
(sum(exposed_pop$`smoking cessation`)/nrow(exposed_pop))*100
## [1] 18.82353
# risk unexposed pop (did not see advertisement)
unexposed_pop = followup_pop %>% filter(`advert exposure` == 0)
(sum(unexposed_pop$`smoking cessation`)/nrow(unexposed_pop))*100
## [1] 10.98039
The difference between the risk of smoking cessation in the exposed versus unexposed populations increases after excluding those lost to followup.
10. Calculate the rates of smoking cessation for the advertisement exposed and the non exposed. What would you conclude about the relationship between advertisement viewing and smoking cessation now?
# Redifining follow up pop
followup_pop = data %>% filter(ltf == 0)
# rate in exposed pop (saw advertisments)
exposed_pop = followup_pop %>% filter(`advert exposure` == 1)
(sum(exposed_pop$`smoking cessation`)/(sum(exposed_pop$`person days`))/365.25)
## [1] 9.560474e-07
# rate in unexposed pop (did not see advertisement)
unexposed_pop = followup_pop %>% filter(`advert exposure` == 0)
(sum(unexposed_pop$`smoking cessation`)/(sum(unexposed_pop$`person days`))/365.25)
## [1] 5.584744e-07
I would conclude that advertisement viewing has a direct relationship with smoking cessation based off of the differenes in cessation rates between the exposed and unexposed populations.