Past research has linked smoking cigarettes to poorer health outcomes across a range of domains, including cancer risk, birth defects, lung diseases, and other morbidities. However, past research has not focused as much on specific cognitive effects of smoking, particularly as related to recall of information learned while or just after smoking. This is a valuable area of research given that about 20% of the US population smokes, and it is common for smokers to smoke multiple cigarettes every day, often smoking between or at the same time as completing routine daily tasks. Understanding the effects of smoking on memory is important to supporting the health of aging smokers and to inform the development of prevention and tobacco-quitting programs. Using a quasi experimental design, this research seeks to understand the following question: Does smoking cigarettes at the same time as or briefly before a reading task results in more recall errors among smokers as compared to non-smokers completing the same task? It was hypothesized that 1. Nonsmokers would have a recall score of 45 and that 2. Non-smokers would have fewer recall errors than people who actively smoked during a reading-recall task.
Study Population
Participants included 87 healthy adults from the Boston-metropolitan area, recruited using flyers placed at several large area recreational centers. Of the 87 participants, 58 were self-identified regular smokers who smoked at least 1 pack of cigarettes per day for the past 12 months. The 29 non-smokers reported never having smoked cigarettes.
The participants were divided into 3 groups: Group A (nonsmokers), and Groups B and C (regular smokers). Regular smokers were randomized into groups A and B using a computer-generated random assignment in order to create a quasi-experimental design. All participants completed informed consent prior to beginning the study. One participant’s score was removed from Group A after data collection prior to analysis due to incorrect completion of the post intervention measure. The study was approved by the Institutional Review Board of Brandeis University.
Interventions
Participants in all groups read the same 1-page passage and were asked to recall it after 60 minutes elapsed. Participants in Group A included non smokers. Group B was composed of regular smokers who did not smoke on the day of the reading task prior to the task (?control), Group C was composed of smokers who were asked to smoke within 20 minutes before completing the reading task or while completing the reading task.
Measures
After 60 minutes elapsed, all participants completed the Reading Recall Task-Short Form (RRT-SF) asking about details of the passage. The task was administered by a trained psychologist from Brandeis University. Scores for the RRT-SF were calculated as the number of errors made on a continuous scale. The possible minimum value in this score is zero, meaning no errors in reading recall. The possible maximum value score value is 85, meaning all reading recall questions were answered correctly.
Analysis
One observation was removed from Group A (n=28 instead of the original n=29) prior to analysis because the post intervention measurement was completed incorrectly, resulting in an invalid, negative score. No other data points were removed prior to analysis. The final sample size was (overall n=86; Group A n=28, Group B=29, Group C=29).
In this quasi experimental design, the independent variable was smoking and the dependent variable was reading recall score. There were two hypotheses in this study. Hypothesis 1 stated that Group A would have a recall score of 45 (μGroupA=μGroupAHypothesized, or μGroupA=45), and Hypothesis 2 stated that Group A would have fewer recall errors than Group C (μGroupA≠μGroupC).The statistical analysis for Hypothesis 2 was based off the statistical null hypothesis, that μGroupA=μGroupC.
The alpha level for all hypothesis testing in this study is .05; this value was established prior to data collection. If p-value > .05, results support the null. If p-value < .05, results fail to support the null and the alternative hypothesis will be supported instead. The alpha level is the probability value used to help decide whether the test statistic based on the observed data (p-value) is likely or unlikely given the null hypothesis. The probability of making a Type I error is “capped” by the alpha level of .05.
A one-sample T-Test with α=.05 was conducted to test Hypothesis 1. First, a QQ plot evaluation and Shapiro-Wilk Normality Test were conducted to confirm normality of the data (see appendix for complete normality testing codes, tables, and graphs). The one-sample T-Test was an appropriate test given that only the hypothesized population mean was known, and this test allows analysis to make use of Group A’s actual mean and degrees of freedom, as well as Group A’s SD and sample size in calculating the standard error of the sampling distribution of sample means to obtain the T-Statistic and determine significance.
Since the scores were normally distributed (as demonstrated by additional Shapiro-Wilk Normality Tests and QQ plot evaluations), a non-directional two-independent sample t-test was used to test Hypothesis 2 with α=.05. This was an appropriate test because the sample was almost perfectly balanced for each group (n=28 for Group A, n=29 for group C), and because this test is suitable to compare independent groups in different conditions in quasi-experimental and experimental designs (this was a quasi experimental design with randomized assignment of smokers between groups B and C). This test was also appropriate because the actual population SD was not known, so this T-Test allowed analysis to use the sample SD to estimate the population SD (this introduces more uncertainty into the SE used in the overall calculation). Because this study makes use of a nearly-perfectly balanced design, the resulting T-statistic will be the same regardless of homogeneity/heterogeneity assumptions. That said, this test was conducted assuming population homogeneity. The test was non-directional because of the exploratory nature of this study; it was our goal to establish if there is, in fact, any performance difference between smokers and non-smokers in this task. Additionally, a one-tail T-Test would not have been appropriate for this study because it was not known the direction the scores would trend to, and because a one-tail test would have been less “strict” (grouping 5% of probability at one tail rather than dividing it into 2.5% probability in each tail, thus making it more likely to capitalize on chance to find a statistically significant result).
Descriptive statistics by group, a boxplot to graphically depict the differences between groups, p-values, standard error, t-statistics, and degrees of freedom will be reported as part of the determination of significance.
The descriptive statistics and boxplots reveal that participants in groups A (non-smokers) and B (regular smokers who did not smoke prior to task) had fewer recall errors on average (39.224 and 41.382, respectively) than participants in group C (smoked during or just prior to the reading task) (48.199). Participants in Group A made fewer recall errors on average and overall than participants in groups B and C. However, Group A had the widest range of scores, and Group C had the smallest range of scores, as demonstrated by the different by the boxplot whiskers and descriptive statistics: Group A: min= 0.304, max=71.357; Group B: min=6.919, max=72.842; Group C: min=22.084, max= 79.994. Group A had the second most variance within its scores out of the three groups, with an SD of 16.686 and CV of .425, as compared to Group B (SD=15.587, CV=.377), and Group C (SD=16.76, CV=.348).
Group A’s mean (39.224) and median (39.002) are similar, but QQ plots show that the data has a few scores that could be considered outliers. For that reason, median (39.002) may be a better measure of central tendency given that the measurement of the median is more resistant to outliers than the mean is. However, since the Shapiro-Wilk normality test of each group’s scores and overall scores showed that the data is normally distributed, either mean or median could be used as a measure of central tendency. A chart of descriptive statistics for recall score by group and boxplots are reported below for further detail about each group’s descriptive statistics.
While mean is a good estimate of the average performance of the sample from Group A, it may not be a reliable estimate of the true population mean performance. Standard error represents how accurately a sample represents a population, and it is calculated by dividing the standard deviation by the square root of n. Group A’s SE is 3.1534, which is important to consider when trying to estimate the true population mean performance. Since the mean is fairly large (Group A=39.224), I consider this SE to be smaller by comparison, which indicates that Group A’s mean may be a reliable estimate of the true population mean performance.
While Group A overall had fewer recall errors (and appeared to have the “best performance”), the relatively large spread between data points as evidenced by the measures of variance (SD and CV) and the overlap in scores with groups B and C (see similarities in boxplot) suggest that participants in Group A might not always outperform participants in groups B and C; this could be due to other factors that influence participants’ recall ability (e.g. perhaps age, cognitive health, primary language) that were not accounted for in this study, or due to chance. Since descriptive statistics alone cannot determine statistical significance, hypothesis testing was used to determine if the differences in scores between groups A and C are statistically significant.
Results: Hypothesis 1
Hypothesis 1 stated that Group A would have a recall score of 45 (μGroupA=μGroupAHypothesized, or μGroupA=45). A one-sample T-Test showed that there was no statistically significant difference between Group A’s mean and the hypothesized mean of 45.
First, a QQ plot evaluation and Shapiro-Wilk Normality Test were conducted to confirm normality of the data (results showed support for the null, meaning that the data is normal). This was an appropriate test given that only the hypothesized population mean was known, and this test allows analysis to make use of Group A’s actual mean and degrees of freedom, as well as Group A’s SD and sample size in calculating the standard error of the sampling distribution of sample means to obtain the T-Statistic.
Next, the one-sample T-Test with α=.05 was conducted. It showed that there was no statistically significant difference between Group A’s mean and the hypothesized mean of 45. The T-statistic = -1.8318, degrees of freedom = 27, and p-value = 0.07803. Because the p-value .078 > alpha level .05, the null hypothesis was supported. In other words, if the statistical null hypothesis is true, the probability that a T-statistic of/more extreme than -1.8318 units away from the center of the T-Distribution would be observed, given the degrees of freedom of 27, is 0.07803, or about 7.8%.
The one-sample T-Test was an appropriate test given that only the hypothesized population mean (45) was known, and this test allowed analysis to use Group A’s actual mean (39.224) and degrees of freedom (27), as well as Group A’s SD (16.686) and sample size (28) in calculating the standard error of the sampling distribution of sample means (3.1534) to obtain the T-Statistic (-1.8318) and determine significance (p-value 0.07803 > alpha level .05).
Since the null hypothesis was supported, it is possible that a Type II error was committed. This could be due to a small sample size, which results in not enough power in statistical tests. Future studies could reduce the chances of this type of error by increasing the sample size. Another uncertainty in this calculation is related to the use of the sample’s SD to estimate the population’s hypothesized SD which was used to calculate the SE. Future investigation is needed to further understand if the similarity Group A’s actual and hypothesized recall scores as found in this study are due to Type II error or actual real differences.
Results: Hypothesis 2
Hypothesis 2 stated that Group A would have fewer recall errors Group C (H0 (null)-> #μGroupA=μGroupC, or μGroupA−μGroupC = 0) or H1(alternative)->μGroupA≠μGroupC). After the data’s normality was established using QQ plot evaluation and the Shapiro-Wilk Normality Test (results showed support for the null, meaning that the data is normal), a non-directional two-independent sample t-test was used to test Hypothesis 2 with α=.05. As shown in the results table, analysis indicated there was a statistically significant difference in recall error scores between groups A and B (t statistic=-2.0257, df=55, p=.048) with estimated recall scores in Group A=39.224 and Group C=48.199. Because p-value= .048 and p .048<.05 alpha level, results failed to support the null hypothesis that there is no statistically significant difference in the recall errors between Group A and Group C. In other words, if the statistical null hypothesis is true, the probability that a T-statistic of/more extreme than -2.0257 units away from the center of the T-Distribution would be observed, given the degrees of freedom of 55, is 0.048, or about 4.8%. Given this small probability, the results support the alternative hypothesis, meaning there is a statistically significant difference between the recall scores between Group A and Group C.
The results from hypothesis testing suggest that smoking during/shortly before reading may increase the number of errors in recall from a reading passage as compared to the performance of non-smokers. Since results failed to support the null, it is important to note the uncertainty that it is possible that there was a Type I error related to study design. This could be due to factors not controlled for in the study participants (e.g. age, IQ, primary language, cognitive health, morbidities, literacy level, etc.). Further investigation is needed to understand if other factors contribute to the variance in recall scores found in this study.
Future research should replicate the present study in different age cohorts and control for other possibly influential factors to provide further context to our results, which suggest that smoking during/shortly before reading may increase the number of recall errors as compared with nonsmokers completing the same task.
##
## Descriptive Statistics for Recall Score by Group:
## descriptivesGroupA descriptivesGroupB descriptivesGroupC
## n 28.000 29.000 29.000
## min 0.304 6.919 22.084
## max 71.357 72.842 79.994
## median 39.002 39.002 47.331
## mean 39.224 41.382 48.199
## SD 16.686 15.587 16.760
## CV 0.425 0.377 0.348
##
## Shapiro-Wilk normality test
##
## data: H1$score
## W = 0.98582, p-value = 0.47155
##
## Shapiro-Wilk normality test
##
## data: GroupA$score
## W = 0.983934, p-value = 0.93177
##
## Shapiro-Wilk normality test
##
## data: H1$score
## W = 0.98582, p-value = 0.47155
##
## Shapiro-Wilk normality test
##
## data: GroupA$score
## W = 0.983934, p-value = 0.93177
##
## Shapiro-Wilk normality test
##
## data: GroupC$score
## W = 0.961122, p-value = 0.35023
##
## Two Sample t-test
##
## data: H1[H1$group == "A", "score"] and H1[H1$group == "C", "score"]
## t = -2.02565, df = 55, p-value = 0.047663
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -17.855452595 -0.095751459
## sample estimates:
## mean of x mean of y
## 39.223526 48.199128
##
## T-Test calculation step-by-step results:
## M.diff SE t.stat df p.obs
## equal.var -8.975602 4.4307859 -2.0257359 55 0.04765439
##
## Shapiro-Wilk normality test
##
## data: GroupB$score
## W = 0.981224, p-value = 0.86803
##
## Two Sample t-test
##
## data: H1[H1$group == "B", "score"] and H1[H1$group == "C", "score"]
## t = -1.60394, df = 56, p-value = 0.11435
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -15.3312518 1.6971136
## sample estimates:
## mean of x mean of y
## 41.382059 48.199128
You should feel free to use sub-title within each section, if it may be helpful for your presentation and for the readers to follow your work.
It is not necessary to present the contents in order specified in homework description.
This is where you may attach your R codes or other program codes, or any other information you’d like to share.
##this is where you should have all your R codes included.
##the 'eval=FALSE' option will show the R codes without actually run it,
## so you will not see any output in the html file
#Generate descriptive statistics
Psy210a_hw1data_F2022<-read.csv('Psy210a_hw1data_F2022.csv')#read imported data and name
'H1' <- subset(Psy210a_hw1data_F2022, score >= 0) #rename data H1 and remove scores less than 0
#sum(is.na(H1$score)) ##confirmed no missing values
#Make subsets
GroupA<-subset(H1, group =="A")
GroupB<-subset(H1, group =="B")
GroupC<-subset(H1, group =="C")
CVgroupA=sd(GroupA$score)/mean(GroupA$score)
CVgroupB=sd(GroupB$score)/mean(GroupB$score)
CVgroupC=sd(GroupC$score)/mean(GroupC$score)
descriptivesGroupA <- c(n=length(GroupA$score),min=min(GroupA$score), max=max(GroupA$score), median=median(GroupA$score), mean=mean(GroupA$score), SD=sd(GroupA$score),CV=CVgroupA)
descriptivesGroupB <- c(n=length(GroupB$score),min=min(GroupB$score), max=max(GroupB$score), median=median(GroupA$score),mean=mean(GroupB$score), SD=sd(GroupB$score), CV=CVgroupB)
descriptivesGroupC <- c(n=length(GroupC$score),min=min(GroupC$score), max=max(GroupC$score), median=median(GroupC$score),mean=mean(GroupC$score), SD=sd(GroupC$score), CV=CVgroupC)
#present descriptive stats as table
DescriptiveStats<-round(cbind(descriptivesGroupA,descriptivesGroupB, descriptivesGroupC),3)
cat('\nDescriptive Statistics for Recall Score by Group:\n')
DescriptiveStats
#Side-by-side boxplot of score by group
boxplot(H1$score ~ H1$group,
horizontal = FALSE,
main = "Boxplot depicting number of recall errors by group",
ylab = 'Number of recall errors',
xlab = 'Group',
frame.plot = FALSE,
col=c('cyan', 'green', 'pink'))
#The descriptive statistics and boxplots reveal that participants in groups A (non-smokers) and B (regular smokers who did not smoke prior to task) had fewer recall errors on average (39.224 and 41.382, respectively) than participants in group C (smoked during or just prior to task) (48.199). Participants in Group A made fewer recall errors on average and overall than participants in groups B and C. However, Group A had the widest range of scores, and Group C had the smallest range of scores, as demonstrated by the different by the boxplot whiskers and descriptive statistics: Group A: min= 0.304, max=71.357; Group B: min=6.919, max=72.842; Group C: min=22.084, max= 79.994. Group A had the second most variance within its scores out of the three groups, with an SD of 16.686 and CV of .425, as compared to Group B (SD=15.587, CV=.377), and Group C (SD=16.76, CV=.348).
#While Group A overall had fewer recall errors (and appeared to have the "best performance"), the relatively large spread between data points as evidenced by the measures of variance (SD and CV) and the overlap in scores with groups B and C suggest that participants in Group A might not always outperform participants in groups B and C; this could be due to other factors that influence participants' recall ability (e.g. perhaps age, cognitive health, primary language) that were not accounted for in this study, or due to chance. Since descriptive statistics alone cannot determine statistical significance, hypothesis testing was used to determine if the differences in scores between groups A and C are statistically significant.
#Testing normality of recall scores with QQ plots
qqnorm(H1$score) #All groups
qqline(H1$score, col='black')
qqnorm(GroupA$score) #GroupA
qqline(GroupA$score, col='yellow')
qqnorm(GroupB$score)#Group B
qqline(GroupB$score, col='red')
qqnorm(GroupC$score) #Group C
qqline(GroupC$score, col='cyan')
#6A, 6B Group A's mean (39.224) and median (39.002) are similar, but QQ plots show that the data has a few scores that could be considered outliers. For that reason, median (39.224) may be a better measure of central tendency given that the measurement of the median is more resistant to outliers than the mean is. However, since the Shapiro-Wilk normality test showed that the data is normally distributed, either mean or median could be used as a measure of central tendency.
#6b While mean is a good estimate of the average performance of the sample from Group A, it may not be a reliable estimate of the true population mean performance. Standard error represents how accurately a sample represents a population, and it is calculated by dividing the standard deviation by the square root of n. Group A's SE is 3.1534, which is important to consider when trying to estimate the true population mean performance. Since the mean is fairly large (Group A=39.224), I consider this SE to be smaller by comparison, which indicates that Group A's mean may be a reliable estimate of the true population mean performance.
#6c
#H0 (null)-> #μGroupAHypothesized=μGroupAActual, or μGroupAHypothesized−μGroupAActual = 0 #H1(alternative)->μGroupAHypothesized≠μGroupAActual
##μGroupAHypothesized=45
#GroupAActual=39.224
#6d
####Normality checks
###Q plots show that the overall data and Group A data has a few outliers. Since a visual evaluation of the QQ plot called the data's normality into question, the Shapiro-Wilk Normality test was conducted to evaluate normality.
### Shapiro-Wilk normality test showed support for the null, so the data is normally distributed.
#data: GroupA$score
#W = 1, p-value = 0.1
shapiro.test(H1$score)
shapiro.test(GroupA$score)
##Hypothesis testing using one-sample T-Test calculating individual statistics
#####Statistics by group
M1<-mean(GroupA$score)
HM2<-45 #hypothesized mean
n1<-length(GroupA$score)
seQ6<-sd(GroupA$score)/sqrt(28) #estimated SE of sampling distribution of sample means
#Standard error is 3.1534
##Difference of means
M.diff2<-M1-HM2
T.StatisticQ6<-M.diff2/seQ6
#T-statistic= -1.831824
dfQ6<-28-1 #Degrees of Freedom= 27 because for one-sample T-Test, df=n-1
p.obsQ6<-2*pt(abs(T.StatisticQ6), dfQ6, lower.tail=FALSE) ##p-value
#p-value=0.07803185
##Hypothesis testing using one-sample T-Test function
Q6d.t<-t.test(H1[H1$group=='A', "score"], mu=45, var.equal = TRUE)
#t = -1.8318, df = 27, p-value = 0.07803
#6d
#A one-sample T-Test showed that there was no statistically significant difference between Group A's mean and the hypothesized mean of 45. The T-statistic = -1.8318, degrees of freedom = 27, and p-value = 0.07803. Because the p-value .078 > alpha level .05, the null hypothesis was supported. In other words, if the statistical null hypothesis is true, the probability that we would observe a T-statistic of/more extreme than -1.8318 away from the center of the T-Distribution is 0.07803, or about 7.8%.
#6e Comment on the uncertainties, if any, related to your answer in 6d.
#Since the null hypothesis was supported, it is possible that a Type II error was committed. This could be due to a small sample size, which results in not enough power in statistical tests. Future studies could reduce the chances of this type of error by increasing the sample size.
#7a
##H0 (null)-> #μGroupA=μGroupC, or μGroupA−μGroupC = 0
##H1(alternative)->μGroupA≠μGroupC
#7b The alpha level is .05. If the p-value from the 2 independent sample T-Test > .05, support the null. If p-value < .05, fail to support the null. The alpha level is the probability value used to help decide whether the test statistic based on the observed data (p-value) is likely or unlikely given the null hypothesis. The probability of making a Type I error is "capped" by the alpha level of .05.
#7c
##Normality checks
#Q plots show that the overall data has a few outliers. This is also true of Group A and Group C data, although Group A overall seems closer to normal than Group C.Since a visual evaluation called the data's normality into question, the Shapiro-Wilk Normality test was conducted to evaluate normality. ?
## Shapiro-Wilk normality test showed support for the null, so the data is normally distributed.
#data: GroupA$score
#W = 1, p-value = 0.1
shapiro.test(H1$score)
shapiro.test(GroupA$score)
shapiro.test(GroupC$score)
##Hypothesis testing using 2 independent samples T-Test
t.test(H1[H1$group=='A', "score"],H1[H1$group=='C',"score"], var.equal = TRUE)
####Step-by-step
##Make subsets by group
GroupA<-subset(H1, group =="A")
GroupB<-subset(H1, group =="B")
GroupC<-subset(H1, group =="C")
##Statistics by group
M1<-mean(GroupA$score)
M2<-mean(GroupC$score)
Var1<-var(GroupA$score)
Var2<-var(GroupC$score)
n1<-length(GroupA$score)
n2<-length(GroupC$score)
##Difference of means
M.diff<-M1-M2
##pooled var
s.pooled<-sqrt((Var1+Var2)/2)
##Calculate standard error
SE<-s.pooled*sqrt(1/n1+1/n2)
##T-statistic
t.stat<-M.diff/SE
##Degrees of freedom assuming homogeneity of variance
df<-n1+n2-2
p.obs<-2*pt(abs(t.stat), df, lower.tail=FALSE) ##assuming equal variances
results<-rbind(equal.var=c('M.diff'=M.diff, 'SE'=SE, 't.stat'=t.stat, 'df'=df, 'p.obs'=p.obs))
#results table
cat('\nT-Test calculation step-by-step results:\n')
results
# M.diff SE t.stat df p.obs
#equal.var -8.975602 4.430786 -2.025736 55 0.04765439
#As shown in the results table, there is statistically significant difference in recall errors between groups A and B (t(55)=-2.025736, p=.048 with estimated recalls in Group A=39.224 and Group C=48.199. Because p-value= .048 and p<.05 alpha level, we fail to support the null hypothesis that there is no statistically significant difference in the recall errors between Group A and Group C.
#7d Since we failed to support the null, it is possible that there was a Type I error related to study design. Another uncertainty related to the 2 Sample Independent T Test is related to the use of the sample's SD to estimate the population's hypothesized SD which was used to calculate the SE. (?))
#The results from hypothesis testing suggest that smoking during/shortly before reading may increase the number of errors in recall from a reading passage as compared to the performance of non-smokers. Further investigation is needed to understand if other factors contribute to the variance in recall scores found in this study.