For my research question, I chose to investigate whether a 100 mg caffeine pill had an effect on mental focus, in the form of a 10 minute attention test. The population parameter of interest is the effect that caffeine has on mental focus and productivity. In the studies I read on, it was found that caffeine does have a positive effect on productivity, when other confounding variables are excluded. After doing my initial research, I thought that the caffeine would have a positive effect on the improvement score of the attention tests.

The observational units in the study were the Islanders that I sampled; I took a random sample of 36 Islanders from one city on each of the islands, all over the age of 18. The explanatory variable was measured by administering either a sugar tablet for the control group, or a 100 mg caffeine tablet for the experimental group. After taking the tablet, all of the participants were asked to take a 10 minute attention test, in which they had to press a key on a keyboard each time they saw the letter Z appear on the screen. The response variable measured was their score on the attention test, and this was done on 5 separate occasions. Since the test was measuring the number of times they missed the letter Z on the screen, a higher score on test, was correlated with less improvement. I initially anticipated the caffeine tablet to have a positive effect on the scores for the attention tests, but it seems that the scores of the subjects in the experimental group were very similar to the subjects in the control group. So instead of the caffeine having a positive effect on the subjects, it instead had no effect at all.

The variables on the plot are the control or testing group (C or T), and the average score on the attention test over the 5 trials taken. In the boxplot, you can see that there is a lot of overlap between the two groups, meaning that the explanatory variable had very little effect on the response variable, and thus, they are not associated.

library(readr)
Attention_Data <- read_csv("Final project data - Sheet1-2.csv")
bwplot(average ~ Type, vertical = TRUE, data = Attention_Data)

favstats(average ~ Type, data = Attention_Data)

The mean of the control group was 7.48, compared to the mean of the treatment group at 12.44, which was the opposite effect of the effect we anticipated. The treatment group also had a much higher standard deviation than the control group, due to some significant outliers.

The population parameter of interest is the effect that caffeine has on mental focus and productivity.

Null hypothesis: there is no association between consuming caffeine and improved mental focus, so the sample means for the control and experimental group should be the same.

\[H_0: \mu_{sugar} = \mu_{caffeine}\]

Alternative hypothesis: there is an association between consuming caffeine and improved mental focus, so the sample means for the control and experimental group should be different values.

\[H_A: \mu_{sugar} < \mu_{caffeine}\]

A type I error is when we reject the null hypothesis when it’s true; in this case, it would be if we rejected the hypothesis that caffeine had no effect on focus, when that is actually the case. A type II error is when we fail to reject the null hypothesis when it is false; in this study, it would be when we fail to reject the hypothesis that caffeine has no effect on focus, when in reality, caffeine does have an effect on focus.

To find the participants for the study, I took a random sample of each of the 3 islands. For each island, I found 12 subjects by randomly selecting a house and a member of the household (excluding Islanders under the age of 18). Because I took a random sample, I can generalize this study to a greater population. Since I only used Islanders over the age of 18, I can only generalize to Islands of the age range, but my study can be applied to all Islands in that range.

t.statistic<-stat(t.test(average ~ Type, data = Attention_Data))
cat("standardized statistic t is",round(t.statistic,2))
## standardized statistic t is -1.06

The t-statistic is -1.06, but the validity conditions for a theory-based t-test are not satisfied. In order for the validity conditions to be satisfied for the theory-based t-test, the quantitative variable needs to have a symmetric distribution in both groups, or each group must have 20 observations without strong skewness.

two.sided.p.value<-pval(t.test(average ~ Type, data = Attention_Data))
cat("the two-sided p-value is",round(two.sided.p.value,5))
## the two-sided p-value is 0.29963

The p-value of 0.30 is the probability of obtaining test results at least as extreme as the result actually observed, assuming that the null hypothesis is correct. Since the p-value is greater than 0.10, there is no evidence against the null hypothesis, so we know that the null is plausible. Since we cannot reject the null hypothesis, we know that there is no association between the variables. In the context of the study, we can say that there is no association between caffeine consumption and mental focus.

diff(rev(mean(average ~ Type, data = Attention_Data)))
##         C 
## -4.966667
set.rseed(498) 
Attention.null <- do(1000) * diffmean(shuffle(average) ~ Type, data = Attention_Data)
dotPlot(~ diffmean,
        data = Attention.null, 
        main="Simulated null distribution of the difference in sample means",
        xlab="difference in sample means",
        width = 1,
        cex = 1,
        groups = (diffmean <= -4.967|diffmean >= 4.967))

p_value<-prop(~(diffmean <= -4.967|diffmean >= 4.967), data = Attention.null)
cat("two-sided p-value is",p_value)
## two-sided p-value is 0.352

Using the simulation based approach, I found a p-value of 0.352, which yields the same result as the theory-based approach. We cannot reject the null hypothesis.

confint(t.test(average ~ Type, data = Attention_Data))

This interval tells us that we are 95% confidence that the difference in sample means of the control and experimental groups is between -14.68 and 4.75. Since this interval includes zero, we can make the same conclusion as above; we can accept the null hypothesis and we know that there is no association between the variables, because the difference in sample means could be zero.

I learned from this study that it is much easier to conduct a study on the people of the Islands, but the results might not necessarily be accurate. From the research I did beforehand, I found that caffeine has a positive effect on mental focus, but doing a study on the Islanders, I found no correlation between the two variables. The data did not behave as I expected it to; I expected that I would be able to find a statistically significant difference between the sample means of the two groups, but instead I found that there is no association between the explanatory and response variables. Although my sample is representative of the larger population of Islanders, I did not take a large enough sample to satisfy the validity conditions for a theory-based approach. This is something that could be improved on for a future study. Next time, I would alter the study so that the subjects take a larger dose of caffeine to see if the effect of that is different, or have the subjects be in both the control and experimental groups to see the difference between each individual. A similar question could be investigated with the use of caffeine on individuals asked to take a different mental test; what is the effect of caffeine on memory rather than focus? What things other than caffeine might be used to improve focus?

Tom M. McLellan, John A. Caldwell, Harris R. Lieberman, A review of caffeine’s effects on cognitive, physical and occupational performance, Neuroscience & Biobehavioral Reviews, Volume 71, 2016, Pages 294-312, ISSN 0149-7634, https://doi.org/10.1016/j.neubiorev.2016.09.001

Frances O’Callaghan, Olav Muurlink & Natasha Reid (2018) Effects of caffeine on sleep quality and daytime functioning, Risk Management and Healthcare Policy, 11:, 263-271, DOI: 10.2147/RMHP.S156404, https://www.tandfonline.com/doi/full/10.2147/RMHP.S156404