Load some potentially useful packages:
library(ggplot2)
library(dplyr)
Question 1 (this is a very conceptual question, which we
will go over together as a class)
Consider the following toy example:
You wish to decide whether a certain coin is fair or not. In
this situation, we have:
\(H_0\): Coin is
fair (chance of heads = 50%) \(~~vs.~~\)
\(H_a\): Coin is
NOT fair (chance of heads \(\neq\)
50%)
(a) To answer the above question, suppose I only allow you to flip
the coin 3 times: it lands “heads” on all 3 flips. What is the chance of
this data under the null hypothesis (i.e., a p-value)?
0.125
(b) If you are using \(\alpha\) =
5%, what do you conclude?
I would not reject the null hypothesis
(c) Try to explain in your own words why the answer to (b) is
slightly disconcerting/perplexing.
the data was the most extreme that it could have been and yet we
cannot reject the null hypothesis, it leads to statistical results that
we don’t necessarily believe is true.
(d) General Question: Whenever we do not reject the
null hypothesis, what are the two possible reasons for this conclusion?
Think back to this summary table:
we are not rejecting either because the null is true or because we have
made a type 2 error.

(e) Which of these two reasons might seem more plausible in our
example? Why?
I believe that a type 2 error is more plausible than the null being
true because the sample size is small meaning that the power is low.
This then means that the chance of a type 2 error is relatively high
(f) Consider the same scenario, but suppose we flip the coin 1000
times: it lands “heads” on 502 flips. Without calculating a
p-value, what do you think you will conclude regarding the coin?
I would not reject the null
(g) Look back at Question 1(d). Which of the two reasons seems more
plausible now? Carefully explain why.
Because the sample size is a lot larger, the null being true seems
more plausible because the power is larger meaning that the chance of a
type 2 error is a lot smaller. In addition, these results are much less
extreme than the original example
(h) Finally, consider a situation where we flip the coin 20 times.
It lands “heads” on 2 flips. Without calculating a p-value,
what do you think you will conclude regarding the coin?
that we can reject the null hypothesis
(i) What are the two possible reasons for this conclusion? Again,
think back to that summary table shown in (d).
We rejected either because the coin is biased, or because we made a
type 1 error.
(j) For the situation in (h), is having low statistical power a
concern? Briefly explain why.
No, because by rejecting the null hypothesis, we eliminated the type
2 error, meaning that power is not important for determining whether an
error is made.
(k) In your own words, provide a summary of when and why power is
important.
Power is only important when you do not reject the null hypothesis.
The calculation of Power results in a higher degree of comfort for an
eventual non-rejection of the null hypothesis because you know that your
sample size is big enough that the chance of a type 2 error is low.
Question 2
For each situation below, select the correct option (of the two
presented):
(a) Power will be higher when:
The true slope = 0.1 \(~~vs.~~\) The true slope = 4
Power will be higher when the true slope is equal to 4
(b) Power will be higher when:
Sample size = 100 \(~~vs.~~\) Sample size = 1000
Power will be higher when sample size is equal to 1000
(c) Type 1 error:
Depends on sample size \(~~vs.~~\) Doesn’t depend on sample
size
Type 1 error does not depend on sample size.
(d) Type 2 error:
Depends on sample size \(~~vs.~~\) Doesn’t depend on sample
size
Because power depends on sample size, and power and the probability
of Type 2 error are related, I would have to conclude that Type 2 error
does depend on sample size
Question 3
Suppose we’re performing the following hypothesis test:
\[H_0: ~true~mean = 30 ~~~~vs.~~~~ H_a:
~true~mean > 30\]
with \(\alpha\) = 0.05 and \(n\) = 50 data points. Also, suppose that
the standard deviation of the response variable is 8. In this case, we
would reject \(H_0\) if
\[\Bigg(\frac{\overline{Y} -
30}{\frac{8}{\sqrt{50}}}\Bigg) ~~>~~ qnorm(0.95,mean=0,sd=1)
~~\approx~~ 1.645\]
(a) Equivalently, for what values of \(\overline{Y}\) would we reject \(H_0\)?
qnorm(0.95,mean=0,sd=1)
## [1] 1.644854
we would reject the null hypothesis if \(\overline{Y}\) is greater than 31.8611
(b) Find the chance that \(\overline{Y}
> 31.86\) when \(H_0\) is
true.
1-pnorm(31.86,mean=30,sd=8/sqrt(50))
## [1] 0.0500857
it is less than 0.05009
(c) What does the value you found in (b) represent?
it represents the p value, meaning that if the null hypothesis is
true, there is a 5.009% chance that we would see these results. In
addition, it also represents the chance that if this data is observed,
then there is a 5.009% chance that rejecting the null hypothesis would
result in a type 1 error
(d) Now, find the chance that \(\overline{Y} > 31.86\) when \(true~mean\) = 32 (i.e., one possible
instance of \(H_a\) being true).
1-pnorm(31.86,mean=32,sd=8/sqrt(50))
## [1] 0.5492409
the chance that the estimated mean is greater than 31.86 when the
true mean is equal to 32 is greater than 0.55.
(e) What is the name of the quantity you found in (d)?
It is the alpha value for rejecting the null hypothesis when the true
mean is equal to 32
(f) How would the chance you calculated in (d) change if \(true~mean\) = 34?
It would most likely go up
(g) Briefly explain in your own words why your answer to (f) makes
sense?
If the true mean increases, but the minimum threshold for rejecting
the null hypothesis remains the same, it means that the area to the
right of the threshold increases. Since the area to the right increases,
we are increasing the p-value that we reject the null hypothesis.
Question 4
Provide a brief response in your own words for each of the following
conceptual questions:
(a) What is the difference between statistical significance
and practical significance?
(b) There is a trade-off between Type 1 and Type 2 errors. If \(\alpha\), the chance of Type 1 error is
decreased, \(\beta\), the chance of
Type 2 error, will increase. Why is this the case?
(c) If a study is under-powered (i.e., has low statistical
power) in the sense that its sample size is small, which of the possible
conclusions from a hypothesis test is more concerning? Briefly
explain.
(d) High statistical power is generally a good thing, but explain
how it might lead to potentially misleading results?
Question 5
In a recent article entitled, “When Researchers State Goals for
Clinical Trials in Advance, Success Rates Plunge” (published in The
Chronicle of Higher Education) we find the following passage:
Apparently, requiring scientists to state their objectives ahead
of time makes a big difference.
Around 2000, the U.S. government ordered researchers conducting
clinical trials with federal money to announce ahead of time which
medical question they were hoping to answer.
Before then, 57 percent of large-budget trials for
cardiovascular disease attributed a positive effect to a drug or dietary
supplement, according to a study published on Wednesday. After the new
requirement, the success rate dropped to just 8 percent, the study
found.
Discuss how this passage relates to content from our course.
Before 2000, researchers were not required to record their objective
ahead of time, and they would change their research questions in the
middle of the experiment to have higher success rates, it is an example
of multiple hypothesis tests
Question 6
Consider the following passage from the article, Bacon Causes
Cancer? Sort of. Not Really. Ish.:
The scientific evidence linking both processed meat and tobacco
to certain types of cancer is strong. In that sense, both are
carcinogens. But smoking increases your relative risk of lung cancer by
2,500 percent; eating two slices of bacon a day increases your relative
risk for colorectal cancer by 18 percent. Given the frequency of
colorectal cancer, that means your risk of getting colorectal cancer
over your life goes from about 5 percent to 6 percent and, well, YBMMV.
(Your bacon mileage may vary.) “If this is the level of risk you’re
running your life on, then you don’t really have much to worry about,”
says Alfred Neugut, an oncologist and cancer epidemiologist at
Columbia.
Discuss how this passage relates to content from our course.
The magnitude of the result is over looked in favor of the sign of
the result, although both show a strong relationship to causing cancer,
the magnitude associated with cigarettes is around 2500 percent, but the
magnitude associated with bacon is around 5-6 percent. This is being
overlooked because in terms of interpreting the data, it represents a
statistical error in a sense, it is displaying misleading results and
interpreting those results in the most favorable way possible.
Question 7
Researchers asked 740 pregnant women to record what they ate before
pregnancy. Of the 132 individual foods tracked, consumption of breakfast
cereal was significantly linked with the occurrence of baby boys (using
\(\alpha\) = 5%).
(a) Explain in your own words why we might be skeptical of such a
finding.
These results are questionable because the hypothetically, the
researchers performed 132 different tests meaning that chance of a type
1 error is not 5%, its equal to 1-(1-0.05)^132
(b) What is a step that could be taken to guard against the concern
raised in (a)?
if alpha is reduced to 0.04% the chance of a type 1 error would be
reduced to an acceptable 0.05%
Question 8
(a) What message(s) do you think the author is trying to portray
with this graphic?
Question 9
A hypothesis test was carried out using 12 data points. It yielded a
p-value of 14% where \(\alpha\) = 5%.
What would your thought process be in interpreting this result?
In interpreting these results, the first thing that I would consider
is which type of error is more plausible in this situation. And since
our p-value forces us to not reject the null, that would mean we are
looking for the possibility of a type 2 error. In this case we don’t
have power, but since the sample size is relatively small, we have to
assume that the power is as well. As such, if the power is relatively
small, that means that our chance for a type 2 error is relatively high.
Given the potentially high type 2 error probability, we should conclude
that the results do show a trend, but are potentially not statistically
significant. If I were conducting the test, I might have used a bigger
sample size.