Load some potentially useful packages:

library(ggplot2)

library(dplyr)


Question 1 (this is a very conceptual question, which we will go over together as a class)

Consider the following toy example:

You wish to decide whether a certain coin is fair or not. In this situation, we have:

\(H_0\): Coin is fair (chance of heads = 50%) \(~~vs.~~\)

\(H_a\): Coin is NOT fair (chance of heads \(\neq\) 50%)


(a) To answer the above question, suppose I only allow you to flip the coin 3 times: it lands “heads” on all 3 flips. What is the chance of this data under the null hypothesis (i.e., a p-value)?

0.125


(b) If you are using \(\alpha\) = 5%, what do you conclude?

I would not reject the null hypothesis


(c) Try to explain in your own words why the answer to (b) is slightly disconcerting/perplexing.

the data was the most extreme that it could have been and yet we cannot reject the null hypothesis, it leads to statistical results that we don’t necessarily believe is true.


(d) General Question: Whenever we do not reject the null hypothesis, what are the two possible reasons for this conclusion? Think back to this summary table:

we are not rejecting either because the null is true or because we have made a type 2 error.



(e) Which of these two reasons might seem more plausible in our example? Why?

I believe that a type 2 error is more plausible than the null being true because the sample size is small meaning that the power is low. This then means that the chance of a type 2 error is relatively high


(f) Consider the same scenario, but suppose we flip the coin 1000 times: it lands “heads” on 502 flips. Without calculating a p-value, what do you think you will conclude regarding the coin?

I would not reject the null

(g) Look back at Question 1(d). Which of the two reasons seems more plausible now? Carefully explain why.

Because the sample size is a lot larger, the null being true seems more plausible because the power is larger meaning that the chance of a type 2 error is a lot smaller. In addition, these results are much less extreme than the original example


(h) Finally, consider a situation where we flip the coin 20 times. It lands “heads” on 2 flips. Without calculating a p-value, what do you think you will conclude regarding the coin?

that we can reject the null hypothesis


(i) What are the two possible reasons for this conclusion? Again, think back to that summary table shown in (d).

We rejected either because the coin is biased, or because we made a type 1 error.


(j) For the situation in (h), is having low statistical power a concern? Briefly explain why.

No, because by rejecting the null hypothesis, we eliminated the type 2 error, meaning that power is not important for determining whether an error is made.


(k) In your own words, provide a summary of when and why power is important.

Power is only important when you do not reject the null hypothesis. The calculation of Power results in a higher degree of comfort for an eventual non-rejection of the null hypothesis because you know that your sample size is big enough that the chance of a type 2 error is low.



Question 2

For each situation below, select the correct option (of the two presented):

(a) Power will be higher when:

The true slope = 0.1 \(~~vs.~~\) The true slope = 4

Power will be higher when the true slope is equal to 4

(b) Power will be higher when:

Sample size = 100 \(~~vs.~~\) Sample size = 1000

Power will be higher when sample size is equal to 1000

(c) Type 1 error:

Depends on sample size \(~~vs.~~\) Doesn’t depend on sample size

Type 1 error does not depend on sample size.

(d) Type 2 error:

Depends on sample size \(~~vs.~~\) Doesn’t depend on sample size

Because power depends on sample size, and power and the probability of Type 2 error are related, I would have to conclude that Type 2 error does depend on sample size

Question 3

Suppose we’re performing the following hypothesis test:

\[H_0: ~true~mean = 30 ~~~~vs.~~~~ H_a: ~true~mean > 30\]

with \(\alpha\) = 0.05 and \(n\) = 50 data points. Also, suppose that the standard deviation of the response variable is 8. In this case, we would reject \(H_0\) if

\[\Bigg(\frac{\overline{Y} - 30}{\frac{8}{\sqrt{50}}}\Bigg) ~~>~~ qnorm(0.95,mean=0,sd=1) ~~\approx~~ 1.645\]

(a) Equivalently, for what values of \(\overline{Y}\) would we reject \(H_0\)?

qnorm(0.95,mean=0,sd=1)
## [1] 1.644854

we would reject the null hypothesis if \(\overline{Y}\) is greater than 31.8611


(b) Find the chance that \(\overline{Y} > 31.86\) when \(H_0\) is true.

1-pnorm(31.86,mean=30,sd=8/sqrt(50))
## [1] 0.0500857

it is less than 0.05009


(c) What does the value you found in (b) represent?

it represents the p value, meaning that if the null hypothesis is true, there is a 5.009% chance that we would see these results. In addition, it also represents the chance that if this data is observed, then there is a 5.009% chance that rejecting the null hypothesis would result in a type 1 error


(d) Now, find the chance that \(\overline{Y} > 31.86\) when \(true~mean\) = 32 (i.e., one possible instance of \(H_a\) being true).

1-pnorm(31.86,mean=32,sd=8/sqrt(50))
## [1] 0.5492409

the chance that the estimated mean is greater than 31.86 when the true mean is equal to 32 is greater than 0.55.


(e) What is the name of the quantity you found in (d)?

It is the alpha value for rejecting the null hypothesis when the true mean is equal to 32


(f) How would the chance you calculated in (d) change if \(true~mean\) = 34?

It would most likely go up


(g) Briefly explain in your own words why your answer to (f) makes sense?

If the true mean increases, but the minimum threshold for rejecting the null hypothesis remains the same, it means that the area to the right of the threshold increases. Since the area to the right increases, we are increasing the p-value that we reject the null hypothesis.



Question 4

Provide a brief response in your own words for each of the following conceptual questions:

(a) What is the difference between statistical significance and practical significance?


(b) There is a trade-off between Type 1 and Type 2 errors. If \(\alpha\), the chance of Type 1 error is decreased, \(\beta\), the chance of Type 2 error, will increase. Why is this the case?


(c) If a study is under-powered (i.e., has low statistical power) in the sense that its sample size is small, which of the possible conclusions from a hypothesis test is more concerning? Briefly explain.


(d) High statistical power is generally a good thing, but explain how it might lead to potentially misleading results?


(e) Suppose we are interested in performing many hypothesis tests, using \(\alpha\) = 5% for each of them. Why is this not advisable?



Question 5

In a recent article entitled, “When Researchers State Goals for Clinical Trials in Advance, Success Rates Plunge” (published in The Chronicle of Higher Education) we find the following passage:

Apparently, requiring scientists to state their objectives ahead of time makes a big difference.

Around 2000, the U.S. government ordered researchers conducting clinical trials with federal money to announce ahead of time which medical question they were hoping to answer.

Before then, 57 percent of large-budget trials for cardiovascular disease attributed a positive effect to a drug or dietary supplement, according to a study published on Wednesday. After the new requirement, the success rate dropped to just 8 percent, the study found.

Discuss how this passage relates to content from our course.

Before 2000, researchers were not required to record their objective ahead of time, and they would change their research questions in the middle of the experiment to have higher success rates, it is an example of multiple hypothesis tests



Question 6

Consider the following passage from the article, Bacon Causes Cancer? Sort of. Not Really. Ish.:

The scientific evidence linking both processed meat and tobacco to certain types of cancer is strong. In that sense, both are carcinogens. But smoking increases your relative risk of lung cancer by 2,500 percent; eating two slices of bacon a day increases your relative risk for colorectal cancer by 18 percent. Given the frequency of colorectal cancer, that means your risk of getting colorectal cancer over your life goes from about 5 percent to 6 percent and, well, YBMMV. (Your bacon mileage may vary.) “If this is the level of risk you’re running your life on, then you don’t really have much to worry about,” says Alfred Neugut, an oncologist and cancer epidemiologist at Columbia.

Discuss how this passage relates to content from our course.

The magnitude of the result is over looked in favor of the sign of the result, although both show a strong relationship to causing cancer, the magnitude associated with cigarettes is around 2500 percent, but the magnitude associated with bacon is around 5-6 percent. This is being overlooked because in terms of interpreting the data, it represents a statistical error in a sense, it is displaying misleading results and interpreting those results in the most favorable way possible.



Question 7

Researchers asked 740 pregnant women to record what they ate before pregnancy. Of the 132 individual foods tracked, consumption of breakfast cereal was significantly linked with the occurrence of baby boys (using \(\alpha\) = 5%).

(a) Explain in your own words why we might be skeptical of such a finding.


These results are questionable because the hypothetically, the researchers performed 132 different tests meaning that chance of a type 1 error is not 5%, its equal to 1-(1-0.05)^132

(b) What is a step that could be taken to guard against the concern raised in (a)?

if alpha is reduced to 0.04% the chance of a type 1 error would be reduced to an acceptable 0.05%



Question 8

Consider the following figure, obtained from a paper published in the American Journal of Clinical Nutrition:


(a) What message(s) do you think the author is trying to portray with this graphic?


(b) Can you articulate a danger posed to society which is implied from the results displayed in the figure (especially for the general public who may not have a strong grasp of statistical concepts)?



Question 9

A hypothesis test was carried out using 12 data points. It yielded a p-value of 14% where \(\alpha\) = 5%. What would your thought process be in interpreting this result?

In interpreting these results, the first thing that I would consider is which type of error is more plausible in this situation. And since our p-value forces us to not reject the null, that would mean we are looking for the possibility of a type 2 error. In this case we don’t have power, but since the sample size is relatively small, we have to assume that the power is as well. As such, if the power is relatively small, that means that our chance for a type 2 error is relatively high. Given the potentially high type 2 error probability, we should conclude that the results do show a trend, but are potentially not statistically significant. If I were conducting the test, I might have used a bigger sample size.