Homework Set 2

Problem 1

(a) Since more patients on Pioglitazone had cardiovascular problems (5,386 vs. 2,593), we can conclude that the rate of cardiovascular problems for those on a Pioglitazone treatment is higher.

5386/159978

## [1] 0.03366713

3.37% of the patients taking Pioglitazone had cardiovascular problems

2593/67593

## [1] 0.03836196

3.84% of the patients taking Rosiglitazone had cardiovascular problems

FALSE: Although there were more cases of cardiovascular problems with those taking Pioglitazone overall, a greater proportion of those taking Rosiglitazone had cardiovascular problems (3.84% vs. 3.37%).

(b) The data suggest that diabetic patients who are taking rosiglitazone are more likely to have cardiovascular problems than those taking pioglitazone.

# H0: no difference between drugs in terms of cardiovascular problems
# HA: difference between drugs in terms of cardiovascular problems
x1<-2593
n1<-67593
x2<-5386
n2<-159978

p_hat1<-x1/n1
p_hat2<-x2/n2

pooled <- (x1+x2)/(n1+n2)

test_stat <- (p_hat1-p_hat2)/sqrt(pooled*(1-pooled)*(1/n1+1/n2))

# two-tailed 
pnorm(test_stat, lower.tail = FALSE)*2

## [1] 2.638569e-08

TRUE: Although a greater proportion of those taking Rosiglitazone had cardiovascular problems in this sample, this does not guarantee that this observed difference is not just due to chance. Therefore, we run a two sample z test on proportions to determine the probablity of getting our data if the null hypothesis were true. The results of our z test indicate that it would be extremely rare to get our data if the null was true, leading us to reject the null hypothesis. As such, the data does suggest that people taking Rosiglitazone are more likely to have cardiovascular problems than those taking Pioglitazone.

(c) The fact that the rate of incidence is higher for the Rosiglitazone group proves that Rosiglitazone causes serious cardiovascular problems

FALSE: Even though the results suggest that there is a link between Rosiglitazone and cardiovascular problems, this does not imply causation. Firstly, the link found in the data could be a type I error. Additionally, it could that a third variable is the true cause of this effect.

(d) Based on the information provided so far, we cannot tell if the difference between the rates of incidences is due to a relationship between the two variables or due to chance.

TRUE: The data suggest that there is a difference between incidence rates. However, we can never fully rule out the possibility of making a type I error.

Problem 2

(a) What proportion of patients in the treatment group and what proportion of patients in the control group died?

# Control
30/34

## [1] 0.8823529

# Treatment
45/69

## [1] 0.6521739

.88 of the control group died, whereas .65 of the treatment group died.

(b) Using a randomization technique

(i) What are the claims being tested?

H0: Null Hypothesis. There is no difference in death rates for the treatment group and control group.

HA: Alternative Hypothesis. There is a difference in death rates for the treatment group and control group.

(ii) Fill in the blanks with a number or phrase.

We write alive on “28” cards… and dead on “75”… one group of size “69” representing treatment and another group of size “34”… many times to build a distribution centered at “0”. Lastly, we caluculate the fraction of simulations where the simulated difference in proportions “as or more extreme than our sample”…

(iii) What do the simulation results shown below suggest about the effectiveness of the transplant program?

.65 - .88

## [1] -0.23

The results show that observing a difference of -.23 in a sample when the null hypothesis is true (dif=0) is very rare. From this simulation we would reject the null and conclude that there is a difference between the treatment and control.

Problem 3

(a) What type of a study is this?

This a randomized controlled trial/experiment

(b) Does this study make use of blinding?

Although the study doesn’t explicitly state that participatns were blind to which group they were placed in, it appears that participants were blind, as precautions generally used to blind participants were used (having the pills look and taste the same). However, there was no indication that experimentors were blind to which participants were in each group, therefore, it was likely not a double blind design.

(c) Compute the difference in proportions of the two groups

# Antibiotic
66/85

## [1] 0.7764706

# Placebo
65/81

## [1] 0.8024691

# difference
.78 -.80

## [1] -0.02

The difference between the antibiotic and placebo group is -0.02

(d) At first glance, does antibiotic or placebo appear to be more effective?

Placebo appears to be more effective. A greater proportion of those in the placebo group reported improvements (.80) than those those in the antibiotics group (.78).

(e) Write out these competing claims in easy-to-understand language and in the context of the application.

H0: Null Hypothesis. There is no difference in self-reported improvement between the antibiotic and placebo group

HA: Alternative Hypothesis. There is a difference in self-reported improvement between the antibiotic and placebo group

Write a conclusion for the hypothesis test in plain language.

The results indicate that the proportion of those in the antibiotic group self-reporting improvements is not significantly different than what we would expect if the null hypothesis were true. Therefore, we fail to reject the null hypothesis and assume that there is no difference between the antibiotic and placebo group.

Problem 4

(a) What are the hypotheses?

H0: Null Hypothesis. There is no difference in yawn rates between the treatment (exposure to yawn) and control (no exposure) groups.

HA: Alternative Hypothesis. There is a difference in yawn rates between the treatment (exposure to yawn) and control (no exposure) groups.

(b) Calculate the observed difference between the yawning rates under the two scenarios.

# treatment
10/34

## [1] 0.2941176

# control
4/16

## [1] 0.25

# difference
.29 - .25

## [1] 0.04

(c) Estimate the p-value usign the figure above and determine the conclusion of the hypothesis test

The observed difference (.04) is well within what we would expect if the null hypothesis was true, as shown in the simulated null hypothesis distribution. The p value would be very high (~.8) leading us to retain the null hypothesis.

Problem 5

(a) Write the hypothesis for testing if the proportion of high school students who followed the news about Egypt is different than the proportion of American adults who did.

H0: Null Hypothesis. The proportion of high school students who followed the news about Egypt is the same as the proportion of adults who did.

HA: Alternative Hypothesis. The proportion of high school students who followed the news about Egypt is different than the proportion of adults who did.

(b) Calculate the proportion of high schoolers in this sample who followed the news about Egypt closely during this time.

17/30

## [1] 0.5666667

The proportion of high schoolers who followed the news in this sample was .57.

(c) Describe how to perform a simulation and, once you had results, how to estimate the p-value.

A simulation is performed by choosing a mean or proportion - in this case a proportion of .69 for high school students - and randomly sampling from a population that you created where 69% of the observations are students who followed the news. In this case the sample size is 30, so you would randomly sample 30 observations from this population and derive a sample proportion. Then you would repeat this process many times (10,000 in this case) to create a simulated sampling distristribution for this proportion. This would represent your null hypothesis distribution - that the proportion of high school students who followed the news was .69. You would then look at the proportion in your actual sample and see where it falls on the simulated null hypothesis distribution. If it is as or more extreme than the top or bottom 97.5% of values you can reject the null hypothesis and accept the alternative hypothesis- that the high school students followed the news at a significantly different rate than U.S. adults.

(d) Estimate the p-value using the plot and determine the conclusion of the hypothesis.

The plot indicates that the sample proportion of .57 is not significantly different than what we would expect by chance if the null (p=.69) were true. I would estimate the p value to be around .3. Therefore, we would fail to reject the null hypothesis.

(e) Please write the code to simulate this distribution and find the p-value for your simulation.

nsim <- 10000
simdis <- rbinom(nsim, 30, .69)
hist(simdis, xlab = "Number of News Followers")
abline(v=17, col="blue", lwd=2, lty=2)

pbinom(17, 30, .69)*2

## [1] 0.2105051

the p-value is .21, therefore we fail to reject the null hypothesis