#Problem 1: Determine whether each is True or False
#A:Since more patients on Pioglitazone had cardiovascular problems (5,386 vs. 2,593), we can conclude that the rate of cardiovascular problems for those on a Pioglitazone treatment is higher.
2593/67593
## [1] 0.03836196
5386/159978
## [1] 0.03366713
#False: although more patients on Pioglitazone (P) had heart problems thant those Rosiglitazone (R), a larger proportion of patients taking R experienced heart problems.
#B:The data suggest that diabetic patients who are taking rosiglitazone are more likely to have cardiovascular problems than those taking pioglitazone.
#Ho: there is no difference in cardiovascular problems between the drugs
#Ha: there is difference in heart problems between the drugs
x1<-2593
n1<-67593
x2<-5386
n2<-159978
p_hat1<-x1/n1
p_hat2<-x2/n2
p_hatpool<- (x1+x2)/(n1+n2)
test_stat= (p_hat1-p_hat2)/sqrt(p_hatpool*(1-p_hatpool)*(1/n1+1/n2))
#two-tailed test
pnorm(test_stat, lower.tail = FALSE)*2
## [1] 2.638569e-08
# True. Becasue we can see that the p value for our two-tailes z test shows that it would be nearly impossible to see our result given the truth of the null hypothesis, or an assumption of no difference between the two drugs. Therefore, we can reject the null hypothesis and conclude that there is a difference between the two drugs, and that those patients on R are more likely to have cardiovascular problems.
#C:The fact that the rate of incidence is higher for the Rosiglitazone group proves that Rosiglitazone causes serious cardiovascular problems
#FALSE: Although we cannot rule out that R causes serious cardiovascular problems, that does not mean we can conclude that it does, either. It is entirely possible that those patients who are on R happen to take other medication which exacerbate heart problems, or there may be some other unknown factor causing this problem.
#D:Based on the information provided so far, we cannot tell if the difference between the rates of incidences is due to a relationship between the two variables or due to chance.
# TRUE: As is the case with the last problem, it is not entirely possible to rule out that there may be some other variable at play causing this difference in the data and statistics. Therefore, it is possible that it is due to chance.
#Problem 2
# A: What proportion of patients in the treatment group and what proportion of patients in the control group died?
#Treatment
45/69
## [1] 0.6521739
#Control
30/34
## [1] 0.8823529
#Randomization technique
#Ho: there is no difference in the death rates for patients between groups
#Ha: there is a difference in the death rates for each of these groups
#we write alive on 28 cards representing patients who were alive at the end of the study, and dead on 75 cards representing patients who were not...one group size of 69 representing treatment, and the other group size of 34 representing control... a distribution centered at 0... we calculate the fraction of the simulations where the simulated differences are greater than or equal to our sample proportion.
#Looking at this graphic I would say that the difference of 0.23 is at least 2 standards deviationsa away from the mean. This measn that it is extremely unlikely to observe our result if the null hypothesis were true. Thus, we reject the null hypothesis and conclude that there is a signifigant difference.
#Problem 3
#A: This is a randomized controlled trial.
#B: it is implied in the explanation that these participants were blind, as to lower the chances that any of the participants knew that they had placebos being issued to them. However, one should be aware that it seems that the doctors are aware of which treatment is being administered to which patinets, therefore it is not a double-blind trial.
#C:
#Antibiotic
66/85
## [1] 0.7764706
#Placebo
65/81
## [1] 0.8024691
#D:it seems that the placebo is more effective than the antibiotic. However the difference between the two is 0.02 so with such a small difference in proportion I am hesitant to say whether this difference is indeed signifigant.
#E
#:H0: there is no difference in treatment effectiveness between the antibiotic group and the placebo group
#:Ha: there is a difference in the effectiveness of treatment between those getting antibiotics and those recieving placebos
#F:The number of self-reported in our sample was 66. Considering that 66 falls very close to the center in this histogram,created by our simulation, I argue that there is no signifigant difference between the placebo and antibiotic.
#Problem 4
#A: Ho: yawning next to someone will have no influnece on whether they will yawn
#Ha: yawning next to someone will impact whether this person will yawn
#B:
#Control
4/16
## [1] 0.25
#treatment
10/34
## [1] 0.2941176
#difference of .044
#considering that the observed value is at least two standard deviations away (at least by my estimation) it seems that it would be extremely unlikely to observe this result under the assumption of the null hypothesis, thus I reject the null hypothesis and conclude that there is a signifigant difference between the two. Therefore, yawning next to someone does indeed make them yawn more often than the control.
#Problem 5:
#A: Ho: There is no difference in following the news about egypt between high schoolers and the general American population.
#Ha: There is a difference between high schoolers and the general American population on how much they follow news about Egypt.
#B: 17/30
#C: to perform a simulation we would take our proportion Americans who paid attention to the eqypt news, or .69 then we would create a population where that propoortion of yes (did follow the news) was present and then from that large set of yes and no cards(however many you want), we would take out 30 cards to represent the 30 people in our actual sample. Then we would calculate the proportion and redo the simulation 999 more times. After this large collection of data, we would then create a distribution, and plot our sample against it. if it fell in the top or bottom 2.5 percent of the distribution, then it would be consdiered a statistically signifigant difference either way.
#D:the p-value for our proportion of .566 is around .075, which means that it is likely we would observe this result under the null. Thus, we accept the null and conclude there is not a statistically signifigant difference.
#my simulation
nsim<-10000
mysim<-rbinom(nsim, 30,0.69)
hist(mysim, main="Simulation of High schoolers Paying Attention to Egyptian News", xlab= "High schoolers Responding Yes")
abline(v=17, col="red")

pbinom(17,30,0.69)*2
## [1] 0.2105051