1. A common symptom of otitis media in young children is the prolonged presence of fluid in the middle ear, which may result in temporary hearing loss and interfere with normal learning skills in the first 2 years of life. One hypothesis is that babies who are breastfed for at least 1 month build up some immunity against the effects of the infection and have less prolonged effusion than do babies fed by bottle. A small study collects the following data (unit: days) on breastfed vs. bottle-fed babies:

First we need to create lists with the values given in the chart:

brfed <- c(18,11,3,6,14,25,17)
botfed <- c(20,35,7,17,28,39,15)
  1. What are the null and alternative hypotheses if you want to look for a statistical difference between the duration of effusion in babies breastfed for at least 1 month vs. babies bottle-fed?

H0 : Both fed babies are equal H1: Breastfed is not equal to bottle-fed babies

  1. What statistical test would you use and why?

First we have to explore the data and look over histogram and means:

hist(brfed)

mean(brfed)
## [1] 13.42857
hist(botfed)
hist(botfed)

The data does not seem that follows a normal distribution , therefore we will ise a non-parametric test. Also, this is a very small study,I would use a WILCOXON RANK SUM TEST. As a default we always want to use two-sided to cover both tails.

  1. Calculate the corresponding p-value.
wilcox.test(brfed,botfed)
## Warning in wilcox.test.default(brfed, botfed): cannot compute exact p-value with
## ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  brfed and botfed
## W = 11.5, p-value = 0.1098
## alternative hypothesis: true location shift is not equal to 0

The p-value is very high , is 0.1098.

  1. Assuming a Type I error rate of 0.05, would you reject or fail to reject the null hypothesis?

Assuming that the Type I error is 0.05 , the we would fail to reject the null hypothesis. The two babies population are statistically equal.

  1. Projections using census data indicate that there are 4,418,620 individuals who identify as female and 4,025,100 who identify as male in NYC. According to NYC Health Department statistics, there have been 111,412 cases of Covid-19 among individuals identifying as female and 115,965 cases of Covid-19 among individuals identifying as male. Suppose you want to check whether the proportion of Covid-19 cases among women and men is statistically equivalent or non-equivalent.
  1. What are the null and alternative hypotheses?

H0 : The proportion of males is equal to the proportion of females affected by covid. H1: The proportion of males is not equal fo the proportion of females affected by covid.

  1. Use the two-sample test for binomial proportions to calculate a test statistic and p-value. Please calculate “by hand” by writing code for the test statistic formula in R and computing a p-value.

Then check your solution by comparing this p-value to that output using the function “prop.test” in R.

prop.test(c(111412,115965),c(4418620,4025100),correct=T)
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(111412, 115965) out of c(4418620, 4025100)
## X-squared = 1039.5, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.003815748 -0.003376767
## sample estimates:
##     prop 1     prop 2 
## 0.02521421 0.02881046

The p-value is under 0.001 , which is very low, therefore, we will reject the null hypothesis.

  1. You want to determine whether there is an association between the development of select diseases and salt intake. You collect an initial sample of the following data:
  1. Calculate the expected counts for the above 3 x 2 contingency table:

Hypertension(HT), Diabetes (Db), Cardiac Event (CE)

HT/High Salt : (24)(17)/83 = 4.9 HT/low Salt (24)(66)/83 = 19.1 Db/High Salt : (35)(17)/83 = 7.2 Db/low Salt (35)(66)/83 = 27.8 CE/High Salt : (24)(17)/83 = 4.9 CE/low salt (24)(66)/83 = 19.1

  1. Which hypothesis test should be used and why?

As we can see in the contingency table, there are 2 values under than 5, which conform over 20% of values and therefore, it will not permit to analyze the problem data using chi-square.

  1. Use R to calculate a p-value.
fisher.test(matrix(c(2,5,10,22,30,14),nrow=3))
## 
##  Fisher's Exact Test for Count Data
## 
## data:  matrix(c(2, 5, 10, 22, 30, 14), nrow = 3)
## p-value = 0.01251
## alternative hypothesis: two.sided

The p-value is low ,which in this case is 0.01251

  1. Are there statistical differences in salt intake across the three types of illnesses?

Yes definitely, if we consider the Type I error as 0.05, then, we reject the null hypothesis and we confirm that there is statistical difference between these different disease.

  1. The study continues for another 6 months and additional samples are collected. You now have a table of the following observed values:

HT/High Salt : (55)(41)/181 = 12.46 HT/low Salt (55)(140)/181 = 42.54 Db/High Salt : (69)(41)/181 = 15.63 Db/low Salt (69)(140)/181 = 53.37 CE/High Salt : (57)(41)/181 = 12.91 CE/low salt (57)(140)/181 = 44.09

  1. Which hypothesis test should be used and why?

Since all of the values in the table are greated than 5, therefore we will use the chi-square test.

  1. Use R to calculate a p-value.

X^2 = (5-12.46)^2/12.46 + (11-15.63)^2/15.63 + (25-12.91)^2/12.91 + (50-42.54)^2/42.54 + (58-53.37)^2/53.37 + (32-44.09)^2/44.09 X^2= 22.19

Degrees of freedom = (3-1)(2-1) = 2

1-pchisq(22.19,2)
## [1] 1.518807e-05
# The p-value is 0.00001519

The p-value is 0.00001519

  1. Are there statistical differences in salt intake across the three types of illnesses after additional data collection?

Yes, the p-value is extremely low and we reject the null hypothesis, confirming that there is staticcillay difference between all of the types of disease mentioned in this exercise.

  1. A study wants to evaluate the effectiveness of a behavioral therapy on patients who are matched based on clinical and demographic characteristics. The following data indicate the success or failure of the intervention in reducing depression among the matched patients:

Suppose you want to compare the proportion of successes in the behavioral therapy between the matched patients A and B.

  1. What are the null and alternative hypotheses?

For this proportions we would use the McNemar method, where the hypothesis are as follows:

H0: The proportion is 1/2 ( or treatment equally effective) H1: The proportion is not 1/2

  1. What hypothesis test should be used and why?

As I mentioned before, we should use the McNemar’s Test , because these proportions are correlated.

  1. Use R to calculate a p-value?
mcnemar.test(matrix(c(396,139,167,162),2,2))
## 
##  McNemar's Chi-squared test with continuity correction
## 
## data:  matrix(c(396, 139, 167, 162), 2, 2)
## McNemar's chi-squared = 2.3824, df = 1, p-value = 0.1227

The p-value is 0.1227

  1. Assuming a Type I error rate of 0.05, would you reject or fail to reject the null hypothesis?

If we assume that the Type I error is 0.05 then we fail to reject the null hypothesis and that the treatments will not be different from each other, it wont be statistically different.