Question 1.

Research on peer pressure has shown that the mean influence score for a scale of peer pressure is 520 with a standard deviation of 80. An investigator would like to show that a minor change in conditions will produce scores with a mean of only 500, and he plans to run a t test to compare his sample mean with a population mean of 520. He plans to collect a sample of size n = 100. * a. What is the effect size? * b. What is the power of the test? * c. What sample size is needed to raise the power to .70, .80, .90?

Solution:

library(pwr)
x1<-pwr.t.test(d=(520-500)/80,n=100,sig.level=0.05, type="paired",alternative="two.sided")


  1. Effect size - 100

  2. Power of the test - 0.6969

  3. Sample size needed to raise power to .7,.8,.9 -

x2<-pwr.t.test(d=(520-500)/80,power=0.7,sig.level=0.05, type="paired",alternative="two.sided")

x3<-pwr.t.test(d=(520-500)/80,power=0.8,sig.level=0.05, type="paired",alternative="two.sided")

x4<-pwr.t.test(d=(520-500)/80,power=0.9,sig.level=0.05, type="paired",alternative="two.sided")

We can see we need sample size = 100,127,170 respectively. The below plot illustrates this further.

samplesizes<-c(x2$n,x3$n,x4$n)
powers<-c(x2$power,x3$power,x4$power)
plot(y=samplesizes,x=powers,main = "Plot of samplesize vs. power")

Question 2: Birthwt dataset

library(MASS)
library(ggplot2)
library(dplyr)
df1<-birthwt %>% select(race,smoke,ptl,ht,ui,ftv)
df1[,] <- lapply(df1[,] , factor)

birthwt$ui<-as.logical(birthwt$ui)
birthwt$ht<-as.logical(birthwt$ht)

We load the dataset birthwt and extract two variables - race,smoke, hypertension(ht) and urinary infection(UI) for consideration.

1.EDA

ggplot(birthwt, aes(race,bwt))+geom_boxplot(fill="firebrick")+facet_wrap(~race, scales = "free")+labs(x="Mother's race (1 = white, 2 = black, 3 = other)", y="Birth Weight (gms)")

As visible from the plot, the median birth weight for all three categories - white, black, other are quite similar. The maximum and minimum spread is the most for white race.

We also examine the relation between birth weight and urinary infection.

ggplot(birthwt, aes(ui,bwt))+geom_boxplot(fill="chartreuse4")+facet_wrap(~ui)+labs(x="Uterine Irritability", y="Birth Weight (gms)")

As obvious, birth weight is higher for mothers without any urinary infection.

ggplot(birthwt, aes(ht,bwt))+geom_boxplot(fill="blue")+facet_wrap(~ht)+labs(x="Hyper Tension", y="Birth Weight (gms)")

From above plot, we say that birth weight is directly related to wether or not the mother was pre diagnosed for hypertension.

2.

t.test(birthwt$bwt~df1$ui)
## 
##  Welch Two Sample t-test
## 
## data:  birthwt$bwt by df1$ui
## t = 3.8615, df = 35.696, p-value = 0.000455
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  275.8913 886.6553
## sample estimates:
## mean in group 0 mean in group 1 
##        3030.702        2449.429

Looking at the above results, the p value (<0.05) indicates that we can reject the null hypothesis. The mean weight of the baby when no urinary infection in mother = 3030.702 gms as compared to a low 2449.429 gms from a mother with diagnosed urinary infection.

t.test(birthwt$bwt~df1$ht)
## 
##  Welch Two Sample t-test
## 
## data:  birthwt$bwt by df1$ht
## t = 1.6118, df = 11.909, p-value = 0.1332
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -153.6751 1024.4717
## sample estimates:
## mean in group 0 mean in group 1 
##        2972.232        2536.833

From the above t-test, we can say with a high p-value, we cannot reject the null hypothesis, hence we cannot determine the mean difference between the two groups.

3.

Conducting test on Urinary Infection data:

z<-sd(birthwt$ui)
pwr.t.test(d=(3030.702-2449.429)/z,power=1,sig.level=0.05)
## 
##      Two-sample t test power calculation 
## 
##               n = 2
##               d = 1631.917
##       sig.level = 0.05
##           power = 1
##     alternative = two.sided
## 
## NOTE: n is number in *each* group