Statistical Inference Course Project

Simulation Exercise

With lambda = 0.2， the mean and standard deviation of our simulated exponential distribution are both 1/lambda = 5. Here I investigate the distribution of averages of 40 exponentials simulated a thousand times.

mns = NULL
for (i in 1 : 1000) mns = c(mns, mean(rexp(40,.2)))
mns<-as.data.frame(mns)
means<-mean(mns$mns)
sds<-sd(mns$mns)
p1<-ggplot(mns, aes(x=mns))+
  geom_histogram(aes(y=..density..), colour="black", fill="#5f9a87", binwidth=.3)+
  stat_function(fun = dnorm, args = list(mean = means, sd = sds), size=2) +
  labs(x="Value", y="Density")+ 
  geom_vline(aes(xintercept=1/0.2),linetype="dashed", size=1, color="#e88b4b")

vas=NULL
for (i in 1 : 1000) vas = c(vas, var(rexp(40,.2)))
vas<-as.data.frame(vas)
vav<-mean(vas$vas)
sdv<-sd(vas$vas)
p2<-ggplot(vas, aes(x=vas))+
  geom_histogram(aes(y=..density..), colour="black", fill="#5f9a87", binwidth=3)+
  labs(x="Value", y="Density")+
  stat_function(fun = dnorm, args = list(mean = vav, sd = sdv), size=2) +
  geom_vline(aes(xintercept=1/(0.2^2)),linetype="dashed", size=1, color="#e88b4b")
ggarrange(p1, p2,
          labels=c("A", "B"),
          ncol = 2, nrow = 1)

Figure 1 In figure A, the histogram of the sample mean, the dashed orange line denotes the true mean, 1/0.2 =5. A normal distribution is generated with the mean (5.0231) and the standard deviation (0.8015) of the 1000 means calcuated from 40 random exponentials each. The normal bell curve representing the sample distribution is then laid on top the histogram which appears to be normally distributed also.In figure B, the histogram of the sample variance, the dashed orange line denotes the theoretical variance, (1/0.2)^2 =25. A normal distribution is generated with the mean (25.7478) and the standard deviation (11.8418) of the 1000 sampled variances. The normal bell curve representing the sample distribution appears positively skewed.

It seems that both the center of the distribution of the sampled means at 5.0231 and the center of the sampled variances at 25.7478 is really close to the expected value, which are 5.0 and 25.0 respectively.

Basic Inferential Data Analysis:

The Effect of Vitamin C on Tooth Growth in Guinea Pigs

First, I will generate two notched boxplots as basic exploratory data analyses.

ToothGrowth$dose <- as.factor(ToothGrowth$dose)
p3<-ggplot(ToothGrowth, aes(x=dose, y=len, fill=dose)) + 
  geom_boxplot(outlier.colour="black", outlier.shape=16,
               outlier.size=2, notch=TRUE)
p4<-ggplot(ToothGrowth, aes(x=supp, y=len,fill=supp)) + 
  geom_boxplot(outlier.colour="black", outlier.shape=16,
               outlier.size=2, notch=TRUE)
ggarrange(p3, p4,
          labels=c("A", "B"),
          ncol = 2, nrow = 1)

Figure 3 Boxplot on the left compare the three dosages of vitamin tested, and the one on the right compares the two types of vitamin source. The notch displays a confidence interval around the median which is normally based on the median +/- 1.58*IQR/sqrt(n). Notches are used to compare groups; if the notches of two boxes do not overlap, this is a strong evidence that the medians differ.

As shown in the above graph in the left panel, the notches of the three boxes do not overlap, thereby we can entertain the idea of them being significantly different from each other. Also, for the dosage 0.5, there is a outliner shown as the black dot, we can choose either to retain or remove it.In the right panel, we can see the data from OJ is a bit skewed to the high end, and the notches do overlap, suggesting there is no difference between groups.

Now let’s generate a pivot table to summarize the data.

ToothGrowth %>%
  group_by(supp, dose)%>%
  summarise(Average=mean(len), StandError=sd(len), Counts=n())

Data seems well balanced, with two vitamin treatments and three dosages. For each treatment + dosage combination, ten individuals were tested. Now I will calculate the confidence intervals for each group using the formula X ± t se*sqrt(n), in order to compare tooth growth by supp and dose.
Let’s start with the two vitamin treatments.

VC<-subset(ToothGrowth, supp=="VC");OJ<-subset(ToothGrowth, supp=="OJ")
nv<-30;meanv<-mean(VC$len); df<-29;sv<-sd(VC$len)
handv<-meanv+c(-1,1)*qt(.975, df)*sv/sqrt(29)

no<-30;meano<-mean(OJ$len); dfo<-29;so<-sd(OJ$len)
hando<-meano+c(-1,1)*qt(.975, dfo)*so/sqrt(29)
handv;hando

## [1] 13.82398 20.10269

## [1] 18.15461 23.17206

dose1<-subset(ToothGrowth, dose==0.5);dose2<-subset(ToothGrowth, dose==1.0);dose3<-subset(ToothGrowth, dose==2.0)
n1<-20;mean1<-mean(dose1$len); df1<-19;s1<-sd(dose1$len)
hand1<-mean1+c(-1,1)*qt(.975, df1)*s1/sqrt(20)

n2<-20;mean2<-mean(dose2$len); df2<-19;s2<-sd(dose2$len)
hand2<-mean2+c(-1,1)*qt(.975, df2)*s2/sqrt(20)

n3<-20;mean3<-mean(dose3$len); df3<-19;s3<-sd(dose3$len)
hand3<-mean3+c(-1,1)*qt(.975, df3)*s3/sqrt(20)
hand1;hand2;hand3

## [1]  8.499046 12.710954

## [1] 17.66851 21.80149

## [1] 24.33364 27.86636

The confidence interval for OJ is 18.1546, 23.1721 and the confidence interval for VC is 13.824, 20.1027. These two C.I. clearly overlap with each other, we conclude that they are not different from each other when alpha = 5%. Now for the three dosages, The confidence intervals for dosage 0.5 is calculated to be 8.499, 12.711, for dosage 1.0 is 17.6685, 21.8015, and for dosage 2.0 is 24.3336, 27.8664. None of these three C.I. overlap with each other, we conclude that dosage effects are significant with alpha = 5%.
Lastly, assuming a normal distribution for the populations from which these data were drawn, we conclude that there is no significant effect from the source of vitamin and the dosage of vitamin is positively related to tooth growth, in other words higher vitamin intake is linked with more teeth growth in guinea pigs.

Statistical Inference Course Project

Peer-graded Assignment

Wenwen Liu

Simulation Exercise

Basic Inferential Data Analysis:

The Effect of Vitamin C on Tooth Growth in Guinea Pigs