#Question 6: The PlantGrowth data set contains three different groups, with each representing various plant food diets (you may need to type data (PlantGrowth) to activate it). The group labeled “ctrl” is the control group, while “trt1” and “trt2” are different types of experimental treatment. As a reminder, this subsetting statement accesses the weight data for the control group: PlantGrowth\(weight[PlantGrowth\)group==”ctrl”] and this subsetting statement accesses the weight data for treatment group 1: PlantGrowth\(weight[PlantGrowth\)group==”trt1”] Run a t-test to compare the means of the control group (“ctrl”) and treatment group 1 (“trt1”) in the PlantGrowth data. Report the observed value of t, the degrees of freedom, and the p-value associated with the observed value. Assuming an alpha threshold of .05, decide whether you should reject the null hypothesis or fail to reject the null hypothesis. In addition, report the upper and lower bound of the confidence interval.

dfCtrl <- PlantGrowth$weight[PlantGrowth$group=="ctrl"]
dftr1 <- PlantGrowth$weight[PlantGrowth$group=="trt1"]
dftr2 <- PlantGrowth$weight[PlantGrowth$group=="trt2"]

t.test(dfCtrl,dftr1)
## 
##  Welch Two Sample t-test
## 
## data:  dfCtrl and dftr1
## t = 1.1913, df = 16.524, p-value = 0.2504
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2875162  1.0295162
## sample estimates:
## mean of x mean of y 
##     5.032     4.661

Analysis:

The T value of the confidence interval comparing the means of the control and treatment 1 is 1.19, the degress of freedom is 16.524, and the P-value is .25. The lower bound for this test was -.288 and the upper is 1.03. Since our Alpha for this experiment is .05 and the p-Value is .25, we should fail to reject the null hypothesis.

#Question 7

Install and library() the BEST package. Note that you may need to install a program called JAGS onto your computer before you try to install the BEST package inside of R. Use BESTmcmc() to compare the PlantGrowth control group (“ctrl”) to treatment group 1 (“trt1”). Plot the result and document the boundary values that BESTmcmc() calculated for the HDI. Write a brief definition of the meaning of the HDI and interpret the results from this comparison.

library(BEST)
## Loading required package: HDInterval
bestOut <- BESTmcmc(dfCtrl, dftr1)
## Waiting for parallel processing to complete...
## done.
bestOut
plot(bestOut)

Analysis: The boundaries for the Best MCMC are -.374 and 1.13 when comparing the control to Treatment 1. The High Density Interval graph gives a visual representation of the possible population mean. Unlike the confidence interval performed above, where if we reran that test 100 times the population mean will be in 95% of the intervals, with no indication where the true population mean difference lies, the HDI utilizes MCMC to give clearer insight into the true interval of the population mean. The bounds are a bit wider, but the results are more conclusive. It also informed that there is a 14% chance the population mean difference is less than 0.

#Question 8

Compare and contrast the results of Exercise 6 and Exercise 7. You have three types of evidence: the results of the null hypothesis test, the confidence interval, and the HDI from the BESTmcmc() procedure. Each one adds something, in turn, to the understanding of the difference between groups. Explain what information each test provides about the comparison of the control group (“ctrl”) and the treatment group 1 (“trt1”).

Analysis:

Observing the data presented through the previous question, it could be assumed that treatment 1 doesn’t offer an improvement in the yield in comparison to the control group. This can be observed by viewing the confidence interval and the HDI. While the confidence interval doesn’t provide concrete evidence,the population mean is highly likely to be positive meaning a higher yield by the control group. This is further solidified by the HDI. The population mean bounds for the Best MCMC produces 85% likelihood that the population mean difference is positive with a mean of .38 meaning a slightly higher yield for the control group.

Given that we failed to reject the null hypothesis, we can infer that treatment 1 isn’t likely to produce a greater yield than the control group.

#Question 9

Using the same PlantGrowth data set, compare the “ctrl” group to the “trt2” group. Use all of the methods described earlier (t-test, confidence interval, and Bayesian method) and explain all of the results.

t.test(dfCtrl,dftr2)
## 
##  Welch Two Sample t-test
## 
## data:  dfCtrl and dftr2
## t = -2.134, df = 16.786, p-value = 0.0479
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.98287213 -0.00512787
## sample estimates:
## mean of x mean of y 
##     5.032     5.526
bestOut <- BESTmcmc(dfCtrl, dftr2)
## Waiting for parallel processing to complete...done.
bestOut
plot(bestOut)

Analysis:

The T value for the control vs treatment 2 is -2.134, the degrees of freedom is 16.79, and the p-value is .048. Given that the P-value is less than the alpha, we would reject the null hypothesis. The bounds for the CI are -.98 and .005.

The BestMCC bounds are a bit wider with a -1.04 and .08 and this 96% likihood the population mean difference is less than 0. Given this experiment, it is likely that treatment 2 will produce a slighly greater yield than the control group and treatment 1.

#Question 10

Consider this t-test, which compares two groups of n = 100,000 observations each: t.test(rnorm(100000,mean=17.1,sd=3.8),rnorm(100000,mean=17.2,sd=3.8)) For each of the groups, the rnorm() command was used to generate a random normal distribution of observations similar to those for the automatic transmission group in the mtcars database (compare the programmed standard deviation for the random normal data to the actual mtcars data). The only difference between the two groups is that in the first rnorm() call, the mean is set to 17.1 mpg and in the second it is set to 17.2 mpg. I think you would agree that this is a negligible difference, if we are discussing fuel economy. Run this line of code and comment on the results of the t-test. What are the implications in terms of using the NHST on very large data sets?

t.test(rnorm(100000,mean=17.1,sd=3.8),rnorm(100000,mean=17.2,sd=3.8))
## 
##  Welch Two Sample t-test
## 
## data:  rnorm(1e+05, mean = 17.1, sd = 3.8) and rnorm(1e+05, mean = 17.2, sd = 3.8)
## t = -6.0215, df = 2e+05, p-value = 1.731e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.13560184 -0.06900359
## sample estimates:
## mean of x mean of y 
##  17.09352  17.19582

Analysis:

While dealing with extremly large amounts of data, the NHST becomes unreliable as even slight deviations with this amount of data will cause the null hypothesis to be rejected. According to the website Frontiers in Human Neuroscience, “HST machinery guarantees that we can detect any tiny irrelevant effect sizes if sample size is large enough.”