library(datasets)
toothgrowth<-ToothGrowth
coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth,
xlab = "ToothGrowth data: length vs dose, given type of supplement")
The data set has the following structural characteristics:
Number of Observations:
nrow(toothgrowth)
## [1] 60
Column Names:
names(toothgrowth)
## [1] "len" "supp" "dose"
Five Nnumber Summary of Total Dataset:
summary(toothgrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
“Supp” is a categorical variable with two categories, OJ and VC, each with 30 observations.
Summary of the “supp” category:
Supp=“OJ”
oj_len<-toothgrowth[toothgrowth$supp=="OJ",]
summary(oj_len)
## len supp dose
## Min. : 8.20 OJ:30 Min. :0.500
## 1st Qu.:15.53 VC: 0 1st Qu.:0.500
## Median :22.70 Median :1.000
## Mean :20.66 Mean :1.167
## 3rd Qu.:25.73 3rd Qu.:2.000
## Max. :30.90 Max. :2.000
sd(oj_len$len)
## [1] 6.605561
Supp=“VC”
vc_len<-toothgrowth[toothgrowth$supp=="VC",]
summary(vc_len)
## len supp dose
## Min. : 4.20 OJ: 0 Min. :0.500
## 1st Qu.:11.20 VC:30 1st Qu.:0.500
## Median :16.50 Median :1.000
## Mean :16.96 Mean :1.167
## 3rd Qu.:23.10 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
sd(vc_len$len)
## [1] 8.266029
Comparing the two sets’ results shows supplement “OJ” has a higher mean length and quartile distribution with a lower standard deviation compared to the supplement “VC” subset.
Summary of the dose category:
table(toothgrowth$dose)
##
## 0.5 1 2
## 20 20 20
dose_.5<-toothgrowth[toothgrowth$dose==.5,]
dose_1<-toothgrowth[toothgrowth$dose==1,]
dose_2<-toothgrowth[toothgrowth$dose==2,]
Dose=.5mg
summary(dose_.5$len)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.200 7.225 9.850 10.600 12.250 21.500
sd(dose_.5$len)
## [1] 4.499763
Dose=1mg
summary(dose_1$len)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.60 16.25 19.25 19.74 23.38 27.30
sd(dose_1$len)
## [1] 4.415436
Dose=2mg
summary(dose_2$len)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.50 23.52 25.95 26.10 27.83 33.90
sd(dose_2$len)
## [1] 3.77415
The findings of the exploratory analysis shows there is a positive relationship between dose and length. VC has a lower mean and a lower 2mg doses achieved the highest summary statistics of lengths and the lowest standard deviation.
From the exploratory analysis the effectiveness of each supplement is unclear. Comparison of the means for each supplement is necessary. However, given that both variables (supplement and doses) are of a categorical, each must be transformed to allow for meaningful analysis.
Three sets of hyotheses will be evaluated: 1. Difference in means of supplement types 2. Difference in means between dose sizes 3. Difference in means by dose size and group
The assumptions of the underlying data and relationships between the groups are the same for each hypothesis:
An additional assumption made for this hypothesis test is no matching of observations since there is no guinea pig id’s to match pairs, despite the data set description saying there were 10 guinea pigs. This also ignores the dosages impact on tooth length. Structuring the test in this manner allows the comparison between the effects of the two supplement types to determine if there is a difference between means of groups by supplement type.
H_0: Difference between mean of length of the orange juice group and Ascorbic Acid group is 0 H_a: Difference between mean of length of the orange juice group and Ascorbic Acid group is not equal to 0
Given the small distribution size and assumption of symmetrical distribution of the length variable (not skewed), the t statistic is being chosen.
alpha for this test is: .05
Given the level of alpha and the two tailed hypothesis test, the critical level of the t-statistic is +/-2.0042.
If the calculated t-statistic exceeds 2.0042 or is below -2.0042, then the null hypothesis that the difference between the two means assuming no pairing and unequal variances is 0 is rejected.
If the null hypothesis is failed to be rejected, then we conclude there is no difference in supplement types. Given how the data will be entered, if the null hypothesis is rejected due to exceeding the upper limit of the critical t-score, then we conclude the OJ supplement type is more effective than the Ascorbic Acid type. Else, if the null hypothesis is rejected due to being below the lower limit of the critical t-score, then we conclude Ascorbic Acid is more effective than OJ.
t.test(oj_len$len,vc_len$len,paired=FALSE,var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: oj_len$len and vc_len$len
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
Since the test statistic is within the upper and lower critical t-scores, then we fail to reject the hypothesis, and conclude there is no statistically signicant difference between the two supplement types at the .05 level of confidence. Failing to reject the null hypothesis is supported by the confidence interval containing 0.
An additional assumption made for this hypothesis test is no matching of observations since there is no guinea pig id’s to match pairs, despite the data set description saying there were 10 guinea pigs. This hypothesis also ignores the supplement type. Structuring the test in this manner allows the comparison between the effects of the dosage sizes to determine if if there is a difference between the means of tooth growth by dosage size. Two sets of hypotheses will be considered. For the first hypothesis set, equal variances is assumed based on the exploratory analysis.
Set1 H_0: Difference between mean of .5mg group and 1mg group is 0 H_a: Difference between mean of .5mg group and 1mg group is not equal to 0
Set2 H_01: Difference between mean of 1mg group and 2mg group is 0 H_a1: Difference between mean of 1mg group and 2mg group is not equal to 0
Given the small distribution size and assumption of symmetrical distribution of the length variable (not skewed), the t statistic is being chosen.
For both sets: alpha for both test sets is: .05
For both sets: Given the level of alpha and the two tailed hypothesis tests, the critical level of the t-statistic is +/-2.101.
Set1 If the calculated t-statistic exceeds 2.101 or is below -2.101, then the null hypothesis that the difference between the two means assuming no pairing and equal variances is 0 is rejected.
Set2 If the calculated t-statistic exceeds 2.101 or is below -2.101, then the null hypothesis that the difference between the two means assuming no pairing and unequal variances is 0 is rejected.
If the null hypothesis is failed to be rejected, then we conclude there is no difference in dosage size. Given how the data will be entered, if the null hypothesis is rejected due to exceeding the upper limit of the critical t-score, then we conclude the higher dosage size grouping has a higher mean length than the lower dosage size. Else, if the null hypothesis is rejected due to being below the lower limit of the critical t-score, then the smaller dosage size group’s mean is higher than the higher dosage size groups. This interpretation will be applied to both sets.
Set1
t.test(dose_1$len,dose_.5$len,paired=TRUE,var.equal=TRUE)
##
## Paired t-test
##
## data: dose_1$len and dose_.5$len
## t = 6.9669, df = 19, p-value = 1.225e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 6.387121 11.872879
## sample estimates:
## mean of the differences
## 9.13
Set2
t.test(dose_2$len,dose_1$len,paired=TRUE,var.equal=FALSE)
##
## Paired t-test
##
## data: dose_2$len and dose_1$len
## t = 4.6046, df = 19, p-value = 0.0001934
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.471814 9.258186
## sample estimates:
## mean of the differences
## 6.365
Since the test statistic is greater than the upper critical t-scores for both sets of hypotheses, we reject the null hypothesis, and conclude there is a statistically significantly different difference between the lower and higher dosages at the .05 level of confidence. For both sets of hypotheses, the t-statistic exceeds the upper boundary of the critical t-scores. Thus, we conclude that the higher dosage mean is higher than the lower dosage mean. For set1, there is a 95% certainty that mean of the 1mg dosage is between 6.276252 and 11.983748 units higher than the mean of the .5mg dosage group. For set2, there is a 95% certainty the mean of the 2mg dosage is between 3.733519 and 8.996481 higher than the mean of the 1mg dosage group.
An additional assumption made for this hypothesis test is no matching of observations since there is no guinea pig id’s to match pairs, despite the data set description saying there were 10 guinea pigs. Structuring the test in this manner allows the comparison between the effects of the dosage sizes and supplement types to determine if there is a difference between means of groups by dosage size and supplement type. This will require three sets of hypotheses.
Exploratory Analysis of Variances
.5mg Grouping OJ:
sd(toothgrowth$len[which(toothgrowth$supp=="OJ" & toothgrowth$dose==.5)])
## [1] 4.459709
VC:
sd(toothgrowth$len[which(toothgrowth$supp=="VC" & toothgrowth$dose==.5)])
## [1] 2.746634
1mg Grouping OJ:
sd(toothgrowth$len[which(toothgrowth$supp=="OJ" & toothgrowth$dose==1)])
## [1] 3.910953
VC:
sd(toothgrowth$len[which(toothgrowth$supp=="VC" & toothgrowth$dose==1)])
## [1] 2.515309
2mg Grouping OJ:
sd(toothgrowth$len[which(toothgrowth$supp=="OJ" & toothgrowth$dose==2)])
## [1] 2.655058
VC:
sd(toothgrowth$len[which(toothgrowth$supp=="VC" & toothgrowth$dose==2)])
## [1] 4.797731
Given the findings of the additional exploratory analysis, the assumption of unequal variances is maintained.
.5mg H_0.5: Difference between mean of orange juice group and Ascorbic Acid group is 0 H_a.5: Difference between mean of orange juice group and Ascorbic Acid group is not equal to 0
1mg H_01: Difference between mean of orange juice group and Ascorbic Acid group is 0 H_a1: Difference between mean of orange juice group and Ascorbic Acid group is not equal to 0
2mg H_02: Difference between mean of orange juice group and Ascorbic Acid group is 0 H_a2: Difference between mean of orange juice group and Ascorbic Acid group is not equal to 0
Given the small distribution size and assumption of symmetrical distribution of the length variable (not skewed), the t statistic is being chosen.
For all sets: alpha for both test sets is: .05
For all sets: Given the level of alpha and the two tailed hypothesis tests, the critical level of the t-statistic is +/-2.262.
For all sets: If the calculated t-statistic exceeds 2.262 or is below -2.262, then the null hypothesis that the difference between the two means assuming no pairing and equal variances is 0 is rejected.
If the null hypothesis is failed to be rejected, then we conclude there is no difference in dosage size by supplement type. Given how the data will be entered, if the null hypothesis is rejected due to exceeding the upper limit of the critical t-score, then we conclude the orange juice group’s mean is higher than the ascorbic acid dosage. Else, if the null hypothesis is rejected due to being below the lower limit of the critical t-score, then the ascorbic acid group’s mean is higher than the orange juice group’s mean. This interpretation will be applied to all sets.
Data:
supp_.5<-toothgrowth[which(toothgrowth$dose==.5),]
supp_1<-toothgrowth[which(toothgrowth$dose==1),]
supp_2<-toothgrowth[which(toothgrowth$dose==2),]
Set1
t.test(len~supp,paired=FALSE,var.equal=FALSE,data=supp_.5)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
Set2
t.test(len~supp,paired=FALSE,var.equal=FALSE,data=supp_1)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
Set3
t.test(len~supp,paired=FALSE,var.equal=FALSE,data=supp_2)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
For set1, given the t-statistic exceeds the upper limit of the critical t-score, we reject the null hypothesis the difference between means of .5mg OJ and .5mg VC is 0. The 95% confidence interval does not include 0. Thus, we conclude the difference between the means of the two groups is statistically significanlty different from 0 at the .05 level of confidence. The results of the t-test shows the mean in the OJ group is higher than the VC group. There is a 95% certainty that the difference between the two means is between 1.719057 and 8.780943 units and that difference is positive for the OJ group.
For set2, given the t-statistic exceeds the upper limit of the critical t-score, we reject the null hypothesis the difference between means of 1mg OJ and 1mg VC is 0. The 95% confidence interval does not include 0. Thus, we conclude the difference between the means of the two groups is statistically significanlty different from 0 at the .05 level of confidence. The results of the t-test shows the mean in the OJ group is higher than the VC group. There is a 95% certainty that the difference between the two means is between 2.802148 and 9.057852 units and that difference is positive for the OJ group.
For set3, given the t-statistic from the t-test is with the upper and lower boundaries of the critical t-scores, we fail to reject the null hypothesis the difference between means of 2mg OJ and 2mg VC is 0. The 95% confidence interval includes 0. Thus, we conclude the difference between the means of the two groups is not statistically significanlty different from 0 at the .05 level of confidence. The results of the t-test shows the means of the two groups are similar.