STAT 4160 — Homework #2
0.1 Honor Policy
Homework 2 is to be completed and submitted individually. You may discuss concepts with classmates, but all work you submit must be your own.
You MUST include the names of the students you worked with.
Worked with: [ENTER NAMES HERE, or “None”]
0.2 Problem 1
1 Data pasted from csv:
#ardness Method 72.4 A 13 68.9 A 8 75.1 A 16 70.3 A 11 73.8 A 15 69.5 A 10 71.6 A 12 74.2 A 14 65.1 B 3 66.8 B 4 67.9 B 6 64.3 B 2 69.0 B 9 63.7 B 1 68.1 B 7 66.0 B 5
1.0.1 Background
A materials scientist is studying the effect of curing method on the hardness of a polymer coating. Two curing methods are considered:
- Method A: UV curing
- Method B: Thermal curing
Hardness is measured on a standardized scale for eight samples produced under each method. The experiment was randomized. The data file is available as hardness_data.csv
On a previous homework, a pooled two-sample t-test suggested that the mean hardness was greater for Method A. In this problem, we will instead use a Wilcoxon Rank-Sum Test.
1.0.2 (a)
Null hypothesis: The UV curing method and Thermal curing method produces the same mean hardness for the polymer coating.
Alternative hypothesis: The UV curing and Thermal curing method produces different mean hardness of the polymer coating. There are two hypotheses that this could indicate – The UV curing method produces a higher mean hardness of the polymer coating compared to the mean hardness produced by the thermal curing method OR The UV curing method produces a lower mean hardness of the polymer coating compared to that produced thermal curing method. If we detect a significant difference between the mean hardness resulting from the curing methods, this could implicate either one of the two previously stated alternative hypotheses.
Another way we could write this is be denoting mu_UV =. the mean hardness for the UV curing method and mu_Th = the mean hardness of the thermal method Null Hypothesis (H0) = mu_UV = mu_Th Alternative Hypothesis (Ha) = mu_UV =/= mu_Th ; which could mean mu_UV>mu_Th OR mu_UV < mu_Th
1.0.3 (b)
By hand: 72.4 A 13 68.9 A 8 75.1 A 16 70.3 A 11 73.8 A 14 69.5 A 10 71.6 A 12 74.2 A 15 65.1 B 3 66.8 B 5 67.9 B 6 64.3 B 2 69.0 B 9 63.7 B 1 68.1 B 7 66.0 B 4
Rank all 16 hardness observations from smallest to largest. To further clarify: Hardness: 63.7 (B) : RANK 1 64.3 (B): RANK 2 65.1 (B): RANK 3 66.0 (B): RANK 4 66.8 (B): RANK 5 67.9 (B): RANK 6 68.1 (B): RANK 7 68.9 (A): RANK 8 69 (B): RANK 9 69.5 (A): RANK 10 70.3 (A): RANK 11 71.6 (A): RANK 12 72.4 (A): RANK 13 73.8 (A): RANK 14 74.2 (A): RANK 15 75.1 (A): RANK 16
Indicate the method corresponding to each rank. After ranking all of the data in order on values of the response variable (hardness in this case) without regard to magnitude of the differences between the observation, we then aggregate the ranks for each group, and sum the ranks for each group. One group’s rank sum is the test statistic since alongside the sample size of each group, we can use these three pieces of information to determine whether there is a significant difference in the outcome variable between the two groups.
Compute the Wilcoxon Rank-Sum test statistic for Method A. We want to take the sum of the ranks for one group, say, group A. Group A contains the ranks: 8, 10, 11, 12, 13, 14, 15, and 16. We then sum these ranks 8+10+11+12+13+14+15+16 = 99
1.0.4 (c)
Use R to confirm the test statistic you computed in part (b).
uv_method <- c(68.9,69.5,70.3,71.6,72.4,73.8,74.2,75.1) thermal_method <- c(63.7,64.3,65.1,66.0,66.8,67.9,68.1,69) hardness<- c(uv_method,thermal_method) group <- rep(c(1,2),each=8) wilcox.test(hardness ~ as.factor(group), exact = TRUE, alternative=“greater”)
Wilcoxon rank sum exact test
data: hardness by as.factor(group)
W = 63, p-value = 0.0001554
alternative hypothesis: true location shift is greater than 0
This calculates W_UVmethod - 8(8+1)*1/2, so we should show that this quantity, where the test statistic calculated above (W_UVmethod = 99) adusted using the same adjustment that R makes to the test statistic indeed reports the same result that R reports. If this result matches, it indicates the generated test statistic (as reported as the rank sum) is correct. We see 99 - 8(9)(0.5) = 99 - 36 = 63, which is what R calculated, matching the reported
1.0.5 (d)
Using R, report: exact p value: 0.0001554 asymptotic p value: 0.0006797
The exact p value uses the exact distribution of the ranks and compares all the possible combinations of ranks between groups A and B to see how likely the given ranked sum is, under the assumption that the null is true, and that all permutations of the data and resulting rank sums for each permutation are equally likely.
The asymptotic p value gives us the same information, but assumes that more observations using the two curing methods are collected with the same procedures/sampling.
- The exact p-value
- The asymptotic p-value
1.0.6 (e)
Interpret the p-value in the context of the polymer hardness problem. The p value of 0.0001554 tells us that, assuming the null hypothesis, that there is no mean difference between the hardness of the polymer coating produced by the thermal curing vs UV curing method, is true, there is a 0.015% chance that the rank sum for the polymer coating for the UV curing method (which was the test statistic), or a rank sum that is more extreme, would’ve been generated.
1.0.7 (f)
Compare the p-value from the Wilcoxon Rank-Sum Test to the p-value from the pooled two-sample t-test. Are they similar or different? Briefly explain, referencing any relevant assumptions.
t.test(hardness ~ group, var.equal= TRUE)
data: hardness by group
t = 5.3189, df = 14, p-value = 0.0001084
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval: 3.349308 7.875692
sample estimates: mean in group 1 mean in group 2 71.9750 66.3625
The p value for the pooled t.test method is different since the t test assumes the group data’s deviations from the mean (residuals) are normally distributed, and that their variances are equal. This makes two assumptions that are very difficult to satisfy for small sample sizes. However, if we can make these assumptions, the t test preserves more information in the data (i.e. captures large deviations from the mean while ranks just collapse the information in the raw data, and thus has more power, which explains the lower p value relative to the p value generated from the Wilcoxon test, i.e. we limit the probability of a type II error. However, because these assumptions are likely not to hold due to the sample size, it is not advisable to use a t test.
1.0.8 (g)
State your final conclusion for this problem in the context of curing methods and hardness. The results of the Wilcox Test provides strong evidence to reject the null, meaning there if there was no difference in the rank sums between the two groups, there is a very low probability that the rank sum generated for the UV cured group as extreme than the sum that was generated would be generated. This means that there is strong evidence to suggest that the two curing methods generated a significantly different level of hardness of the polymer coating, and based on the relative rank sums, we can say that the UV curing method generated a significantly harder polymer coating than the thermal curing method.
1.1 Problem 2: Comparing Multiple Treatments
1.1.1 Background
An environmental scientist is comparing four different water filtration systems to see whether they produce equivalent nitrate concentrations in treated water. Each system is tested six times using the same water source. The experiment is a randomized complete design.
The nitrate concentration (mg/L) measurements are available in nitrate_data.csv
1.1.2 (a)
State the null and alternative hypotheses for a one-way ANOVA on nitrate concentration. Null: There is no mean difference in nitrate concentrations in the treated water across any the samples that used any of the 4 filtration systems. I.e. the nitrate concentration of any one group is not greater than the nitrate concentration of another. ### (b) Using R, conduct a one-way ANOVA. nitratedata <- read.csv(“/Users/brianyingjetng/Desktop/STAT4160/nitrate_data.csv”)
anovanitrate <-aov(Nitrate~as.factor(System), data = nitratedata) summary(anovanitrate) Df Sum Sq Mean Sq F value Pr(>F)
as.factor(System) 3 283.98 94.66 240.9 7.34e-16 *** Residuals 20 7.86 0.39
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
1.1.3 (c)
Check the assumptions of ANOVA by submitting:
- The ANOVA table
- A QQ-plot of the residuals
Syntax: qqnorm(anovanitrate\(residuals) qqline(anovanitrate\)residuals)
- A Residuals vs. Fitted Values plot plot(anovanitrate\(fitted.values,anovanitrate\)residuals, main=“Residual Plot”) abline(h=0)
1.1.4 (d)
Based on these diagnostics, does the normality assumption appear reasonable? Be specific. The normality assumption seems tenuous,since the theoretical quantities of the quantities zi on the standard normal distribution, plotted against the expectation of the P ith percetile of the ith sample observations’ relative to the normal distribution, appears to be sloped slightly downward in relation to the line. This indicates that the response variable may need to be transformed.
1.1.5 (e)
Assess the constant variance assumption using both the Residuals vs. Fitted plot and Bartlett’s Test. Should a transformation be considered? If so, indicate the direction on the ladder of transformations. Bartlett’s test: Bartlett test of homogeneity of variances
bartlett.test(anovanitrate\(residuals ~ nitratedata\)System)
data: anovanitrate\(residuals by nitratedata\)System Bartlett’s K-squared = 1.287, df = 3, p-value = 0.7322
Based on these results, it does not appear that a transformation is necessary. For the residual plot, there are some variability
1.1.6 (f)
Create a new variable equal to the square root of nitrate concentration, and perform a new ANOVA using this transformed response. nitratesqrt <- (nitratedata$Nitrate)^0.5
1.1.7 (g)
For the transformed ANOVA, again check normality and constant variance. Submit: nitratesqrt <- (nitratedata$Nitrate)^0.5 anovap2 <- aov(nitratesqrt ~ as.factor(System), data = nitratedata) summary(anovap2)
- ANOVA table Df Sum Sq Mean Sq F value Pr(>F)
as.factor(System) 3 7.448 2.4828 236.2 8.89e-16 *** Residuals 20 0.210 0.0105
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
QQ-plot of residuals qqnorm(anovap2\(residuals) qqline(anovap2\)residuals)
Residuals vs. Fitted Values plot
plot(anovap2\(fitted.values,anovap2\)residuals, main=“Residual plot”) abline(h=0)
- Bartlett’s Test
bartlett.test(anovap2\(residuals ~ nitratedata\)System)
Bartlett test of homogeneity of variances
data: anovap2\(residuals by nitratedata\)System Bartlett’s K-squared = 0.40039, df = 3, p-value = 0.9402
1.1.8 (h)
Compare the F-statistics and p-values from the original and transformed ANOVAs. Which analysis provides the more valid inference, and why?
summary(anovanitrate) Df Sum Sq Mean Sq F value Pr(>F)
System 1 277.86 277.86 437.2 5.25e-16 *** Residuals 22 13.98 0.64
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(anovap2) Df Sum Sq Mean Sq F value Pr(>F)
System 1 7.407 7.407 648.5 <2e-16 *** Residuals 22 0.251 0.011
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
anovanitrate is the anova for the original data, while anovap2 is the test run for the transformed data. The F statistic for the first test is 437.2, and the p value is 5.25x10^-16. While for the second test, the F stat is 648.5, and the p value is less than 2 x 10^-16. We see that the F value is higher for the transformed data, and the p value is lower. This indicates that the squared transformation increases the relative Sum of Squares of the Treatment relative to the sum of sqaured errors, The tests done in part G indicate that the transformed data also fulfills the ANOVA assumptions of normally distributed errors and equal varriances in group responses. Therefore, the transformed data provides a more valid intference, since it more precisely estimates the effect of water filtration system on the nitrate level by minimizing the relative proportion of the SSE to SST, while maintaining the assumptions to ensure the test remains valid.
1.1.9 (i)
Using the transformed data, conduct Tukey’s Honest Significant Difference procedure. Report the relevant output and summarize your conclusions.
TukeyHSD(anovap2, conf.level=0.95) Tukey multiple comparisons of means 95% family-wise confidence level
Fit: aov(formula = nitratesqrt ~ as.factor(System), data = nitratedata)
$as.factor(System) diff lwr upr p adj 2-1 0.4460924 0.2803988 0.6117861 1.6e-06 3-1 0.9033731 0.7376794 1.0690667 0.0e+00 4-1 1.5039086 1.3382149 1.6696023 0.0e+00 3-2 0.4572806 0.2915870 0.6229743 1.1e-06 4-2 1.0578162 0.8921225 1.2235098 0.0e+00 4-3 0.6005355 0.4348419 0.7662292 0.0e+00
Tukey’s significant difference procedure tells us that there are significant differences between the nitrate levels for each of the systems. The tests appear to support the following conclusions:
Comparing the nitrate level in 2 and 1, we see that using system 2 produces, on average, a nitrate level of 0.44 higher. We produce a p value of 1.6x10^-6, indicating that there is a very low (less than 0.05) probability that, given that there was no difference between the groups, we would’ve produced a group difference as high or higher than we did. This means that using system 2 produces a significant higher nitrate level than using system 1.
Compare the nitrate levels in 3 and 2, we see that system 3 produces, on average, a nitrate level of 0.457 higher than system 2. We produce a p value of 1.1x10^-6, indicating that there is a very low probability that, given that there are no differences between the systems 2 and 3, we would’ve produced a group difference as higher or higher than we did. This means that system 3 produces a significantly higher nitrate level than system 2.
Comparing the nitrate levels of samples that used systems 4 vs 3, we see that system 4 produces, on average, a nitrate level of 0.60 higher than system 3.W We produce a p value that is very close to 0, indicating that there is a very low probability, given that there are no differences between systems 3 and 4, we would’ve produced a group difference as high or higher than we did. This means that system 4 produces a significantly higher level of nitrate compared to system 3.
By the transitive property of inequality, since we have strong evidence to support mean nitrate level 2 > mean nitrate level s1; s3>s2; s4>s3. We can also conclude that the mean nitrate level for S3 > S1; S4 > S1; S4 > S2. These results, in totality, would indicate that using system 1 produces the lowest level of nitrate in the water. While system 2 causes significantly more nitrate to be present in the water than system 1, while system 3 causes significantly more nitrate to be present in the water than both systems 1 and 2. And finally, using system 4 causes significantly more nitrate to be present in the water than systems 3, 2, and 1.
1.1.10 (j)
Conduct a Kruskal–Wallis Test on the original nitrate concentration data. system1 <- c(4.8,5.2,6.1,5.5,4.9,5.8) system2 <- c(7.4,6.9,7.8,8.1,7.2,8.5) system3 <- c(10.2,9.7,11.1,10.6,9.9,10.8) system4 <- c(14.5,13.8,15.2,14.9,13.6,15.7)
kwtest <- data.frame(response = c(system1, system2, system3, system4), group = factor(rep(1:4,each = 6)), rank= rank(c(system1, system2, system3, system4))) kruskal.test(response ~ group, data = kwtest)
kruskal.test(response ~ group, data = kwtest)
Kruskal-Wallis rank sum test
data: response by group Kruskal-Wallis chi-squared = 21.6, df = 3, p-value = 7.9e-05
`` - State the null and alternative hypotheses. Null hypothesis: There is no significant differences in the nitrate levels between the water samples in any of the four groups (same as oneway anova). Alternative hypothesis: There is a significant difference in the nitrate levels of at least one of the water samples using different systems, i.e. that at least one of the systems produce a different nitrate level in its respective water samples compared to any of the other systems.
- Report the Kruskal–Wallis test statistic and the asymptotic p-value. The Kruskal-Wallis test stat is 21.6. And the asymptotic p value is 7.9x10^-5.
1.1.11 (k)
Interpret the p-value from the Kruskal–Wallis test in context. The p value of 7.9x10^-5 indicates that under the assumption that there is no significant difference in the nitrate levels between water samples in any of the groups, there is a probability of 7.9x10^-5 that we would’ve generated a KW test statistic as extreme or more extreme than the one we generated.
1.1.12 (l)
State your overall conclusion for this problem regarding differences among the filtration systems. As indicated by the oneway anova and confirmed the KW test, which confirmed that we can reject the null that there is no significant differnce between the nitrate levels between the water samples in any of the groups, we can support that there is likely some difference in the nitrate levels in the water samples, at least between the samples that used two or more of the system. The pairwise t tests, which was then adjusted using the Tukey HSD process, indicates that the nitrate level of the water using system 4 was significantly higher than the nitrate level of the water using system 3, which was higher than the nitrate level of the water using system 2, which was significantly higher than the nitrate level of the water using system 1. Which provides support for the following conclusion Nitrate level using system 4 > nitrate level of water using system 3 > nitrate level of water using system 2 > nitrate level of water using system 1.