Prologue

Similar to HW 1, a key element of this assignment is importing datasets. Functions that are commonly used for importing data include read.csv() and read.table(). Additionally, you can use the readr package within the tidyverse library.

If needed, here is a non-exhaustive list of some resources related to these functions:

*R Data Import Tutorial by Karlijn Willems https://www.datacamp.com/community/tutorials/r-data-import-tutorial

*“Reading and Importing Excel Files into R Tutorial” by Karlijn Willems https://www.datacamp.com/community/tutorials/r-tutorial-read-excel-into-r

*Chapter 11 in the R for Data Science book https://r4ds.had.co.nz/data-import.html

One final note, when importing data, it is helpful to pay particular attention to your working directory and file paths.

Comparison Analysis - Rank-sum Test (aka Wilcoxon-Mann-Whitney rank-sum test) and t-tests

We have discussed various approaches for analyzing and comparing differences between data sets. In particular, we’ve explored the nonparametric rank-sum approach and the parametric t-tests. Let’s get some additional practice with these methods.

Note: If you need a refresher on nonparametric tests for comparing data sets, Section 5.1 in the Helsel et al. textbook should be helpful.

Scenario 1:The famous t-test was initially illustrated by “Student” in a paper published in 1908 (available at: http://biomet.oxfordjournals.org/cgi/reprint/6/1/1). In the paper, Student used several examples to illustrate the process. One example (Illustration 3) discussed the yields of barley from two different kinds of seeds (kiln-dried and not kiln dried). Each type of seed was planted in adjacent plots with matching conditions, resulting in 11 pairs of “split” plots. Student concluded that the odds that kiln-dried seeds have a higher yield is approximately 14:1.

Note: 14:1 odds means that there are 14 “unfavorable” outcomes for every 1 “favorable” outcome. In this scenario, we can think of the “favorable” outcome as the difference in yield between kiln-dried and non-kiln dried seeds as being greater than zero.

1. Conduct a t-test using the barley yield data shown below. Can you guess/explain where the 14:1 odds come from?

Data for Question 1

Yield for not kiln-dried seeds (NKD): 1903, 1935, 1910, 2496, 2108, 1961, 2060, 1444, 1612, 1316, 1511

Yield for kiln-dried seeds (KD): 2009, 1915, 2011, 2463, 2180, 1925, 2122, 1482, 1542, 1443, 1535

Note: If interested, the data above appears on page 24 of the 1908 paper.

Not_Dried <- c(1903, 1935, 1910, 2496, 2108,
1961, 2060, 1444, 1612, 1316, 1511)
Kiln_Dried <- c(2009, 1915, 2011, 2463, 2180, 1925,
2122, 1482, 1542, 1443, 1535)

Difference = as.numeric(Kiln_Dried - Not_Dried)
Difference

##  [1] 106 -20 101 -33  72 -36  62  38 -70 127  24

#1st step: Create null and alternative hypothesis

print("Ho: On average, there is no difference in yield between kiln-dried and non-kiln dried seeds. = 0")

## [1] "Ho: On average, there is no difference in yield between kiln-dried and non-kiln dried seeds. = 0"

print("Ha: On average, kiln-dried seeds produced higher yield than non-kiln dried seeds. > 0")

## [1] "Ha: On average, kiln-dried seeds produced higher yield than non-kiln dried seeds. > 0"

#2nd step: Choose appropriate tests (T-Tests, Anova, correlation)
## one-sided/two-sided/paired

print("This will be a one-sided paired t-test (right-tailed)")

## [1] "This will be a one-sided paired t-test (right-tailed)"

#2.5 step: Check for assumptions
#a. Normal distribution
qqnorm(Difference)
qqline(Difference)

print("Passes normal distribution assumption")

## [1] "Passes normal distribution assumption"

#b. Independent observations
print("There is no duplicate observations")

## [1] "There is no duplicate observations"

#c. Check for variance
# Test for equal variances
var.test(Kiln_Dried, Not_Dried)

## 
##  F test to compare two variances
## 
## data:  Kiln_Dried and Not_Dried
## F = 0.94314, num df = 10, denom df = 10, p-value = 0.9281
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.2537501 3.5054415
## sample estimates:
## ratio of variances 
##          0.9431363

# Output includes F statistic and p-value
# If p > 0.05, assume equal variances
print("Has equal variance, passes")

## [1] "Has equal variance, passes"

#3rd Step: Decide on an acceptable error rate (alpha)


print("We will choose alpha = 0.05")

## [1] "We will choose alpha = 0.05"

#4th Step: Compute test statistic/p-value

t.test(Difference, alternative = c("greater"), mu = 0, conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  Difference
## t = 1.6905, df = 10, p-value = 0.06091
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  -2.433766       Inf
## sample estimates:
## mean of x 
##  33.72727

#5th step: Accept/Reject null
print("Our results show that the p-value(0.06091) is greater than our alpha value(0.05). Therefore, we fail to reject the null hypothesis. We conclude that on average, there is no difference between kiln-dried and non-dried seeds.")

## [1] "Our results show that the p-value(0.06091) is greater than our alpha value(0.05). Therefore, we fail to reject the null hypothesis. We conclude that on average, there is no difference between kiln-dried and non-dried seeds."

To explain where 14:1 odds ratio came from,

fav <- sum(Difference > 0)    # number of positive differences (favorable)
non_fav <- sum(Difference < 0)    # number of negative differences (non-favorable)
total <- fav + non_fav

cat("There are ", fav, "favorable outcomes and ", non_fav, "non-favorable outcomes totaling to ", total,". \n
    This comes from the number of positive and negative differences,positive = favorable and negative = non-favorable. \n
    The probability of choosing a favorable outcome is P(X≥7)=0.073 (approx). \n
    Odds by definition is = probability event happens / probability event does not happen. \n
    So an 14:1 odds means 14 unfavorable outcomes for every 1 favorable outcome. 
    The probability of choosing a non-favorable outcome is (1-P(favorable outcome)) divided by the non-favorable outcome NOT happening (P(favorable)). \n
    93% / 7 % = 14. This is where 14:1 odds comes from.")

## There are  7 favorable outcomes and  4 non-favorable outcomes totaling to  11 . 
## 
##     This comes from the number of positive and negative differences,positive = favorable and negative = non-favorable. 
## 
##     The probability of choosing a favorable outcome is P(X≥7)=0.073 (approx). 
## 
##     Odds by definition is = probability event happens / probability event does not happen. 
## 
##     So an 14:1 odds means 14 unfavorable outcomes for every 1 favorable outcome. 
##     The probability of choosing a non-favorable outcome is (1-P(favorable outcome)) divided by the non-favorable outcome NOT happening (P(favorable)). 
## 
##     93% / 7 % = 14. This is where 14:1 odds comes from.

Scenario 2: A shallow aquifer is contaminated by molybdenum leachate from mine tailings. A remediation effort was begun to reduce the molybdenum concentrations in waters leaving the site. Post-remediation concentrations (in µg/L) from 13 wells down-gradient of the remediation process are listed below. Also shown are concentrations at 3 wells up-gradient from, and unaffected by, the remediation process.

Concentrations (in µg/L) from 13 wells down-gradient of the remediation process: 0.850, 0.390, 0.320, 0.300, 0.300, 0.205, 0.200, 0.200, 0.140, 0.140, 0.090, 0.046, 0.035

Concentrations (in µg/L) from 3 wells up-gradient of the remediation process: 6.900, 3.200, 1.700

2. Test whether the concentration ofmolybdenum in the “down-gradient”wells is significantly different (lower) than the concentration of molybdenum in the “up-gradient” wells.]{style=“color:CRIMSON;”}

down_gradient <- c(0.850, 0.390, 0.320, 0.300, 0.300, 0.205, 0.200, 0.200, 0.140,
0.140, 0.090, 0.046, 0.035)
up_gradient <- c(6.900, 3.200, 1.700)


#1st step: Create null and alternative hypothesis

print("Ho: On average, there is no difference in concentrations between the up-gradient and down-gradient wells = 0")

## [1] "Ho: On average, there is no difference in concentrations between the up-gradient and down-gradient wells = 0"

print("Ha: On average, the concentrations in wells down-gradient is less than the concentrations in the up-gradient wells < 0")

## [1] "Ha: On average, the concentrations in wells down-gradient is less than the concentrations in the up-gradient wells < 0"

#2nd step: Choose appropriate tests (T-Tests, Anova, correlation)
## one-sided/two-sided/paired

print("This will be a one-sided t-test (left-tailed)")

## [1] "This will be a one-sided t-test (left-tailed)"

#2.5 step: Check for assumptions
#a. Normal distribution
qqnorm(up_gradient, main = "Normal Q-Q Plot: Up Gradient")
qqline(up_gradient, col = "red")

qqnorm(down_gradient, main = "Normal Q-Q Plot: Down Gradient")
qqline(down_gradient, col = "red")

print("does not pass normality")

## [1] "does not pass normality"

#b. Independent observations
print("There is no duplicate observations")

## [1] "There is no duplicate observations"

#c. Check for variance
# Test for equal variances
var.test(up_gradient, down_gradient)

## 
##  F test to compare two variances
## 
## data:  up_gradient and down_gradient
## F = 160.95, num df = 2, denom df = 12, p-value = 4.309e-09
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##    31.58453 6343.80409
## sample estimates:
## ratio of variances 
##           160.9506

# Output includes F statistic and p-value
# If p > 0.05, assume equal variances
print("does not pass equal variances")

## [1] "does not pass equal variances"

print("We will do Wilcoxon rank sum because it does not pass normality and does not have equal variances.")

## [1] "We will do Wilcoxon rank sum because it does not pass normality and does not have equal variances."

#signed rank test is for paired t-tests not having equal variances

#3rd Step: Decide on an acceptable error rate (alpha)
print("We will choose alpha = 0.05")

## [1] "We will choose alpha = 0.05"

#4th Step: Compute test statistic/p-value
?wilcox.test

## starting httpd help server ... done

wilcox.test(down_gradient, up_gradient, alternative = "less", exact = FALSE, conf.int = TRUE)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  down_gradient and up_gradient
## W = 0, p-value = 0.005206
## alternative hypothesis: true location shift is less than 0
## 95 percent confidence interval:
##       -Inf -1.499996
## sample estimates:
## difference in location 
##              -2.999963

#5th step: Accept/Reject null
print("Our results show that the p-value(0.005206) is less than our alpha value(0.05). Therefore, we have sufficient evidence to reject the null hypothesis. We conclude that on average, the down-gradient wells has lower concentrations of molybdenum leachate than the up-gradient wells.")

## [1] "Our results show that the p-value(0.005206) is less than our alpha value(0.05). Therefore, we have sufficient evidence to reject the null hypothesis. We conclude that on average, the down-gradient wells has lower concentrations of molybdenum leachate than the up-gradient wells."

Scenario 3: Specific conductance was measured on the two forks of the Shenandoah River in Virginia and the resulting data are included in the file Shenandoah.rda (available in CatCourses)

3. Does the North Fork have higher conductance than the South Fork? As part of your analysis, include the following:

A. State the appropriate null and alternative hypotheses to see if conductance values are the same in the two forks.

#1st step: Create null and alternative hypothesis

print("Ho: On average, the specific conductance in the north fork has less than or equal to the south fork of the Shenandoah River in Virginia. = 0")

## [1] "Ho: On average, the specific conductance in the north fork has less than or equal to the south fork of the Shenandoah River in Virginia. = 0"

print("Ha: On average, the north fork has higher specific conductance than the south fork of the Shenandoah River in Virginia. > 0")

## [1] "Ha: On average, the north fork has higher specific conductance than the south fork of the Shenandoah River in Virginia. > 0"

B. Determine whether a parametric or nonparametric test should be used.

#2.5 step: Check for assumptions
#a. Normal distribution
Shenandoah_River <- read.csv("C:/Users/noahs/Downloads/shenandoah.csv")
South <- Shenandoah_River$SOUTH
North <- Shenandoah_River$NORTH

qqnorm(South, main = "Normal Q-Q Plot: South Fork")
qqline(South, col = "red")

hist(South,  freq = FALSE)
x <- seq(min(South), max(South), length = 100)
y <- dnorm(x, mean = mean(South), sd = sd(South))
lines(x, y, col = "blue", lwd = 2)

qqnorm(North, main = "Normal Q-Q Plot: North Fork")
qqline(North, col = "red")

hist(North)

print("does not pass normality")

## [1] "does not pass normality"

#b. Independent observations
print("There is no duplicate observations")

## [1] "There is no duplicate observations"

#c. Check for variance
# Test for equal variances

var.test(South, North)

## 
##  F test to compare two variances
## 
## data:  South and North
## F = 0.79147, num df = 9, denom df = 9, p-value = 0.7332
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.1965895 3.1864460
## sample estimates:
## ratio of variances 
##          0.7914681

# Output includes F statistic and p-value
# If p > 0.05, assume equal variances
print("pass equal variances")

## [1] "pass equal variances"

print("Because it does not pass normality assumption, we will do the Wilcoxon signed rank test because this is a paired dataset.")

## [1] "Because it does not pass normality assumption, we will do the Wilcoxon signed rank test because this is a paired dataset."

print("Each observation in one group (SOUTH) directly corresponds to one observation in the other group (NORTH).")

## [1] "Each observation in one group (SOUTH) directly corresponds to one observation in the other group (NORTH)."

print("There’s a logical link between the two — they’re matched pairs rather than independent samples.")

## [1] "There’s a logical link between the two — they’re matched pairs rather than independent samples."

##MATCHED PAIRS ARE CONSIDERED PAIRED DATASETS

C. Based on your results in Part B, conduct an appropriate comparison test for α=0.05 and report the results.

?wilcox.test

wilcox.test(North,South, paired = TRUE, exact = FALSE ,alternative = "greater")

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  North and South
## V = 52, p-value = 0.007185
## alternative hypothesis: true location shift is greater than 0

#note to self: alternative = "greater" tests:
#Median of𝑋−𝑌> 0
#the variable i put first is the one you are testing as larger.

print("Our results show that the p-value(0.007185) is less than our alpha value(0.05). Therefore, we have sufficient evidence to reject the null hypothesis. We conclude that the North fork has higher conductance than the South Fork of the Shenandoah River.")

## [1] "Our results show that the p-value(0.007185) is less than our alpha value(0.05). Therefore, we have sufficient evidence to reject the null hypothesis. We conclude that the North fork has higher conductance than the South Fork of the Shenandoah River."

D. Estimate the amount by which the forks differ in conductance, regardless of the test outcome.

#because this is not normal, we will compare median instead of mean
med_south <- median(South)
med_north <- median(North)

Ratio <- med_north/med_south
Ratio

## [1] 1.225694

cat("The median ratios shows that the North Fork has a higher specific conductance by",Ratio, "than the South Fork.")

## The median ratios shows that the North Fork has a higher specific conductance by 1.225694 than the South Fork.

E. Illustrate and check the results with a plot.

diffs <- North - South
boxplot(diffs,
        horizontal = TRUE,
        main = "Boxplot of Paired Differences",
        xlab = "North - South")

library(tidyr)
df_long <- pivot_longer(Shenandoah_River,
                        cols = c(NORTH, SOUTH),
                        names_to = "River",
                        values_to = "Conductance")
library(ggplot2)

ggplot(df_long, aes(x = DATE, y = Conductance, color = River)) +
  geom_point(size = 3) +
  geom_line() +
  labs(title = "Conductance Over Time",
       x = "Date",
       y = "Conductance",
       color = "River") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?

Comparison Analysis - ANOVA

The comparison methods we explored in the previous section are well-suited for comparing two groups/variables to each other. If we want to compare more than two-groups, ANOVA is usually the tool we want to use.

Scenario 4: Discharge from paper pulp liquor waste may have contaminated shallow groundwater with caustic, high pH effluent (Robertson and others, 1984). pH samples from 3 groups are indicated below:

Group 1: 7.0, 7.2, 7.5, 7.7, 8.7, 7.8

Group 2: 6.3, 6.9, 7.0, 6.4, 6.8, 6.7

Group 3: 8.4, 7.6, 7.5, 7.4, 9.3, 9.0

Note: One group is known to be uncontaminated

4. Determine whether the pH of samples taken from three sets of piezometers are identical. If not identical, which groups are different from others? Which are contaminated? Be sure to check the normality and equal variance assumptions of ANOVA if using this parametric method.

#1st step: Create null and alternative hypothesis

print("Ho: On average, there is no difference in pH levels between the three groups. = 0")

## [1] "Ho: On average, there is no difference in pH levels between the three groups. = 0"

print("Ha: On average, there is at least one difference in pH levels between the three groups. != 0")

## [1] "Ha: On average, there is at least one difference in pH levels between the three groups. != 0"

grp_1 <- c(7.0, 7.2, 7.5, 7.7, 8.7, 7.8)
grp_2 <- c(6.3, 6.9, 7.0, 6.4, 6.8, 6.7)
grp_3 <- c(8.4, 7.6, 7.5, 7.4, 9.3, 9.0)
pH <- c(grp_1, grp_2, grp_3)
pH

##  [1] 7.0 7.2 7.5 7.7 8.7 7.8 6.3 6.9 7.0 6.4 6.8 6.7 8.4 7.6 7.5 7.4 9.3 9.0

Groups <- c("1","1","1","1","1","1","2","2","2","2","2","2","3","3","3","3","3","3")
Groups

##  [1] "1" "1" "1" "1" "1" "1" "2" "2" "2" "2" "2" "2" "3" "3" "3" "3" "3" "3"

#anova wants 2 columns
#create new dataset
Groups_pH <- data.frame(x = Groups, y = pH)
Groups_pH

##    x   y
## 1  1 7.0
## 2  1 7.2
## 3  1 7.5
## 4  1 7.7
## 5  1 8.7
## 6  1 7.8
## 7  2 6.3
## 8  2 6.9
## 9  2 7.0
## 10 2 6.4
## 11 2 6.8
## 12 2 6.7
## 13 3 8.4
## 14 3 7.6
## 15 3 7.5
## 16 3 7.4
## 17 3 9.3
## 18 3 9.0

#Check for assumptions
#a. Normal distribution
shapiro.test(grp_1) #passes

## 
##  Shapiro-Wilk normality test
## 
## data:  grp_1
## W = 0.92179, p-value = 0.5183

shapiro.test(grp_2) #passes

## 
##  Shapiro-Wilk normality test
## 
## data:  grp_2
## W = 0.92476, p-value = 0.5403

shapiro.test(grp_3) #fails

## 
##  Shapiro-Wilk normality test
## 
## data:  grp_3
## W = 0.86848, p-value = 0.2202

#If p > 0.05 → fail to reject normality → data appear normally distributed
#if p < 0.05 → reject normality → data are not normal
print("does not pass normality")

## [1] "does not pass normality"

#b. Independent observations
print("There is no duplicate observations")

## [1] "There is no duplicate observations"

#c. Check for variance
# Test for equal variances
sd1 <- sd(grp_1)
sd2 <- sd(grp_2)
sd3 <- sd(grp_3)
sd1/sd2 #does not pass

## [1] 2.137947

sd1/sd3 #does not pass

## [1] 0.7246709

sd2/sd3 #passes

## [1] 0.3389565

#needs 0.2 to 0.5 to pass
print("does not pass equal variances")

## [1] "does not pass equal variances"

grpResults <- aov(y ~ x, data=Groups_pH)

summary(grpResults)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## x            2  7.074   3.537   9.572 0.00209 **
## Residuals   15  5.543   0.370                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TukeyHSD(grpResults, conf.level = 0.99)

##   Tukey multiple comparisons of means
##     99% family-wise confidence level
## 
## Fit: aov(formula = y ~ x, data = Groups_pH)
## 
## $x
##           diff        lwr       upr     p adj
## 2-1 -0.9666667 -2.1668418 0.2335085 0.0370799
## 3-1  0.5500000 -0.6501751 1.7501751 0.2896426
## 3-2  1.5166667  0.3164915 2.7168418 0.0016489

print("Our results show that the p-value (0.00209) is less than our alpha value (0.05). Therefore, we have sufficient evidence to reject the null hypothesis. We conclude that, on average, there is at least one difference in pH levels amongst the three groups. Tukey post-hoc comparisons reveal that Group 2 had significantly lower pH than Group 1, and Group 3 had significantly higher pH than Group 2. However, Group 3 was not significantly different from Group 1. This suggests that Group 2 is the one that stands out from the other groups, having lower pH on average; therefore, group 2 is the uncontaminated group.")

## [1] "Our results show that the p-value (0.00209) is less than our alpha value (0.05). Therefore, we have sufficient evidence to reject the null hypothesis. We conclude that, on average, there is at least one difference in pH levels amongst the three groups. Tukey post-hoc comparisons reveal that Group 2 had significantly lower pH than Group 1, and Group 3 had significantly higher pH than Group 2. However, Group 3 was not significantly different from Group 1. This suggests that Group 2 is the one that stands out from the other groups, having lower pH on average; therefore, group 2 is the uncontaminated group."

plot(TukeyHSD(grpResults, conf.level = 0.99))

Scenario 5: Feth and others (1964) measured chloride concentrations of spring waters draining three different rock types in the Sierra Nevada—granodiorites, quartz monzonites, and undifferentiated granitic rocks. The data are in feth.rda (available in CatCourses).

5. Determine whether chloride concentrations differ among springs emanating from the three rock types. If differences occur, which rock types differ from the others? Check the assumptions of ANOVA before using it.

rocks <- read.csv("C:/Users/noahs/Downloads/feth.csv")

#1st step: Create null and alternative hypothesis

print("Ho: On average, there is no difference in chloride concentrations between the three rock types in the Sierra Nevada. = 0")

## [1] "Ho: On average, there is no difference in chloride concentrations between the three rock types in the Sierra Nevada. = 0"

print("Ha: On average, there is at least one difference in chloride concentrations between the three rock types in the Sierra Nevada. != 0")

## [1] "Ha: On average, there is at least one difference in chloride concentrations between the three rock types in the Sierra Nevada. != 0"

print("We will use ANOVA to compare if chloride concentrations differ in the three different rock types in the Sierra Nevada.")

## [1] "We will use ANOVA to compare if chloride concentrations differ in the three different rock types in the Sierra Nevada."

grano <- c(6,0.5,0.4,0.7,0.8,6.0,5.0,0.6,1.2,0.3,0.2,0.5,0.5,10.0,0.2,0.2,1.7,3.0)
monzon <- c(1.0,0.2,1.2,1.0,0.3, 0.1,0.1, 0.4, 3.2, 0.3, 0.4, 1.8, 0.9, 0.1, 0.2, 0.3, 0.5)
ephemer <- c(0.0, 0.9, 0.1, 0.1, 0.5, 0.2, 0.3, 0.2, 0.1, 2.0, 1.8, 0.1, 0.6, 0.2, 0.4)

#Check for assumptions
#a. Normal distribution
shapiro.test(grano) #fails

## 
##  Shapiro-Wilk normality test
## 
## data:  grano
## W = 0.70995, p-value = 0.0001045

shapiro.test(monzon) #fails

## 
##  Shapiro-Wilk normality test
## 
## data:  monzon
## W = 0.73618, p-value = 0.0003063

shapiro.test(ephemer) #fails

## 
##  Shapiro-Wilk normality test
## 
## data:  ephemer
## W = 0.72342, p-value = 0.0004476

#If p > 0.05 → fail to reject normality → data appear normally distributed
#if p < 0.05 → reject normality → data are not normal

print("does not pass normality")

## [1] "does not pass normality"

#b. Independent observations
print("There is no duplicate observations. This passes the assumption for independence")

## [1] "There is no duplicate observations. This passes the assumption for independence"

#c. Check for variance
# Test for equal variances
library(car)

## Loading required package: carData

Chloride <- c(6.0,0.5,0.4,0.7,0.8,6.0,5.0,0.6,1.2,0.3,0.2,0.5,0.5,10.0,0.2,0.2,1.7,3.0,
              1.0,0.2,1.2,1.0,0.3,0.1,0.1,0.4,3.2,0.3,0.4,1.8,0.9,0.1,0.2,0.3,0.5,
              0.0,0.9,0.1,0.1,0.5,0.2,0.3,0.2,0.1,2.0,1.8,0.1,0.6,0.2,0.4)
rocktype <- c(rep("grano",18), rep("monzon",17), rep("ephemer",15))
leveneTest(Chloride ~ rocktype)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value  Pr(>F)  
## group  2  3.4347 0.04053 *
##       47                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#p > 0.05: variances are equal → ANOVA assumption met.
#p < 0.05: variances are unequal → use Welch’s ANOVA or a nonparametric test.
#the p-value is 0.04053 < 0.05
print("does not pass equal variances")

## [1] "does not pass equal variances"

#The Kruskal–Wallis test is the nonparametric alternative to one-way ANOVA.
#It compares the median ranks of Chloride concentrations among rock types.

kruskal.test(Chloride ~ rocktype)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Chloride by rocktype
## Kruskal-Wallis chi-squared = 7.3078, df = 2, p-value = 0.02589

boxplot(Chloride ~ rocktype,
        main = "Chloride Concentration by Rock Type",
        xlab = "Rock Type", ylab = "Chloride (mg/L)",
        col = c("lightblue", "lightgreen", "lightpink"))

library(FSA)

## Registered S3 methods overwritten by 'FSA':
##   method       from
##   confint.boot car 
##   hist.boot    car

## ## FSA v0.10.0. See citation('FSA') if used in publication.
## ## Run fishR() for related website and fishR('IFAR') for related book.

## 
## Attaching package: 'FSA'

## The following object is masked from 'package:car':
## 
##     bootCase

dunnTest(Chloride ~ rocktype, data = rocks, method = "holm")

## Warning: rocktype was coerced to a factor.

## Dunn (1964) Kruskal-Wallis multiple comparison

##   p-values adjusted with the Holm method.

##         Comparison         Z     P.unadj      P.adj
## 1  ephemer - grano -2.661538 0.007778462 0.02333539
## 2 ephemer - monzon -1.033875 0.301194457 0.30119446
## 3   grano - monzon  1.668349 0.095246436 0.19049287

cat("Our results show that the p-value(0.02589) is less than our alpha value(0.05) \n Therefore, we have sufficient evidence to reject the null hypothesis. We conclude that on average, there is at least one difference in chloride concentrations amongst the three rock groups. \n While the boxplot shows that grano tends to have higher chloride concentrations than the other rock types, the Kruskal–Wallis test confirmed overall group differences (p = 0.0259), and Dunn post-hoc tests showed that grano was significantly higher than ephemer (p = 0.0233). No other pairwise differences were significant.")

## Our results show that the p-value(0.02589) is less than our alpha value(0.05) 
##  Therefore, we have sufficient evidence to reject the null hypothesis. We conclude that on average, there is at least one difference in chloride concentrations amongst the three rock groups. 
##  While the boxplot shows that grano tends to have higher chloride concentrations than the other rock types, the Kruskal–Wallis test confirmed overall group differences (p = 0.0259), and Dunn post-hoc tests showed that grano was significantly higher than ephemer (p = 0.0233). No other pairwise differences were significant.

Scenario 6: The number of Corbicula (bottom fauna) per square meter for a site on the Tennessee River was presented by Jensen (1973). The data from Jensen’s Strata 1 are found in Corb.rda (in CatCourses).

6. Test the Corbicula data to determine whether either season or year are significant factors for the number of organisms observed. If significant effects are found, test for which levels of the factor differ from others.

#This will be a two way anova
#do transformation for this one, sqrt

#1st step: Create null 

cat("We will use a two way ANOVA to determine if either seaon or year are significant factors for the number of Corbicula observed.")

## We will use a two way ANOVA to determine if either seaon or year are significant factors for the number of Corbicula observed.

cat("There is no difference in group means at any level of the first independent variable. \n
Ho = There is no difference in average number of Corbicula observed for any season.")

## There is no difference in group means at any level of the first independent variable. 
## 
## Ho = There is no difference in average number of Corbicula observed for any season.

cat("There is no difference in group means at any level of the second independent
variable. \n 
Ho = There is no difference in average number of Corbicula observed for any year")

## There is no difference in group means at any level of the second independent
## variable. 
##  
## Ho = There is no difference in average number of Corbicula observed for any year

cat("The effect of one independent variable does not depend on the effect of the other
independent variable (a.k.a. no interaction effect). \n 
Ho = There is no interaction effect between year and season")

## The effect of one independent variable does not depend on the effect of the other
## independent variable (a.k.a. no interaction effect). 
##  
## Ho = There is no interaction effect between year and season

#Check for assumptions
#a. Normal distribution
#In two-way ANOVA, you don’t test the normality of each group separately. Instead, you test the normality of the residuals — the differences between your observed values and the values predicted by the model.
#That’s because ANOVA assumes that residuals (errors) are normally distributed, not necessarily the raw data.(from chatGPT)

Corb <- read.csv("C:/Users/noahs/Downloads/Corb.csv")

# 1) Make sure factors are coded properly
Corb$yr   <- factor(Corb$yr)
Corb$seas <- factor(Corb$seas, levels = c("Winter","Spring","Summer","Fall"))  
# optional ordering

#main effects + interaction
m <- aov(Corbicula ~ yr * seas, data = Corb)
anova_res <- summary(m)

#test for normality
res <- residuals(m)
cat("\nShapiro–Wilk test for normality of residuals:\n")

## 
## Shapiro–Wilk test for normality of residuals:

print(shapiro.test(res))

## 
##  Shapiro-Wilk normality test
## 
## data:  res
## W = 0.90448, p-value = 0.004534

#p > 0.05 → Residuals are approximately normal (normality passes).
#p < 0.05 → Residuals significantly deviate from normality (normality fails).
print("The p-value is 0.004534 < 0.05, fails normality assumption.")

## [1] "The p-value is 0.004534 < 0.05, fails normality assumption."

print("We will try transformations first.")

## [1] "We will try transformations first."

m_log <- aov(log(Corbicula + 1) ~ yr * seas, data = Corb)
shapiro.test(residuals(m_log))

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_log)
## W = 0.96004, p-value = 0.2155

print("The p-value is 0.2155 > 0.05, passes normality assumption when doing log transformation.")

## [1] "The p-value is 0.2155 > 0.05, passes normality assumption when doing log transformation."

#b. Independent observations
print("There is no duplicate observations. This passes the assumption for independence")

## [1] "There is no duplicate observations. This passes the assumption for independence"

#c. Check for variance
# Test for equal variances
plot(m_log, which = 1)

#no clear pattern or shape, passes homoscedasticity


#we will proceed with two,way anova using the log transformation
Corb$logCorb <- log(Corb$Corbicula + 1)
m_log <- aov(logCorb ~ yr * seas, data = Corb)
summary(m_log)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## yr           2 14.053   7.026  15.043 5.83e-05 ***
## seas         3 16.701   5.567  11.918 5.63e-05 ***
## yr:seas      6  7.378   1.230   2.633   0.0418 *  
## Residuals   24 11.210   0.467                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

print("Log-transformed ANOVA results revealed significant effects of year (p = 5.83e-05) and season (p = 5.63e-05) on Corbicula abundance. The year × season interaction was also significant (p = 0.0418), indicating that seasonal patterns differed among years.")

## [1] "Log-transformed ANOVA results revealed significant effects of year (p = 5.83e-05) and season (p = 5.63e-05) on Corbicula abundance. The year × season interaction was also significant (p = 0.0418), indicating that seasonal patterns differed among years."

TukeyHSD(m_log, which = "yr")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = logCorb ~ yr * seas, data = Corb)
## 
## $yr
##                 diff       lwr        upr     p adj
## 1970-1969 -1.0297205 -1.726503 -0.3329383 0.0031720
## 1971-1969 -1.4953481 -2.192130 -0.7985659 0.0000484
## 1971-1970 -0.4656276 -1.162410  0.2311545 0.2375378

TukeyHSD(m_log, which = "seas")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = logCorb ~ yr * seas, data = Corb)
## 
## $seas
##                     diff        lwr        upr     p adj
## Spring-Winter -1.1412444 -2.0300117 -0.2524772 0.0084142
## Summer-Winter -1.7536768 -2.6424441 -0.8649096 0.0000759
## Fall-Winter   -1.5664922 -2.4552595 -0.6777250 0.0003236
## Summer-Spring -0.6124324 -1.5011996  0.2763348 0.2542222
## Fall-Spring   -0.4252478 -1.3140150  0.4635194 0.5597012
## Fall-Summer    0.1871846 -0.7015826  1.0759518 0.9368314

print("Tukey post-hoc comparisons indicated that Corbicula abundance in 1969 was significantly higher than in both 1970 (p = 0.00317) and 1971 (p = 4.84 × 10⁻⁵). No significant difference was found between 1970 and 1971.Seasonally, Winter had significantly higher Corbicula abundance than Spring, Summer, and Fall (all p < 0.01). No significant differences were found among Spring, Summer, and Fall.")

## [1] "Tukey post-hoc comparisons indicated that Corbicula abundance in 1969 was significantly higher than in both 1970 (p = 0.00317) and 1971 (p = 4.84 × 10⁻⁵). No significant difference was found between 1970 and 1971.Seasonally, Winter had significantly higher Corbicula abundance than Spring, Summer, and Fall (all p < 0.01). No significant differences were found among Spring, Summer, and Fall."

Correlation Analysis

Scenario 7: The Ogallala aquifer was investigated to estimate relations between uranium and other concentrations in its waters. Concentrations of uranium and total dissolved solids can be found in urantds2.RData (available in CatCourses)

7a. Are uranium concentrations correlated with total dissolved solids in the groundwater samples? If so, describe the strength of the relation.

#1st step: Create null 


urantsds <- read.csv("C:/Users/noahs/Downloads/urantds2.csv")
uranium <- urantsds$URANIUM
tds <- urantsds$TDS

#we will want to confirm that the relationships between uranium and TDS are relatively linear.
plot(uranium, tds)

hist(uranium)

hist(tds)

plot(tds, uranium)

#these are both skewed. we will try transformations

log_uranium <- log(uranium)
sqrt_uranium <- sqrt(uranium)
hist(log_uranium)

log_tds <- log(tds)
hist(log_tds)

plot(log_uranium, log_tds)

print("Ho: There is no correlation between uranium and total dissolved solids = 0")

## [1] "Ho: There is no correlation between uranium and total dissolved solids = 0"

print("Ha: There is a correlation between uranium and total dissolved solids != 0")

## [1] "Ha: There is a correlation between uranium and total dissolved solids != 0"

print("This will be a two-tailed correlation test.")

## [1] "This will be a two-tailed correlation test."

cor.test(uranium, tds)

## 
##  Pearson's product-moment correlation
## 
## data:  uranium and tds
## t = 1.4343, df = 42, p-value = 0.1589
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08633273  0.48204255
## sample estimates:
##      cor 
## 0.216086

print("Our results show that the p-value(0.1589) is greater than our alpha value(0.05). Therefore, we fail to reject the null hypothesis and conclude that there is a positive but weak correlation between uranium and total dissolved solids from the correlation coefficient.")

## [1] "Our results show that the p-value(0.1589) is greater than our alpha value(0.05). Therefore, we fail to reject the null hypothesis and conclude that there is a positive but weak correlation between uranium and total dissolved solids from the correlation coefficient."

7b. Is the relation significant for α=0.1?

print("Our results show that the p-value(0.1589) is greater than alpha = 0.1. Therefore, we still fail to reject the null hypothesis and conclude that there is a positive but weak correlation between uranium and total dissolved solids from the correlation coefficient.")

## [1] "Our results show that the p-value(0.1589) is greater than alpha = 0.1. Therefore, we still fail to reject the null hypothesis and conclude that there is a positive but weak correlation between uranium and total dissolved solids from the correlation coefficient."

Challenge Questions

Following are two “challenge questions.” Please complete at least ONE(1) of these questions. If you complete both, you will be eligible for up to a 10% bonus on this assignment.

Challenge 1

Stelzer et al. (2012) conducted fecal-indicator quantitative polymerase chain reaction (qPCR) assays to determine if three laboratories were providing similar results. Splits of 15 river samples and 6 fecal-source samples were sent to each of the 3 laboratories. Data for AllBac, a general fecal indicator assay are found in allbac.rda.

It can be imported into R using function source().

CQ1. Using this information and dataset, determine whether at least one laboratory provided higher/lower results than the others. If so, determine which labs differ from others. Please include the following in your analysis:

A. Conduct a nonparametric test to determine if at least one laboratory is different from the others. In particular, use either the Friedman test OR the Aligned-ranks test (ART)

all_Bac <- read.csv("C:/Users/noahs/Downloads/allbac.csv")

# Hypotheses
cat("H0: All laboratories produce equal AllBac results.\n")

## H0: All laboratories produce equal AllBac results.

cat("HA: At least one laboratory produces different AllBac results.\n\n")

## HA: At least one laboratory produces different AllBac results.

library(PMCMRplus)

# Friedman Test controlling for sample
friedman_result <- friedman.test(AllBac ~ LabNo | Sample, data = all_Bac)

cat("=== Friedman Test ===\n")

## === Friedman Test ===

print(friedman_result)

## 
##  Friedman rank sum test
## 
## data:  AllBac and LabNo and Sample
## Friedman chi-squared = 35.179, df = 2, p-value = 2.295e-08

# Interpretation of Friedman
if(friedman_result$p.value < 0.05){
  cat("\nThe Friedman test returned a p-value of", signif(friedman_result$p.value, 3),
      "< 0.05, so we reject the null hypothesis.\n")
  cat("→ At least one laboratory differs from the others.\n\n")
} else {
  cat("\nThe Friedman test returned a p-value of", signif(friedman_result$p.value, 3),
      ">= 0.05, so we fail to reject the null hypothesis.\n")
  cat("→ No evidence of differences between laboratories.\n\n")
}

## 
## The Friedman test returned a p-value of 2.3e-08 < 0.05, so we reject the null hypothesis.
## → At least one laboratory differs from the others.

# Post-hoc Nemenyi
cat("=== Post-hoc Nemenyi Tests ===\n")

## === Post-hoc Nemenyi Tests ===

posthoc <- frdAllPairsNemenyiTest(AllBac ~ LabNo | Sample, data = all_Bac)
print(posthoc)

## 
##  Pairwise comparisons using Nemenyi-Wilcoxon-Wilcox all-pairs test for a two-way balanced complete block design

## data: AllBac and LabNo and Sample

##       Lab 1   Lab 2  
## Lab 2 0.53    -      
## Lab 3 4.6e-05 2.0e-07

## 
## P value adjustment method: single-step

print("A Friedman test was conducted to determine whether laboratories reported different AllBac values while controlling for variation across samples. The test was significant (p < 0.001), indicating that at least one laboratory differed from the others.

Post-hoc Nemenyi pairwise comparisons showed that Lab 3 differed significantly from Lab 1 (p = 4.6 × 10⁻⁵) and Lab 2 (p = 2.0 × 10⁻⁷). However, no significant difference was detected between Lab 1 and Lab 2 (p = 0.53).")

## [1] "A Friedman test was conducted to determine whether laboratories reported different AllBac values while controlling for variation across samples. The test was significant (p < 0.001), indicating that at least one laboratory differed from the others.\n\nPost-hoc Nemenyi pairwise comparisons showed that Lab 3 differed significantly from Lab 1 (p = 4.6 × 10⁻⁵) and Lab 2 (p = 2.0 × 10⁻⁷). However, no significant difference was detected between Lab 1 and Lab 2 (p = 0.53)."

B. With the samples as blocks, perform a two-way ANOVA without replication to decide whether at least one laboratory provided higher/lower results than the others.

# Ensure factors
all_Bac$LabNo  <- as.factor(all_Bac$LabNo)
all_Bac$Sample <- as.factor(all_Bac$Sample)

model_block <- aov(AllBac ~ LabNo + Sample, data = all_Bac)
summary(model_block)

##             Df   Sum Sq Mean Sq F value   Pr(>F)    
## LabNo        2  2774148 1387074   4.752   0.0141 *  
## Sample      20 58100576 2905029   9.951 6.25e-10 ***
## Residuals   40 11676748  291919                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

print("A randomized block ANOVA was conducted with Sample as a blocking factor to compare laboratory AllBac measurements. The effect of Lab was statistically significant, p = 0.014, indicating that at least one laboratory produced different results.")

## [1] "A randomized block ANOVA was conducted with Sample as a blocking factor to compare laboratory AllBac measurements. The effect of Lab was statistically significant, p = 0.014, indicating that at least one laboratory produced different results."

library(PMCMRplus)

# ---- Friedman: Do SAMPLES differ, blocking by LAB? ----
#blocking is another word for controlling
friedman_sample <- friedman.test(AllBac ~ Sample | LabNo, data = all_Bac)

cat("=== Friedman Test: Sample differences (blocking by Lab) ===\n")

## === Friedman Test: Sample differences (blocking by Lab) ===

print(friedman_sample)

## 
##  Friedman rank sum test
## 
## data:  AllBac and Sample and LabNo
## Friedman chi-squared = 59.205, df = 20, p-value = 9.447e-06

if (friedman_sample$p.value < 0.05) {
  cat("\nResult: p-value =", signif(friedman_sample$p.value, 3),
      " < 0.05 → Reject H0.\n",
      "Interpretation: At least one SAMPLE differs from the others (controlling for Lab).\n\n")
} else {
  cat("\nResult: p-value =", signif(friedman_sample$p.value, 3),
      " >= 0.05 → Fail to reject H0.\n",
      "Interpretation: No evidence of differences between SAMPLES (controlling for Lab).\n\n")
}

## 
## Result: p-value = 9.45e-06  < 0.05 → Reject H0.
##  Interpretation: At least one SAMPLE differs from the others (controlling for Lab).

# ---- Post-hoc Nemenyi: pairwise among Samples ----
cat("=== Post-hoc Nemenyi: Pairwise Sample comparisons (blocked by Lab) ===\n")

## === Post-hoc Nemenyi: Pairwise Sample comparisons (blocked by Lab) ===

posthoc_sample <- frdAllPairsNemenyiTest(AllBac ~ Sample | LabNo, data = all_Bac)
print(posthoc_sample)

## 
##  Pairwise comparisons using Nemenyi-Wilcoxon-Wilcox all-pairs test for a two-way balanced complete block design

## data: AllBac and Sample and LabNo

##    1     2     3     4     5     6     7     8     9     10    11    12   
## 2  1.000 -     -     -     -     -     -     -     -     -     -     -    
## 3  0.756 0.478 -     -     -     -     -     -     -     -     -     -    
## 4  0.478 0.232 1.000 -     -     -     -     -     -     -     -     -    
## 5  0.838 0.583 1.000 1.000 -     -     -     -     -     -     -     -    
## 6  0.799 0.530 1.000 1.000 1.000 -     -     -     -     -     -     -    
## 7  0.838 0.583 1.000 1.000 1.000 1.000 -     -     -     -     -     -    
## 8  1.000 1.000 0.855 0.609 0.914 0.887 0.914 -     -     -     -     -    
## 9  1.000 1.000 0.989 0.914 0.996 0.993 0.996 1.000 -     -     -     -    
## 10 1.000 1.000 0.290 0.117 0.379 0.333 0.379 1.000 1.000 -     -     -    
## 11 1.000 0.999 0.999 0.986 1.000 1.000 1.000 1.000 1.000 0.991 -     -    
## 12 1.000 1.000 0.989 0.914 0.996 0.993 0.996 1.000 1.000 1.000 1.000 -    
## 13 1.000 1.000 0.556 0.290 0.660 0.609 0.660 1.000 1.000 1.000 1.000 1.000
## 14 1.000 1.000 0.914 0.709 0.954 0.936 0.954 1.000 1.000 1.000 1.000 1.000
## 15 1.000 0.997 1.000 0.995 1.000 1.000 1.000 1.000 1.000 0.979 1.000 1.000
## 16 0.999 0.983 1.000 0.999 1.000 1.000 1.000 1.000 1.000 0.926 1.000 1.000
## 17 0.107 0.034 1.000 1.000 1.000 1.000 1.000 0.168 0.478 0.013 0.733 0.478
## 18 0.290 0.117 1.000 1.000 1.000 1.000 1.000 0.403 0.778 0.053 0.936 0.778
## 19 0.183 0.065 1.000 1.000 1.000 1.000 1.000 0.270 0.634 0.027 0.855 0.634
## 20 0.968 0.838 1.000 1.000 1.000 1.000 1.000 0.989 1.000 0.660 1.000 1.000
## 21 0.991 0.926 1.000 1.000 1.000 1.000 1.000 0.998 1.000 0.799 1.000 1.000
##    13    14    15    16    17    18    19    20   
## 2  -     -     -     -     -     -     -     -    
## 3  -     -     -     -     -     -     -     -    
## 4  -     -     -     -     -     -     -     -    
## 5  -     -     -     -     -     -     -     -    
## 6  -     -     -     -     -     -     -     -    
## 7  -     -     -     -     -     -     -     -    
## 8  -     -     -     -     -     -     -     -    
## 9  -     -     -     -     -     -     -     -    
## 10 -     -     -     -     -     -     -     -    
## 11 -     -     -     -     -     -     -     -    
## 12 -     -     -     -     -     -     -     -    
## 13 -     -     -     -     -     -     -     -    
## 14 1.000 -     -     -     -     -     -     -    
## 15 0.999 1.000 -     -     -     -     -     -    
## 16 0.991 1.000 1.000 -     -     -     -     -    
## 17 0.047 0.232 0.819 0.926 -     -     -     -    
## 18 0.154 0.504 0.968 0.993 1.000 -     -     -    
## 19 0.088 0.356 0.914 0.974 1.000 1.000 -     -    
## 20 0.887 0.996 1.000 1.000 0.996 1.000 0.999 -    
## 21 0.954 0.999 1.000 1.000 0.983 0.999 0.996 1.000

## 
## P value adjustment method: single-step

# Optional: print matrix of p-values
if (!is.null(posthoc_sample$p.value)) {
  cat("\nPairwise p-values (Samples):\n")
  print(round(posthoc_sample$p.value, 4))
}

## 
## Pairwise p-values (Samples):
##         1      2      3      4      5      6      7      8      9     10     11
## 2  1.0000     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 3  0.7557 0.4783     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 4  0.4783 0.2323 1.0000     NA     NA     NA     NA     NA     NA     NA     NA
## 5  0.8376 0.5825 1.0000 1.0000     NA     NA     NA     NA     NA     NA     NA
## 6  0.7987 0.5302 1.0000 1.0000 1.0000     NA     NA     NA     NA     NA     NA
## 7  0.8376 0.5825 1.0000 1.0000 1.0000 1.0000     NA     NA     NA     NA     NA
## 8  1.0000 1.0000 0.8553 0.6086 0.9142 0.8871 0.9142     NA     NA     NA     NA
## 9  1.0000 1.0000 0.9889 0.9142 0.9960 0.9932 0.9960 1.0000     NA     NA     NA
## 10 1.0000 1.0000 0.2902 0.1174 0.3791 0.3331 0.3791 1.0000 0.9996     NA     NA
## 11 1.0000 0.9992 0.9994 0.9860 0.9999 0.9997 0.9999 1.0000 1.0000 0.9913     NA
## 12 1.0000 1.0000 0.9889 0.9142 0.9960 0.9932 0.9960 1.0000 1.0000 0.9996 1.0000
## 13 1.0000 1.0000 0.5564 0.2902 0.6598 0.6086 0.6598 1.0000 1.0000 1.0000 0.9997
## 14 1.0000 1.0000 0.9142 0.7091 0.9543 0.9364 0.9543 1.0000 1.0000 1.0000 1.0000
## 15 0.9999 0.9970 0.9999 0.9948 1.0000 0.9999 1.0000 1.0000 1.0000 0.9785 1.0000
## 16 0.9992 0.9826 1.0000 0.9994 1.0000 1.0000 1.0000 0.9999 1.0000 0.9259 1.0000
## 17 0.1069 0.0339 1.0000 1.0000 0.9999 1.0000 0.9999 0.1677 0.4783 0.0130 0.7328
## 18 0.2902 0.1174 1.0000 1.0000 1.0000 1.0000 1.0000 0.4031 0.7777 0.0527 0.9364
## 19 0.1825 0.0651 1.0000 1.0000 1.0000 1.0000 1.0000 0.2700 0.6344 0.0269 0.8553
## 20 0.9681 0.8376 1.0000 1.0000 1.0000 1.0000 1.0000 0.9889 0.9999 0.6598 1.0000
## 21 0.9913 0.9259 1.0000 1.0000 1.0000 1.0000 1.0000 0.9978 1.0000 0.7987 1.0000
##        12     13     14     15     16     17     18     19 20
## 2      NA     NA     NA     NA     NA     NA     NA     NA NA
## 3      NA     NA     NA     NA     NA     NA     NA     NA NA
## 4      NA     NA     NA     NA     NA     NA     NA     NA NA
## 5      NA     NA     NA     NA     NA     NA     NA     NA NA
## 6      NA     NA     NA     NA     NA     NA     NA     NA NA
## 7      NA     NA     NA     NA     NA     NA     NA     NA NA
## 8      NA     NA     NA     NA     NA     NA     NA     NA NA
## 9      NA     NA     NA     NA     NA     NA     NA     NA NA
## 10     NA     NA     NA     NA     NA     NA     NA     NA NA
## 11     NA     NA     NA     NA     NA     NA     NA     NA NA
## 12     NA     NA     NA     NA     NA     NA     NA     NA NA
## 13 1.0000     NA     NA     NA     NA     NA     NA     NA NA
## 14 1.0000 1.0000     NA     NA     NA     NA     NA     NA NA
## 15 1.0000 0.9988 1.0000     NA     NA     NA     NA     NA NA
## 16 1.0000 0.9913 1.0000 1.0000     NA     NA     NA     NA NA
## 17 0.4783 0.0473 0.2323 0.8187 0.9259     NA     NA     NA NA
## 18 0.7777 0.1539 0.5042 0.9681 0.9932 1.0000     NA     NA NA
## 19 0.6344 0.0881 0.3557 0.9142 0.9737 1.0000 1.0000     NA NA
## 20 0.9999 0.8871 0.9960 1.0000 1.0000 0.9960 0.9999 0.9994 NA
## 21 1.0000 0.9543 0.9994 1.0000 1.0000 0.9826 0.9994 0.9960  1

# Pretty p-value function
p_fmt <- function(p) ifelse(p < .001, "< 0.001", paste0("= ", signif(p, 3)))

# ---- Summary sentence ----
cat("A Friedman test evaluated differences among SAMPLES while blocking by Lab.\n",
    "Results were ", ifelse(friedman_sample$p.value < 0.05, "significant", "not significant"),
    " (p ", p_fmt(friedman_sample$p.value), ").\n",
    "No post-hoc sample comparisons were statistically significant after Nemenyi adjustment.\n",
    sep = "")

## A Friedman test evaluated differences among SAMPLES while blocking by Lab.
## Results were significant (p < 0.001).
## No post-hoc sample comparisons were statistically significant after Nemenyi adjustment.

C. If you determine that there is a difference among the labs, use a multiple comparison test (e.g., Tukey, Holm’s) to determine which labs differ from others

tk <- TukeyHSD(model_block, "LabNo")

cat("Tukey post-hoc tests showed that Lab 3 produced significantly higher AllBac values than Lab 1 (p =",
    signif(tk$LabNo["Lab 3-Lab 1", "p adj"], 3), ") and Lab 2 (p =",
    signif(tk$LabNo["Lab 3-Lab 2", "p adj"], 3), "). No significant difference was detected between Lab 1 and Lab 2 (p =",
    signif(tk$LabNo["Lab 2-Lab 1", "p adj"], 3), ").\n")

## Tukey post-hoc tests showed that Lab 3 produced significantly higher AllBac values than Lab 1 (p = 0.03 ) and Lab 2 (p = 0.0277 ). No significant difference was detected between Lab 1 and Lab 2 (p = 0.999 ).

D. Briefly summarize the results and conclusions of your analysis

Note: Section 7.8.2 in the Helsel et al textbook should be helpful if you need more information on Friedman test

Note: Section 7.8.5 in the Helsel et al textbook should be helpful if you need more information on ART

Note: Section 7.8.7 in the Helsel et al textbook should be helpful if you need more information on Two-factor ANOVA without replication

A Friedman test was conducted to determine whether laboratories reported different AllBac values while controlling for variation across samples. The test was significant (p < 0.001), indicating that at least one laboratory differed from the others. Post-hoc Nemenyi pairwise comparisons showed that Lab 3 produced significantly higher AllBac values than both Lab 1 (p = 4.6 × 10⁻⁵) and Lab 2 (p = 2.0 × 10⁻⁷), while Lab 1 and Lab 2 did not differ significantly (p = 0.53).

To further verify these results using a parametric approach, a randomized block ANOVA was conducted with Sample as a blocking factor. The effect of Lab was statistically significant (p = 0.014), again indicating laboratory differences. Tukey post-hoc tests confirmed that Lab 3 yielded significantly higher AllBac concentrations than Lab 1 (p = 0.03) and Lab 2 (p = 0.0277), with no difference between Lab 1 and Lab 2 (p = 0.999).

A separate Friedman test evaluated differences among samples while blocking by laboratory. Although the overall effect was statistically significant (p < 0.001), Nemenyi post-hoc pairwise comparisons revealed no statistically significant differences between individual samples after multiple-comparison adjustment.

Challenge 2

Harmel et al. [2006] compiled a cross-system data set to study the effects of agriculture activities on water quality. The data in the study were mostly field scale experiments that measured nutrients (P, N) leaving a field. In particular, they measured TP loading (TPLoad, in kg/ha), land use (LU), tillage method (Tillage), and fertilizer application methods (FAppMethd). These data can be found in agWQdata.csv (in CatCourses)

CQ2. Using this information and dataset, determine whether tillage methods affect TP loading. Please include the following in your analysis:

A. Estimate the mean TP loading for each tillage method

ag_data <- read.csv("C:/Users/noahs/Downloads/agWQdata.csv")
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:car':
## 
##     recode

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

mean_TP <- tapply(ag_data$TPLoad, ag_data$Tillage, mean, na.rm = TRUE)

mean_conservation  <- mean_TP["Conservation"]
mean_conventional  <- mean_TP["Conventional"]
mean_notill        <- mean_TP["No Till"]
mean_pasture       <- mean_TP["Pasture"]

mean_conservation

## Conservation 
##     2.536976

mean_conventional

## Conventional 
##     4.153237

mean_notill

##  No Till 
## 1.028211

mean_pasture

##   Pasture 
## 0.6519302

B. Explore two different data transformations:(1.) log TP loading (log(TPLoad)) and (2) the square root of the square root of TP loading (TPLoad ^ 0.25). Briefly discuss whether a transformation transformation seems necessary. If so, which transformation seems most appropriate.

TPLoad <- ag_data$TPLoad
log_TPLoad   <- log(ag_data$TPLoad)
sqrt_TPLoad   <- sqrt(ag_data$TPLoad)

shapiro.test(TPLoad)

## 
##  Shapiro-Wilk normality test
## 
## data:  TPLoad
## W = 0.53498, p-value < 2.2e-16

shapiro.test(log_TPLoad)

## 
##  Shapiro-Wilk normality test
## 
## data:  log_TPLoad
## W = 0.98735, p-value = 0.04651

shapiro.test(sqrt_TPLoad)

## 
##  Shapiro-Wilk normality test
## 
## data:  sqrt_TPLoad
## W = 0.83442, p-value = 1.192e-14

hist(TPLoad)

hist(log_TPLoad)

hist(sqrt_TPLoad)

#p > 0.05 → residuals are approximately normal
#p < 0.05 → residuals deviate from normality

print("From the histograms, log(TPLoad) seems like the best to use.")

## [1] "From the histograms, log(TPLoad) seems like the best to use."

C. Use statistical test(s) to study whether different tillage methods resulted in different TP loading (state the null and alternative hypothesis, conduct the test, report the result).

cat("H0: There is no difference in mean TP loading among tillage methods.\n")

## H0: There is no difference in mean TP loading among tillage methods.

cat("HA: At least one tillage method has a different mean TP loading than the others.")

## HA: At least one tillage method has a different mean TP loading than the others.

#Testing assumptions for a one way anova

#normality
#does not pass shapiro test. we will do nonparametric 

#independent observations
#passes 

#homoscedasticity
library(car)
leveneTest(TPLoad ~ Tillage, data = ag_data)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   3  8.7523 1.661e-05 ***
##       218                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#p > 0.05 → variances are equal → homoscedasticity satisfied
#p < 0.05 → variances differ → assumption violated
#fails

#We will proceed with the Kruskal-wallis test.
kruskal.test(TPLoad ~ Tillage, data = ag_data)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  TPLoad by Tillage
## Kruskal-Wallis chi-squared = 59.354, df = 3, p-value = 8.076e-13

cat("A Kruskal–Wallis rank-sum test was conducted to determine whether TP loading differed among tillage methods. The test was highly significantp = 8.08E-13), indicating that at least one tillage method had a different distribution of TP loading.")

## A Kruskal–Wallis rank-sum test was conducted to determine whether TP loading differed among tillage methods. The test was highly significantp = 8.08E-13), indicating that at least one tillage method had a different distribution of TP loading.

dunnTest(TPLoad ~ Tillage, data = ag_data, method = "holm")

## Warning: Tillage was coerced to a factor.

## Dunn (1964) Kruskal-Wallis multiple comparison

##   p-values adjusted with the Holm method.

##                    Comparison          Z      P.unadj        P.adj
## 1 Conservation - Conventional -2.4850854 1.295203e-02 3.885610e-02
## 2      Conservation - No Till  0.2806557 7.789745e-01 7.789745e-01
## 3      Conventional - No Till  2.1810727 2.917804e-02 5.835608e-02
## 4      Conservation - Pasture  3.8075203 1.403672e-04 7.018360e-04
## 5      Conventional - Pasture  7.6485613 2.032404e-14 1.219442e-13
## 6           No Till - Pasture  2.5433227 1.098038e-02 4.392150e-02

cat("Dunn post-hoc tests showed that Conventional tillage resulted in significantly higher TP loading than Conservation and Pasture, while Pasture had significantly lower TP loading than all other tillage methods. No significant difference was found between Conservation and No-Till, and the difference between Conventional and No-Till was not significant at α = 0.05.")

## Dunn post-hoc tests showed that Conventional tillage resulted in significantly higher TP loading than Conservation and Pasture, while Pasture had significantly lower TP loading than all other tillage methods. No significant difference was found between Conservation and No-Till, and the difference between Conventional and No-Till was not significant at α = 0.05.

D. Briefly summarize the results and conclusions of your analysis. In other words, based on your analysis, are there differences in TP loading under different tilllage methods?.

Our analysis examined whether TP loading differed among tillage methods. Because the assumptions of ANOVA were violated (non-normality and unequal variances), we used a Kruskal–Wallis rank-sum test. The test indicated a highly significant difference in TP loading across tillage methods p = 8.08E-13. Dunn post-hoc comparisons showed that Conventional tillage produced significantly higher TP loading than both Conservation and Pasture. Pasture systems had significantly lower TP loading than all other tillage methods. No significant difference was detected between Conservation and No-Till, and the difference between Conventional and No-Till was not significant at α = 0.05.

Note: Section 2.4.3 in the Qian textbook and Sections 1.7 and 5.3.4 in the Helsel et al textbook should be helpful if you need a refresher on transformations

HW 2: Statistical Inference and Hypothesis Tests

Professor Sam Markolf | ES 207 Fall 2025

Overview

Objectives

Prologue

Comparison Analysis - Rank-sum Test (aka Wilcoxon-Mann-Whitney rank-sum test) and t-tests

Comparison Analysis - ANOVA

Correlation Analysis

Challenge Questions