Overall Linear Effects Equation

$ Y_{ij}=+i+{ij} $

$H_0:\mu_1=\mu_2=\mu_3=\mu_4$ vs $H_a: \textbf{not all} \mu_i \textbf{are equal}$

response<-c(.34,.12,1.23,.7,1.75,.12,.91,2.94,2.14,2.36,2.86,4.55,6.31,8.37,9.75,6.09,9.82,7.24,17.15,11.82,10.97,17.20,14.35,16.82)
factor<-c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
df<-data.frame(factor,response)

boxplot(response~factor,xlab = "Estimation Method",ylab = "Peak Discharge (cfs)",main = "Boxplot of log10 Peak Discharge (cfs) by Estimation Method")

Normal Q-Q Plot

The data does not appear to be normally distributed, since it’s the boxplots are different in size, moreoever, there shows a spread increasing with the mean (Method 1 < 2 < 3 < 4). Thus the equal-variance assumption is violated (heteroscedasticity).

model<-aov(response~factor,data=df)
summary(model)

##             Df Sum Sq Mean Sq F value  Pr(>F)    
## factor       1  672.0   672.0   149.9 2.7e-11 ***
## Residuals   22   98.6     4.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(model)

One-Way ANOVA on Raw Peak Discharge by Method

The factor effect is not significant: $F=0.874$ with $p=0.36>0.05$ . Thus, at the 5% level you fail to reject$H_0$ and there is no statistical evidence (in this model) that the methods have different mean peak discharges. The diagnostics are acceptable: the Q–Q plot is close to a straight line (residuals approximately normal), the Residuals vs Fitted and Scale–Location plots show no strong trend (variance roughly constant), and the Residuals vs Leverage plot indicates no influential points crossing Cook’s distance. In short, under these model assumptions and this fit, the groups do not differ significantly.

library(MASS)
boxcox(response~factor,data=df)

tran_resp<-(sqrt(response)-1)/.5
df2<-data.frame(tran_resp,factor)
model<-aov(tran_resp~factor,data=df2)
summary(model)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## factor       1  130.1  130.12   251.2 1.62e-13 ***
## Residuals   22   11.4    0.52                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(model)

Analysis on Parametric

Using a variance-stabilizing transform suggested by the Box–Cox profile (peak near a power of about 0.5), the response was square-root transformed, $y^{*} = (\sqrt{y} − 1)/0.5$, and re-fit with a one-way ANOVA. The factor effect is highly significant on the transformed scale (F ≈ 251.2, p ≈ 1.6×10⁻¹³), indicating the four estimation methods do not produce the same mean peak discharge. Diagnostic plots for the transformed model show residuals that are approximately normal, with much more even spread across fitted values and no influential points, so model assumptions are reasonable after transformation. Overall, after stabilizing variance, there is overwhelming evidence of method differences in estimated peak discharge.

kruskal.test(response~factor,data=df)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  response by factor
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

Analysis on Kruskal Test

The Kruskal–Wallis rank-sum test comparing the four methods yields χ² = 6.22 with df = 3 and p = 0.1015. At α = 0.05, this p-value exceeds the threshold, so we fail to reject the null hypothesis that the groups come from the same distribution, for example, their population medians are equal. Thus, the nonparametric analysis does not find a statistically significant difference among methods, and post-hoc pairwise tests are not indicated. Because the test uses ranks and the sample size is small (n = 6 per group), power to detect moderate differences may be limited.

Final Code

response<-c(.34,.12,1.23,.7,1.75,.12,.91,2.94,2.14,2.36,2.86,4.55,6.31,8.37,9.75,6.09,9.82,7.24,17.15,11.82,10.97,17.20,14.35,16.82)
factor<-c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))

boxplot(response~factor,xlab = "Estimation Method",ylab = "Peak Discharge (cfs)",main = "Boxplot of log10 Peak Discharge (cfs) by Estimation Method")

model<-aov(response~factor,data=df)
summary(model)

##             Df Sum Sq Mean Sq F value  Pr(>F)    
## factor       1  672.0   672.0   149.9 2.7e-11 ***
## Residuals   22   98.6     4.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(model)

library(MASS)
boxcox(response~factor,data=df)

tran_resp<-(sqrt(response)-1)/.5
df2<-data.frame(tran_resp,factor)
model<-aov(tran_resp~factor,data=df2)
summary(model)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## factor       1  130.1  130.12   251.2 1.62e-13 ***
## Residuals   22   11.4    0.52                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(model)

kruskal.test(response~factor,data=df)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  response by factor
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

New_Assignment_10

Chelsea Saucedo

2025-10-07