names(hg)
## [1] "Specimen.ID" "Patient.Name" "Collection.Date" "Population"
## [5] "Age" "Sex" "Specimen.Type" "Resulted"
## [9] "Test" "Component" "Value" "Units"
## [13] "Converted"
summary(aov(hg$Converted~hg$Population*hg$Sex))
## Df Sum Sq Mean Sq F value Pr(>F)
## hg$Population 2 0.8893 0.4447 16.430 5.36e-06 ***
## hg$Sex 1 0.0018 0.0018 0.065 0.80
## hg$Population:hg$Sex 2 0.0392 0.0196 0.725 0.49
## Residuals 42 1.1367 0.0271
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## C AWBP EMP
## 0.0112283 0.1192812 0.3560838
## F M
## 0.1866083 0.1741481
## F M
## C 0.006681132 0.01577547
## AWBP 0.082393396 0.15281561
## EMP 0.380786793 0.32079380
From the reviewer:
“This appears to be an unbalanced 2-way factorial with population and sex as fixed effects and (possibly) interaction, for which a general linear model comprising the main effects and interaction would be appropriate. There presently is no mention of interaction or testing the 2-way (location-sex) means. Also note that sex differences that were not consistent across the sites would have been detected as interaction. Please revise this section to more clearly describe what tests were conducted. In addition, ANOVA assumes normality and homogeneity of variance. There is no mention of testing for either, but Fig. 1 shows the skewness and increasing variance with concentration that typify a lognormal distribution, the central tendency of which is best represented by the geometric mean. Such distributions are expected for concentration data, especially for substances that biomagnify (a multiplicative process). Log transformation usually resolves both issues. Unless the data are shown to be normally distributed (or close) and the variances reasonably similar, the analyses should be repeated after log transformation (which may or may not change the conclusions, but which should be done anyway).”
I addressed the reviewer’s concerns by first transforming the data. Because all of the values are <1, I used a log(x+1) transformation; this avoided a transformed vector of all negative numbers.
log.const.conv = log(hg$Converted+1)
I then used a linear model to fit a factorial ANOVA. This estimates parameters for the main effects of each level of population and each level of sex plus terms for the interaction between population and sex.
logmodel = aov(log.const.conv~pop2*hg$Sex)
summary.lm(logmodel)
##
## Call:
## aov(formula = log.const.conv ~ pop2 * hg$Sex)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.22341 -0.04188 -0.00620 0.03185 0.40541
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.006648 0.050762 0.131 0.896
## pop2AWBP 0.072036 0.062170 1.159 0.253
## pop2EMP 0.300137 0.062170 4.828 1.86e-05 ***
## hg$SexM 0.008951 0.071788 0.125 0.901
## pop2AWBP:hg$SexM 0.052971 0.087253 0.607 0.547
## pop2EMP:hg$SexM -0.052140 0.091007 -0.573 0.570
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1135 on 42 degrees of freedom
## Multiple R-squared: 0.5176, Adjusted R-squared: 0.4601
## F-statistic: 9.012 on 5 and 42 DF, p-value: 7.042e-06
These results suggest that there is no interaction with Sex but that the EMP is significantly different from AWBP and WSSP (aka C).