The data for this recipie is from a study in a research article on The effects of irrigation water salinity and N rate on distribution of cotton root length density (RLD) at different soil depths in 2011 and 2012. For this recipie, we will only examine the soil depth of 0-20cm. The data for this can be found below, and the entire dataset can be found in table 4 of the link above.
Read in data set:
data <- read.csv("~/RLD.csv")
View(data)
str(data)
## 'data.frame': 12 obs. of 4 variables:
## $ N.rate : Factor w/ 2 levels "N0","N360": 1 1 1 2 2 2 1 1 1 2 ...
## $ Water.salinity: Factor w/ 3 levels "BW","FW","SW": 2 1 3 2 1 3 2 1 3 2 ...
## $ Year : int 2011 2011 2011 2011 2011 2011 2012 2012 2012 2012 ...
## $ RLD : num 0.708 0.611 0.652 0.527 0.663 0.604 0.692 0.591 0.611 0.531 ...
This experiment has 3 factors, N.rate, Water.salinity, and Year (again, the article also included soil depth but our data does not). N.rate has 2 levels (N0 and N360) Water Salinity has 3 levels (FW, BW, and SW) Year has 2 levels (2011 and 2012)
data$N.rate=as.factor(data$N.rate)
data$Water.salinity=as.factor(data$Water.salinity)
data$Year=as.factor(data$Year)
The continuous variable in this experiment is root length density
The response variable for this experiment is also root length density
There are 12 observations, 3 factors, and 1 response.
Structure of the dataset:
str(data)
## 'data.frame': 12 obs. of 4 variables:
## $ N.rate : Factor w/ 2 levels "N0","N360": 1 1 1 2 2 2 1 1 1 2 ...
## $ Water.salinity: Factor w/ 3 levels "BW","FW","SW": 2 1 3 2 1 3 2 1 3 2 ...
## $ Year : Factor w/ 2 levels "2011","2012": 1 1 1 1 1 1 2 2 2 2 ...
## $ RLD : num 0.708 0.611 0.652 0.527 0.663 0.604 0.692 0.591 0.611 0.531 ...
data
## N.rate Water.salinity Year RLD
## 1 N0 FW 2011 0.708
## 2 N0 BW 2011 0.611
## 3 N0 SW 2011 0.652
## 4 N360 FW 2011 0.527
## 5 N360 BW 2011 0.663
## 6 N360 SW 2011 0.604
## 7 N0 FW 2012 0.692
## 8 N0 BW 2012 0.591
## 9 N0 SW 2012 0.611
## 10 N360 FW 2012 0.531
## 11 N360 BW 2012 0.636
## 12 N360 SW 2012 0.556
A two-factor ANOVA will be performed with N.rate and Water.salinity as the factors. The null hypothesis is that the variation in RLD cannot be explained by anything other than randomization, or in this case, cannot be explained b N.rate and Water.salinity.
This design was choosen to see the effects of N.rate and Water.salinity on RLD.
The field experiment used a factorial, completely randomized block design.
There are repeated measures, each N.rate and Water.salinity combination is measured twice (in 2011 and 2012)
The data is blocked for year.
Summary Statistics:
summary(data)
## N.rate Water.salinity Year RLD
## N0 :6 BW:4 2011:6 Min. :0.527
## N360:6 FW:4 2012:6 1st Qu.:0.582
## SW:4 Median :0.611
## Mean :0.615
## 3rd Qu.:0.655
## Max. :0.708
Histogram of Root Length Density:
hist(data$RLD)
Mean of Root Length Density by N.rate, water salinity and year:
tapply(data$RLD, data$N.rate, mean)
## N0 N360
## 0.6442 0.5862
tapply(data$RLD, data$Water.salinity, mean)
## BW FW SW
## 0.6252 0.6145 0.6058
tapply(data$RLD, data$Year, mean)
## 2011 2012
## 0.6275 0.6028
RLD is highest in N0, BW and 2011.
Boxplots:
boxplot(data$RLD~data$N.rate, xlab="N Rate", ylab="Root Length Density")
boxplot(data$RLD~data$Water.salinity, xlab="Water salinity", ylab="Root Length Density")
There appears to be a difference of RLD levels between the N.rate For Water.salinity, it appears BW may result in a higher RLD, while there may be no significant difference between FW and SW. It is obvious that FW results in a wide range of RLD values.
#Model 1: Anova model for effects of N.Rate and Water.salinity blocked by year:
model1=aov(data$RLD~data$N.rate*data$Water.salinity+data$Year)
summary(model1)
## Df Sum Sq Mean Sq F value Pr(>F)
## data$N.rate 1 0.01009 0.01009 58.2 0.00062 ***
## data$Water.salinity 2 0.00076 0.00038 2.2 0.20647
## data$Year 1 0.00183 0.00183 10.5 0.02287 *
## data$N.rate:data$Water.salinity 2 0.02415 0.01208 69.6 0.00022 ***
## Residuals 5 0.00087 0.00017
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Again, the null hypothesis is that the variation in RLD cannot be explained by anything other than randomization. Since we got low p-values for N.rate and the interation between N.rate and Water.salinity, this means it is likely that N.rate alone as well as the interation between N.rate and Water.salinity explains the variation in RLD.
However, it is unlikely that Water.salinity alone explains the variation in RLD.
Since the p-value for year is small, it is likely year has an effect on the RLD response. But since we blocked for year, we do not care about its effect (nuisance variable).
#Model 2: Anova model for effect of N.Rate alone blocked by year:
model2=aov(data$RLD~data$N.rate+data$Year)
summary(model2)
## Df Sum Sq Mean Sq F value Pr(>F)
## data$N.rate 1 0.01009 0.01009 3.52 0.093 .
## data$Year 1 0.00183 0.00183 0.64 0.445
## Residuals 9 0.02578 0.00286
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this model, only randomization can explain the variation in RLD.
#Tukey test for Model 1
tukey1<-TukeyHSD(model1)
plot(tukey1)
TukeyHSD(model1)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = data$RLD ~ data$N.rate * data$Water.salinity + data$Year)
##
## $`data$N.rate`
## diff lwr upr p adj
## N360-N0 -0.058 -0.07755 -0.03845 6e-04
##
## $`data$Water.salinity`
## diff lwr upr p adj
## FW-BW -0.01075 -0.04106 0.01956 0.5262
## SW-BW -0.01950 -0.04981 0.01081 0.1858
## SW-FW -0.00875 -0.03906 0.02156 0.6417
##
## $`data$Year`
## diff lwr upr p adj
## 2012-2011 -0.02467 -0.04422 -0.005116 0.0229
##
## $`data$N.rate:data$Water.salinity`
## diff lwr upr p adj
## N360:BW-N0:BW 0.0485 -0.007696 0.104696 0.0857
## N0:FW-N0:BW 0.0990 0.042804 0.155196 0.0045
## N360:FW-N0:BW -0.0720 -0.128196 -0.015804 0.0185
## N0:SW-N0:BW 0.0305 -0.025696 0.086696 0.3309
## N360:SW-N0:BW -0.0210 -0.077196 0.035196 0.6335
## N0:FW-N360:BW 0.0505 -0.005696 0.106696 0.0743
## N360:FW-N360:BW -0.1205 -0.176696 -0.064304 0.0018
## N0:SW-N360:BW -0.0180 -0.074196 0.038196 0.7452
## N360:SW-N360:BW -0.0695 -0.125696 -0.013304 0.0214
## N360:FW-N0:FW -0.1710 -0.227196 -0.114804 0.0003
## N0:SW-N0:FW -0.0685 -0.124696 -0.012304 0.0227
## N360:SW-N0:FW -0.1200 -0.176196 -0.063804 0.0019
## N0:SW-N360:FW 0.1025 0.046304 0.158696 0.0039
## N360:SW-N360:FW 0.0510 -0.005196 0.107196 0.0717
## N360:SW-N0:SW -0.0515 -0.107696 0.004696 0.0692
When the line for a factor pair crosses zero, that indicates we fail to reject the null hypothesis that there is no difference in the mean of that combination of pairs. For example, the difference of mean RLD of the N360:BW pair and the N0:BW pair cannot be explained by anything other than randomization . This can also be inferred from the p-value (.0857).
From either the tukey plot or output, we can tell that there are some treatments are significantly different from eachtother (low p-value or the confidence interval does not cross zero). It is possible that these treatement differences can explain the variation in the response (RLD).
#Tukey Test for differences in water salinity treatment alone:
tukey2<-TukeyHSD(aov(data$RLD~data$Water.salinity))
plot(tukey2)
This plot suggests there is no difference in means between the 3 levels of water salinity
Visually inspect normality of data:
qqnorm(residuals(model1))
qqline(residuals(model1))
The data appears to be normal! We will double check the assumption of the ANOVA model with a Shapiro-Wilks Normality test:
shapiro.test(residuals(model1))
##
## Shapiro-Wilk normality test
##
## data: residuals(model1)
## W = 0.9802, p-value = 0.9844
Null hypothesis: The data came from a normally distributed population. There is a high probability that the data came from a normally distributed population (pvalue=.9844). We fail to reject the null.
plot(fitted(model1),residuals(model1))
The residuals are scattered and equally distributed about the zero axis indicating a good model fit.
#Interaction Plot
We create an interaction plot to view the interactions between the factors.
interaction.plot(data$N.rate, data$Water.salinity, data$RLD)
There is interaction among all the factors, evident by the different slopes and intersecting lines
Although the data for this experiment tured out to be normal, if the test had come back not normal, Kruskal-Wallis one-way analysis of variance by Rank Sum Test could be performed:
kruskal.test(data$RLD~data$N.rate)
##
## Kruskal-Wallis rank sum test
##
## data: data$RLD by data$N.rate
## Kruskal-Wallis chi-squared = 2.573, df = 1, p-value = 0.1087
kruskal.test(data$RLD~data$Water.salinity)
##
## Kruskal-Wallis rank sum test
##
## data: data$RLD by data$Water.salinity
## Kruskal-Wallis chi-squared = 0.2412, df = 2, p-value = 0.8864
kruskal.test(data$RLD~data$Year)
##
## Kruskal-Wallis rank sum test
##
## data: data$RLD by data$Year
## Kruskal-Wallis chi-squared = 0.5211, df = 1, p-value = 0.4704
The null hypothesis of the kruskal test is that the mean ranks of the samples from the populations are expected to the same (this is not the same as saying the populations have identical means). Due to the high p-values, we fail to reject the null, but, since our classic anova model assumptions were fulfilled, we can use the results reached in the previous sections.
None used.
http://www.sciencedirect.com/science/article/pii/S0378429014002524 (see Section 3.2 and table 4)
All included above.