Recipie for RCBD

Ali Svoobda

RPI

10/16/14 V.1

1. Setting

System under test

The data for this recipie is from a study in a research article on The effects of irrigation water salinity and N rate on distribution of cotton root length density (RLD) at different soil depths in 2011 and 2012. For this recipie, we will only examine the soil depth of 0-20cm. The data for this can be found below, and the entire dataset can be found in table 4 of the link above.

Read in data set:

data <- read.csv("~/RLD.csv")
View(data)
str(data)

## 'data.frame':    12 obs. of  4 variables:
##  $ N.rate        : Factor w/ 2 levels "N0","N360": 1 1 1 2 2 2 1 1 1 2 ...
##  $ Water.salinity: Factor w/ 3 levels "BW","FW","SW": 2 1 3 2 1 3 2 1 3 2 ...
##  $ Year          : int  2011 2011 2011 2011 2011 2011 2012 2012 2012 2012 ...
##  $ RLD           : num  0.708 0.611 0.652 0.527 0.663 0.604 0.692 0.591 0.611 0.531 ...

Factors and Levels

This experiment has 3 factors, N.rate, Water.salinity, and Year (again, the article also included soil depth but our data does not). N.rate has 2 levels (N0 and N360) Water Salinity has 3 levels (FW, BW, and SW) Year has 2 levels (2011 and 2012)

data$N.rate=as.factor(data$N.rate)
data$Water.salinity=as.factor(data$Water.salinity)
data$Year=as.factor(data$Year)

Continuous Variables

The continuous variable in this experiment is root length density

Response Variables

The response variable for this experiment is also root length density

The Data: How is it organized and what does it look like?

There are 12 observations, 3 factors, and 1 response.

Structure of the dataset:

str(data)

## 'data.frame':    12 obs. of  4 variables:
##  $ N.rate        : Factor w/ 2 levels "N0","N360": 1 1 1 2 2 2 1 1 1 2 ...
##  $ Water.salinity: Factor w/ 3 levels "BW","FW","SW": 2 1 3 2 1 3 2 1 3 2 ...
##  $ Year          : Factor w/ 2 levels "2011","2012": 1 1 1 1 1 1 2 2 2 2 ...
##  $ RLD           : num  0.708 0.611 0.652 0.527 0.663 0.604 0.692 0.591 0.611 0.531 ...

data

##    N.rate Water.salinity Year   RLD
## 1      N0             FW 2011 0.708
## 2      N0             BW 2011 0.611
## 3      N0             SW 2011 0.652
## 4    N360             FW 2011 0.527
## 5    N360             BW 2011 0.663
## 6    N360             SW 2011 0.604
## 7      N0             FW 2012 0.692
## 8      N0             BW 2012 0.591
## 9      N0             SW 2012 0.611
## 10   N360             FW 2012 0.531
## 11   N360             BW 2012 0.636
## 12   N360             SW 2012 0.556

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

A two-factor ANOVA will be performed with N.rate and Water.salinity as the factors. The null hypothesis is that the variation in RLD cannot be explained by anything other than randomization, or in this case, cannot be explained b N.rate and Water.salinity.

What is the Rationale for this design?

This design was choosen to see the effects of N.rate and Water.salinity on RLD.

Randomize: What is the Randomization Scheme?

The field experiment used a factorial, completely randomized block design.

Replicate: Are there replicates and/or repeated measures?

There are repeated measures, each N.rate and Water.salinity combination is measured twice (in 2011 and 2012)

Block: Did you use blocking in the design?

The data is blocked for year.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

Summary Statistics:

summary(data)

##   N.rate  Water.salinity   Year        RLD       
##  N0  :6   BW:4           2011:6   Min.   :0.527  
##  N360:6   FW:4           2012:6   1st Qu.:0.582  
##           SW:4                    Median :0.611  
##                                   Mean   :0.615  
##                                   3rd Qu.:0.655  
##                                   Max.   :0.708

Histogram of Root Length Density:

hist(data$RLD)

plot of chunk unnamed-chunk-5

Mean of Root Length Density by N.rate, water salinity and year:

tapply(data$RLD, data$N.rate, mean)

##     N0   N360 
## 0.6442 0.5862

tapply(data$RLD, data$Water.salinity, mean)

##     BW     FW     SW 
## 0.6252 0.6145 0.6058

tapply(data$RLD, data$Year, mean)

##   2011   2012 
## 0.6275 0.6028

RLD is highest in N0, BW and 2011.

Boxplots:

boxplot(data$RLD~data$N.rate, xlab="N Rate", ylab="Root Length Density")

plot of chunk unnamed-chunk-7

boxplot(data$RLD~data$Water.salinity, xlab="Water salinity", ylab="Root Length Density")

plot of chunk unnamed-chunk-7

There appears to be a difference of RLD levels between the N.rate For Water.salinity, it appears BW may result in a higher RLD, while there may be no significant difference between FW and SW. It is obvious that FW results in a wide range of RLD values.

Testing

#Model 1: Anova model for effects of N.Rate and Water.salinity blocked by year:

model1=aov(data$RLD~data$N.rate*data$Water.salinity+data$Year)
summary(model1)

##                                 Df  Sum Sq Mean Sq F value  Pr(>F)    
## data$N.rate                      1 0.01009 0.01009    58.2 0.00062 ***
## data$Water.salinity              2 0.00076 0.00038     2.2 0.20647    
## data$Year                        1 0.00183 0.00183    10.5 0.02287 *  
## data$N.rate:data$Water.salinity  2 0.02415 0.01208    69.6 0.00022 ***
## Residuals                        5 0.00087 0.00017                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Again, the null hypothesis is that the variation in RLD cannot be explained by anything other than randomization. Since we got low p-values for N.rate and the interation between N.rate and Water.salinity, this means it is likely that N.rate alone as well as the interation between N.rate and Water.salinity explains the variation in RLD.

However, it is unlikely that Water.salinity alone explains the variation in RLD.

Since the p-value for year is small, it is likely year has an effect on the RLD response. But since we blocked for year, we do not care about its effect (nuisance variable).

#Model 2: Anova model for effect of N.Rate alone blocked by year:

model2=aov(data$RLD~data$N.rate+data$Year)
summary(model2)

##             Df  Sum Sq Mean Sq F value Pr(>F)  
## data$N.rate  1 0.01009 0.01009    3.52  0.093 .
## data$Year    1 0.00183 0.00183    0.64  0.445  
## Residuals    9 0.02578 0.00286                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this model, only randomization can explain the variation in RLD.

#Tukey test for Model 1

tukey1<-TukeyHSD(model1)
plot(tukey1)

plot of chunk unnamed-chunk-10

TukeyHSD(model1)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = data$RLD ~ data$N.rate * data$Water.salinity + data$Year)
## 
## $`data$N.rate`
##           diff      lwr      upr p adj
## N360-N0 -0.058 -0.07755 -0.03845 6e-04
## 
## $`data$Water.salinity`
##           diff      lwr     upr  p adj
## FW-BW -0.01075 -0.04106 0.01956 0.5262
## SW-BW -0.01950 -0.04981 0.01081 0.1858
## SW-FW -0.00875 -0.03906 0.02156 0.6417
## 
## $`data$Year`
##               diff      lwr       upr  p adj
## 2012-2011 -0.02467 -0.04422 -0.005116 0.0229
## 
## $`data$N.rate:data$Water.salinity`
##                    diff       lwr       upr  p adj
## N360:BW-N0:BW    0.0485 -0.007696  0.104696 0.0857
## N0:FW-N0:BW      0.0990  0.042804  0.155196 0.0045
## N360:FW-N0:BW   -0.0720 -0.128196 -0.015804 0.0185
## N0:SW-N0:BW      0.0305 -0.025696  0.086696 0.3309
## N360:SW-N0:BW   -0.0210 -0.077196  0.035196 0.6335
## N0:FW-N360:BW    0.0505 -0.005696  0.106696 0.0743
## N360:FW-N360:BW -0.1205 -0.176696 -0.064304 0.0018
## N0:SW-N360:BW   -0.0180 -0.074196  0.038196 0.7452
## N360:SW-N360:BW -0.0695 -0.125696 -0.013304 0.0214
## N360:FW-N0:FW   -0.1710 -0.227196 -0.114804 0.0003
## N0:SW-N0:FW     -0.0685 -0.124696 -0.012304 0.0227
## N360:SW-N0:FW   -0.1200 -0.176196 -0.063804 0.0019
## N0:SW-N360:FW    0.1025  0.046304  0.158696 0.0039
## N360:SW-N360:FW  0.0510 -0.005196  0.107196 0.0717
## N360:SW-N0:SW   -0.0515 -0.107696  0.004696 0.0692

When the line for a factor pair crosses zero, that indicates we fail to reject the null hypothesis that there is no difference in the mean of that combination of pairs. For example, the difference of mean RLD of the N360:BW pair and the N0:BW pair cannot be explained by anything other than randomization . This can also be inferred from the p-value (.0857).

From either the tukey plot or output, we can tell that there are some treatments are significantly different from eachtother (low p-value or the confidence interval does not cross zero). It is possible that these treatement differences can explain the variation in the response (RLD).

#Tukey Test for differences in water salinity treatment alone:

tukey2<-TukeyHSD(aov(data$RLD~data$Water.salinity))
plot(tukey2)

plot of chunk unnamed-chunk-11

This plot suggests there is no difference in means between the 3 levels of water salinity

Diagnostics/Model Adequacy Checking

Visually inspect normality of data:

qqnorm(residuals(model1))
qqline(residuals(model1))

plot of chunk unnamed-chunk-12

The data appears to be normal! We will double check the assumption of the ANOVA model with a Shapiro-Wilks Normality test:

shapiro.test(residuals(model1))

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(model1)
## W = 0.9802, p-value = 0.9844

Null hypothesis: The data came from a normally distributed population. There is a high probability that the data came from a normally distributed population (pvalue=.9844). We fail to reject the null.

Fitted vs Residuals Plot

plot(fitted(model1),residuals(model1))

plot of chunk unnamed-chunk-14

The residuals are scattered and equally distributed about the zero axis indicating a good model fit.

#Interaction Plot

We create an interaction plot to view the interactions between the factors.

interaction.plot(data$N.rate, data$Water.salinity, data$RLD)

plot of chunk unnamed-chunk-15

There is interaction among all the factors, evident by the different slopes and intersecting lines

4. Contingencies

Although the data for this experiment tured out to be normal, if the test had come back not normal, Kruskal-Wallis one-way analysis of variance by Rank Sum Test could be performed:

kruskal.test(data$RLD~data$N.rate)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  data$RLD by data$N.rate
## Kruskal-Wallis chi-squared = 2.573, df = 1, p-value = 0.1087

kruskal.test(data$RLD~data$Water.salinity)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  data$RLD by data$Water.salinity
## Kruskal-Wallis chi-squared = 0.2412, df = 2, p-value = 0.8864

kruskal.test(data$RLD~data$Year)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  data$RLD by data$Year
## Kruskal-Wallis chi-squared = 0.5211, df = 1, p-value = 0.4704

The null hypothesis of the kruskal test is that the mean ranks of the samples from the populations are expected to the same (this is not the same as saying the populations have identical means). Due to the high p-values, we fail to reject the null, but, since our classic anova model assumptions were fulfilled, we can use the results reached in the previous sections.

5. References to the Literature

None used.

6. Appendicies

Link to raw data

http://www.sciencedirect.com/science/article/pii/S0378429014002524 (see Section 3.2 and table 4)

Complete R Code

All included above.