Recipe 4: Completely Randomized Block design

This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).

When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Recipes for the Design of Experiments: Recipe Outline

as of August 28, 2014, superceding the version of August 24. Always use the most recent version.

Completely Randomized Block Design from Literature

Uzma Mushtaque

RPI

October 14 2014 and Version:4.1

1. Setting

The objective of the system under study [1] is to evaluate the effects of saline irrigation water and N (nitrogen fertilizer application) rate on: (1) the spatial distribution of cotton roots and (2) cotton growth, N uptake, and yield.

System under test

Design: The study consisted of a 3 × 2 factorial design with three irrigation water salinities and two N application rates. As stated in [1], ‘A field experiment was established with a 3 × 2 factorial, completely randomized block design: i.e. three levels of irrigation water salinity (fresh water, brackish water, or saline water) and two rates of nitrogen (N) application (0 and 360 kg N/ha). The results showed that cotton root biomass and distribution were significantly affected by water salinity and N fertilization, but not by N-salinity interaction.’

The paper investigates the effect of two factors with 2 and 3 levels each on 3 response variables: 1. Root biomass 2. Root length density 3. Root surface area For this analysis we focus only on one response variable: Root biomass. Also, the paper repeats the analysis for 2 subsequent years therefore there are 2 blocking variables: year(2 years) and soil types (5 types). Water salinity has 3 levels and will be referred to as fresh water (FW), brackish water (BW) and saline water (SW). The N application rates have levels 0 and 360 kg N/ha, (abbreviated as N0and N360, respectively).

Limitaions: The data set provided in the paper is not large. Smaller data sets can result in bias causing misleading inferences about confounding, effect modification and can interact with other biases [2].

Data Summary

A summmary of the given dataset is presented here

# Selecting the data file from the local machine

dataf<- (read.csv(file.choose(), header=T))

summary(dataf)
##   N_rate   Water_Salinity   Soil_Depth  Root_biomass  
##  N0  :15   BW:10          Min.   :1    Min.   :  3.3  
##  N360:15   FW:10          1st Qu.:2    1st Qu.: 10.8  
##            SW:10          Median :3    Median : 17.6  
##                           Mean   :3    Mean   :153.0  
##                           3rd Qu.:4    3rd Qu.: 58.9  
##                           Max.   :5    Max.   :829.6

Factors and Levels

The experiment comprises of two factors (Water Salinity and Nitrogen Fertilizer application rate). Water salinity has 3 levels and is referred to as fresh water (FW), brackish water (BW) and saline water (SW). The N application rates have levels 0 and 360 kg N/ha, (abbreviated as N0and N360, respectively).

Data: A subset of the actual data set.

head(dataf)
##   N_rate Water_Salinity Soil_Depth Root_biomass
## 1     N0             FW          1        768.5
## 2     N0             BW          1        617.0
## 3     N0             SW          1        475.8
## 4   N360             FW          1        829.6
## 5   N360             BW          1        793.3
## 6   N360             SW          1        560.8
tail(dataf)
##    N_rate Water_Salinity Soil_Depth Root_biomass
## 25     N0             FW          5        13.34
## 26     N0             BW          5         5.04
## 27     N0             SW          5         3.33
## 28   N360             FW          5        10.26
## 29   N360             BW          5        10.63
## 30   N360             SW          5         4.65

Continuous variables (if any)

The response variable (Root Biomass measured in kg/ha) is a continuous variable of the type ‘interval’. By this we mean that it can be measured along a continuum and has a numerical value. Also, since zero (0) of the measurement indicates that it is no value so it can be categorized as a continuous variable of the type ‘interval’.[3]

Response variables

The paper under study investigates the effect of the 2 factors with 2 and 3 level each on 3 response variables: 1. Root biomass 2. Root length density 3. Root surface area For this analysis we focus only on one response variable: Root biomass.

From [1]: ‘Shoot and root biomass was determined when the plantsreached the boll stage (20 August 2011 and 15 August 2012). The cotton plants were cut at the soil surface and partitioned into leaves,stems, and bolls. Root samples were collected in 20 cm increments between 0 and 100 cm. The root samples were washed with tapwater on a 0.5 mm mesh screen. The roots were scanned witha flatbed image scanner (Epson Expression/STD 1600 scanner).The images were analyzed using WinRhizo commercial software(Regent Instruments, 2001) to determine root length, root lengthdensity, root average diameter, root surface area, and root volume.’

The Data: How is it organized and what does it look like?

str(dataf)
## 'data.frame':    30 obs. of  4 variables:
##  $ N_rate        : Factor w/ 2 levels "N0","N360": 1 1 1 2 2 2 1 1 1 2 ...
##  $ Water_Salinity: Factor w/ 3 levels "BW","FW","SW": 2 1 3 2 1 3 2 1 3 2 ...
##  $ Soil_Depth    : int  1 1 1 1 1 1 2 2 2 2 ...
##  $ Root_biomass  : num  768 617 476 830 793 ...

Randomization

As stated in [1] : ‘The six treatments were replicated three times in a randomizedcomplete factorial block design.’ The paper gives a complete description of the randomization scheme which includes: preparing the plots (with all 5 soil types) and randomized assignment of fertilizer and irrigation modes.

Replication

The experiment conducted in the paper [1] uses a repeated measure (over 2 years). However, for this analysis we do not use any replication/repeated measure.

Blocking

For this analysis as well as the analyisis presented in [1], different soil depths (measured in cm) are considered as nuisance variables. Hence, blocking is employed to control this variable. There are essentially 5 soil-depth range considered in the paper :0-20 cm, 20-40 cm, 40-60 cm, 60-80 cm, 80-100 cm. For this analysis we represent each of these ‘blocks’ as 1,2,3,4,5 respectively.

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

As in [1] we conduct a two-way ANOVA to examine the influence of the two independent variables (Water Salinity and Nitrogen fertilizer rate) on the root biomass (kg/ha) for each block of soil depth. Therefore we can state the null hypothesis as: ‘In the year 2011 the type of irrigation water used and/or the nitrogen fertilizer application rate employed has no significant effect on the root biomass weight’.In other words if we fail to reject the null hypothesis through our analysis, it would imply that the difference in the mean weight of the root biomass is solely due to randomization.

What is the rationale for this design?

From [1] , ‘Soil salinity is one of the most important abiotic stresses limiting crop production worldwide. Therefore, it is essential to use good management practices when applying saline or brackish irrigation water. Agricultural crops absorb almost all water and nutrients through the root system. Thus, roots play an important role in crop growth and yield formation . Crop yield is closely related to root development. A well-developed root system is essential for obtaining high crop yield. Root growth and development are significantly influenced by soil conditions such as fertility and moisture.’ Therefore, through the experiment conducted here we can draw useful inferences on the effects of two most important factors(water salinity level and nitrogen fertilizer application rate) on the roots of crops.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

par(mfrow=c(1,1))

#Create a histogram 

hist(dataf$Root_biomass, xlim=c(0,1000), ylab = "Root Biomass (kg/ha)")

plot of chunk unnamed-chunk-4

#boxplot

boxplot(Root_biomass~N_rate,data=dataf, xlab="Nitrogen fertilizer", ylab="Root Biomass (kg/ha)")
title("Boxplot of Root Biomass variation due to Nitrogen application")

plot of chunk unnamed-chunk-4

#boxplot

boxplot(Root_biomass~Water_Salinity,data=dataf, xlab="Water Salinity", ylab="Root Biomass (kg/ha)")
title("Boxplot of Root Biomass variation due to Water Salinity")

plot of chunk unnamed-chunk-4

#boxplot

boxplot(Root_biomass~Soil_Depth,data=dataf, xlab="Soil Depth", ylab="Root Biomass (kg/ha)")
title("Boxplot of Root Biomass variation due to Soil Depth")

plot of chunk unnamed-chunk-4

#Define a dataframe

Y<-dataf

#Summary

summary(Y)
##   N_rate   Water_Salinity   Soil_Depth  Root_biomass  
##  N0  :15   BW:10          Min.   :1    Min.   :  3.3  
##  N360:15   FW:10          1st Qu.:2    1st Qu.: 10.8  
##            SW:10          Median :3    Median : 17.6  
##                           Mean   :3    Mean   :153.0  
##                           3rd Qu.:4    3rd Qu.: 58.9  
##                           Max.   :5    Max.   :829.6
# Factor assignment

dataf$N_rate=as.factor(dataf$N_rate)

dataf$Water_Salinity=as.factor(dataf$Water_Salinity)

dataf$Soil_Depth=as.factor(dataf$Soil_Depth)

Testing

A three-way analysis of variance (ANOVA) testing is conducted to analyze the effects of irrigation water salinity, N rate, and soil depths on distribution of cotton root biomass.In a later analysis we also use blocking(eliminating the soil-Depth variable) and conduct a 2-way ANOVA in order to test our null hypothesis (stated above).

Analysis of variance for both the factors: Nitrogen fertilizer and Water Salinity taken separately

model1=aov(Root_biomass~N_rate,data=dataf) 
anova(model1)
## Analysis of Variance Table
## 
## Response: Root_biomass
##           Df  Sum Sq Mean Sq F value Pr(>F)
## N_rate     1    3940    3940    0.05   0.82
## Residuals 28 2145001   76607
model2=aov(Root_biomass~Water_Salinity,data=dataf) 
anova(model2)
## Analysis of Variance Table
## 
## Response: Root_biomass
##                Df  Sum Sq Mean Sq F value Pr(>F)
## Water_Salinity  2   18961    9480    0.12   0.89
## Residuals      27 2129981   78888

Analysis of variance for both the factors taken together

model12=aov(Root_biomass~N_rate*Water_Salinity,data=dataf) 
anova(model12)
## Analysis of Variance Table
## 
## Response: Root_biomass
##                       Df  Sum Sq Mean Sq F value Pr(>F)
## N_rate                 1    3940    3940    0.04   0.83
## Water_Salinity         2   18961    9480    0.11   0.90
## N_rate:Water_Salinity  2     884     442    0.00   1.00
## Residuals             24 2125156   88548
model13=aov(Root_biomass~Soil_Depth*Water_Salinity,data=dataf) 
anova(model13)
## Analysis of Variance Table
## 
## Response: Root_biomass
##                           Df  Sum Sq Mean Sq F value Pr(>F)    
## Soil_Depth                 4 2045769  511442  360.95  1e-14 ***
## Water_Salinity             2   18961    9480    6.69 0.0084 ** 
## Soil_Depth:Water_Salinity  8   62958    7870    5.55 0.0022 ** 
## Residuals                 15   21254    1417                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model14=aov(Root_biomass~N_rate*Soil_Depth,data=dataf) 
anova(model14)
## Analysis of Variance Table
## 
## Response: Root_biomass
##                   Df  Sum Sq Mean Sq F value  Pr(>F)    
## N_rate             1    3940    3940    0.92    0.35    
## Soil_Depth         4 2045769  511442  119.21 1.2e-13 ***
## N_rate:Soil_Depth  4   13430    3358    0.78    0.55    
## Residuals         20   85802    4290                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis of variance for the factor: Nitrogen fertilizer and Water Salinity within each block of soil depth

model123=aov(Root_biomass~N_rate+Soil_Depth,data=dataf) 
anova(model123)
## Analysis of Variance Table
## 
## Response: Root_biomass
##            Df  Sum Sq Mean Sq F value  Pr(>F)    
## N_rate      1    3940    3940    0.95    0.34    
## Soil_Depth  4 2045769  511442  123.70 1.2e-15 ***
## Residuals  24   99232    4135                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model124=aov(Root_biomass~Water_Salinity+Soil_Depth,data=dataf) 
anova(model124)
## Analysis of Variance Table
## 
## Response: Root_biomass
##                Df  Sum Sq Mean Sq F value  Pr(>F)    
## Water_Salinity  2   18961    9480    2.59   0.097 .  
## Soil_Depth      4 2045769  511442  139.69 8.8e-16 ***
## Residuals      23   84212    3661                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model125=aov(Root_biomass~(N_rate*Water_Salinity)+Soil_Depth,data=dataf) 
anova(model125)
## Analysis of Variance Table
## 
## Response: Root_biomass
##                       Df  Sum Sq Mean Sq F value  Pr(>F)    
## N_rate                 1    3940    3940    0.99    0.33    
## Water_Salinity         2   18961    9480    2.39    0.12    
## Soil_Depth             4 2045769  511442  128.85 5.6e-14 ***
## N_rate:Water_Salinity  2     884     442    0.11    0.90    
## Residuals             20   79387    3969                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion from ANOVA

As is evident from the results there is no significant effect observed of either of the factors on the response variable when taken separately. Since soil depth is also an independent variable that is known to have a significant effect on the crop growth patterns (including root biomass) therefore in our first set of analysis we consider it as a factor and find a significant interaction effect(with both the 2 factors) on the response variable. To take this effect out of the analysis and focus only on our two initial factors (i.e. Irrigation water salinity and Nitrogen fertilizer application rate) we consider it as a nuisance variable in our experiment and use blocking to control it. In all the three results we get a p value of less than 0.5 (significance level). In fact the only interaction which is not significant was between N rate and water salinity (p value = 0.90).

Irrigation water salinity had significant effects on total root biomass Nitrogen application significantly increased root biomass. There was no significant interaction effect between irrigation water salinity and N rate on total root biomass in 2011.These results are in line with the results obtained in [1].From these results it is clear that we fail to reject the null hypothesis and the effect on root biomass due to water salinity or nitrogen application cannot be attributed to randomization alone.However, the interaction effect was not found to be statistically significant for the factors Water salinity and nitrogen fertilizer application rate.

Estimation of Parameters

Here is a summary of the data with all factors and levels. As stated above, data set is not too big and therefore imparts a lot of limitations to the analysis [2]

summary(dataf)
##   N_rate   Water_Salinity Soil_Depth  Root_biomass  
##  N0  :15   BW:10          1:6        Min.   :  3.3  
##  N360:15   FW:10          2:6        1st Qu.: 10.8  
##            SW:10          3:6        Median : 17.6  
##                           4:6        Mean   :153.0  
##                           5:6        3rd Qu.: 58.9  
##                                      Max.   :829.6

Interaction Plot

Interaction plot is a pictorial representation of the interaction effect between the two factors under consideration. As we can see there is little or no interaction between various levels.

interaction.plot(dataf$N_rate,dataf$Water_Salinity,dataf$Root_biomass, xlab= "Nitrogen Fertilizer", ylab="Root Biomass", main ="Interaction plot", trace.label="Water Salinity")

plot of chunk unnamed-chunk-9

Diagnostics/Model Adequacy Checking

From ANOVA we can infer that the variation in root biomass due to water salinity and nitrogen salinity taken separately within each soil depth (block)is not due to randomization alone.However, in order to ensure that the given data set meets the assumptions underlying our model we conduct a model adequacy checking.We create a normal quartile-quartile plot and also a Shapiro-Wilk test of normality to check if the data is normally distributed.

# Shapiro Test

shapiro.test(dataf$Root_biomass)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataf$Root_biomass
## W = 0.5761, p-value = 3.774e-08
#Plot

qqnorm(dataf$Root_biomass)
qqline(dataf$Root_biomass)

plot of chunk unnamed-chunk-10

# Diagnostics check for the model with Water Salinity factor

qqnorm(residuals(model123))
qqline(residuals(model123))

plot of chunk unnamed-chunk-10

plot(fitted(model123),residuals(model123))

plot of chunk unnamed-chunk-10

#Diagnostics for the Nitrogen fertilizer factor

qqnorm(residuals(model124))
qqline(residuals(model124))

plot of chunk unnamed-chunk-10

plot(fitted(model124),residuals(model124))

plot of chunk unnamed-chunk-10

#Diagnostics for the interaction effect

qqnorm(residuals(model125))
qqline(residuals(model125))

plot of chunk unnamed-chunk-10

plot(fitted(model125),residuals(model125))

plot of chunk unnamed-chunk-10

From the results we see that the data can be assumed to be normally distributed. This is further strengthened by the fact that the Shapiro test returns a p value which is much less than 0.1.Further the fitted vs. residual model for all the three cases does not depict any extreme variation. This further confirms the validity of ANOVA testing.

Tukey’s test

In order to avoid the chances of discovering false positives in a multivariate statistical test we further perform a Tukey’s HSD test.

#Water salinity and soil depth
TukeyHSD(model123, ordered = FALSE, conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Root_biomass ~ N_rate + Soil_Depth, data = dataf)
## 
## $N_rate
##          diff    lwr   upr  p adj
## N360-N0 22.92 -25.54 71.38 0.3387
## 
## $Soil_Depth
##         diff    lwr     upr  p adj
## 2-1 -619.212 -728.6 -509.84 0.0000
## 3-1 -657.545 -766.9 -548.18 0.0000
## 4-1 -662.848 -772.2 -553.48 0.0000
## 5-1 -666.292 -775.7 -556.92 0.0000
## 3-2  -38.333 -147.7   71.04 0.8378
## 4-2  -43.637 -153.0   65.73 0.7649
## 5-2  -47.080 -156.4   62.29 0.7124
## 4-3   -5.303 -114.7  104.07 0.9999
## 5-3   -8.747 -118.1  100.62 0.9993
## 5-4   -3.443 -112.8  105.93 1.0000
#Nitrogen fertilizer and soil depth
TukeyHSD(model124, ordered = FALSE, conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Root_biomass ~ Water_Salinity + Soil_Depth, data = dataf)
## 
## $Water_Salinity
##         diff     lwr    upr  p adj
## FW-BW  19.70  -48.07 87.469 0.7496
## SW-BW -40.68 -108.45 27.091 0.3080
## SW-FW -60.38 -128.15  7.391 0.0870
## 
## $Soil_Depth
##         diff    lwr     upr  p adj
## 2-1 -619.212 -722.5 -515.94 0.0000
## 3-1 -657.545 -760.8 -554.28 0.0000
## 4-1 -662.848 -766.1 -559.58 0.0000
## 5-1 -666.292 -769.6 -563.02 0.0000
## 3-2  -38.333 -141.6   64.94 0.8061
## 4-2  -43.637 -146.9   59.63 0.7235
## 5-2  -47.080 -150.3   56.19 0.6654
## 4-3   -5.303 -108.6   97.97 0.9999
## 5-3   -8.747 -112.0   94.52 0.9991
## 5-4   -3.443 -106.7   99.83 1.0000
#All factors interaction
TukeyHSD(model125, ordered = FALSE, conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Root_biomass ~ (N_rate * Water_Salinity) + Soil_Depth, data = dataf)
## 
## $N_rate
##          diff    lwr   upr p adj
## N360-N0 22.92 -25.07 70.91 0.331
## 
## $Water_Salinity
##         diff     lwr   upr  p adj
## FW-BW  19.70  -51.58 90.98 0.7667
## SW-BW -40.68 -111.96 30.61 0.3384
## SW-FW -60.38 -131.66 10.91 0.1064
## 
## $Soil_Depth
##         diff    lwr     upr  p adj
## 2-1 -619.212 -728.1 -510.36 0.0000
## 3-1 -657.545 -766.4 -548.70 0.0000
## 4-1 -662.848 -771.7 -554.00 0.0000
## 5-1 -666.292 -775.1 -557.44 0.0000
## 3-2  -38.333 -147.2   70.51 0.8273
## 4-2  -43.637 -152.5   65.21 0.7516
## 5-2  -47.080 -155.9   61.77 0.6976
## 4-3   -5.303 -114.2  103.54 0.9999
## 5-3   -8.747 -117.6  100.10 0.9992
## 5-4   -3.443 -112.3  105.40 1.0000
## 
## $`N_rate:Water_Salinity`
##                    diff     lwr    upr  p adj
## N360:BW-N0:BW    38.192  -87.06 163.44 0.9257
## N0:FW-N0:BW      31.842  -93.41 157.09 0.9644
## N360:FW-N0:BW    45.750  -79.50 171.00 0.8553
## N0:SW-N0:BW     -29.914 -155.16  95.33 0.9727
## N360:SW-N0:BW   -13.250 -138.50 112.00 0.9994
## N0:FW-N360:BW    -6.350 -131.60 118.90 1.0000
## N360:FW-N360:BW   7.558 -117.69 132.81 1.0000
## N0:SW-N360:BW   -68.106 -193.35  57.14 0.5416
## N360:SW-N360:BW -51.442 -176.69  73.81 0.7864
## N360:FW-N0:FW    13.908 -111.34 139.16 0.9992
## N0:SW-N0:FW     -61.756 -187.00  63.49 0.6382
## N360:SW-N0:FW   -45.092 -170.34  80.16 0.8625
## N0:SW-N360:FW   -75.664 -200.91  49.58 0.4311
## N360:SW-N360:FW -59.000 -184.25  66.25 0.6796
## N360:SW-N0:SW    16.664 -108.58 141.91 0.9981
# Plot the results
m1 = TukeyHSD(model123, which="N_rate", ordered = FALSE)
m2 = TukeyHSD(model124, which="Water_Salinity",ordered = FALSE)
m3= TukeyHSD(model125, which="Water_Salinity",ordered = FALSE)
plot(m1)

plot of chunk unnamed-chunk-11

plot(m2)

plot of chunk unnamed-chunk-11

plot(m3)

plot of chunk unnamed-chunk-11

From the plots it is clear that the differences in means of all the levels have zero and so it can be assumed that there is no difference in means and that variance can be attributed to randomization.For example consider the Water salinity levels FW and BW. From the graph we see that the confidence interval for FW-BW contains zero which means that these two levels are not significantly different from one another.

Final Discussion

So specifically for our study we can say that Nitrogen fertilizer application affected root biomass. Irrigation with saline water had a significant impact on root biomass; however, there was generally no significant difference in root biomass between the fresh water and brackish water treatments. Similar inferences can be drawn for other levels for both factors.

References

[1] Root distribution and growth of cotton as affected by drip irrigation with saline water by Wei Min, Huijuan Guo, Guangwei Zhou, Wen Zhang, Lijuan Ma, Jun Ye, Zhenan Hou,Department of Resources and Environmental Science, Shihezi University, Shihezi 832003, Xinjiang, People’s Republic of China

[2] Problems due to Small Samples and Sparse Data in Conditional Logistic Regression Analysis by Sander Greenland,Judith A. Schwartzbaum and William D. Finkle

[3] https://statistics.laerd.com/statistical-guides/types-of-variable.php