This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).
When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
as of August 28, 2014, superceding the version of August 24. Always use the most recent version.
In this study that uses data collected from “Effects of Plant Density on Yield and Canopy Micro Environment in Hybrid Cotton” (Yang et al., 2014), a two-factor, multi-level experiment will be performed to see if either plant density (in m^-2) or the number of bolls per unit of ground area (in m^-2) (or, both via interaction) has a statistically significant effect on the yield of seed cotton (in kg). In the dataset, the factor ‘plants’ refers to plant density and the factor ‘bolls’ refers to the number of bolls per unit of ground area. Additionally, this analysis’ response variable is referred to in the dataset as ‘seed.cotton.yield’, which denotes the total yield of seed cotton in the analysis. [1]
##Load in the Teams Dataset
#Get dataset from Project Documents File
cotton <- read.csv("~/Academics (RPI)/09. Fall 2014/Design of Experiments/02. Wikibook Recipes/Recipe #04/Cotton.csv", header=TRUE)
head(cotton)
## year density plants bolls boll.weight lint.percent seed.cotton.yield
## 1 2009 D1 1.2 55.4 5.4 40.1 2829
## 2 2009 D2 2.1 72.3 5.4 39.8 3638
## 3 2009 D3 3.0 77.8 5.3 40.1 3897
## 4 2009 D4 3.9 71.8 5.3 40.2 3455
## 5 2009 D5 4.8 66.9 5.2 39.8 3145
## 6 2009 D6 5.7 65.2 5.2 39.9 3097
## lint.yield
## 1 1073
## 2 1411
## 3 1502
## 4 1388
## 5 1260
## 6 1225
tail(cotton)
## year density plants bolls boll.weight lint.percent seed.cotton.yield
## 7 2010 D1 1.2 58.2 5.4 40.1 2919
## 8 2010 D2 2.1 73.7 5.3 39.9 3711
## 9 2010 D3 3.0 79.3 5.3 40.2 3959
## 10 2010 D4 3.9 74.5 5.4 39.8 3639
## 11 2010 D5 4.8 69.8 5.3 39.9 3334
## 12 2010 D6 5.7 68.2 5.3 39.8 3255
## lint.yield
## 7 1109
## 8 1417
## 9 1511
## 10 1389
## 11 1263
## 12 1237
This analysis considers two different factors (with each having multiple levels), which include ‘plants’ and ‘bolls’. In the original dataset “cotton”, the factors ‘plants’ and ‘bolls’ are denoted as being numeric variables with no specific categorical levels. However, in carrying out this analysis, these factors will be transformed into categorical variables with properly defined levels. These factors were selected intuitively, since this analysis aims to determine whether or not the plant density (in m^-2) or the number of bolls per unit of ground area (in m^-2) has a statistically significant effect on the yield of seed cotton (in kg).
#Display the summary statistics of "cotton".
summary(cotton)
## year density plants bolls boll.weight
## Min. :2009 D1:2 Min. :1.20 Min. :55.4 Min. :5.20
## 1st Qu.:2009 D2:2 1st Qu.:2.10 1st Qu.:66.5 1st Qu.:5.30
## Median :2010 D3:2 Median :3.45 Median :70.8 Median :5.30
## Mean :2010 D4:2 Mean :3.45 Mean :69.4 Mean :5.32
## 3rd Qu.:2010 D5:2 3rd Qu.:4.80 3rd Qu.:73.9 3rd Qu.:5.40
## Max. :2010 D6:2 Max. :5.70 Max. :79.3 Max. :5.40
## lint.percent seed.cotton.yield lint.yield
## Min. :39.8 Min. :2829 Min. :1073
## 1st Qu.:39.8 1st Qu.:3133 1st Qu.:1234
## Median :39.9 Median :3394 Median :1326
## Mean :40.0 Mean :3406 Mean :1315
## 3rd Qu.:40.1 3rd Qu.:3657 3rd Qu.:1412
## Max. :40.2 Max. :3959 Max. :1511
#Display the names found in "cotton".
names(cotton)
## [1] "year" "density" "plants"
## [4] "bolls" "boll.weight" "lint.percent"
## [7] "seed.cotton.yield" "lint.yield"
#Display the structure of "cotton".
str(cotton)
## 'data.frame': 12 obs. of 8 variables:
## $ year : int 2009 2009 2009 2009 2009 2009 2010 2010 2010 2010 ...
## $ density : Factor w/ 6 levels "D1","D2","D3",..: 1 2 3 4 5 6 1 2 3 4 ...
## $ plants : num 1.2 2.1 3 3.9 4.8 5.7 1.2 2.1 3 3.9 ...
## $ bolls : num 55.4 72.3 77.8 71.8 66.9 65.2 58.2 73.7 79.3 74.5 ...
## $ boll.weight : num 5.4 5.4 5.3 5.3 5.2 5.2 5.4 5.3 5.3 5.4 ...
## $ lint.percent : num 40.1 39.8 40.1 40.2 39.8 39.9 40.1 39.9 40.2 39.8 ...
## $ seed.cotton.yield: int 2829 3638 3897 3455 3145 3097 2919 3711 3959 3639 ...
## $ lint.yield : int 1073 1411 1502 1388 1260 1225 1109 1417 1511 1389 ...
In this dataset, there are a few variables that can be considered to be continuous variables; these variables are the ones which are categorized as being numeric variables. By this standard, the continuous variables that exist in this dataset include ‘plants’ (which refers to plant density), ‘bolls’(which refers to the number of bolls per unit of ground area), ‘boll.weight’ (which refers to the weight of the number of bolls per unit of ground area (in grams)), and ‘lint.percent’ (which refers to the percentage of cotton lint that was observed). Additionally, three variables in this dataset currently exist as integer variables, including the year of plant growth (‘yield’), the total yield of seed cotton in kilograms (‘seed.cotton.yield’), and the total yield of cotton lint in kilograms (‘lint.yield’). Upon manually defining those given numeric values into factor levels that will be considered in this analysis, these factors will be then defined as categorical variables.
This analysis will consider one response variable, ‘seed.cotton.yield’, which denotes the total yield of seed cotton in the analysis.
As a whole, this dataset was generated from Table 1 found in the research publication titled “Effects of Plant Density on Yield and Canopy Micro Environment in Hybrid Cotton” (Yang et al., 2014). It includes data pertaining to cotton yield and its components with different planting densities. It contains 12 observations of 8 variables, which are defined below (Yang et al., 2014) [1]:
year [planting year]
density [plant density represented by treatment notation - D1, 1.2; D2, 2.1; D3, 3.0; D4, 3.9; D5, 4.8; D6, 5.7,]
plants [plant density in m^-2]
bolls [number of bolls per unit of ground area (in m^-2)]
boll.weight [weight of the number of bolls per unit of ground area (in g)]
lint.percent [percentage of cotton lint observed]
seed.cotton.yield [total yield of seed cotton (in kg)]
lint.yield [total yield of cotton lint (in kg)]
According to the research publication being considered in this analysis (Yang et al., 2014), the field was designed for the experiment with 18 plots and comparable size of protecting area around. The six treatments that are being considered in this analysis [(plant density, plants m-2): D1, 1.2; D2, 2.1; D3, 3.0; D4, 3.9; D5, 4.8; D6, 5.7] were arranged with a completely randomized block design containing three distinct replicates. Each of the plots that were considered were 9 meters long and 6 meters wide with 6 individual rows in each plot (with 1 meter row spacing). Therefore, we can assume that this experiment exhibits sufficient randomization.
In this experiment, we are trying to determine whether or not the variation that is observed in the response variable (which corresponds to ‘seed.cotton.yield’ in this analysis) can be explained by the variation existent in the two different treatments of the experiment (which correspond to ‘plants’ and ‘bolls’). Therefore, the null hypothesis that is being tested states that the plant density (in m^-2) and the number of bolls per unit of ground area (in m^-2) do not have have a statistically significant effect on the yield of seed cotton (in kg). In carrying out this analysis, we perform an analysis of variance (ANOVA) for the yield of seed cotton (‘seed.cotton.yield’) to see if there is a significant difference in the means for this response variable when considering the plant density and the number of bolls per unit of ground area, which are contained in this dataset.
The rationale for this design lies primarily in the fact that we’re trying to determine if plant density and the number of bolls per unit of ground area have any effect on the yield of seed cotton. So, since the total yield of seed cotton (in kilograms) is a useful and relevant metric to consider when determining how fruitful cotton crops are, this design of a two-factor, multi-level experiment (as it corresponds to an analysis of variance) was crafted to see if plant density and the number of bolls per unit of ground area have a statistically significant effect on the yield of seed cotton. By performing this analysis, we can hope to receive some insight regarding the different factors that come into play that typically result in large values of seed cotton yield.
Since original assumption claimed that the data that was generated in the original experiment laid out in Yang et al. exhibits randomization via a completely randomized block design, we did not necessarily need to worry about randomizing our data any further to ensure that a completely randomized design is created.
In the original design of the experiment laid out in Yang et al., the six treatments considered in the analysis [(plant density, plants m-2): D1, 1.2; D2, 2.1; D3, 3.0; D4, 3.9; D5, 4.8; D6, 5.7] were arranged with a completely randomized block design containing three distinct replicates (for a total of 18 cumulative measures for all 18 plots).
In order to transform our numeric factor variables into categorical factor variables, blocking was used in this design. In transforming the number of bolls per unit of ground area, ‘bolls’, into a categorical variable, three different levels were defined, designating a low number of bolls, a medium number of bolls, and a high number of bolls per unit of ground area. These different levels were determined upon calculating the first and third quartiles of ‘bolls’ data in the dataset “cotton” (see numetric levels in R code below). In transforming the plant density, ‘plants’, into a cetegorical variable, the different numeric/integer values were simply converted into individually defined levels. Therefore, in the dataset, factor ‘plants’ now has six distinct levels (“1.2”, “2.1”, “3.0”, “3.9”, “4.8”, and “5.7”).
#Transform 'bolls' into categorical variables (Low, Medium, and High), categorize them as factors, and display their resulting levels.
cotton$bolls[cotton$bolls > 0 & cotton$bolls <= 66.48] = "Low"
cotton$bolls[cotton$bolls > 66.48 & cotton$bolls <= 73.90] = "Medium"
cotton$bolls[cotton$bolls > 73.90 & cotton$bolls <= 79.30] = "High"
cotton$bolls = as.factor(cotton$bolls)
levels(cotton$bolls)
## [1] "High" "Low" "Medium"
#Categorize 'plants' as a factor and display its resulting levels.
cotton$plants = as.factor(cotton$plants)
levels(cotton$plants)
## [1] "1.2" "2.1" "3" "3.9" "4.8" "5.7"
In beginning to display this data graphically, summary statistics were gathered for the dataset being considered here, “cotton”. Additionally, histograms and boxplots were created to represent the different observations of amounts of seed cotton yield.
#Display the summary statistics of "cotton".
summary(cotton)
## year density plants bolls boll.weight lint.percent
## Min. :2009 D1:2 1.2:2 High :3 Min. :5.20 Min. :39.8
## 1st Qu.:2009 D2:2 2.1:2 Low :3 1st Qu.:5.30 1st Qu.:39.8
## Median :2010 D3:2 3 :2 Medium:6 Median :5.30 Median :39.9
## Mean :2010 D4:2 3.9:2 Mean :5.32 Mean :40.0
## 3rd Qu.:2010 D5:2 4.8:2 3rd Qu.:5.40 3rd Qu.:40.1
## Max. :2010 D6:2 5.7:2 Max. :5.40 Max. :40.2
## seed.cotton.yield lint.yield
## Min. :2829 Min. :1073
## 1st Qu.:3133 1st Qu.:1234
## Median :3394 Median :1326
## Mean :3406 Mean :1315
## 3rd Qu.:3657 3rd Qu.:1412
## Max. :3959 Max. :1511
#Display the names found in "cotton".
names(cotton)
## [1] "year" "density" "plants"
## [4] "bolls" "boll.weight" "lint.percent"
## [7] "seed.cotton.yield" "lint.yield"
#Display the structure of "cotton".
str(cotton)
## 'data.frame': 12 obs. of 8 variables:
## $ year : int 2009 2009 2009 2009 2009 2009 2010 2010 2010 2010 ...
## $ density : Factor w/ 6 levels "D1","D2","D3",..: 1 2 3 4 5 6 1 2 3 4 ...
## $ plants : Factor w/ 6 levels "1.2","2.1","3",..: 1 2 3 4 5 6 1 2 3 4 ...
## $ bolls : Factor w/ 3 levels "High","Low","Medium": 2 3 1 3 3 2 2 3 1 1 ...
## $ boll.weight : num 5.4 5.4 5.3 5.3 5.2 5.2 5.4 5.3 5.3 5.4 ...
## $ lint.percent : num 40.1 39.8 40.1 40.2 39.8 39.9 40.1 39.9 40.2 39.8 ...
## $ seed.cotton.yield: int 2829 3638 3897 3455 3145 3097 2919 3711 3959 3639 ...
## $ lint.yield : int 1073 1411 1502 1388 1260 1225 1109 1417 1511 1389 ...
#Display the head and tail of "cotton".
head(cotton)
## year density plants bolls boll.weight lint.percent seed.cotton.yield
## 1 2009 D1 1.2 Low 5.4 40.1 2829
## 2 2009 D2 2.1 Medium 5.4 39.8 3638
## 3 2009 D3 3 High 5.3 40.1 3897
## 4 2009 D4 3.9 Medium 5.3 40.2 3455
## 5 2009 D5 4.8 Medium 5.2 39.8 3145
## 6 2009 D6 5.7 Low 5.2 39.9 3097
## lint.yield
## 1 1073
## 2 1411
## 3 1502
## 4 1388
## 5 1260
## 6 1225
tail(cotton)
## year density plants bolls boll.weight lint.percent seed.cotton.yield
## 7 2010 D1 1.2 Low 5.4 40.1 2919
## 8 2010 D2 2.1 Medium 5.3 39.9 3711
## 9 2010 D3 3 High 5.3 40.2 3959
## 10 2010 D4 3.9 High 5.4 39.8 3639
## 11 2010 D5 4.8 Medium 5.3 39.9 3334
## 12 2010 D6 5.7 Medium 5.3 39.8 3255
## lint.yield
## 7 1109
## 8 1417
## 9 1511
## 10 1389
## 11 1263
## 12 1237
#Display the levels of 'plants' and 'bolls' within "cotton".
levels(cotton$plants)
## [1] "1.2" "2.1" "3" "3.9" "4.8" "5.7"
levels(cotton$bolls)
## [1] "High" "Low" "Medium"
par(mfrow=c(1,1))
#Create a histogram of total seed cotton yield ('seed.cotton.yield').
hist(cotton$seed.cotton.yield, xlim=c(2829, 3959), main = "Total Seed Cotton Yield (in kg)")
par(mfrow=c(1,1))
#Create a boxplot of total seed cotton yield ('seed.cotton.yield').
boxplot(cotton$seed.cotton.yield~cotton$plants, main = "Total Seed Cotton Yield", ylim = c(2829,3959), xlab = "Plant Densities (in m^-2)", ylab = "Seed Cotton Yield (in kg) ")
In order to determine if the variation that is observed in the response variable (which corresponds to the total seed cotton yield in this analysis) can be explained by the variation existent in the treatments of the experiment (which correspond to the number of bolls per unit of ground area and plant density), an analysis of variance (ANOVA) is performed as a means for analyzing the differences in seed cotton yield values for each of the different numbers of bolls per unit of ground area and values of plant density being considered in this dataset.
For each of the three ANOVA models that are designed in this experiment, the null hypothesis that is being tested (which we will either reject or fail to reject by the end of our analysis) states that the plant density (in m^-2) and the number of bolls per unit of ground area (in m^-2) do not have have a statistically significant effect on the yield of seed cotton (in kg), implying that the differences in the mean values of total seed cotton yield were solely the result of randomization in this experiment. In other words, if we reject the null hypothesis, we would infer that the differences in mean values of the total seed cotton yield for each of the corresponding numbers of bolls per unit of ground area and values of plant density being considered in this dataset is caused by something other than randomization, leading us to believe that the variation that is observed in the mean values of total seed cotton yield can be explained by the variation existent in the different numbers of bolls per unit of ground area and values of plant density being considered in this analysis. Alternately, if we fail to reject the null hypothesis, we would infer that the variation that is observed in the mean values of total seed cotton yield cannot be explained by the variation existent in the different numbers of bolls per unit of ground area and values of plant density being considered in this analysis and, as such, is likely caused by randomization.
#Perform an analysis of variance (ANOVA) for the different mean values observed for the total seed cotton yield, given the factor 'plants'.
model_plants <- aov(seed.cotton.yield~plants,cotton)
anova(model_plants)
## Analysis of Variance Table
##
## Response: seed.cotton.yield
## Df Sum Sq Mean Sq F value Pr(>F)
## plants 5 1456204 291241 31.3 0.00032 ***
## Residuals 6 55907 9318
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Perform an analysis of variance (ANOVA) for the different mean values observed for the total seed cotton yield, given the factor 'bolls'.
model_bolls <- aov(seed.cotton.yield~bolls,cotton)
anova(model_bolls)
## Analysis of Variance Table
##
## Response: seed.cotton.yield
## Df Sum Sq Mean Sq F value Pr(>F)
## bolls 2 1173684 586842 15.6 0.0012 **
## Residuals 9 338427 37603
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Perform an analysis of variance (ANOVA) for the different mean values observed for the total seed cotton yield, given the interaction of 'plants' and 'bolls'.
model_interaction <- aov(seed.cotton.yield~bolls*plants,cotton)
anova(model_interaction)
## Analysis of Variance Table
##
## Response: seed.cotton.yield
## Df Sum Sq Mean Sq F value Pr(>F)
## bolls 2 1173684 586842 88.59 0.00049 ***
## plants 5 311930 62386 9.42 0.02475 *
## Residuals 4 26497 6624
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(1,1))
#Create an interaction plot that plots the mean values of 'seed.cotton.yield' against the interaction of both 'plants' and 'bolls'.
interaction.plot(cotton$plants,cotton$bolls,cotton$seed.cotton.yield)
For the analysis of variance (ANOVA) that is performed where ‘plants’ is analyzed against the response variable ‘seed.cotton.yield’, a p-value = 0.000318 is returned, indicating that there is roughly a probability of 0.000318 that the resulting associated F-value (31.256) is the result of solely randomization. Therefore, based on this result, we would reject the null hypothesis, leading us to believe that the variation that is observed in the mean values of total seed cotton yield can be explained by the variation existent in the different plant densities being considered in this analysis and, as such, is likely not caused solely by randomization. (See above results for p-value and F-value.)
For the analysis of variance (ANOVA) that is performed where ‘bolls’ is analyzed against the response variable ‘seed.cotton.yield’, a p-value = 0.001187 is returned, indicating that there is roughly a probability of 0.001187 that the resulting associated F-value (15.606) is the result of solely randomization. Therefore, based on this result, we would reject the null hypothesis, leading us to believe that the variation that is observed in the mean values of total seed cotton yield can be explained by the variation existent in the different numbers of bolls per unit of grouna area being considered in this analysis and, as such, is likely not caused solely by randomization. (See above results for p-value and F-value.)
For the analysis of variance (ANOVA) that is performed where the interaction of both ‘plants’ and ‘bolls’ is analyzed against the response variable ‘seed.cotton.yield’, a p-value exhibiting the interaction effect is not returned, indicating that ANOVA F-tests on an essentially perfect fit are unreliable. Therefore, this suggests that the variation that is observed in the mean values of total seed cotton yield can likely be explained by the variation existent in the interaction of the different numbers of bolls per unit of grouna area and the different values of plant densities being considered in this analysis. Additionally, upon generating an interaction plot that plots the mean values of the total seed cotton yield (‘seed.cotton.yield’) against the interaction of both ‘plants’ and ‘bolls’, the plot suggests that the interaction of these two factors does have a significant effect on the response variable (since the lines that are displayed on the plot have slopes that will result in the eventual intersection of those lines). Therefore, based on this result, we would reject the null hypothesis, leading us to infer that the variation that is observed in the total seed cotton yield can be explained by the variation existent in the interaction of the different numbers of bolls per unit of grouna area and the different values of plant densities being considered in this analysis. (see Analysis of Variance Table for “model_interaction”.)
In further carrying out this analysis, we can compute Tukey Honest Significant Differences (via “TukeyHSD()”) as a means for determining the specifc levels of each factor existent in this analysis that are truly independent from each other and that significantly affect the response variable, ‘seed.cotton.yield’.
#Perform a TukeyHSD Test for "model_plants".
Tukey_plants = TukeyHSD(model_plants, ordered = FALSE, conf.level = 0.95)
Tukey_plants
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = seed.cotton.yield ~ plants, data = cotton)
##
## $plants
## diff lwr upr p adj
## 2.1-1.2 800.5 416.33 1184.67 0.0013
## 3-1.2 1054.0 669.83 1438.17 0.0003
## 3.9-1.2 673.0 288.83 1057.17 0.0033
## 4.8-1.2 365.5 -18.67 749.67 0.0615
## 5.7-1.2 302.0 -82.17 686.17 0.1269
## 3-2.1 253.5 -130.67 637.67 0.2227
## 3.9-2.1 -127.5 -511.67 256.67 0.7672
## 4.8-2.1 -435.0 -819.17 -50.83 0.0290
## 5.7-2.1 -498.5 -882.67 -114.33 0.0154
## 3.9-3 -381.0 -765.17 3.17 0.0518
## 4.8-3 -688.5 -1072.67 -304.33 0.0030
## 5.7-3 -752.0 -1136.17 -367.83 0.0018
## 4.8-3.9 -307.5 -691.67 76.67 0.1190
## 5.7-3.9 -371.0 -755.17 13.17 0.0578
## 5.7-4.8 -63.5 -447.67 320.67 0.9808
par(mfrow=c(1,1))
plot(Tukey_plants)
#Perform a TukeyHSD Test for "model_bolls".
Tukey_bolls = TukeyHSD(model_bolls, ordered = FALSE, conf.level = 0.95)
Tukey_bolls
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = seed.cotton.yield ~ bolls, data = cotton)
##
## $bolls
## diff lwr upr p adj
## Low-High -883.3 -1325.39 -441.27 0.0009
## Medium-High -408.7 -791.50 -25.83 0.0373
## Medium-Low 474.7 91.83 857.50 0.0177
par(mfrow=c(1,1))
plot(Tukey_bolls)
#Perform a TukeyHSD Test for "model_interaction".
Tukey_interaction = TukeyHSD(model_interaction, ordered = FALSE, conf.level = 0.95)
Tukey_interaction
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = seed.cotton.yield ~ bolls * plants, data = cotton)
##
## $bolls
## diff lwr upr p adj
## Low-High -883.3 -1120.2 -646.5 0.0004
## Medium-High -408.7 -613.8 -203.6 0.0046
## Medium-Low 474.7 269.6 679.8 0.0026
##
## $plants
## diff lwr upr p adj
## 2.1-1.2 325.83 -60.13 711.80 0.0860
## 3-1.2 170.67 -215.30 556.63 0.4284
## 3.9-1.2 -6.00 -391.96 379.96 1.0000
## 4.8-1.2 -109.17 -495.13 276.80 0.7575
## 5.7-1.2 64.67 -321.30 450.63 0.9544
## 3-2.1 -155.17 -541.13 230.80 0.5022
## 3.9-2.1 -331.83 -717.80 54.13 0.0813
## 4.8-2.1 -435.00 -820.96 -49.04 0.0334
## 5.7-2.1 -261.17 -647.13 124.80 0.1635
## 3.9-3 -176.67 -562.63 209.30 0.4022
## 4.8-3 -279.83 -665.80 106.13 0.1350
## 5.7-3 -106.00 -491.96 279.96 0.7752
## 4.8-3.9 -103.17 -489.13 282.80 0.7907
## 5.7-3.9 70.67 -315.30 456.63 0.9367
## 5.7-4.8 173.83 -212.13 559.80 0.4144
par(mfrow=c(1,1))
plot(Tukey_interaction)
After observing the results of these Tukey Honest Significant Differences for “model_plants”, it’s seemingly clear that each of the different level-comparisons suggest a significant difference in means among a great majority of the different level-pairs (excluding “5.7-1.2”, “3.0-1.2”, “3.9-2.1”, “4.8-3.9”, and “5.7-4.8”, since, unlike the other level-pairs, their respective p-values are greater than 0.05), which further suggests the significance of the results that were gathered in the above ANOVA model “model_plants”.
After observing the results of these Tukey Honest Significant Differences for “model_bolls”, it’s seemingly clear that each of the different level-comparisons suggest a significant difference in means among all of the different level-pairs (since all of the respective p-values are less than 0.05), which further suggests the significance of the results that were gathered in the above ANOVA model “model_bolls”.
After observing the results of the these Tukey Honest Significant Differences for “model_interaction”, it’s seemingly clear that only one of the different level-comparisons for ‘cotton\(plants' suggest a significant difference in means among a great majority of the different level-pairs (which solely includes "4.8-2.1"). Additionally, it's seemingly clear that each of the different level-comparisons for 'cotton\)bolls’ suggest a significant difference in means among all of the different level-pairs (since all of the respective p-values are less than 0.05) since their respective p-values are less than 0.05). However, it’s also evident that the Tukey Honest Significant Differences do not appear when the interaction of the different level-pairs for both ‘bolls’ and ‘plants’ is concerned, which likely suggests that the interaction model itself is too perfect of a fit and that, as a result, these interaction-specific Tukey Honest Significant Differences cannot be determined here.
In estimating the different parameters of the experiment, summary statistics were determined for the relevant data in the dataset pertaining to the total seed cotton yield contained in “cotton” (which includes both the average seed cotton yield values for all of the plant densities contained within the dataset and the standard deviation of those seed cotton yields), the numbers of bolls per unit of ground area contained in “cotton” (which includes both the quantities of bolls classified as being “High”, “Medium”, and “Low”, respectively, and the standard deviation of those distributed quantities), and the different values of plant densities contained in “cotton” (which includes both the quantities of plant densities classified as being associated with a given density level, respectively, and the standard deviation of those distributed quantities).
#Display summary statistics of cotton$seed.cotton.yield.
summary(cotton$seed.cotton.yield)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2830 3130 3390 3410 3660 3960
#Display standard deviation of cotton$seed.cotton.yield.
sd(cotton$seed.cotton.yield, na.rm = FALSE)
## [1] 370.8
#Display summary statistics of cotton$plants.
summary(cotton$plants)
## 1.2 2.1 3 3.9 4.8 5.7
## 2 2 2 2 2 2
#Display standard deviation of cotton$plants.
sd(cotton$plants, na.rm = FALSE)
## [1] 1.784
#Display summary statistics of cotton$bolls.
summary(cotton$bolls)
## High Low Medium
## 3 3 6
#Display standard deviation of cotton$bolls.
sd(cotton$bolls, na.rm = FALSE)
## [1] 0.866
In verifying the results of this experiment, it’s important to ensure that the dataset itself meets all of the assumptions that correlate with the design approach that was carried out. In this way, we want to make sure that our dataset exhibits normality. Until we know that our dataset does, in fact, exhibit normality, we cannot yet say with confidence that our results are significant and representative of a properly carried-out modeling approach. In verifying our dataset for normality, we can both create a Normal Quantile-Quantile (QQ) Plot of our data and perform a Shapiro-Wilk Test of Normality on our data.
#Create a Normal Q-Q Plot for the total seed cotton yield.
qqnorm(cotton[,"seed.cotton.yield"], main = "Normal Q-Q Plot of Total Seed Cotton Yield")
qqline(cotton[,"seed.cotton.yield"])
#Create a Normal Q-Q Plot of the residuals for "model_plants".
qqnorm(residuals(model_plants), main = "Normal Q-Q Plot of Residuals of 'model_plants'")
qqline(residuals(model_plants))
#Create a Normal Q-Q Plot of the residuals for "model_bolls".
qqnorm(residuals(model_bolls), main = "Normal Q-Q Plot of Residuals of 'model_bolls'")
qqline(residuals(model_bolls))
#Create a Normal Q-Q Plot of the residuals for "model_interaction".
qqnorm(residuals(model_interaction), main = "Normal Q-Q Plot of Residuals of 'model_interaction'")
qqline(residuals(model_interaction))
#Perform Shapiro-Wilk Test of Normality on the total seed cotton yield (normality is assummed if p > 0.1).
shapiro.test(cotton[,"seed.cotton.yield"])
##
## Shapiro-Wilk normality test
##
## data: cotton[, "seed.cotton.yield"]
## W = 0.9609, p-value = 0.7969
Upon both constructing Normal Q-Q Plots and performing a Shapiro-Wilk Test of Normality on the data in this analysis, it’s likely that we can readily assume that our data exhibits normality, since the resulting p-value of the Shapiro-Wilk Test of Normality for “cotton[,“seed.cotton.yield”]” was found to be > 0.1 at its value of 0.7969 and since all of the constructed Normal Q-Q Plots did seem to display a trend of data that aligned closely with the Normal Q-Q Line.
In further backing up the confidence that we have with our results, we can generate a “quality of fit” model that plots residual error against each of the fitted models that were developed in our original analysis of variance (ANOVA).
#Create a "Quality of Fit Model" that plots the residuals of "model_plants" against its fitted model.
plot(fitted(model_plants),residuals(model_plants))
#Create a "Quality of Fit Model" that plots the residuals of "model_bolls" against its fitted model.
plot(fitted(model_bolls),residuals(model_bolls))
#Create a "Quality of Fit Model" that plots the residuals of "model_interaction" against its fitted model.
plot(fitted(model_interaction),residuals(model_interaction))
Because each of the resulting plots appears to be scatted and clumped around zero, each of the three ANOVA models developed suggests good fit. Thus, we can confindently rely on both the modeling approach that we carried out and the dataset that we analyzed in justifying the significance of our results.
If our modeling assumptions of data normality and factor independence failed in our analysis, we can still err on the side of caution by performing the nonparametric Kruskal-Wallis rank sum test to back up our original results (which will help us to decide whether the population distributions are identical without necessarily exhibiting a normal distribution).
#Perform Kruskal-Wallis Rank Sum Test on 'seed.cotton.yield' within the "cotton" dataset for both 'plants' and 'bolls' (identical populations is assummed if p > 0.05).
kruskal.test(cotton[,"seed.cotton.yield"],cotton$plants)
##
## Kruskal-Wallis rank sum test
##
## data: cotton[, "seed.cotton.yield"] and cotton$plants
## Kruskal-Wallis chi-squared = 10.31, df = 5, p-value = 0.06697
kruskal.test(cotton[,"seed.cotton.yield"],cotton$bolls)
##
## Kruskal-Wallis rank sum test
##
## data: cotton[, "seed.cotton.yield"] and cotton$bolls
## Kruskal-Wallis chi-squared = 8.692, df = 2, p-value = 0.01296
Since the p-values for both of the resulting Kruskal-Wallis rank sum tests that consider the factors ‘plants’ and ‘bolls’ against the response variable ‘seed.cotton.yield’ are less than 0.05, we can assume that the mean values of the total seed cotton yield compared to both the different numbers of bolls per unit of ground area and the different values of plant density (considered separately) are comparatively nonidentical populations. Therefore, this result suggests that we would reject the null hypothesis of our main experiment, we would lead us to infer that the differences in mean values of the total seed cotton yield for each of the corresponding numbers of bolls per unit of ground area and values of plant density being considered in this dataset is caused by something other than randomization, leading us to believe that the variation that is observed in the mean values of total seed cotton yield can be explained by the variation existent in the different numbers of bolls per unit of ground area and values of plant density being considered in this analysis. Furthermore, in addition to treating our data in such a way that uses a nonparametric analysis upon any realization that normality cannot be assumed, transformations such as the “Box-Cox Power Transformation” certainly could have been performed on the data to make it approximately normal. However, these transformations would not be necessary for this analysis, since the nonparametric significance results that we generated by using the Kruskal-Wallis rank sums test were suitable in giving us confidence in the results of our analysis.
[1] Guo-zheng YANG,Xue-jiao LUO,Yi-chun NIE,Xian-long ZHANG. “Effects of Plant Density on Yield and Canopy Micro Environment in Hybrid Cotton.” Journal of Integrative Agriculture: October 2014.
This dataframe contains information pertaining to cotton yield and its components with different planting densities, collected for the research corresponding to the article entitled “Effects of Plant Density on Yield and Canopy Micro Environment in Hybrid Cotton” (Yang et al., 2014) [http://www.sciencedirect.com/science/article/pii/S2095311913607273#].