(4028mdk09 2009)


Introduction

Humans invent herbicide to kill unwanted plants on their property (humans call it “weed”).

I have two sugarbeet varieties on my lands that known to be resistant to this herbicide, and both have been producing relatively similar yield historically.

I sprayed my lands with the herbicide last month, and both of the sugarbeet varieties are still yielding but some differences has been noticed.

I have collected some yield data to study the differences statiscally.

LandID <- as.factor(rep(1:4, rep(5,4)))
Variety <- as.factor(rep(c("Merlin", "Golden"), rep(10,2)))
Yield <- as.numeric(c("76.2", "49.6", "61.6", "49.8", "37","44.8","28.7","49.6","38.6","38.1", "47.4","19.8","37.4","34.2","34","47","24.9","24.6","32.8","46.4"))

sugarbeet <- data.frame(LandID, Variety, Yield)

Sugar beets yield data:

##    LandID Variety Yield
## 1       1  Merlin  76.2
## 2       1  Merlin  49.6
## 3       1  Merlin  61.6
## 4       1  Merlin  49.8
## 5       1  Merlin  37.0
## 6       2  Merlin  44.8
## 7       2  Merlin  28.7
## 8       2  Merlin  49.6
## 9       2  Merlin  38.6
## 10      2  Merlin  38.1
## 11      3  Golden  47.4
## 12      3  Golden  19.8
## 13      3  Golden  37.4
## 14      3  Golden  34.2
## 15      3  Golden  34.0
## 16      4  Golden  47.0
## 17      4  Golden  24.9
## 18      4  Golden  24.6
## 19      4  Golden  32.8
## 20      4  Golden  46.4

Visualisation

Visualisation is always a great way to help inspecting the differences, though it is not a mandatory requirement.

You can choose to use one among various tools to make a graph to visualise it. Following is a graph of my using a boxplot with R. The boxplot is overlayed with data of each variety.

It appears that the statistics of 10 Merlin variety sugar beet is doing better than the other 10 Golden variety sugar beet.

We will now apply statistical test to tell people is it really different statistically.


Statistical Analysis

The technique is known as Two-sample t-Test, HOWEVER, there are 5 type of t-tests out there and you need to decide which is the best to apply on your context. What are them?

How do they differ from each other?

Two-sample t-Tests Normally distributed? Paired? Equal variance?
Standard Student’s Two-Sample t-Test x
Welch’s t-Test x x
Paired t-Test -
Wilcoxon Matched Pairs x -
Mann-Whitney test x x -

(√: Yes, x: No, -: Not required)

Above is a summary table tells us 3 basic conditions to be fulfilled before deciding which t-test to use:
1. Are both variable normally distributed?
2. Are the samples in a “pair trial” - commonly known as before and after test / sample objects experience two condition.
3. Do they have equal variance?

We use additional tests for 1 and 3, whereas 2 we can just decide based on the experimental design.

Step 1 - Shapiro-wilk test of Normality

R code with result:

by(sugarbeet$Yield, sugarbeet$Variety, shapiro.test)
## sugarbeet$Variety: Golden
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.91458, p-value = 0.314
## 
## ------------------------------------------------------------ 
## sugarbeet$Variety: Merlin
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.92652, p-value = 0.4145

Null hypothesis (Ho) is always: Normally distributed Alternative hypothesis (Ha) is always: Not normally distributed

P values of both golden and merlin sugar beet variesties shows that results are not significant and thus we fail to reject the Ho. Both Golden and Merlin are normally distributed.

note: as long as one variable is not normally distributed, the result of this test is considered not normally distribted.

Thus, normally distributed? - √

Step 2 - Deciding are the samples paired?

Not in this case.

This is not a before-after experiment. Data were measured in one go. It is not a herbicide experiment of collecting data before herbicide application and collecting again after the application on the same crop.

Thus, are the data Paired? - x

Step 3 - Levene’s test for homogeneity of variance

In this test, we want to check and compare are both variety has different variances. Levene test is performed in this step. I am using a R among many other tools to perform this levene test and get the result:

## Levene's Test for Homogeneity of Variance (center = mean)
##       Df F value Pr(>F)
## group  1  0.4529 0.5095
##       18

Null hypothesis (Ho) is always: Variances are equal. Alternative hypothesis (Ha) is always: Variances are not equal.

We can see the DF, the degree of freedom (df) is 1, meaning both variety of our sugar beets are being included (df = n-1). Looking at P-value of 0.5095, which is higher than P-value of 0.05. So, we fail to reject the Ho and assume the variances to be equal.

Thus, Equal variance? - √

Assumption tests result summary

Sugar beets assumption test results: * normally distributed? - √
* data Paired? - x * Equal variance? - √

Two-sample t-Tests Normally distributed? Paired? Equal variance?
Standard Student’s Two-Sample t-Test x
Welch’s t-Test x x
Paired t-Test -
Wilcoxon Matched Pairs x -
Mann-Whitney test x x -

(√: Yes, x: No, -: Not required)

It is clearly that the assumption of Standard student’s two-sample t-tests is fulfilled, and this will be used to computed to test the statistically differences between Golden and Marlin varity.

Standard Student’s Two-Sample t-Test

## 
##  Two Sample t-test
## 
## data:  sugarbeet$Yield by sugarbeet$Variety
## t = -2.355, df = 18, p-value = 0.03007
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -23.746171  -1.353829
## sample estimates:
## mean in group Golden mean in group Merlin 
##                34.85                47.40

It is the Standard Student’s Two-Sample t-Test from R, it shows that the mean value of Merlin is higher than Golden, at a P-value of less than 0.05 (P=0.03007). Thus, we reject the o and say that there is significant differences between Merlin and Golden.

If the Merlin production is the usual production rate of the farm, and we said that the production of both variety were somewhat equal before herbicide application. If all other environmental and human factors are equal, we may say the reduction of the Golden variety could be due to herbicide application, with the support of the statistical results. It worth further action to look into this matter.

References

4028mdk09 2009, A sugar beet harvest in progress, Germany, Wikipedia, viewed 10 April 2009, https://en.wikipedia.org/wiki/Sugar_beet#/media/File:Entladung_Bunker.JPG