Parametric ANOVA tests assume normality of data and constant variance. If the assumption of normality is proven false, then non-parametric methods may be used. This method is known as the Kruskal-Wallis Test. Prior to the testing the hypothesis using Kruskal-Wallis, the Completely Randomized Design or CRD needs to be set-up in the R simulation console. Further, CRDs are built on the basis of defined levels and replications as shown in the Process section.
Independence: The observations within each group are assumed to be independent of each other. This means that values in one group should not be related to or influence the values in any other group.
Random Sampling: The data should be collected using a random sampling process. This means that each observation has an equal probability of being included in the study.
Constant Variance: Like one-way ANOVA testing, the variance among the groups is assumed to be equal for non-parametric tests.
Normality: For hypothesis testing, in one-way ANOVA, the model errors are assumed to be normally distributed. However, for the Kruskal-Wallis test (non-parametric ANOVA) we cannot assume normal distribution.
Please note: the replications for the lesson are pre-defined. Traditionally, one would run a power analysis (pwr.t.test) for the desired alpha (i.e. 0.05) and the power for a specific effect (i.e. 1). This would denote the number of replications needed statistically satisfy the performance of the hypothesis testing.
library(agricolae)
# Extensive functionality on experimental design functions package, typically used for agriculture research, package needs to be loaded to develop the functions
OutDesign<-function(levels, replication){
outdesign<-design.crd(levels, replication, serie = 1, 34268)$book
#outdesign<-outdesign[order(outdesign$levels), ]
write.csv(outdesign,"layout.csv")}
# Line 1: OutDesign creates a function to export data
# Line 2: CRD have defaults in the serie (plot number degrees) and seed (random number for testing) that can be altered to create the data. In this we have created number plot of 2 digits (1) and the random seed (34268). The random seed can be any set of numbers based on the study.
# Line 3: CRD are random sets of data that do not need to be sorted from A to Z to run properly; therefore, this line is not needed
# Line 4: Writes the layout to a csv file to the set working directory, check the set working directory to ensure it is saved in the proper location for your exploration
?design.crd
# provides the parameters used in the function and proper description, typically used for agriculture research
The Outdesign function is created to pull data out of R into a csv file. This is done by creating a CRD with the listed parameters then writing the CSV. The CSV file saves to the set working directory. Note where the file was saved because this will need to be call later to import the dat. The importance of exporting the data and working from an exported file is the repeatability of the analysis. The export can be saved to online repository (i.e. https://github.com), and it can be linked in the code. The link will enable other users to repeat your analysis to be used in future renditions of the study.
InDesign<-function(path){
indesign<-read.csv(path)
indesign<-indesign[,2:4]
indesign <- indesign[,-c(2)]
colnames(indesign)<-c("response","levels")
indesign}
# Line 1: Creates the InDesign function and then defines the pathname from the set working directory
# Line 2: Note the columns needed for test, this will be based on the layout.csv column numbers
# Line 3: Further manipulate the data, remove or add any columns needed for the analysis
# Line 4: Name the columns, proper names to represent your data set
levels<-c("Level1","Level2","Level3")
replication<-5
OutDesign(levels, response) #open layout.csv, add column with response, and save file
# Line 1: Define the levels of the analysis (i.e. i values).
# Line 2: Define the number of replications in the analysis (i.e. j values).
# Line 3: This will rename the replications output column to response, so you have the levels and the responses. This will also generate a file titled layout.csv.
dat<-InDesign(path="/Users/YOURNAMEHERE/layout.csv") #read file back into R with response added
dat
# Line 1: Reads in the file layout.csv from the pathname from your set working directory to InDesign. If you view your files (right side of R Studio layout), you will see the file layout.csv . If not, make sure you have the library(agricolae) package loaded.
# Line 2: Shows modified export data with levels and response only.
The InDesign function is created to pull data from the csv file back into R, but it is modified to perform the analysis (i.e. Kruskal-Wallace Test). This is done by calling the path that the CSV file is located which is in the set working directory. Then the data is modified by showing only the levels and the responses which is previously known for the replication outputs in the data set. Following, the proper naming of columns and rows can be used to present the “dat” of the data which is used for the study. Additionally, one can save the export to an online directory, and one can pull the data from there. Make sure the file is up to data and accurate before they test is run.
outdesign2 <-design.crd(trt,r,serie=3)
print(outdesign2$parameters)
book2<-outdesign2
# seed = 12543
outdesign1 <-design.crd(trt,r,serie=2,12543,"Mersenne-Twister")
book1<-outdesign1
trt <-c("CIP-101","CIP-201","CIP-301","CIP-401","CIP-501")
r <-c(4,3,5,4,3)
\[ y_{ij}= \mu + \tau_{i} + \epsilon_{ij} ~; ~where ~i=1,2,3 ~and ~j=1,2,3,4,5 \]
\[ H_{0} : \mu_{1} = \mu_{2} = \mu_{3}~or~H_{0} : \tau_{i} = 0 \]
\[ H_{a} : At~least~one~\mu ~is ~different ~or~H_{a} : \tau_{i} \neq 0 \]
Level1 <- c(values)
Level2 <- c(values)
Level3 <- c(values)
# Create vectors of the levels that contain the data values.
Levels <- data.frame(Level1, Level2, Level3)
# Compiles the vectors together to make the data frame.
Level_long <- pivot_longer(Levels, c(Level1, Level2, Level3))
# Pivot the data frame from wide to long.
aov.model<-aov(value~name, data=Level_long)
# Create an AOV model.
summary(aov.model)
# Generates a summary of the ANOVA test. This can be used to evaluate the data.
plot(aov.model)
# Generates Residual Plots from the data
qqnorm(Level_long$value)
qqline(Level_long$value)
# Generates a Q-Q Normal Plot with a best fit line
kruskal.test(value~name, data= Level_long)
# This function performs the Kruskal-Wallis test.
library(tidyr)
# Organizes the "messy" data to usable format, need to load the package
library(dplyr)
# Package used to manipulate the data into a uniform set to resolve the analysis, need to load the package
library(MASS)
# Package supports functions and dataset for least squares, best fit, linear modes, and Kruskal-Wallace for multidimensional scaling
## Brand.A Brand.B Brand.C
## 1 4.7 5.3 6.3
## 2 3.2 6.4 8.2
## 3 5.1 7.3 6.2
## 4 5.2 6.8 7.1
## 5 5.0 7.2 6.6
\[ y_{ij}= \mu + \tau_{i} + \epsilon_{ij} ; ~where ~i=1,2,3 ~and ~j=1,2,3,4,5 \]
\[ H_{0} : \mu_{1} = \mu_{2} = \mu_{3} ~or ~H_{0} : \tau_{i} = 0 \]
\[ H_{a} : At~least~one~\mu~is~different~or~H_{a} : \tau_{i} \neq 0 \]
A <- c(4.7, 3.2, 5.1,5.2,5)
B <- c(5.3,6.4,7.3,6.8,7.2)
C <- c(6.3,8.2,6.2,7.1,6.6)
brands1 <- data.frame(A,B,C)
brand_long <- pivot_longer(brands1,c(A,B,C))
colnames(brand_long)<-c("Brand","DV")
brand_long
## # A tibble: 15 × 2
## Brand DV
## <chr> <dbl>
## 1 A 4.7
## 2 B 5.3
## 3 C 6.3
## 4 A 3.2
## 5 B 6.4
## 6 C 8.2
## 7 A 5.1
## 8 B 7.3
## 9 C 6.2
## 10 A 5.2
## 11 B 6.8
## 12 C 7.1
## 13 A 5
## 14 B 7.2
## 15 C 6.6
brands2<-read.csv("https://raw.githubusercontent.com/mharvey2696/Test1DoE/main/CRD-NONPAR-CSV.csv")
aov.model<-aov(DV~Brand,data=brands2)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Brand 2 14.90 7.448 11.14 0.00184 **
## Residuals 12 8.02 0.668
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)
qqnorm(brands2$DV)
qqline(brands2$DV)
kruskal.test(DV~Brand,data=brands2)
##
## Kruskal-Wallis rank sum test
##
## data: DV by Brand
## Kruskal-Wallis chi-squared = 9.38, df = 2, p-value = 0.009187
# This function performs the Kruskal-Wallis test.
library(agricolae)
library(tidyr)
library(dplyr)
library(MASS)
brands<-read.csv("https://raw.githubusercontent.com/mharvey2696/Test1DoE/main/test.csv")
brands
A <- c(4.7, 3.2, 5.1,5.2,5)
B <- c(5.3,6.4,7.3,6.8,7.2)
C <- c(6.3,8.2,6.2,7.1,6.6)
brands1 <- data.frame(A,B,C)
brand_long <- pivot_longer(brands1,c(A,B,C))
colnames(brand_long)<-c("Brand","DV")
# OR
brands2<-read.csv("https://raw.githubusercontent.com/mharvey2696/Test1DoE/main/CRD-NONPAR-CSV.csv")
aov.model<-aov(DV~Brand,data=brands2)
summary(aov.model)
plot(aov.model)
qqnorm(brands2$DV)
qqline(brands2$DV)
kruskal.test(DV~Brand,data=brands2)