Dorcas Oluwadare
2024-08-15
Data Wrangling is one of the key aspects of Data analysis. The tidyverse package is a very useful tool for data pre-processing.
Pivoting is usually needed to re-shape data for ease of data entry, data comparison and data summary.
pivot_longer() makes data sets longer than the original data set by increasing the number of rows and decreasing the number of columns.
## Rows: 464 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): env, loc, gen
## dbl (7): year, yield, height, lodging, size, protein, oil
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 464 × 10
## env loc year gen yield height lodging size protein oil
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 L70 Lawes 1970 G01 2.39 1.44 4.25 8.45 36.7 20.9
## 2 L70 Lawes 1970 G02 2.28 1.45 4.25 9.95 37.6 20.7
## 3 L70 Lawes 1970 G03 2.57 1.46 3.75 10.8 37.8 21.3
## 4 L70 Lawes 1970 G04 2.88 1.26 3.5 10.0 38.4 22.0
## 5 L70 Lawes 1970 G05 2.39 1.34 3.5 11 37.5 22.1
## 6 L70 Lawes 1970 G06 2.41 1.36 4 11.8 38.2 21.2
## 7 L70 Lawes 1970 G07 2.70 1.3 3 11.8 37.4 21.7
## 8 L70 Lawes 1970 G08 2.46 0.955 3.25 10 35.2 21.1
## 9 L70 Lawes 1970 G09 2.57 1.03 3 11.2 35.9 21.5
## 10 L70 Lawes 1970 G10 2.98 1.16 3.75 10.8 39.7 20.4
## # ℹ 454 more rows
dat_longer <- dat %>%
pivot_longer(yield:lodging, names_to = "trait", values_to = "values")
dat_longer## # A tibble: 1,392 × 9
## env loc year gen size protein oil trait values
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 L70 Lawes 1970 G01 8.45 36.7 20.9 yield 2.39
## 2 L70 Lawes 1970 G01 8.45 36.7 20.9 height 1.44
## 3 L70 Lawes 1970 G01 8.45 36.7 20.9 lodging 4.25
## 4 L70 Lawes 1970 G02 9.95 37.6 20.7 yield 2.28
## 5 L70 Lawes 1970 G02 9.95 37.6 20.7 height 1.45
## 6 L70 Lawes 1970 G02 9.95 37.6 20.7 lodging 4.25
## 7 L70 Lawes 1970 G03 10.8 37.8 21.3 yield 2.57
## 8 L70 Lawes 1970 G03 10.8 37.8 21.3 height 1.46
## 9 L70 Lawes 1970 G03 10.8 37.8 21.3 lodging 3.75
## 10 L70 Lawes 1970 G04 10.0 38.4 22.0 yield 2.88
## # ℹ 1,382 more rows
pivot_wider() makes a data set wider by increasing the number of columns and decreasing the number of rows.
## Rows: 696 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): env, loc, gen, trait
## dbl (2): year, values
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 696 × 6
## env loc year gen trait values
## <chr> <chr> <dbl> <chr> <chr> <dbl>
## 1 L70 Lawes 1970 G01 yield 2.39
## 2 L70 Lawes 1970 G01 height 1.44
## 3 L70 Lawes 1970 G01 lodging 4.25
## 4 L70 Lawes 1970 G02 yield 2.28
## 5 L70 Lawes 1970 G02 height 1.45
## 6 L70 Lawes 1970 G02 lodging 4.25
## 7 L70 Lawes 1970 G03 yield 2.57
## 8 L70 Lawes 1970 G03 height 1.46
## 9 L70 Lawes 1970 G03 lodging 3.75
## 10 L70 Lawes 1970 G04 yield 2.88
## # ℹ 686 more rows
## # A tibble: 232 × 7
## env loc year gen yield height lodging
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 L70 Lawes 1970 G01 2.39 1.44 4.25
## 2 L70 Lawes 1970 G02 2.28 1.45 4.25
## 3 L70 Lawes 1970 G03 2.57 1.46 3.75
## 4 L70 Lawes 1970 G04 2.88 1.26 3.5
## 5 L70 Lawes 1970 G05 2.39 1.34 3.5
## 6 L70 Lawes 1970 G06 2.41 1.36 4
## 7 L70 Lawes 1970 G07 2.70 1.3 3
## 8 L70 Lawes 1970 G08 2.46 0.955 3.25
## 9 L70 Lawes 1970 G09 2.57 1.03 3
## 10 L70 Lawes 1970 G10 2.98 1.16 3.75
## # ℹ 222 more rows
library(readxl)
df1 <- read_excel("data/df.combine.xlsx", sheet = "df1")
df2 <- read_excel("data/df.combine.xlsx", sheet= "df2")
df_left <- left_join(df1,df2)## Joining with `by = join_by(gen)`
## # A tibble: 15 × 6
## gen yield height size protein oil
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 G01 2.79 0.97 8.35 39.4 20.9
## 2 G02 2.62 0.785 9.75 40.6 20.7
## 3 G03 2.48 0.955 11.4 37.4 21.3
## 4 G04 2.92 0.76 11.1 36.9 22.0
## 5 G05 2.14 0.76 11.9 38.7 22.1
## 6 G06 2.86 0.65 12.2 38.1 21.2
## 7 G07 2.97 0.8 11.2 37.4 21.7
## 8 G08 1.65 0.9 11.8 36.6 21.1
## 9 G09 3.1 0.825 12.2 37.1 21.5
## 10 G10 3.00 0.785 12.0 37.8 20.4
## 11 G11 1.90 0.875 7.4 43.4 NA
## 12 G12 1.64 1.16 7.45 44.4 NA
## 13 G13 1.97 1.08 8.25 42.2 NA
## 14 G14 2.45 0.865 7.65 38.4 NA
## 15 G15 2.63 0.675 9.05 40.7 NA
## Joining with `by = join_by(gen)`
## # A tibble: 15 × 6
## gen yield height size protein oil
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 G01 2.79 0.97 8.35 39.4 20.9
## 2 G02 2.62 0.785 9.75 40.6 20.7
## 3 G03 2.48 0.955 11.4 37.4 21.3
## 4 G04 2.92 0.76 11.1 36.9 22.0
## 5 G05 2.14 0.76 11.9 38.7 22.1
## 6 G06 2.86 0.65 12.2 38.1 21.2
## 7 G07 2.97 0.8 11.2 37.4 21.7
## 8 G08 1.65 0.9 11.8 36.6 21.1
## 9 G09 3.1 0.825 12.2 37.1 21.5
## 10 G10 3.00 0.785 12.0 37.8 20.4
## 11 G16 NA NA NA NA 19.1
## 12 G17 NA NA NA NA 18.7
## 13 G18 NA NA NA NA 19.2
## 14 G19 NA NA NA NA 20.8
## 15 G20 NA NA NA NA 20.7
## Joining with `by = join_by(gen)`
## # A tibble: 10 × 6
## gen yield height size protein oil
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 G01 2.79 0.97 8.35 39.4 20.9
## 2 G02 2.62 0.785 9.75 40.6 20.7
## 3 G03 2.48 0.955 11.4 37.4 21.3
## 4 G04 2.92 0.76 11.1 36.9 22.0
## 5 G05 2.14 0.76 11.9 38.7 22.1
## 6 G06 2.86 0.65 12.2 38.1 21.2
## 7 G07 2.97 0.8 11.2 37.4 21.7
## 8 G08 1.65 0.9 11.8 36.6 21.1
## 9 G09 3.1 0.825 12.2 37.1 21.5
## 10 G10 3.00 0.785 12.0 37.8 20.4
## Joining with `by = join_by(gen)`
## # A tibble: 20 × 6
## gen yield height size protein oil
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 G01 2.79 0.97 8.35 39.4 20.9
## 2 G02 2.62 0.785 9.75 40.6 20.7
## 3 G03 2.48 0.955 11.4 37.4 21.3
## 4 G04 2.92 0.76 11.1 36.9 22.0
## 5 G05 2.14 0.76 11.9 38.7 22.1
## 6 G06 2.86 0.65 12.2 38.1 21.2
## 7 G07 2.97 0.8 11.2 37.4 21.7
## 8 G08 1.65 0.9 11.8 36.6 21.1
## 9 G09 3.1 0.825 12.2 37.1 21.5
## 10 G10 3.00 0.785 12.0 37.8 20.4
## 11 G11 1.90 0.875 7.4 43.4 NA
## 12 G12 1.64 1.16 7.45 44.4 NA
## 13 G13 1.97 1.08 8.25 42.2 NA
## 14 G14 2.45 0.865 7.65 38.4 NA
## 15 G15 2.63 0.675 9.05 40.7 NA
## 16 G16 NA NA NA NA 19.1
## 17 G17 NA NA NA NA 18.7
## 18 G18 NA NA NA NA 19.2
## 19 G19 NA NA NA NA 20.8
## 20 G20 NA NA NA NA 20.7
In data pre-processing it is sometimes necessary to create new variables needed for analysis.
dat <- dat %>%
mutate(
lodging_grp= case_when(
lodging >= 1 & lodging <= 1.49 ~ "No lodging",
lodging >= 1.5 & lodging <= 2.49 ~ "Mild Lodging",
lodging >= 2.5 & lodging <= 3.49 ~ "Moderate Lodging",
lodging >= 3.5 ~ "Heavy Lodging"
)
)dat_new <- dat %>%
mutate(
lodging_grp2 = case_when(
lodging_grp %in% c("Heavy Lodging", "Moderate Lodging") ~"Lodging",
lodging_grp %in% c("No Lodging", "Mild Lodging")~ "No Lodging",
TRUE ~ "NA")
)
dat_new## # A tibble: 464 × 12
## env loc year gen yield height lodging size protein oil lodging_grp
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 L70 Lawes 1970 G01 2.39 1.44 4.25 8.45 36.7 20.9 Heavy Lodgi…
## 2 L70 Lawes 1970 G02 2.28 1.45 4.25 9.95 37.6 20.7 Heavy Lodgi…
## 3 L70 Lawes 1970 G03 2.57 1.46 3.75 10.8 37.8 21.3 Heavy Lodgi…
## 4 L70 Lawes 1970 G04 2.88 1.26 3.5 10.0 38.4 22.0 Heavy Lodgi…
## 5 L70 Lawes 1970 G05 2.39 1.34 3.5 11 37.5 22.1 Heavy Lodgi…
## 6 L70 Lawes 1970 G06 2.41 1.36 4 11.8 38.2 21.2 Heavy Lodgi…
## 7 L70 Lawes 1970 G07 2.70 1.3 3 11.8 37.4 21.7 Moderate Lo…
## 8 L70 Lawes 1970 G08 2.46 0.955 3.25 10 35.2 21.1 Moderate Lo…
## 9 L70 Lawes 1970 G09 2.57 1.03 3 11.2 35.9 21.5 Moderate Lo…
## 10 L70 Lawes 1970 G10 2.98 1.16 3.75 10.8 39.7 20.4 Heavy Lodgi…
## # ℹ 454 more rows
## # ℹ 1 more variable: lodging_grp2 <chr>
Some data sets may require splitting values or combining multiple cells before analysis, unite(), separate() and str_sub() functions are used to create new variables and columns.
## Rows: 360 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Gen
## dbl (6): Year, Trait1, Trait2, Trait3, Trait4, Trait5
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 360 × 7
## Gen Year Trait1 Trait2 Trait3 Trait4 Trait5
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 IBG10-1 2020 12 12 58 63 85
## 2 IBG10-2 2020 9 9 62 67 96
## 3 IBG10-3 2020 12 12 56 63 96
## 4 IBG11-1 2020 10 10 56 60 95
## 5 IBG11-2 2020 9 9 56 60 102
## 6 IBG11-3 2020 10 10 57 60 97
## 7 IBG12-1 2020 10 10 50 59 84
## 8 IBG12-2 2020 11 11 53 56 84
## 9 IBG12-3 2020 12 12 57 60 92
## 10 IBG126-1 2020 11 10 51 58 84
## # ℹ 350 more rows
BadData %<>%
separate(Gen,sep = "-", into = c("LocGen","Rep")) %>%
mutate(
Location= str_sub(LocGen, 1,2),
Genotype = str_sub(LocGen,3,5)
) %>%
select(Location,Genotype, Rep,Year, Trait1,Trait2,Trait3,Trait4,Trait5)
BadData %<>% unite(Location, Year, col= "Env", sep = "-")
BadData## # A tibble: 360 × 8
## Env Genotype Rep Trait1 Trait2 Trait3 Trait4 Trait5
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 IB-2020 G10 1 12 12 58 63 85
## 2 IB-2020 G10 2 9 9 62 67 96
## 3 IB-2020 G10 3 12 12 56 63 96
## 4 IB-2020 G11 1 10 10 56 60 95
## 5 IB-2020 G11 2 9 9 56 60 102
## 6 IB-2020 G11 3 10 10 57 60 97
## 7 IB-2020 G12 1 10 10 50 59 84
## 8 IB-2020 G12 2 11 11 53 56 84
## 9 IB-2020 G12 3 12 12 57 60 92
## 10 IB-2020 G12 1 11 10 51 58 84
## # ℹ 350 more rows
## Rows: 464 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): env, loc, gen
## dbl (7): year, yield, height, lodging, size, protein, oil
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 464 × 10
## env loc year gen yield height lodging size protein oil
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 L70 Lawes 1970 G01 2.39 1.44 4.25 8.45 36.7 20.9
## 2 L70 Lawes 1970 G02 2.28 1.45 4.25 9.95 37.6 20.7
## 3 L70 Lawes 1970 G03 2.57 1.46 3.75 10.8 37.8 21.3
## 4 L70 Lawes 1970 G04 2.88 1.26 3.5 10.0 38.4 22.0
## 5 L70 Lawes 1970 G05 2.39 1.34 3.5 11 37.5 22.1
## 6 L70 Lawes 1970 G06 2.41 1.36 4 11.8 38.2 21.2
## 7 L70 Lawes 1970 G07 2.70 1.3 3 11.8 37.4 21.7
## 8 L70 Lawes 1970 G08 2.46 0.955 3.25 10 35.2 21.1
## 9 L70 Lawes 1970 G09 2.57 1.03 3 11.2 35.9 21.5
## 10 L70 Lawes 1970 G10 2.98 1.16 3.75 10.8 39.7 20.4
## # ℹ 454 more rows
## # A tibble: 464 × 12
## env loc year gen yield height lodging size protein oil loc3 year2
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 L70 Lawes 1970 G01 2.39 1.44 4.25 8.45 36.7 20.9 Law 70
## 2 L70 Lawes 1970 G02 2.28 1.45 4.25 9.95 37.6 20.7 Law 70
## 3 L70 Lawes 1970 G03 2.57 1.46 3.75 10.8 37.8 21.3 Law 70
## 4 L70 Lawes 1970 G04 2.88 1.26 3.5 10.0 38.4 22.0 Law 70
## 5 L70 Lawes 1970 G05 2.39 1.34 3.5 11 37.5 22.1 Law 70
## 6 L70 Lawes 1970 G06 2.41 1.36 4 11.8 38.2 21.2 Law 70
## 7 L70 Lawes 1970 G07 2.70 1.3 3 11.8 37.4 21.7 Law 70
## 8 L70 Lawes 1970 G08 2.46 0.955 3.25 10 35.2 21.1 Law 70
## 9 L70 Lawes 1970 G09 2.57 1.03 3 11.2 35.9 21.5 Law 70
## 10 L70 Lawes 1970 G10 2.98 1.16 3.75 10.8 39.7 20.4 Law 70
## # ℹ 454 more rows
## # A tibble: 464 × 11
## env loc year gen yield height lodging size protein oil ENV
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 L70 Lawes 1970 G01 2.39 1.44 4.25 8.45 36.7 20.9 Law70
## 2 L70 Lawes 1970 G02 2.28 1.45 4.25 9.95 37.6 20.7 Law70
## 3 L70 Lawes 1970 G03 2.57 1.46 3.75 10.8 37.8 21.3 Law70
## 4 L70 Lawes 1970 G04 2.88 1.26 3.5 10.0 38.4 22.0 Law70
## 5 L70 Lawes 1970 G05 2.39 1.34 3.5 11 37.5 22.1 Law70
## 6 L70 Lawes 1970 G06 2.41 1.36 4 11.8 38.2 21.2 Law70
## 7 L70 Lawes 1970 G07 2.70 1.3 3 11.8 37.4 21.7 Law70
## 8 L70 Lawes 1970 G08 2.46 0.955 3.25 10 35.2 21.1 Law70
## 9 L70 Lawes 1970 G09 2.57 1.03 3 11.2 35.9 21.5 Law70
## 10 L70 Lawes 1970 G10 2.98 1.16 3.75 10.8 39.7 20.4 Law70
## # ℹ 454 more rows
## # A tibble: 464 × 12
## env loc year gen yield height lodging size protein oil ENV
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 L70 Lawes 1970 G01 2.39 1.44 4.25 8.45 36.7 20.9 Law70
## 2 L70 Lawes 1970 G02 2.28 1.45 4.25 9.95 37.6 20.7 Law70
## 3 L70 Lawes 1970 G03 2.57 1.46 3.75 10.8 37.8 21.3 Law70
## 4 L70 Lawes 1970 G04 2.88 1.26 3.5 10.0 38.4 22.0 Law70
## 5 L70 Lawes 1970 G05 2.39 1.34 3.5 11 37.5 22.1 Law70
## 6 L70 Lawes 1970 G06 2.41 1.36 4 11.8 38.2 21.2 Law70
## 7 L70 Lawes 1970 G07 2.70 1.3 3 11.8 37.4 21.7 Law70
## 8 L70 Lawes 1970 G08 2.46 0.955 3.25 10 35.2 21.1 Law70
## 9 L70 Lawes 1970 G09 2.57 1.03 3 11.2 35.9 21.5 Law70
## 10 L70 Lawes 1970 G10 2.98 1.16 3.75 10.8 39.7 20.4 Law70
## # ℹ 454 more rows
## # ℹ 1 more variable: yield_grp <chr>
example3 <- example1 %>%
dplyr::rename( "Trait1"= "yield",
"Trait2"= "yield_grp" ,
"Trait3"= "height" ,
"Trait4" ="oil" ) %>%
select(ENV, gen, Trait1, Trait2, Trait3, Trait4,)
example3## # A tibble: 464 × 6
## ENV gen Trait1 Trait2 Trait3 Trait4
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 Law70 G01 2.39 Low 1.44 20.9
## 2 Law70 G02 2.28 Low 1.45 20.7
## 3 Law70 G03 2.57 Low 1.46 21.3
## 4 Law70 G04 2.88 Low 1.26 22.0
## 5 Law70 G05 2.39 Low 1.34 22.1
## 6 Law70 G06 2.41 Low 1.36 21.2
## 7 Law70 G07 2.70 Low 1.3 21.7
## 8 Law70 G08 2.46 Low 0.955 21.1
## 9 Law70 G09 2.57 Low 1.03 21.5
## 10 Law70 G10 2.98 Low 1.16 20.4
## # ℹ 454 more rows
Experimental design is a systematic method used to determine the relationship between factors affecting a process and the output of that process. It is the process of preparing research in advance, which is then carried out in controlled fashion, so precision is maximized and specific conclusions can be drawn.
Identify as much sources of variation as possible.
Study and eliminate as much of the natural variation.
Prevent unremoved variation from confusing or biasing the effects
being tested.
Detect cause and effect with minimal amount of experimental error.
When designing experiments, the goal is to make the treatment group and control group as homogeneous as possible. The researcher can use techniques such as blocking, proper plot techniques and data analysis to control experimental errors.
Packages used are agricolae, agricolaeplotr and desplot.
Blocking
There are multiple things to consider before picking the experimental design for analysis of a data set. For example:
Randomization in RCBD occurs after the units have been assigned to blocks.
An experimenter has fifteen genotypes to be tested with 3 replicates.
library(tidyverse)
library(agricolae)
library(desplot)
trt <- LETTERS[1:15]
outdesign <- design.rcbd(trt, 3,serie = 2,seed = 547)
book <- outdesign$book
print(outdesign$sketch)## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] "J" "D" "E" "K" "A" "C" "B" "N" "F" "I" "H" "G" "O" "M"
## [2,] "B" "G" "M" "A" "E" "L" "K" "O" "J" "N" "C" "H" "D" "I"
## [3,] "C" "I" "O" "D" "F" "H" "L" "A" "E" "N" "G" "B" "M" "J"
## [,15]
## [1,] "L"
## [2,] "F"
## [3,] "K"
outdesign$bookRowCol <- outdesign$book %>%
mutate(Row= block %>% as.integer) %>%
group_by(Row) %>%
mutate( Col= plots %>% str_sub(2,3) %>% as.integer() )
#Plot field layout using desplot
desplot(trt ~ Row + Col, flip = TRUE,
data= outdesign$bookRowCol,
out1 = block,
text = trt, cex = 1, shorten = "no",
main = "Randomized Complete Block Design",
show.key = FALSE,
x = "Blocks" )outdesign %>% agricolaeplotr::plot_rcdb(reverse_y = TRUE, reverse_x = TRUE, factor_name = "trt", y = "block", labels = "trt") + guides(fill= FALSE)These are multifactor experiments. The field is divided into whole plots and further divided into subplots. Whole plots serves as the experimental unit for a particular treatment, subplots serves as the experimental unit for another treatment.
The levels of all the factors are not randomly assigned and reset after each experiment.
There is loss of precision in the whole plot treatment comparison and statistical complexity.
There is restriction on random assignment.
t1<-c("A","B","C","D")
t2<-c(1,2,3)
Splitplot <-design.split(t1,t2,r=3,serie=2,seed=45,kinds ="Super-Duper")
Split_book<- Splitplot$bookSplitplot$bookRowCol <- Split_book %>%
mutate(block = paste0("Block",block),
Col = plots %>% str_sub(2,3) %>% as.integer,
Row = splots %>% as.integer)
#Plot Field layout
desplot( block ~ Col + Row | block, flip = TRUE,
out1 = Row, out1.gpar = list(col= "black",lty =1),
out2 = Col, out2.gpar = list(col = "black",lty = 1),
text = t2, cex = 1, shorten = "no", col = t1,
data = Splitplot$bookRowCol,
main = "Split Plot Design",
show.key = T, key.cex = 0.5)Splitplot %>%
agricolaeplotr::plot_split_rcbd(factor_name_1 = "t1", factor_name_2 = "t2", subplots = TRUE, labels = "t1")Sometimes it is necessary to form small blocks with fewer plots than the number of genotypes due to heterogeneous field conditions or too many genotypes. Examples of these designs are Lattice and Latin Square Design, Augmented and partially replicated designs.
Number of treatments must be a perfect square. \(t=k^2\) where t is number of treatment and k is number of units per block.
Number of blocks per replicate is equal to plots per block. \(s=k\) and both the square root of t.
trtL <- letters[1:5] #trt creates a perfect square
Latin <- design.lsd(trtL, serie = 2, 784)
Latin_book <- Latin$book
print(Latin$sketch)## [,1] [,2] [,3] [,4] [,5]
## [1,] "a" "b" "e" "d" "c"
## [2,] "e" "a" "d" "c" "b"
## [3,] "b" "c" "a" "e" "d"
## [4,] "c" "d" "b" "a" "e"
## [5,] "d" "e" "c" "b" "a"
# Adding rows and columns
Latin$bookRowCol <- Latin_book %>%
mutate(Row= row %>% as.integer,
Col= col %>% as.integer)
#Plot field layout
desplot(trtL ~ Row + Col, Flip = TRUE,
out1 = Row, out1.gpar = list(col="black",lwd=3),
out2 = Col, out2.gpar = list(col="black", lwd=3),
text = trtL, cex= 1, shorten = "no",
data =Latin$bookRowCol,
main = "Latin Square Design",
show.key = F)Latin %>%
agricolaeplotr::plot_latin_square(reverse_x = TRUE, factor_name = "trtL", labels = "trtL") + guides(fill= "none")They are special cases of incomplete block designs.
trtLT <- letters[1:25] #trt has have factors of a perfect square
lattice <- design.lattice(trtLT, 2, serie=2, 784)##
## Lattice design, simple 5 x 5
##
## Efficiency factor
## (E ) 0.75
##
## <<< Book >>>
## $rep1
## [,1] [,2] [,3] [,4] [,5]
## [1,] "c" "o" "j" "y" "i"
## [2,] "k" "d" "v" "h" "e"
## [3,] "x" "u" "a" "f" "w"
## [4,] "s" "p" "t" "m" "n"
## [5,] "b" "g" "q" "l" "r"
##
## $rep2
## [,1] [,2] [,3] [,4] [,5]
## [1,] "a" "j" "t" "q" "v"
## [2,] "x" "c" "s" "b" "k"
## [3,] "w" "i" "n" "r" "e"
## [4,] "u" "o" "p" "g" "d"
## [5,] "f" "y" "m" "l" "h"
# Add Rows and columns
lattice$bookRowCol <- lattice_book %>%
mutate(r = paste0("Rep", r),
col = block %>% as.integer,
Row = plots %>% str_sub(2,3) %>% as.integer
)
# Plot design using desplot
s <- desplot (trt ~ Row + col |r , Flip = TRUE,
out1 = Row, out1.gpar = list(col= "black", lwd= 1, lty= 1 ),
out2 = col, out2.gpar = list(col= "black", lty= 1),
text = trt, shorten = "no", cex = 1,
midpoint = "NULL",
data =lattice$bookRowCol,
main = "Lattice Design",
show.key = F,
ylab = "Blocks"
)
slattice %>%
agricolaeplotr::plot_lattice_simple(reverse_x = TRUE, factor_name = "trt", labels = "trt", y="block"
) %>%
+ guides(fill= "none") +facet_wrap(~r)This design is best suited for experiments with limited test treatment, limited resources, difficulty in keeping blocks homogeneous, testing genotypes in as many environments which leads to a single replicate of genotypes.
\(N=n+r*c\), where N is number of plots, n is number of entries, c is number of checks.
New or test treatments are not replicated or have fewer replicates than the checks.
Evaluates more genotypes and tests in more environments.
Observations on new genotypes can be adjusted for field heterogeneity.
It is flexible
Significant resources are spent on producing checks
Experimental error has a limited number of degrees of freedom which limits the power to detect changes across treatment.
A plant breeder has 53 genotypes, 3 of which are standard checks to be replicated in each block
trtA <- letters[1:3] #control
trtAA <- 1:50 #treatment
aug <- design.dau(trt1 = trtA,trt2 = trtAA, 7, serie = 2, 998)
book.a <- aug$book
by(book.a,book.a[2],function(x) paste(x[,1],"-",as.character(x[,3])))## block: 1
## [1] "101 - 44" "102 - 46" "103 - 20" "104 - 8" "105 - a" "106 - 14"
## [7] "107 - b" "108 - 24" "109 - c" "110 - 29" "111 - 30"
## ------------------------------------------------------------
## block: 2
## [1] "201 - 23" "202 - 40" "203 - 4" "204 - 1" "205 - 39" "206 - 48"
## [7] "207 - c" "208 - a" "209 - 37" "210 - b"
## ------------------------------------------------------------
## block: 3
## [1] "301 - 34" "302 - c" "303 - b" "304 - 5" "305 - 12" "306 - 9"
## [7] "307 - a" "308 - 35" "309 - 28" "310 - 27"
## ------------------------------------------------------------
## block: 4
## [1] "401 - 47" "402 - a" "403 - 2" "404 - c" "405 - 49" "406 - 16"
## [7] "407 - 45" "408 - b" "409 - 43" "410 - 13"
## ------------------------------------------------------------
## block: 5
## [1] "501 - 21" "502 - c" "503 - 22" "504 - 17" "505 - b" "506 - 38"
## [7] "507 - 26" "508 - a" "509 - 15" "510 - 11"
## ------------------------------------------------------------
## block: 6
## [1] "601 - c" "602 - 19" "603 - 31" "604 - 10" "605 - b" "606 - 41"
## [7] "607 - 33" "608 - a" "609 - 18" "610 - 7"
## ------------------------------------------------------------
## block: 7
## [1] "701 - 42" "702 - b" "703 - c" "704 - 25" "705 - 32" "706 - 36"
## [7] "707 - 50" "708 - 6" "709 - a" "710 - 3"
augmentedRowCol <- book.a %>%
mutate(Col = block %>% as.integer,
Row = plots %>% str_sub(2,3) %>% as.integer)
#plot field layout using desplot
desplot(trt~ Row + Col, flip = TRUE,
out1 = block, out1.gpar = list(col="black", lwd= 2, lty =3),
text= trt, cex = 1, shorten = "no",
data= augmentedRowCol,
main = "Augmented Design",
show.key = F
)aug %>%
agricolaeplotr::plot_dau(y = "block", factor_name = "trt", labels = "trt") %>%
+ guides(fill= "none") + labs(title= "Augmented Design")Suitable for large numbers of genotypes.
Random allocation of treatments within a block.
Not all treatments appear in a block
All pairs of treatments appear equally in a trial
\(t= s*k\), where t is number of treatments in each replicate, s is number of blocks in each replicate and k is number of units per block.
trtA <- 1:102
tl <- length(trtA)
k <- 6
s <- tl/k
r <- 3
AlphaDesign <- design.alpha(trtA, k, r, serie = 2, 998)##
## Alpha Design (0,1) - Serie II
##
## Parameters Alpha Design
## =======================
## Treatmeans : 102
## Block size : 6
## Blocks : 17
## Replication: 3
##
## Efficiency factor
## (E ) 0.808
##
## <<< Book >>>
Alpha_book <- AlphaDesign$book
plots <- Alpha_book[,1]
dim(plots) <- c(k,s,r)
for (i in 1:r) {print(t(plots[,,i]))
}## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 101 102 103 104 105 106
## [2,] 107 108 109 110 111 112
## [3,] 113 114 115 116 117 118
## [4,] 119 120 121 122 123 124
## [5,] 125 126 127 128 129 130
## [6,] 131 132 133 134 135 136
## [7,] 137 138 139 140 141 142
## [8,] 143 144 145 146 147 148
## [9,] 149 150 151 152 153 154
## [10,] 155 156 157 158 159 160
## [11,] 161 162 163 164 165 166
## [12,] 167 168 169 170 171 172
## [13,] 173 174 175 176 177 178
## [14,] 179 180 181 182 183 184
## [15,] 185 186 187 188 189 190
## [16,] 191 192 193 194 195 196
## [17,] 197 198 199 200 201 202
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 201 202 203 204 205 206
## [2,] 207 208 209 210 211 212
## [3,] 213 214 215 216 217 218
## [4,] 219 220 221 222 223 224
## [5,] 225 226 227 228 229 230
## [6,] 231 232 233 234 235 236
## [7,] 237 238 239 240 241 242
## [8,] 243 244 245 246 247 248
## [9,] 249 250 251 252 253 254
## [10,] 255 256 257 258 259 260
## [11,] 261 262 263 264 265 266
## [12,] 267 268 269 270 271 272
## [13,] 273 274 275 276 277 278
## [14,] 279 280 281 282 283 284
## [15,] 285 286 287 288 289 290
## [16,] 291 292 293 294 295 296
## [17,] 297 298 299 300 301 302
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 301 302 303 304 305 306
## [2,] 307 308 309 310 311 312
## [3,] 313 314 315 316 317 318
## [4,] 319 320 321 322 323 324
## [5,] 325 326 327 328 329 330
## [6,] 331 332 333 334 335 336
## [7,] 337 338 339 340 341 342
## [8,] 343 344 345 346 347 348
## [9,] 349 350 351 352 353 354
## [10,] 355 356 357 358 359 360
## [11,] 361 362 363 364 365 366
## [12,] 367 368 369 370 371 372
## [13,] 373 374 375 376 377 378
## [14,] 379 380 381 382 383 384
## [15,] 385 386 387 388 389 390
## [16,] 391 392 393 394 395 396
## [17,] 397 398 399 400 401 402
## $rep1
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "23" "93" "79" "4" "43" "18"
## [2,] "28" "39" "34" "99" "36" "91"
## [3,] "26" "31" "44" "84" "25" "3"
## [4,] "57" "1" "90" "73" "69" "45"
## [5,] "80" "10" "82" "101" "37" "30"
## [6,] "83" "78" "61" "42" "47" "38"
## [7,] "62" "64" "66" "7" "75" "74"
## [8,] "17" "51" "35" "94" "55" "24"
## [9,] "54" "6" "67" "8" "13" "95"
## [10,] "100" "5" "87" "60" "49" "22"
## [11,] "52" "53" "21" "9" "96" "14"
## [12,] "97" "98" "92" "29" "2" "48"
## [13,] "32" "16" "27" "46" "72" "56"
## [14,] "58" "77" "89" "65" "50" "19"
## [15,] "88" "85" "40" "59" "12" "81"
## [16,] "15" "41" "70" "68" "11" "63"
## [17,] "76" "86" "102" "71" "33" "20"
##
## $rep2
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "2" "27" "68" "42" "69" "99"
## [2,] "75" "54" "33" "23" "58" "10"
## [3,] "92" "83" "16" "41" "45" "55"
## [4,] "81" "53" "87" "3" "79" "94"
## [5,] "76" "38" "101" "56" "13" "34"
## [6,] "84" "12" "17" "29" "90" "70"
## [7,] "73" "31" "98" "32" "63" "51"
## [8,] "39" "47" "80" "1" "72" "95"
## [9,] "48" "24" "44" "15" "9" "88"
## [10,] "100" "85" "25" "35" "11" "14"
## [11,] "86" "89" "36" "6" "37" "74"
## [12,] "52" "4" "77" "102" "5" "66"
## [13,] "30" "43" "50" "20" "62" "49"
## [14,] "78" "57" "91" "8" "46" "97"
## [15,] "26" "21" "22" "7" "59" "18"
## [16,] "40" "19" "93" "64" "60" "96"
## [17,] "65" "61" "67" "82" "71" "28"
##
## $rep3
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "47" "28" "73" "27" "48" "54"
## [2,] "97" "52" "17" "25" "41" "59"
## [3,] "1" "3" "24" "2" "40" "63"
## [4,] "53" "43" "74" "31" "88" "60"
## [5,] "6" "16" "30" "90" "78" "34"
## [6,] "94" "15" "61" "72" "69" "98"
## [7,] "56" "26" "92" "57" "70" "35"
## [8,] "19" "37" "83" "20" "91" "13"
## [9,] "23" "85" "84" "5" "55" "21"
## [10,] "68" "96" "51" "44" "81" "49"
## [11,] "9" "87" "62" "76" "93" "89"
## [12,] "101" "64" "86" "8" "50" "79"
## [13,] "80" "71" "100" "66" "18" "58"
## [14,] "10" "32" "39" "102" "67" "42"
## [15,] "4" "65" "12" "75" "22" "14"
## [16,] "77" "95" "7" "82" "33" "99"
## [17,] "45" "36" "38" "46" "29" "11"
# Add Row and columns
alphabookRowCol <- Alpha_book %>%
mutate(replication = paste0("Replication", replication),
Row = block %>% as.integer,
Col = cols %>% as.integer)
#field layout using desplot
desplot(alphabookRowCol, trtA ~ Row + Col|replication, flip = TRUE,
out1 = replication, out1.gpar = list(col= "black", lyt= 3, lwd= 3),
out2 = block, out2.gpar = list(col="black",lyt= 2),
text = trtA,cex=1, shorten = "no",
main = "Alpha Design",
show.key = F)AlphaDesign %>%
agricolaeplotr::plot_alpha( x= "cols", y= "block", factor_name = "trtA", labels = "trtA") %>%
+ guides(fill= "none")+ facet_wrap(~replication)P-rep designs are useful when an augmented design is using control plots where seed are limited and full replication is not feasible.
A portion of the test genotypes are replicated and the rest appears once.
The design is constructed in such a way that duplicated treatments can be used to estimate the residual variance.
A breeder has 106 test materials and 4 checks. She wishes to replicate her trial 3 times.
# adding Row and column
pRowlCol <- book.c %>%
mutate(Row = plots %>% str_sub(2,3) %>% as.integer,
Col = block %>% as.integer)
# field layout using desplot
desplot(trt ~ Row + Col, flip = TRUE,
out1 = block, out1.gpar = list(col= "black", lwd= 2, lty= 3),
text = trt, cex = 1, shorten = "no",
data= pRowlCol,
main= "P-rep Design",
show.key = F)design %>%
agricolaeplotr::plot_dau(y= "block", factor_name = "trt",labels = "trt") %>%
+ guides(fill= "none")+ labs(title= "P-rep Design")Analysis of Variance (ANOVA) is used to determine differences between research results and experimental samples.
One way ANOVA focuses on one treatment factor while two way ANOVA focuses on multiple treatment factors.
Differences among measurements for same group is called experimental error (variance within groups), in order to have experimental error we need replication.
Differences between two unrelated groups is called variance between groups.
Fixed effects occurs when treatments are well defined, easily replicable and expected to yield same impact on average in each replicate while random effect occurs if the treatments cannot be considered to come from a predefined or known set (random sampling from a larger population). When the researcher’s aim is comparation of genotypes, then the factors are considered fixed, but when the aim of the researcher is selection the factors are considered random.
Heritability is the amount of phenotypic variation in a population, that is attributable to individual genetic differences. It is the ratio of variation due to differences between genotypes to the phenotypic variation for a trait in a population. \(h^2= \frac{σ_g^2}{{σ_g^2+ \frac{σ_e^2}{r}}}\)
Packages used are lmerTest and emmeans.
If experimental error is higher than the variance between groups, we cannot conclude that there is a difference between the two groups.
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
## Rows: 72 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): env, rep, gen
## dbl (1): yield
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 72 × 4
## env rep gen yield
## <chr> <chr> <chr> <dbl>
## 1 E3 R1 G01 449.
## 2 E3 R1 G02 458.
## 3 E3 R1 G03 545.
## 4 E3 R1 G04 547.
## 5 E3 R1 G05 784.
## 6 E3 R1 G06 501.
## 7 E3 R1 G07 1099.
## 8 E3 R1 G08 882.
## 9 E3 R1 G09 821.
## 10 E3 R1 G10 918.
## # ℹ 62 more rows
## env rep gen yield
## Length:72 Length:72 Length:72 Min. : 91.26
## Class :character Class :character Class :character 1st Qu.: 511.78
## Mode :character Mode :character Mode :character Median : 669.10
## Mean : 671.15
## 3rd Qu.: 808.90
## Max. :1327.45
## [1] "E3"
## [1] "R1" "R2" "R3" "R4"
## [1] "G01" "G02" "G03" "G04" "G05" "G06" "G07" "G08" "G09" "G10" "G11" "G12"
## [13] "G13" "G14" "G15" "G16" "G17" "G18"
model.E3 <- lmer(yield ~ gen + (1|rep), data = omerE3) ## factors considered fixed
testE3 <- anova(model.E3, ddf = "Kenward-Roger", type = 3)
testE3## Type III Analysis of Variance Table with Kenward-Roger's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## gen 2478067 145769 17 51 5.688 6.427e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## gen emmean SE df lower.CL upper.CL
## G01 582 84.5 45.9 412 752
## G02 515 84.5 45.9 345 685
## G03 805 84.5 45.9 635 975
## G04 749 84.5 45.9 579 919
## G05 643 84.5 45.9 473 813
## G06 315 84.5 45.9 145 485
## G07 937 84.5 45.9 767 1107
## G08 741 84.5 45.9 571 911
## G09 813 84.5 45.9 643 983
## G10 739 84.5 45.9 569 909
## G11 728 84.5 45.9 558 898
## G12 539 84.5 45.9 369 709
## G13 825 84.5 45.9 655 995
## G14 583 84.5 45.9 413 753
## G15 647 84.5 45.9 477 817
## G16 605 84.5 45.9 435 775
## G17 285 84.5 45.9 115 455
## G18 1030 84.5 45.9 860 1200
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
## contrast estimate SE df t.ratio p.value
## G01 - G02 66.52 113 51 0.588 1.0000
## G01 - G03 -223.07 113 51 -1.971 0.8690
## G01 - G04 -167.40 113 51 -1.479 0.9885
## G01 - G05 -61.05 113 51 -0.539 1.0000
## G01 - G06 267.20 113 51 2.361 0.6381
## G01 - G07 -355.20 113 51 -3.138 0.1798
## G01 - G08 -159.53 113 51 -1.409 0.9930
## G01 - G09 -231.31 113 51 -2.043 0.8340
## G01 - G10 -156.66 113 51 -1.384 0.9943
## G01 - G11 -146.29 113 51 -1.292 0.9973
## G01 - G12 42.87 113 51 0.379 1.0000
## G01 - G13 -243.18 113 51 -2.148 0.7759
## G01 - G14 -1.06 113 51 -0.009 1.0000
## G01 - G15 -64.72 113 51 -0.572 1.0000
## G01 - G16 -22.81 113 51 -0.201 1.0000
## G01 - G17 296.66 113 51 2.621 0.4581
## G01 - G18 -448.28 113 51 -3.960 0.0227
## G02 - G03 -289.59 113 51 -2.558 0.5006
## G02 - G04 -233.91 113 51 -2.066 0.8220
## G02 - G05 -127.57 113 51 -1.127 0.9995
## G02 - G06 200.69 113 51 1.773 0.9402
## G02 - G07 -421.72 113 51 -3.726 0.0434
## G02 - G08 -226.04 113 51 -1.997 0.8569
## G02 - G09 -297.82 113 51 -2.631 0.4513
## G02 - G10 -223.18 113 51 -1.972 0.8686
## G02 - G11 -212.80 113 51 -1.880 0.9061
## G02 - G12 -23.64 113 51 -0.209 1.0000
## G02 - G13 -309.69 113 51 -2.736 0.3835
## G02 - G14 -67.58 113 51 -0.597 1.0000
## G02 - G15 -131.24 113 51 -1.159 0.9993
## G02 - G16 -89.32 113 51 -0.789 1.0000
## G02 - G17 230.14 113 51 2.033 0.8392
## G02 - G18 -514.79 113 51 -4.548 0.0039
## G03 - G04 55.67 113 51 0.492 1.0000
## G03 - G05 162.02 113 51 1.431 0.9918
## G03 - G06 490.28 113 51 4.331 0.0076
## G03 - G07 -132.13 113 51 -1.167 0.9992
## G03 - G08 63.55 113 51 0.561 1.0000
## G03 - G09 -8.23 113 51 -0.073 1.0000
## G03 - G10 66.41 113 51 0.587 1.0000
## G03 - G11 76.79 113 51 0.678 1.0000
## G03 - G12 265.95 113 51 2.349 0.6458
## G03 - G13 -20.10 113 51 -0.178 1.0000
## G03 - G14 222.01 113 51 1.961 0.8732
## G03 - G15 158.35 113 51 1.399 0.9936
## G03 - G16 200.27 113 51 1.769 0.9412
## G03 - G17 519.73 113 51 4.591 0.0034
## G03 - G18 -225.20 113 51 -1.989 0.8604
## G04 - G05 106.35 113 51 0.939 1.0000
## G04 - G06 434.61 113 51 3.839 0.0319
## G04 - G07 -187.81 113 51 -1.659 0.9659
## G04 - G08 7.87 113 51 0.070 1.0000
## G04 - G09 -63.91 113 51 -0.565 1.0000
## G04 - G10 10.74 113 51 0.095 1.0000
## G04 - G11 21.11 113 51 0.187 1.0000
## G04 - G12 210.27 113 51 1.858 0.9141
## G04 - G13 -75.78 113 51 -0.669 1.0000
## G04 - G14 166.34 113 51 1.469 0.9892
## G04 - G15 102.68 113 51 0.907 1.0000
## G04 - G16 144.59 113 51 1.277 0.9976
## G04 - G17 464.06 113 51 4.100 0.0152
## G04 - G18 -280.88 113 51 -2.481 0.5540
## G05 - G06 328.26 113 51 2.900 0.2885
## G05 - G07 -294.15 113 51 -2.599 0.4731
## G05 - G08 -98.47 113 51 -0.870 1.0000
## G05 - G09 -170.25 113 51 -1.504 0.9864
## G05 - G10 -95.61 113 51 -0.845 1.0000
## G05 - G11 -85.23 113 51 -0.753 1.0000
## G05 - G12 103.92 113 51 0.918 1.0000
## G05 - G13 -182.12 113 51 -1.609 0.9741
## G05 - G14 59.99 113 51 0.530 1.0000
## G05 - G15 -3.67 113 51 -0.032 1.0000
## G05 - G16 38.24 113 51 0.338 1.0000
## G05 - G17 357.71 113 51 3.160 0.1715
## G05 - G18 -387.23 113 51 -3.421 0.0945
## G06 - G07 -622.41 113 51 -5.498 0.0002
## G06 - G08 -426.73 113 51 -3.770 0.0385
## G06 - G09 -498.51 113 51 -4.404 0.0061
## G06 - G10 -423.87 113 51 -3.744 0.0413
## G06 - G11 -413.49 113 51 -3.653 0.0526
## G06 - G12 -224.33 113 51 -1.982 0.8640
## G06 - G13 -510.38 113 51 -4.509 0.0044
## G06 - G14 -268.26 113 51 -2.370 0.6317
## G06 - G15 -331.93 113 51 -2.932 0.2716
## G06 - G16 -290.01 113 51 -2.562 0.4980
## G06 - G17 29.45 113 51 0.260 1.0000
## G06 - G18 -715.48 113 51 -6.321 <.0001
## G07 - G08 195.68 113 51 1.729 0.9514
## G07 - G09 123.90 113 51 1.095 0.9996
## G07 - G10 198.54 113 51 1.754 0.9452
## G07 - G11 208.92 113 51 1.846 0.9182
## G07 - G12 398.08 113 51 3.517 0.0746
## G07 - G13 112.03 113 51 0.990 0.9999
## G07 - G14 354.14 113 51 3.129 0.1834
## G07 - G15 290.49 113 51 2.566 0.4952
## G07 - G16 332.40 113 51 2.936 0.2695
## G07 - G17 651.86 113 51 5.759 0.0001
## G07 - G18 -93.07 113 51 -0.822 1.0000
## G08 - G09 -71.78 113 51 -0.634 1.0000
## G08 - G10 2.87 113 51 0.025 1.0000
## G08 - G11 13.24 113 51 0.117 1.0000
## G08 - G12 202.40 113 51 1.788 0.9360
## G08 - G13 -83.65 113 51 -0.739 1.0000
## G08 - G14 158.47 113 51 1.400 0.9935
## G08 - G15 94.81 113 51 0.838 1.0000
## G08 - G16 136.72 113 51 1.208 0.9988
## G08 - G17 456.19 113 51 4.030 0.0186
## G08 - G18 -288.75 113 51 -2.551 0.5057
## G09 - G10 74.64 113 51 0.659 1.0000
## G09 - G11 85.02 113 51 0.751 1.0000
## G09 - G12 274.18 113 51 2.422 0.5954
## G09 - G13 -11.87 113 51 -0.105 1.0000
## G09 - G14 230.25 113 51 2.034 0.8388
## G09 - G15 166.59 113 51 1.472 0.9891
## G09 - G16 208.50 113 51 1.842 0.9194
## G09 - G17 527.96 113 51 4.664 0.0027
## G09 - G18 -216.97 113 51 -1.917 0.8920
## G10 - G11 10.38 113 51 0.092 1.0000
## G10 - G12 199.53 113 51 1.763 0.9430
## G10 - G13 -86.51 113 51 -0.764 1.0000
## G10 - G14 155.60 113 51 1.375 0.9947
## G10 - G15 91.94 113 51 0.812 1.0000
## G10 - G16 133.85 113 51 1.182 0.9991
## G10 - G17 453.32 113 51 4.005 0.0200
## G10 - G18 -291.62 113 51 -2.576 0.4883
## G11 - G12 189.16 113 51 1.671 0.9637
## G11 - G13 -96.89 113 51 -0.856 1.0000
## G11 - G14 145.23 113 51 1.283 0.9975
## G11 - G15 81.57 113 51 0.721 1.0000
## G11 - G16 123.48 113 51 1.091 0.9997
## G11 - G17 442.94 113 51 3.913 0.0260
## G11 - G18 -301.99 113 51 -2.668 0.4269
## G12 - G13 -286.05 113 51 -2.527 0.5222
## G12 - G14 -43.93 113 51 -0.388 1.0000
## G12 - G15 -107.59 113 51 -0.950 0.9999
## G12 - G16 -65.68 113 51 -0.580 1.0000
## G12 - G17 253.78 113 51 2.242 0.7177
## G12 - G18 -491.15 113 51 -4.339 0.0074
## G13 - G14 242.12 113 51 2.139 0.7814
## G13 - G15 178.46 113 51 1.576 0.9786
## G13 - G16 220.37 113 51 1.947 0.8795
## G13 - G17 539.83 113 51 4.769 0.0019
## G13 - G18 -205.10 113 51 -1.812 0.9290
## G14 - G15 -63.66 113 51 -0.562 1.0000
## G14 - G16 -21.75 113 51 -0.192 1.0000
## G14 - G17 297.72 113 51 2.630 0.4519
## G14 - G18 -447.22 113 51 -3.951 0.0233
## G15 - G16 41.91 113 51 0.370 1.0000
## G15 - G17 361.38 113 51 3.192 0.1598
## G15 - G18 -383.56 113 51 -3.388 0.1021
## G16 - G17 319.46 113 51 2.822 0.3316
## G16 - G18 -425.47 113 51 -3.759 0.0397
## G17 - G18 -744.93 113 51 -6.581 <.0001
##
## Degrees-of-freedom method: kenward-roger
## P value adjustment: tukey method for comparing a family of 18 estimates
## $emmeans
## gen emmean SE df lower.CL upper.CL
## G01 582 84.5 45.9 412 752
## G02 515 84.5 45.9 345 685
## G03 805 84.5 45.9 635 975
## G04 749 84.5 45.9 579 919
## G05 643 84.5 45.9 473 813
## G06 315 84.5 45.9 145 485
## G07 937 84.5 45.9 767 1107
## G08 741 84.5 45.9 571 911
## G09 813 84.5 45.9 643 983
## G10 739 84.5 45.9 569 909
## G11 728 84.5 45.9 558 898
## G12 539 84.5 45.9 369 709
## G13 825 84.5 45.9 655 995
## G14 583 84.5 45.9 413 753
## G15 647 84.5 45.9 477 817
## G16 605 84.5 45.9 435 775
## G17 285 84.5 45.9 115 455
## G18 1030 84.5 45.9 860 1200
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## G01 - G05 -61.05 113 51 -0.539 0.9963
## G02 - G05 -127.57 113 51 -1.127 0.8940
## G03 - G05 162.02 113 51 1.431 0.7490
## G04 - G05 106.35 113 51 0.939 0.9500
## G06 - G05 -328.26 113 51 -2.900 0.0656
## G07 - G05 294.15 113 51 2.599 0.1309
## G08 - G05 98.47 113 51 0.870 0.9643
## G09 - G05 170.25 113 51 1.504 0.7063
## G10 - G05 95.61 113 51 0.845 0.9687
## G11 - G05 85.23 113 51 0.753 0.9815
## G12 - G05 -103.92 113 51 -0.918 0.9547
## G13 - G05 182.12 113 51 1.609 0.6413
## G14 - G05 -59.99 113 51 -0.530 0.9966
## G15 - G05 3.67 113 51 0.032 1.0000
## G16 - G05 -38.24 113 51 -0.338 0.9997
## G17 - G05 -357.71 113 51 -3.160 0.0340
## G18 - G05 387.23 113 51 3.421 0.0168
##
## Degrees-of-freedom method: kenward-roger
## P value adjustment: dunnettx method for 17 tests
## Rows: 432 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): env, rep, gen
## dbl (1): yield
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 432 × 4
## env rep gen yield
## <chr> <chr> <chr> <dbl>
## 1 E1 R1 G01 140.
## 2 E1 R1 G02 183.
## 3 E1 R1 G03 172.
## 4 E1 R1 G04 200.
## 5 E1 R1 G05 157.
## 6 E1 R1 G06 177.
## 7 E1 R1 G07 55.4
## 8 E1 R1 G08 89.5
## 9 E1 R1 G09 296.
## 10 E1 R1 G10 312.
## # ℹ 422 more rows
## env rep gen yield
## Length:432 Length:432 Length:432 Min. : 10.17
## Class :character Class :character Class :character 1st Qu.: 182.07
## Mode :character Mode :character Mode :character Median : 339.88
## Mean : 496.11
## 3rd Qu.: 728.59
## Max. :2037.63
## [1] "E1" "E2" "E3" "E4" "E5" "E6"
## [1] "R1" "R2" "R3" "R4"
## [1] "G01" "G02" "G03" "G04" "G05" "G06" "G07" "G08" "G09" "G10" "G11" "G12"
## [13] "G13" "G14" "G15" "G16" "G17" "G18"
model.S <- lmer(yield ~ gen + env + gen:env + (1|env:rep), omer.S) ## factors are considered fixed
test.S <- anova(model.S, ddf = "Kenward-Roger", type = 3)
test.S## Type III Analysis of Variance Table with Kenward-Roger's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## gen 2347588 138093 17 306 5.6000 5.183e-11 ***
## env 29552101 5910420 5 18 239.6815 8.225e-16 ***
## gen:env 9352495 110029 85 306 4.4619 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## gen env emmean SE df lower.CL upper.CL
## G01 E1 130.6 80.3 313 -27.475 289
## G02 E1 182.5 80.3 313 24.465 341
## G03 E1 159.3 80.3 313 1.253 317
## G04 E1 185.6 80.3 313 27.510 344
## G05 E1 157.2 80.3 313 -0.825 315
## G06 E1 164.6 80.3 313 6.548 323
## G07 E1 36.6 80.3 313 -121.467 195
## G08 E1 122.7 80.3 313 -35.385 281
## G09 E1 235.5 80.3 313 77.485 394
## G10 E1 286.6 80.3 313 128.565 445
## G11 E1 177.1 80.3 313 19.013 335
## G12 E1 157.3 80.3 313 -0.740 315
## G13 E1 96.4 80.3 313 -61.695 254
## G14 E1 100.6 80.3 313 -57.422 259
## G15 E1 22.3 80.3 313 -135.737 180
## G16 E1 188.1 80.3 313 30.053 346
## G17 E1 19.8 80.3 313 -138.245 178
## G18 E1 174.5 80.3 313 16.430 333
## G01 E2 242.8 80.3 313 84.748 401
## G02 E2 330.9 80.3 313 172.868 489
## G03 E2 212.0 80.3 313 53.925 370
## G04 E2 128.7 80.3 313 -29.357 287
## G05 E2 105.2 80.3 313 -52.862 263
## G06 E2 167.6 80.3 313 9.513 326
## G07 E2 188.6 80.3 313 30.555 347
## G08 E2 482.5 80.3 313 324.488 641
## G09 E2 439.7 80.3 313 281.643 598
## G10 E2 196.5 80.3 313 38.418 355
## G11 E2 353.8 80.3 313 195.703 512
## G12 E2 505.9 80.3 313 347.888 664
## G13 E2 643.5 80.3 313 485.455 802
## G14 E2 323.9 80.3 313 165.860 482
## G15 E2 440.9 80.3 313 282.893 599
## G16 E2 268.6 80.3 313 110.585 427
## G17 E2 293.6 80.3 313 135.495 452
## G18 E2 260.2 80.3 313 102.173 418
## G01 E3 581.9 80.3 313 423.800 740
## G02 E3 515.3 80.3 313 357.285 673
## G03 E3 804.9 80.3 313 646.875 963
## G04 E3 749.3 80.3 313 591.200 907
## G05 E3 642.9 80.3 313 484.853 801
## G06 E3 314.6 80.3 313 156.595 473
## G07 E3 937.1 80.3 313 779.005 1095
## G08 E3 741.4 80.3 313 583.328 899
## G09 E3 813.2 80.3 313 655.105 971
## G10 E3 738.5 80.3 313 580.463 897
## G11 E3 728.1 80.3 313 570.088 886
## G12 E3 539.0 80.3 313 380.928 697
## G13 E3 825.0 80.3 313 666.975 983
## G14 E3 582.9 80.3 313 424.860 741
## G15 E3 646.6 80.3 313 488.520 805
## G16 E3 604.7 80.3 313 446.608 763
## G17 E3 285.2 80.3 313 127.143 443
## G18 E3 1030.1 80.3 313 872.078 1188
## G01 E4 294.1 80.3 313 136.050 452
## G02 E4 639.8 80.3 313 481.710 798
## G03 E4 272.4 80.3 313 114.368 430
## G04 E4 393.5 80.3 313 235.475 552
## G05 E4 456.9 80.3 313 298.853 615
## G06 E4 146.6 80.3 313 -11.422 305
## G07 E4 524.2 80.3 313 366.158 682
## G08 E4 419.1 80.3 313 261.000 577
## G09 E4 353.0 80.3 313 194.903 511
## G10 E4 659.0 80.3 313 500.925 817
## G11 E4 640.3 80.3 313 482.235 798
## G12 E4 295.4 80.3 313 137.363 453
## G13 E4 590.9 80.3 313 432.830 749
## G14 E4 401.3 80.3 313 243.255 559
## G15 E4 481.3 80.3 313 323.265 639
## G16 E4 320.7 80.3 313 162.615 479
## G17 E4 996.3 80.3 313 838.290 1154
## G18 E4 670.7 80.3 313 512.675 829
## G01 E5 197.0 80.3 313 38.915 355
## G02 E5 157.9 80.3 313 -0.125 316
## G03 E5 221.4 80.3 313 63.345 379
## G04 E5 121.4 80.3 313 -36.667 279
## G05 E5 207.4 80.3 313 49.335 365
## G06 E5 191.2 80.3 313 33.153 349
## G07 E5 159.1 80.3 313 1.045 317
## G08 E5 130.9 80.3 313 -27.165 289
## G09 E5 115.2 80.3 313 -42.815 273
## G10 E5 398.4 80.3 313 240.328 556
## G11 E5 210.7 80.3 313 52.615 369
## G12 E5 169.3 80.3 313 11.248 327
## G13 E5 341.0 80.3 313 182.935 499
## G14 E5 203.7 80.3 313 45.625 362
## G15 E5 210.7 80.3 313 52.645 369
## G16 E5 122.1 80.3 313 -35.925 280
## G17 E5 210.7 80.3 313 52.615 369
## G18 E5 78.0 80.3 313 -80.055 236
## G01 E6 836.7 80.3 313 678.655 995
## G02 E6 1220.6 80.3 313 1062.568 1379
## G03 E6 902.3 80.3 313 744.228 1060
## G04 E6 814.5 80.3 313 656.398 973
## G05 E6 1180.0 80.3 313 1021.960 1338
## G06 E6 1170.3 80.3 313 1012.258 1328
## G07 E6 1156.6 80.3 313 998.588 1315
## G08 E6 1238.7 80.3 313 1080.618 1397
## G09 E6 1312.5 80.3 313 1154.440 1471
## G10 E6 1267.4 80.3 313 1109.325 1425
## G11 E6 1022.5 80.3 313 864.448 1181
## G12 E6 1338.7 80.3 313 1180.685 1497
## G13 E6 1220.6 80.3 313 1062.568 1379
## G14 E6 1649.5 80.3 313 1491.483 1808
## G15 E6 1749.9 80.3 313 1591.870 1908
## G16 E6 918.8 80.3 313 760.695 1077
## G17 E6 1323.4 80.3 313 1165.380 1481
## G18 E6 992.6 80.3 313 834.523 1151
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
Run a two-way ANOVA (genotype and environment) considering these two factors and the interaction as random.
model.S.R <- lmer( yield ~ (1|gen) + (1|env) + (1|gen:env) + (1|env:rep), omer.S)
summary(model.S.R)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: yield ~ (1 | gen) + (1 | env) + (1 | gen:env) + (1 | env:rep)
## Data: omer.S
##
## REML criterion at convergence: 5785.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7533 -0.3550 -0.0211 0.2433 7.1388
##
## Random effects:
## Groups Name Variance Std.Dev.
## gen:env (Intercept) 21343 146.09
## env:rep (Intercept) 1152 33.94
## gen (Intercept) 1169 34.19
## env (Intercept) 149280 386.37
## Residual 24660 157.03
## Number of obs: 432, groups: gen:env, 108; env:rep, 24; gen, 18; env, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 496.113 158.895 5.027 3.122 0.026 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(h^2= \frac{σ_g^2}{{σ_g^2+ \frac{σ_{g*e}^2}{e}+\frac{σ_e^2}{e*r}}}\)
\(h^2= \frac{1169}{1169+(21343/6)+(24660/(4*6))} = 0.20\)
PCA is used for exploratory data analysis, allowing to better visualize the variation present in a data set with many variables.
It involves the transformation of large data into a smaller one that still contains most of the principal components.
The principal components are the directions where there is the most variance.
An eigenvector is a direction which can be horizontal or a certain degree while an eigenvalue is a number indicating the variance of the data in that direction.
The data used to conduct PCA is not usually raw data, because the treatments should not have replicates.
Library used factoextra , stats , ggbiplot
## Rows: 464 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): env, loc, gen
## dbl (7): year, yield, height, lodging, size, protein, oil
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## spc_tbl_ [116 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ env : chr [1:116] "B70" "B70" "B70" "B70" ...
## $ loc : chr [1:116] "Brookstead" "Brookstead" "Brookstead" "Brookstead" ...
## $ year : num [1:116] 1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 ...
## $ gen : chr [1:116] "G01" "G02" "G03" "G04" ...
## $ yield : num [1:116] 1.253 1.167 0.468 1.445 1.338 ...
## $ height : num [1:116] 1.01 1.13 1.16 1.24 1.12 ...
## $ lodging: num [1:116] 3.25 2.75 2.25 1.5 2 2.25 2 2.25 1.75 2 ...
## $ size : num [1:116] 8.85 8.9 10.8 10.6 11.95 ...
## $ protein: num [1:116] 39.5 38.6 37.8 38.7 37.8 ...
## $ oil : num [1:116] 18.9 19.8 20.4 20.4 20.8 ...
## - attr(*, "spec")=
## .. cols(
## .. env = col_character(),
## .. loc = col_character(),
## .. year = col_double(),
## .. gen = col_character(),
## .. yield = col_double(),
## .. height = col_double(),
## .. lodging = col_double(),
## .. size = col_double(),
## .. protein = col_double(),
## .. oil = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
data.brookstead.pca <- prcomp(data.brookstead, center = TRUE, scale = TRUE)
summary(data.brookstead.pca)## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 1.8780 0.9949 0.9176 0.60462 0.42702 0.30580
## Proportion of Variance 0.5878 0.1650 0.1403 0.06093 0.03039 0.01559
## Cumulative Proportion 0.5878 0.7528 0.8931 0.95402 0.98441 1.00000
ggbiplot(data.brookstead.pca, labels = labels.gen.b, groups = labels.year.b)+
labs(title = "Yield and Other Traits of soybeans in Brookstead Australia",
subtitle = "Period 1970-1971",
caption = "Data: agridat::australia.soybean")Hierarchical classification algorithm allows a set of individuals to be grouped into subsets.
To create coherent clusters which are clearly different from each other.
Individuals in a cluster should be as homogeneous as possible.
Agglomerative: sequential grouping of similar clusters.
Divisive: grouping of all the observation together into one cluster and the successively splitting into smaller clusters.
The similarities between clusters are often calculated using distance metrics . The greater the distance between two clusters the better. After distance metric we choose the linkage criteria.
## Rows: 2432 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): gen, env
## dbl (8): amylase, diapow, hddate, lodging, malt, height, protein, yield
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## gen env amylase diapow
## Length:2432 Length:2432 Min. :14.90 Min. : 35.00
## Class :character Class :character 1st Qu.:25.62 1st Qu.: 70.00
## Mode :character Mode :character Median :28.50 Median : 84.00
## Mean :29.04 Mean : 87.25
## 3rd Qu.:32.00 3rd Qu.:101.00
## Max. :49.30 Max. :229.00
## NA's :1066 NA's :1066
## hddate lodging malt height
## Min. :143.0 Min. : 0.00 Min. :69.00 Min. : 34.00
## 1st Qu.:174.5 1st Qu.: 15.00 1st Qu.:73.30 1st Qu.: 82.50
## Median :183.5 Median : 35.00 Median :74.50 Median : 95.00
## Mean :181.2 Mean : 37.13 Mean :74.72 Mean : 94.95
## 3rd Qu.:191.0 3rd Qu.: 55.00 3rd Qu.:75.90 3rd Qu.:109.00
## Max. :217.0 Max. :100.00 Max. :83.00 Max. :151.00
## NA's :1521 NA's :1067 NA's :5
## protein yield
## Min. : 8.00 Min. : 1.390
## 1st Qu.:11.90 1st Qu.: 4.092
## Median :13.00 Median : 5.272
## Mean :12.92 Mean : 5.293
## 3rd Qu.:14.00 3rd Qu.: 6.338
## Max. :17.50 Max. :11.526
## NA's :1067
## [1] "MN92" "MTi92" "MTd92" "ID91" "OR91" "WA91" "MTi91" "MTd91" "NY92"
## [10] "ON92" "WA92" "MA92" "SKo92" "SKg92" "SKk92" "ID92"
## gen env amylase diapow
## Length:152 Length:152 Min. :19.10 Min. : 47.00
## Class :character Class :character 1st Qu.:25.48 1st Qu.: 72.00
## Mode :character Mode :character Median :28.05 Median : 83.00
## Mean :28.08 Mean : 84.16
## 3rd Qu.:30.12 3rd Qu.: 94.00
## Max. :40.10 Max. :144.00
##
## hddate lodging malt height protein
## Min. :173.0 Min. : NA Min. :71.10 Min. : 92.7 Min. :11.30
## 1st Qu.:178.0 1st Qu.: NA 1st Qu.:73.40 1st Qu.:108.9 1st Qu.:12.90
## Median :180.5 Median : NA Median :74.40 Median :114.3 Median :13.50
## Mean :180.5 Mean :NaN Mean :74.36 Mean :114.0 Mean :13.57
## 3rd Qu.:183.0 3rd Qu.: NA 3rd Qu.:75.20 3rd Qu.:119.4 3rd Qu.:14.22
## Max. :188.0 Max. : NA Max. :78.50 Max. :137.2 Max. :16.00
## NA's :152
## yield
## Min. : 4.048
## 1st Qu.: 6.681
## Median : 7.684
## Mean : 7.500
## 3rd Qu.: 8.382
## Max. :10.315
##
## [1] "gen" "env" "amylase" "diapow" "hddate" "malt" "height"
## [8] "protein" "yield"
dat.ID91 <- na.omit(dat.ID91)
dat.ID91.label <- dat.ID91$gen
dat.ID91 <- dat.ID91 %>%
select(-c(gen, env))
dat.ID91## # A tibble: 152 × 7
## amylase diapow hddate malt height protein yield
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 26.2 74 179 74.1 111 12.1 8.63
## 2 36.2 97 180 76.7 116 14.8 7.95
## 3 32.8 83 183 74.4 117. 14.3 6.10
## 4 24.9 81 178. 72.3 113 14 7.20
## 5 30.2 69 180 72.9 117. 13.8 6.01
## 6 30.3 99 184. 73.2 119. 14.9 8.09
## 7 33.4 63 178 73.5 104. 13.1 7.98
## 8 25.5 74 186. 71.8 130. 14.6 5.93
## 9 26.8 78 185 75 124. 12.8 8.72
## 10 28.3 85 179 71.3 119. 14.1 6.48
## # ℹ 142 more rows
## amylase diapow hddate malt
## Min. :-2.323173 Min. :-2.13347 Min. :-2.13124 Min. :-2.39465
## 1st Qu.:-0.674579 1st Qu.:-0.69806 1st Qu.:-0.71724 1st Qu.:-0.70544
## Median :-0.008677 Median :-0.06648 Median :-0.01023 Median : 0.02899
## Mean : 0.000000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.527924 3rd Qu.: 0.56510 3rd Qu.: 0.69677 3rd Qu.: 0.61654
## Max. : 3.107488 Max. : 3.43592 Max. : 2.11078 Max. : 3.04018
## height protein yield
## Min. :-2.38931 Min. :-2.44921 Min. :-2.7274
## 1st Qu.:-0.57607 1st Qu.:-0.72269 1st Qu.:-0.6471
## Median : 0.02835 Median :-0.07525 Median : 0.1461
## Mean : 0.00000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.59919 3rd Qu.: 0.70708 3rd Qu.: 0.6974
## Max. : 2.59152 Max. : 2.62243 Max. : 2.2246
dist.Id91 <- dist(dat.ID91.sc, method = "euclidean")
hclust.ID91 <- hclust(dist.Id91, method = "ward.D2")
plot(hclust.ID91)dend.ID91 <- as.dendrogram(hclust.ID91)
col.dend.ID91 <- color_branches(dend.ID91, k =3)
plot(col.dend.ID91)K-means clustering is used to partition data sets, however the k groups is pre-defined before analysis.
It allows for classification of observations in a data set into multiple groups called clusters and each cluster is represented by it’s center corresponding to the mean of the points assigned.
It is used when the data does not have existing groups labels.
dat.ID91.k <- dat %>%
filter(env=="ID91") %>%
select(-c(env, lodging)) %>%
column_to_rownames(var = "gen")
head(dat.ID91.k)## amylase diapow hddate malt height protein yield
## Steptoe 26.2 74 179.0 74.1 111.0 12.1 8.629
## Morex 36.2 97 180.0 76.7 116.0 14.8 7.951
## SM1 32.8 83 183.0 74.4 116.8 14.3 6.095
## SM2 24.9 81 178.5 72.3 113.0 14.0 7.201
## SM3 30.2 69 180.0 72.9 116.8 13.8 6.014
## SM4 30.3 99 183.5 73.2 119.4 14.9 8.087
dat.ID91.k.sc <- dist(scale(dat.ID91.k), method = "euclidean")
#fviz_nbclust(dat.ID91.k.sc, kmeans, method = "wss")
#datpca.k <- prcomp(dat.ID91.k.sc, scale = TRUE)
#fviz_eig(datpca.k, addlabels = TRUE)
set.seed(123)
k.means.ID91 <- kmeans(dat.ID91.k.sc, 3, nstart = 40)
fviz_cluster(k.means.ID91, data = dat.ID91.k.sc,
ellipse.type = "euclid", repel = TRUE,
labelsize = 8, star.plot= TRUE)