The PSU2024 field CSV (data/PSU2024.csv) provides
per-plot phenotype data with design columns REP,
IBLOCK, P_SQUARE, ROW,
COLUMN, but no absolute field X/Y
coordinates. SpATS spatial correction needs X/Y on a single
continuous grid covering all plots.
This notebook documents how those coordinates were inferred from the
data and the in-field knowledge that the smallest plot ID (RS)
is at the bottom-left of the field and the maximum at the top-right,
with -P treatment on the left and +P on the right. The
resulting offset table is what
Corrected_phenotype_analysis_PSU2024.Rmd uses in its X/Y
derivation.
dat <- read_csv(here("data", "PSU2024.csv"), show_col_types = FALSE) %>%
mutate(P_LEVEL = factor(P_LEVEL, levels = c("LOW_P", "HIGH_P")))
cat("rows:", nrow(dat), "\n")
## rows: 960
cat("ROW range: ", range(dat$ROW), "\n")
## ROW range: 1 10
cat("COLUMN range: ", range(dat$COLUMN), "\n")
## COLUMN range: 1 24
cat("REP levels: ", paste(sort(unique(dat$REP)), collapse = ", "), "\n")
## REP levels: 1, 2, 3, 4, 5
cat("IBLOCK levels:", paste(sort(unique(dat$IBLOCK)), collapse = ", "), "\n")
## IBLOCK levels: 1, 2, 3, 4, 5, 6, 7, 8
cat("P_SQUARE: ", paste(sort(unique(dat$P_SQUARE)), collapse = ", "), "\n")
## P_SQUARE: 1, 2, 3, 4
cat("P_LEVEL: ", paste(levels(dat$P_LEVEL), collapse = ", "), "\n")
## P_LEVEL: LOW_P, HIGH_P
cat("n GENOTYPE: ", length(unique(dat$GENOTYPE)), "\n")
## n GENOTYPE: 46
cat("RS range: ", range(dat$RS), "\n")
## RS range: 2001 2960
dat %>% count(REP, P_SQUARE, P_LEVEL) %>% print(n = 30)
## # A tibble: 20 × 4
## REP P_SQUARE P_LEVEL n
## <dbl> <dbl> <fct> <int>
## 1 1 1 LOW_P 48
## 2 1 2 HIGH_P 48
## 3 1 3 LOW_P 48
## 4 1 4 HIGH_P 48
## 5 2 1 LOW_P 48
## 6 2 2 HIGH_P 48
## 7 2 3 LOW_P 48
## 8 2 4 HIGH_P 48
## 9 3 1 LOW_P 48
## 10 3 2 HIGH_P 48
## 11 3 3 LOW_P 48
## 12 3 4 HIGH_P 48
## 13 4 1 LOW_P 48
## 14 4 2 HIGH_P 48
## 15 4 3 LOW_P 48
## 16 4 4 HIGH_P 48
## 17 5 1 LOW_P 48
## 18 5 2 HIGH_P 48
## 19 5 3 LOW_P 48
## 20 5 4 HIGH_P 48
5 REPs × 4 P_SQUAREs × 48 plots = 960 plots. P_SQUAREs 1 and 3 are LOW_P, P_SQUAREs 2 and 4 are HIGH_P.
If ROW × COLUMN were global field
coordinates we’d expect one plot per cell. Counting how many plots share
each (ROW, COLUMN):
dat %>% count(ROW, COLUMN) %>%
count(n, name = "cells_with_this_count")
## # A tibble: 1 × 2
## n cells_with_this_count
## <int> <int>
## 1 4 240
dat %>% count(REP, P_SQUARE, ROW, COLUMN) %>%
count(n, name = "cells_with_this_count")
## # A tibble: 1 × 2
## n cells_with_this_count
## <int> <int>
## 1 1 960
Each (ROW, COLUMN) is shared by 4 plots — one per
P_SQUARE — but (REP, P_SQUARE, ROW, COLUMN) is a unique
identifier. So ROW and COLUMN restart
inside each P_SQUARE, and P_SQUARE is an independent block
dimension not captured by the spatial coords alone.
dat %>% group_by(P_LEVEL) %>%
summarise(n = n(), RS_min = min(RS), RS_max = max(RS),
ID_min = min(ID), ID_max = max(ID))
## # A tibble: 2 × 6
## P_LEVEL n RS_min RS_max ID_min ID_max
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 LOW_P 480 2001 2720 1 720
## 2 HIGH_P 480 2241 2960 241 960
dat %>% group_by(P_SQUARE, P_LEVEL) %>%
summarise(n = n(), RS_min = min(RS), RS_max = max(RS),
ID_min = min(ID), ID_max = max(ID), .groups = "drop")
## # A tibble: 4 × 7
## P_SQUARE P_LEVEL n RS_min RS_max ID_min ID_max
## <dbl> <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 LOW_P 240 2001 2240 1 240
## 2 2 HIGH_P 240 2241 2480 241 480
## 3 3 LOW_P 240 2481 2720 481 720
## 4 4 HIGH_P 240 2721 2960 721 960
RS is contiguous within each P_SQUARE — no interleaving. P_SQ 1 = LOW_P (2001-2240), P_SQ 2 = HIGH_P (2241-2480), P_SQ 3 = LOW_P (2481-2720), P_SQ 4 = HIGH_P (2721-2960). The ID-numbering scheme labels the P_SQUAREs but does not by itself determine their physical arrangement in the field.
PSU2022 used a comparable 4-square design with thousands-digit encoding of P treatment. Same pattern — two LowP blocks and two HighP blocks, each spanning 192 plots (16 IBLOCK × 12 plots):
psu2022 <- read_csv(here("data", "22_NCS_PSU_LANGEBIO_FIELDS_PSU_P_field.csv"),
show_col_types = FALSE) %>%
rename(plot_id = `P22-`)
psu2022 %>% group_by(Treatment) %>%
summarise(n = n(), id_min = min(plot_id), id_max = max(plot_id))
## # A tibble: 2 × 4
## Treatment n id_min id_max
## <chr> <int> <dbl> <dbl>
## 1 HighP 384 2001 4192
## 2 LowP 384 1001 3192
PSU2022 LowP plots cluster in 1xxx + 3xxx,
HighP plots in 2xxx + 4xxx — the same
alternating-block convention. Physical orientation (-P left / +P right)
was confirmed by field observation.
Combining:
ROW (1-10) and COLUMN (1-24) are local
within each 240-plot P_SQUARE.Gives a 2×2 layout:
top | P_SQ 3 (LowP) | P_SQ 4 (HighP) | RS 2960 → top-right
bottom | P_SQ 1 (LowP) | P_SQ 2 (HighP) | RS 2001 → bottom-left
ROW 1-10, COL 1-24 ROW 1-10, COL 1-24
Assembled field = 20 rows tall × 48 columns wide.
n_col_psq <- max(dat$COLUMN) # 24
n_row_psq <- max(dat$ROW) # 10
psq_x_offset <- c(`1` = 0L, `3` = 0L,
`2` = n_col_psq, `4` = n_col_psq)
psq_y_offset <- c(`1` = 0L, `2` = 0L,
`3` = n_row_psq, `4` = n_row_psq)
dat <- dat %>%
mutate(X = psq_x_offset[as.character(P_SQUARE)] + COLUMN,
Y = psq_y_offset[as.character(P_SQUARE)] + ROW)
dat %>% group_by(P_SQUARE, P_LEVEL) %>%
summarise(X_range = paste(range(X), collapse = "-"),
Y_range = paste(range(Y), collapse = "-"),
.groups = "drop")
## # A tibble: 4 × 4
## P_SQUARE P_LEVEL X_range Y_range
## <dbl> <fct> <chr> <chr>
## 1 1 LOW_P 1-24 1-10
## 2 2 HIGH_P 25-48 1-10
## 3 3 LOW_P 1-24 11-20
## 4 4 HIGH_P 25-48 11-20
This is the exact offset table copied into
Corrected_phenotype_analysis_PSU2024.Rmd §2.
p_lvl <- ggplot(dat, aes(x = X, y = Y, fill = P_LEVEL)) +
geom_tile(color = "white", linewidth = 0.15) +
scale_fill_manual(values = c("LOW_P" = "#d95f02", "HIGH_P" = "#1b9e77")) +
scale_y_continuous(breaks = seq(1, 20, 2)) +
coord_equal() +
labs(title = "PA2024 field — assembled 2×2 layout, colored by P_LEVEL",
subtitle = "RS 2001 bottom-left → RS 2960 top-right; -P on left, +P on right",
x = "field X (assembled, COL 1-48)", y = "field row (1-20)",
fill = "P level") +
theme_minimal(base_size = 11) +
theme(panel.grid = element_blank())
ggsave(file.path(paths$figures, "PSU2024_layout_by_P_LEVEL.png"),
p_lvl, width = 9, height = 6, dpi = 150)
p_lvl
p_sq <- ggplot(dat, aes(x = X, y = Y, fill = factor(P_SQUARE))) +
geom_tile(color = "white", linewidth = 0.15) +
scale_y_continuous(breaks = seq(1, 20, 2)) +
coord_equal() +
labs(title = "PA2024 field — colored by P_SQUARE",
x = "field X (assembled)", y = "field row",
fill = "P_SQUARE") +
theme_minimal(base_size = 11) +
theme(panel.grid = element_blank())
ggsave(file.path(paths$figures, "PSU2024_layout_by_P_SQUARE.png"),
p_sq, width = 9, height = 6, dpi = 150)
p_sq
p_rep <- ggplot(dat, aes(x = X, y = Y, fill = factor(REP))) +
geom_tile(color = "white", linewidth = 0.15) +
scale_y_continuous(breaks = seq(1, 20, 2)) +
coord_equal() +
labs(title = "PA2024 field — colored by REP",
x = "field X (assembled)", y = "field row",
fill = "REP") +
theme_minimal(base_size = 11) +
theme(panel.grid = element_blank())
ggsave(file.path(paths$figures, "PSU2024_layout_by_REP.png"),
p_rep, width = 9, height = 6, dpi = 150)
p_rep
This layout interpretation rests on two pieces of in-field knowledge:
If either is wrong, the offset table at §7 needs to change. Specifically:
The downstream analysis
(Corrected_phenotype_analysis_PSU2024.Rmd) uses these X/Y
as the SpATS spatial coords with no further assumptions about plot
positions, so a wrong layout would manifest as a poor SpATS surface fit
(and visibly weird spatial residual plots).