Purpose

The PSU2024 field CSV (data/PSU2024.csv) provides per-plot phenotype data with design columns REP, IBLOCK, P_SQUARE, ROW, COLUMN, but no absolute field X/Y coordinates. SpATS spatial correction needs X/Y on a single continuous grid covering all plots.

This notebook documents how those coordinates were inferred from the data and the in-field knowledge that the smallest plot ID (RS) is at the bottom-left of the field and the maximum at the top-right, with -P treatment on the left and +P on the right. The resulting offset table is what Corrected_phenotype_analysis_PSU2024.Rmd uses in its X/Y derivation.

1. Load PSU2024

dat <- read_csv(here("data", "PSU2024.csv"), show_col_types = FALSE) %>%
  mutate(P_LEVEL = factor(P_LEVEL, levels = c("LOW_P", "HIGH_P")))

cat("rows:", nrow(dat), "\n")
## rows: 960
cat("ROW range:    ", range(dat$ROW), "\n")
## ROW range:     1 10
cat("COLUMN range: ", range(dat$COLUMN), "\n")
## COLUMN range:  1 24
cat("REP levels:   ", paste(sort(unique(dat$REP)), collapse = ", "), "\n")
## REP levels:    1, 2, 3, 4, 5
cat("IBLOCK levels:", paste(sort(unique(dat$IBLOCK)), collapse = ", "), "\n")
## IBLOCK levels: 1, 2, 3, 4, 5, 6, 7, 8
cat("P_SQUARE:     ", paste(sort(unique(dat$P_SQUARE)), collapse = ", "), "\n")
## P_SQUARE:      1, 2, 3, 4
cat("P_LEVEL:      ", paste(levels(dat$P_LEVEL), collapse = ", "), "\n")
## P_LEVEL:       LOW_P, HIGH_P
cat("n GENOTYPE:   ", length(unique(dat$GENOTYPE)), "\n")
## n GENOTYPE:    46
cat("RS range:     ", range(dat$RS), "\n")
## RS range:      2001 2960

2. Design structure

dat %>% count(REP, P_SQUARE, P_LEVEL) %>% print(n = 30)
## # A tibble: 20 × 4
##      REP P_SQUARE P_LEVEL     n
##    <dbl>    <dbl> <fct>   <int>
##  1     1        1 LOW_P      48
##  2     1        2 HIGH_P     48
##  3     1        3 LOW_P      48
##  4     1        4 HIGH_P     48
##  5     2        1 LOW_P      48
##  6     2        2 HIGH_P     48
##  7     2        3 LOW_P      48
##  8     2        4 HIGH_P     48
##  9     3        1 LOW_P      48
## 10     3        2 HIGH_P     48
## 11     3        3 LOW_P      48
## 12     3        4 HIGH_P     48
## 13     4        1 LOW_P      48
## 14     4        2 HIGH_P     48
## 15     4        3 LOW_P      48
## 16     4        4 HIGH_P     48
## 17     5        1 LOW_P      48
## 18     5        2 HIGH_P     48
## 19     5        3 LOW_P      48
## 20     5        4 HIGH_P     48

5 REPs × 4 P_SQUAREs × 48 plots = 960 plots. P_SQUAREs 1 and 3 are LOW_P, P_SQUAREs 2 and 4 are HIGH_P.

3. ROW × COLUMN are local within each P_SQUARE

If ROW × COLUMN were global field coordinates we’d expect one plot per cell. Counting how many plots share each (ROW, COLUMN):

dat %>% count(ROW, COLUMN) %>%
  count(n, name = "cells_with_this_count")
## # A tibble: 1 × 2
##       n cells_with_this_count
##   <int>                 <int>
## 1     4                   240
dat %>% count(REP, P_SQUARE, ROW, COLUMN) %>%
  count(n, name = "cells_with_this_count")
## # A tibble: 1 × 2
##       n cells_with_this_count
##   <int>                 <int>
## 1     1                   960

Each (ROW, COLUMN) is shared by 4 plots — one per P_SQUARE — but (REP, P_SQUARE, ROW, COLUMN) is a unique identifier. So ROW and COLUMN restart inside each P_SQUARE, and P_SQUARE is an independent block dimension not captured by the spatial coords alone.

4. Plot-ID distribution by P_LEVEL (PSU2024)

dat %>% group_by(P_LEVEL) %>%
  summarise(n = n(), RS_min = min(RS), RS_max = max(RS),
            ID_min = min(ID), ID_max = max(ID))
## # A tibble: 2 × 6
##   P_LEVEL     n RS_min RS_max ID_min ID_max
##   <fct>   <int>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 LOW_P     480   2001   2720      1    720
## 2 HIGH_P    480   2241   2960    241    960
dat %>% group_by(P_SQUARE, P_LEVEL) %>%
  summarise(n = n(), RS_min = min(RS), RS_max = max(RS),
            ID_min = min(ID), ID_max = max(ID), .groups = "drop")
## # A tibble: 4 × 7
##   P_SQUARE P_LEVEL     n RS_min RS_max ID_min ID_max
##      <dbl> <fct>   <int>  <dbl>  <dbl>  <dbl>  <dbl>
## 1        1 LOW_P     240   2001   2240      1    240
## 2        2 HIGH_P    240   2241   2480    241    480
## 3        3 LOW_P     240   2481   2720    481    720
## 4        4 HIGH_P    240   2721   2960    721    960

RS is contiguous within each P_SQUARE — no interleaving. P_SQ 1 = LOW_P (2001-2240), P_SQ 2 = HIGH_P (2241-2480), P_SQ 3 = LOW_P (2481-2720), P_SQ 4 = HIGH_P (2721-2960). The ID-numbering scheme labels the P_SQUAREs but does not by itself determine their physical arrangement in the field.

5. Cross-check vs PSU2022

PSU2022 used a comparable 4-square design with thousands-digit encoding of P treatment. Same pattern — two LowP blocks and two HighP blocks, each spanning 192 plots (16 IBLOCK × 12 plots):

psu2022 <- read_csv(here("data", "22_NCS_PSU_LANGEBIO_FIELDS_PSU_P_field.csv"),
                    show_col_types = FALSE) %>%
  rename(plot_id = `P22-`)

psu2022 %>% group_by(Treatment) %>%
  summarise(n = n(), id_min = min(plot_id), id_max = max(plot_id))
## # A tibble: 2 × 4
##   Treatment     n id_min id_max
##   <chr>     <int>  <dbl>  <dbl>
## 1 HighP       384   2001   4192
## 2 LowP        384   1001   3192

PSU2022 LowP plots cluster in 1xxx + 3xxx, HighP plots in 2xxx + 4xxx — the same alternating-block convention. Physical orientation (-P left / +P right) was confirmed by field observation.

6. Inferred physical layout — 2×2 of P_SQUAREs

Combining:

  • ROW (1-10) and COLUMN (1-24) are local within each 240-plot P_SQUARE.
  • RS 2001 (P_SQUARE 1, ROW 1, COLUMN 1) sits at the bottom-left of the field; RS 2960 (P_SQUARE 4, ROW 10, COLUMN 24) sits at the top-right.
  • The two LowP squares are on the left half, the two HighP on the right.

Gives a 2×2 layout:

top    | P_SQ 3 (LowP)  | P_SQ 4 (HighP) |   RS 2960 → top-right
bottom | P_SQ 1 (LowP)  | P_SQ 2 (HighP) |   RS 2001 → bottom-left
        ROW 1-10, COL 1-24  ROW 1-10, COL 1-24

Assembled field = 20 rows tall × 48 columns wide.

7. X / Y derivation

n_col_psq <- max(dat$COLUMN)   # 24
n_row_psq <- max(dat$ROW)      # 10
psq_x_offset <- c(`1` = 0L, `3` = 0L,
                  `2` = n_col_psq, `4` = n_col_psq)
psq_y_offset <- c(`1` = 0L, `2` = 0L,
                  `3` = n_row_psq, `4` = n_row_psq)

dat <- dat %>%
  mutate(X = psq_x_offset[as.character(P_SQUARE)] + COLUMN,
         Y = psq_y_offset[as.character(P_SQUARE)] + ROW)

dat %>% group_by(P_SQUARE, P_LEVEL) %>%
  summarise(X_range = paste(range(X), collapse = "-"),
            Y_range = paste(range(Y), collapse = "-"),
            .groups = "drop")
## # A tibble: 4 × 4
##   P_SQUARE P_LEVEL X_range Y_range
##      <dbl> <fct>   <chr>   <chr>  
## 1        1 LOW_P   1-24    1-10   
## 2        2 HIGH_P  25-48   1-10   
## 3        3 LOW_P   1-24    11-20  
## 4        4 HIGH_P  25-48   11-20

This is the exact offset table copied into Corrected_phenotype_analysis_PSU2024.Rmd §2.

8. Field maps

p_lvl <- ggplot(dat, aes(x = X, y = Y, fill = P_LEVEL)) +
  geom_tile(color = "white", linewidth = 0.15) +
  scale_fill_manual(values = c("LOW_P" = "#d95f02", "HIGH_P" = "#1b9e77")) +
  scale_y_continuous(breaks = seq(1, 20, 2)) +
  coord_equal() +
  labs(title = "PA2024 field — assembled 2×2 layout, colored by P_LEVEL",
       subtitle = "RS 2001 bottom-left → RS 2960 top-right; -P on left, +P on right",
       x = "field X (assembled, COL 1-48)", y = "field row (1-20)",
       fill = "P level") +
  theme_minimal(base_size = 11) +
  theme(panel.grid = element_blank())

ggsave(file.path(paths$figures, "PSU2024_layout_by_P_LEVEL.png"),
       p_lvl, width = 9, height = 6, dpi = 150)
p_lvl

p_sq <- ggplot(dat, aes(x = X, y = Y, fill = factor(P_SQUARE))) +
  geom_tile(color = "white", linewidth = 0.15) +
  scale_y_continuous(breaks = seq(1, 20, 2)) +
  coord_equal() +
  labs(title = "PA2024 field — colored by P_SQUARE",
       x = "field X (assembled)", y = "field row",
       fill = "P_SQUARE") +
  theme_minimal(base_size = 11) +
  theme(panel.grid = element_blank())

ggsave(file.path(paths$figures, "PSU2024_layout_by_P_SQUARE.png"),
       p_sq, width = 9, height = 6, dpi = 150)
p_sq

p_rep <- ggplot(dat, aes(x = X, y = Y, fill = factor(REP))) +
  geom_tile(color = "white", linewidth = 0.15) +
  scale_y_continuous(breaks = seq(1, 20, 2)) +
  coord_equal() +
  labs(title = "PA2024 field — colored by REP",
       x = "field X (assembled)", y = "field row",
       fill = "REP") +
  theme_minimal(base_size = 11) +
  theme(panel.grid = element_blank())

ggsave(file.path(paths$figures, "PSU2024_layout_by_REP.png"),
       p_rep, width = 9, height = 6, dpi = 150)
p_rep

9. Notes for review

This layout interpretation rests on two pieces of in-field knowledge:

  1. Smallest RS (2001) → physical bottom-left; largest RS (2960) → top-right.
  2. LowP on the left half, HighP on the right.

If either is wrong, the offset table at §7 needs to change. Specifically:

  • If the layout is taller than wide (transposed), swap X ↔︎ Y in §7.
  • If P_SQUARE 1 is top-left rather than bottom-left, swap the Y offsets.
  • If LowP and HighP squares are interleaved (1-2-3-4 left-to-right) rather than side-by-side (1-3 left, 2-4 right), the X offsets need to be permuted.

The downstream analysis (Corrected_phenotype_analysis_PSU2024.Rmd) uses these X/Y as the SpATS spatial coords with no further assumptions about plot positions, so a wrong layout would manifest as a poor SpATS surface fit (and visibly weird spatial residual plots).