I am studying a California native plant that has different distribution of flower color when growing in different soil types. I believe that you have already met with my collaborator Magdalene Lo. She has grown these plants in different soil types to measure fitness and survival. I have taken the plant samples she has grown and am doing genetic analysis on them.
Research questions from client:
How does gene expression of anthocyanin biosynthetic genes (the types of genes I am studying) differ in different flower colors of L. parviflorus (species) when grown in different soil types?
How does gene expression differ between flower colors when grown in the same soil type?
Study design
Plants with white and pink flowers were grown in four soil types. There were 4 trays of the control, 5 trays of drought (water limitation), 5 trays of magnesium, and 6 trays of MgxDr (interaction). Each tray had both white and pink flower morphs. Client does not have information about how many were in each tray or how individuals were allocated to trays, only that several hundred seeds were planted for each flower color. Client reports “there were 12 families of pink flowers and 6 families of white flowers crossed so there are about double the number of pink seeds planted.”
Client collected individual plants of each flower color from each soil type. With respect to soil type, some individuals were collected from different trays but some were not. The individuals were chosen based on the family. For the plants that were collected, the client sampled tissue from three locations: flower, leaf, and root. These tissue samples were combined across individuals to form pooled samples from which RNA was extracted. The RNA samples were divided in two – the ‘replicate’ column identifies these subsamples – and six genes were amplified independently via qPCR to obtain gene expression data.
Sample data
Sample data provided:
Code
# read in and preview dataraw <-read_csv('sample-data.csv') |>rename_with(tolower)head(raw) |>pander()
tissue
color
treatment
ct
absolute
replicate
leaf
pink
control
27.8
3.575
1
leaf
pink
control
27.42
3.693
2
leaf
white
control
26.2
4.065
1
leaf
white
control
26.18
4.07
2
leaf
pink
mgxdr
28.21
3.449
1
leaf
pink
mgxdr
27.66
3.619
2
The responses are:
CT values: number of PCR cycles until concentration threshold is crossed
absolute concentration: back-computed from CT values using qPCR standard
Save for small differences in the conversion factor, these are scalar multiples of each other.
Client request
Questions:
What type of analysis is best? It will be some kind of comparison of means but what exactly makes the most sense.
How can we potentially bring multiple tissue types into the analysis so we do not have to multiply the analyses by 3?
How can we control for the fact that the conversion factor may be slightly different between each run?
Some kind of R or JMP template!
[added 7/30] Plots of the differences between pink and white morphs in different soil conditions.
Recommendations
Treatment coding
Since the soil types themselves have factorial structure, the treatment structure should be recorded like so:
treatment
drought
magnesium
control
N
N
mg
N
Y
drought
Y
N
mgxdr
Y
Y
If balanced groups of seeds were randomly allocated by family to soil types and grown in such a way that the same number of individuals of each family were allowed to mature, then the treatment allocation is randomized with respect to soil type and family (not color) and the observational unit is a plant. If so, then the treatment allocation is also randomized with respect to soil type and color, but the design is not balanced. Client should confirm with collaborator as this will be relevant for peer/committee review.
Pseudoreplication
Because there is just one physical sample of each tissue type per treatment combination, there is no replication in this study. Treatments were not independently allocated to the RNA subsamples, so there is not sufficient data to estimate the observational variation in gene expression levels. This significantly constrains possible analyses of study data, as ANOVA cannot be performed. The values of the responses (CT value and absolute concentration) should be averaged across pseudoreplicates.
Visualization can be used to identify possible effects focusing on the difference in expression levels between pink and white flowers. Examples of complete interaction plots (no aggregation across any treatment factors) are shown below. The client should consider examining similar plots with aggregation to explore marginal effects of treatment factors.
Code
# interaction contrasts (absolute concentration)clean |>mutate(magnesium =fct_recode(magnesium, magnesium ='Y', control ='N'),drought =fct_recode(drought, drought ='Y', control ='N')) |>ggplot(aes(x = color, y = absolute.mean, color = magnesium)) +geom_point() +facet_grid(tissue ~ drought) +geom_path(aes(group = magnesium)) +guides(color =guide_legend(title =NULL)) +theme_bw() +labs(title ='absolute concentration')
Code
# interaction contrasts (ct values)clean |>mutate(magnesium =fct_recode(magnesium, magnesium ='Y', control ='N'),drought =fct_recode(drought, drought ='Y', control ='N')) |>ggplot(aes(x = color, y = ct.mean, color = magnesium)) +geom_point() +facet_grid(tissue ~ drought) +geom_path(aes(group = magnesium)) +guides(color =guide_legend(title =NULL)) +theme_bw() +labs(title ='ct values')
I note the graphics are mirror images, so conclusions should not differ regardless of whether CT values or absolute concentrations are used. My recommendation is to choose just one response to analyze; examining both will multiply your results without adding information.
Contrasts
The research questions pertain to comparing gene expression levels between pink and white flower phenotypes under various conditions. These are referred to as contrasts and are estimated via arithmetic differences between the corresponding averages (P - W, P/dr - W/dr, etc.). The recommended analysis thus focuses on contrasts for the color phenotype, and interprets interaction effects accordingly.
The codes provided use lm and emmeans to compute contrasts, but they’re not really fitting a model, since the treatment mean “estimates” are really just the observations.
Code
# "model" to use for computing contrasts of interestfit <-lm(absolute.mean ~ drought*magnesium*color*tissue, data = clean)fit.emm <-emmeans(fit, specs =~ color*drought*magnesium | tissue)
Research question 1
How does gene expression of anthocyanin biosynthetic genes (the types of genes I am studying) differ in different flower colors of L. parviflorus (species) when grown in different soil types?
I’ll rephrase this as: does the difference in gene expression level between color phenotypes depend on soil type? And then I’ll construe this as whether there are significant interaction effects involving color.
The factorial effects of color, drought, and magnesium are the differences between ‘high’ and ‘low’ levels of each factor conditioned on the levels of other factors in various constellations.
The main effects of each are the differences between the average response at each level of the factor, averaged across the other factors. There are three main effects:
color: the difference in average gene expression between samples from pink and white flowers, averaged across soil type
magnesium: the difference in average gene expression between samples from magnesium and non-magnesium soils, averaged across color and drought type
drought: the difference in average gene expression between samples from drought and non-drought soils, averaged across color and magnesium
The interaction effects then characterize the dependence of the differences above on the levels of other factors. There are four interaction effects:
color/magnesium: the difference in the effect of color between levels of the magnesium factor, averaged across drought type
color/drought: the difference in the effect of color between levels of the drought type, averaged across levels of the magnesium factor
magnesium/drought: the difference in the effect of magnesium between levels of the drought type, averaged across color
color/magnesium/drought: the difference in the color/magnesium interaction between levels of the drought type
The interaction effects are symmetric in the sense that the color/magnesium effect is the same as the magnesium/color effect. Because the research question focuses on color phenotypes, contrasts are interpreted primarily in terms of the effect of color when applicable.
The estimated factorial effects for each tissue type are as follows:
Code
# wrap in function for use with emmeansfac.eff.emmc <-function(levels, ...){data_grid(clean, magnesium, drought, color) |>mutate(mg =0.25*c(rep(-1, 4), rep(1, 4)),dr =0.25*rep(c(rep(-1, 2), rep(1, 2)), 2),cl =0.25*rep(c(1, -1), 4),clxdr =0.5*(4*cl)*(4*dr),clxmg =0.5*(4*cl)*(4*mg),drxmg =0.5*(4*mg)*(4*dr),clxdrxmg = (4*cl)*(4*dr)*(4*mg)) |>arrange(magnesium, drought, color) |>select(-(1:3))}# point estimates of factorial effectsfit.fac.eff <- fit.emm |>contrast('fac.eff') |>tidy() |>select(tissue, contrast, estimate)# examinefit.fac.eff |>spread(tissue, estimate) |>pander(caption ='Estimates of factorial effects by tissue type.')
Estimates of factorial effects by tissue type.
contrast
flower
leaf
root
cl
0.8575
0.07569
0.1034
clxdr
0.0145
-0.5297
0.3753
clxdrxmg
-3.389
-1.768
-1.172
clxmg
0.4824
0.6649
-0.4079
dr
-0.4589
-0.1424
-0.1894
drxmg
2.57
1.531
0.1463
mg
0.721
-0.02505
0.2525
We can’t answer the question of whether color interactions are significant the usual way but R. V. Lenth (1989) provides a clever method of testing the hypotheses \(H_{0i}: \theta_i = 0\) vs \(H_{Ai}: \theta_i \neq 0\) for each factorial effect \(\theta_i, i = 1, \dots, k\). The unrepx package provides an implementation that generates p-values for these tests from the vector of interaction effects (R. Lenth 2022).
The full set of results for all three tissue types is below. Remember that these are separate analyses shown jointly. Lenth’s method assumes that estimates for factorial effects are independent, which is only true here within one tissue type; since tissue samples of different types are collected from the same individuals, the independence does not hold across tissue types.
Code
# apply to all three tissue typesfit.fac.eff |>group_by(tissue) |>nest(data =c(contrast, estimate)) |>mutate(test =map(data, ~eff.test(.x$estimate, method ='Lenth'))) |>mutate(rslt =map2(data, test, ~inner_join(.x, .y, join_by(estimate == effect)))) |>select(tissue, rslt) |>unnest(everything()) |>pander(caption ='Inference for factorial effects using Lenth\'s method.')
Inference for factorial effects using Lenth’s method.
tissue
contrast
estimate
Lenth_PSE
t.ratio
p.value
simult.pval
flower
mg
0.721
0.9026
0.799
0.3684
0.9904
flower
dr
-0.4589
0.9026
-0.508
0.6728
1
flower
cl
0.8575
0.9026
0.95
0.2901
0.9346
flower
clxdr
0.0145
0.9026
0.016
0.9893
1
flower
clxmg
0.4824
0.9026
0.534
0.6577
1
flower
drxmg
2.57
0.9026
2.848
0.0365
0.1752
flower
clxdrxmg
-3.389
0.9026
-3.755
0.0221
0.1045
leaf
mg
-0.02505
0.7946
-0.032
0.9791
1
leaf
dr
-0.1424
0.7946
-0.179
0.8797
1
leaf
cl
0.07569
0.7946
0.095
0.9371
1
leaf
clxdr
-0.5297
0.7946
-0.667
0.5789
1
leaf
clxmg
0.6649
0.7946
0.837
0.3463
0.9792
leaf
drxmg
1.531
0.7946
1.927
0.0773
0.348
leaf
clxdrxmg
-1.768
0.7946
-2.225
0.0564
0.2553
root
mg
0.2525
0.3315
0.762
0.3912
0.9953
root
dr
-0.1894
0.3315
-0.571
0.6368
1
root
cl
0.1034
0.3315
0.312
0.7936
1
root
clxdr
0.3753
0.3315
1.132
0.2221
0.822
root
clxmg
-0.4079
0.3315
-1.231
0.1932
0.7517
root
drxmg
0.1463
0.3315
0.441
0.7136
1
root
clxdrxmg
-1.172
0.3315
-3.535
0.0249
0.119
I’m choosing to look at the unadjusted p-values, as the F tests in ANOVA are not usually adjusted for multiplicity. However, the client may wish to consider simultaneous inference adjustments when looking at results across genes.
For this gene, the color/drought/magnesium interactions are significant for all tissue types according to the unadjusted p-values, suggesting that the effect of color depends on the combination of soil types.
However, somewhat counterintuitively, the main effects and two-way interactions involving color are not significant. This is addressed to some extent by the plot below: the expression level differs by color phenotype less when averaged across soil type (solid black) than when disaggregated, and even then, expression levels for pink and white color phenotypes only differ in certain soil combinations. The client should consider carefully how to explain this result.
Code
# visualize marginal effect of colorcolor.means <- clean |>group_by(color, tissue) |>summarize(absolute.mean =mean(absolute.mean))ggplot(clean, aes(x = color, y = absolute.mean, color = magnesium, linetype = drought)) +facet_wrap(~tissue) +geom_point(alpha =0.5) +geom_path(aes(group =interaction(drought, magnesium)),alpha =0.5) +geom_point(data = color.means, inherit.aes = F,aes(x = color, y = absolute.mean),shape =17, size =4) +geom_path(data = color.means,inherit.aes = F,aes(x = color, y = absolute.mean, group = tissue),linewidth =1.25) +theme_bw()
Research question 2
How does gene expression differ between flower colors when grown in the same soil type?
This can be approached analogous to the analysis given above, but using specific interaction contrasts that compare expression levels between pink and white color phenotypes separately for each combination of the drought/magnesium factors in place of all factorial effects. This analysis is shown below.
Gene expression levels differ significantly between pink and white color phenotypes in flower tissue and leaf tissue in the magnesium soil and in root tissue in the drought soil. Visually:
These plots show the estimated difference in gene expression levels (concentration values) between pink and white flowers by soil type.
Summary
Due to pseudoreplication, ANOVA should not be performed. The approach above employs the method of R. V. Lenth (1989) for analysis of unreplicated factorial designs. This analysis pipeline should be easily transferrable to the data from the other five genes with minor to no adjustments to code.
Returning to the original questions:
What type of analysis is best? It will be some kind of comparison of means but what exactly makes the most sense.
There is no replication in the study, so analyses should be largely limited to visual assessments. R. V. Lenth (1989) provides a method of analyzing data from unreplicated factorial designs; this approach can be used to address the main research questions and example implementations are given above.
How can we potentially bring multiple tissue types into the analysis so we do not have to multiply the analyses by 3?
It is recommended to carry out analyses separately by tissue type and compare qualitatively as originally planned. Under different circumstances and with proper design considerations, effects associated with tissue type could be estimated along with other factors in the ANOVA framework; client is recommended to consult with a statistician at design stage on experimental design and analysis plan if this is of interest in future.
How can we control for the fact that the conversion factor may be slightly different between each run?
Small variation in the conversion factors between samples shouldn’t affect conclusions. It is also worth noting that the conversions need not be uniform if they are part of the measurement process. In this study, direct comparison of CT values and absolute concentrations indicates near-identical trends with respect to the experimental factors.
Some kind of R or JMP template!
A copy of the source .qmd for this document is provided for client’s use.
Lenth, Russell V. 1989. “Quick and Easy Analysis of Unreplicated Factorials.”Technometrics 31 (4): 469–73.
Source Code
---title: "*L. parviflorus* gene expression"author: "Weston Gronor"author-title: "CLIENT"date: "7/31/25"published-title: "UPDATED"format: html: code-fold: true code-tools: truetoc: truebibliography: refs.bibexecute: warning: false message: false---```{r, echo = F}# setuplibrary(tidyverse)library(emmeans)library(modelr)library(broom)library(unrepx)library(pander)options(contrasts = c('contr.sum', 'contr.sum'))```# Project summaryBackground from client:> I am studying a California native plant that has different distribution of flower color when growing in different soil types. I believe that you have already met with my collaborator Magdalene Lo. She has grown these plants in different soil types to measure fitness and survival. I have taken the plant samples she has grown and am doing genetic analysis on them.Research questions from client:1. How does gene expression of anthocyanin biosynthetic genes (the types of genes I am studying) differ in different flower colors of *L. parviflorus* (species) when grown in different soil types?2. How does gene expression differ between flower colors when grown in the same soil type?## Study design Plants with white and pink flowers were grown in four soil types. There were 4 trays of the control, 5 trays of drought (water limitation), 5 trays of magnesium, and 6 trays of MgxDr (interaction). Each tray had both white and pink flower morphs. Client does not have information about how many were in each tray or how individuals were allocated to trays, only that several hundred seeds were planted for each flower color. Client reports "there were 12 families of pink flowers and 6 families of white flowers crossed so there are about double the number of pink seeds planted."Client collected individual plants of each flower color from each soil type. With respect to soil type, some individuals were collected from different trays but some were not. The individuals were chosen based on the family. For the plants that were collected, the client sampled tissue from three locations: flower, leaf, and root. These tissue samples were combined across individuals to form pooled samples from which RNA was extracted. The RNA samples were divided in two -- the 'replicate' column identifies these subsamples -- and six genes were amplified independently via qPCR to obtain gene expression data.## Sample dataSample data provided:```{r}# read in and preview dataraw <-read_csv('sample-data.csv') |>rename_with(tolower)head(raw) |>pander()```The responses are:- CT values: number of PCR cycles until concentration threshold is crossed- absolute concentration: back-computed from CT values using qPCR standardSave for small differences in the conversion factor, these are scalar multiples of each other.## Client requestQuestions:1. What type of analysis is best? It will be some kind of comparison of means but what exactly makes the most sense.2. How can we potentially bring multiple tissue types into the analysis so we do not have to multiply the analyses by 3?3. How can we control for the fact that the conversion factor may be slightly different between each run?4. Some kind of R or JMP template!5. [added 7/30] Plots of the differences between pink and white morphs in different soil conditions.# Recommendations## Treatment codingSince the soil types themselves have factorial structure, the treatment structure should be recorded like so:```{r, echo = F}# recode treatmentsraw |> mutate(drought = if_else(treatment %in% c('drought', 'mgxdr'), 'Y', 'N'), magnesium = if_else(treatment %in% c('mg', 'mgxdr'), 'Y', 'N')) |> distinct(treatment, drought, magnesium) |> arrange(drought, magnesium) |> pander()```If balanced groups of seeds were randomly allocated by family to soil types and grown in such a way that the same number of individuals of each family were allowed to mature, then the treatment allocation is randomized with respect to soil type and family (not color) and the observational unit is a plant. If so, then the treatment allocation is also randomized with respect to soil type and color, but the design is not balanced. Client should confirm with collaborator as this will be relevant for peer/committee review.## PseudoreplicationBecause there is just one physical sample of each tissue type per treatment combination, [**there is no replication in this study.**]{.underline} Treatments were not independently allocated to the RNA subsamples, so there is not sufficient data to estimate the observational variation in gene expression levels. This significantly constrains possible analyses of study data, as ANOVA cannot be performed. The values of the responses (CT value and absolute concentration) should be averaged across pseudoreplicates.```{r}# recode treatments and average pseudoreplicatesclean <- raw |>mutate(drought =if_else(treatment %in%c('drought', 'mgxdr'), 'Y', 'N') |>factor(levels =c('N', 'Y')),magnesium =if_else(treatment %in%c('mg', 'mgxdr'), 'Y', 'N') |>factor(levels =c('N', 'Y')),color =factor(color),tissue =factor(tissue)) |>group_by(drought, magnesium, color, tissue) |>summarize(across(.cols =c(ct, absolute), .fns =list(mean = mean), .names ='{.col}.{.fn}'),.groups ='drop')head(clean) |>pander()```## Analysis strategy### Interaction plotsVisualization can be used to identify possible effects focusing on the difference in expression levels between pink and white flowers. Examples of complete interaction plots (no aggregation across any treatment factors) are shown below. The client should consider examining similar plots with aggregation to explore marginal effects of treatment factors.```{r}# interaction contrasts (absolute concentration)clean |>mutate(magnesium =fct_recode(magnesium, magnesium ='Y', control ='N'),drought =fct_recode(drought, drought ='Y', control ='N')) |>ggplot(aes(x = color, y = absolute.mean, color = magnesium)) +geom_point() +facet_grid(tissue ~ drought) +geom_path(aes(group = magnesium)) +guides(color =guide_legend(title =NULL)) +theme_bw() +labs(title ='absolute concentration')# interaction contrasts (ct values)clean |>mutate(magnesium =fct_recode(magnesium, magnesium ='Y', control ='N'),drought =fct_recode(drought, drought ='Y', control ='N')) |>ggplot(aes(x = color, y = ct.mean, color = magnesium)) +geom_point() +facet_grid(tissue ~ drought) +geom_path(aes(group = magnesium)) +guides(color =guide_legend(title =NULL)) +theme_bw() +labs(title ='ct values')```I note the graphics are mirror images, so conclusions should not differ regardless of whether CT values or absolute concentrations are used. My recommendation is to choose just one response to analyze; examining both will multiply your results without adding information.### ContrastsThe research questions pertain to comparing gene expression levels between pink and white flower phenotypes under various conditions. These are referred to as *contrasts* and are estimated via arithmetic differences between the corresponding averages (P - W, P/dr - W/dr, etc.). The recommended analysis thus focuses on contrasts for the color phenotype, and interprets interaction effects accordingly. The codes provided use `lm` and `emmeans` to compute contrasts, but they're not really fitting a model, since the treatment mean "estimates" are really just the observations.```{r}# "model" to use for computing contrasts of interestfit <-lm(absolute.mean ~ drought*magnesium*color*tissue, data = clean)fit.emm <-emmeans(fit, specs =~ color*drought*magnesium | tissue) ```### Research question 1> How does gene expression of anthocyanin biosynthetic genes (the types of genes I am studying) differ in different flower colors of *L. parviflorus* (species) when grown in different soil types?I'll rephrase this as: does the difference in gene expression level between color phenotypes depend on soil type? And then I'll construe this as whether there are significant interaction effects involving color.The **factorial effects** of color, drought, and magnesium are the differences between 'high' and 'low' levels of each factor conditioned on the levels of other factors in various constellations. The main effects of each are the differences between the average response at each level of the factor, averaged across the other factors. There are three main effects:- **color**: the difference in average gene expression between samples from pink and white flowers, averaged across soil type- **magnesium**: the difference in average gene expression between samples from magnesium and non-magnesium soils, averaged across color and drought type- **drought**: the difference in average gene expression between samples from drought and non-drought soils, averaged across color and magnesiumThe interaction effects then characterize the dependence of the differences above on the levels of other factors. There are four interaction effects:- **color/magnesium**: the difference in the effect of color between levels of the magnesium factor, averaged across drought type - **color/drought**: the difference in the effect of color between levels of the drought type, averaged across levels of the magnesium factor- **magnesium/drought**: the difference in the effect of magnesium between levels of the drought type, averaged across color- **color/magnesium/drought**: the difference in the color/magnesium interaction between levels of the drought typeThe interaction effects are symmetric in the sense that the color/magnesium effect is the same as the magnesium/color effect. Because the research question focuses on color phenotypes, contrasts are interpreted primarily in terms of the effect of color when applicable.```{r, include = F}# examine factor levels for emmeans objectfit.emm@grid# specify contrast coefficients for factorial effectsdata_grid(clean, magnesium, drought, color) |> mutate(mg = 0.25*c(rep(-1, 4), rep(1, 4)), dr = 0.25*rep(c(rep(-1, 2), rep(1, 2)), 2), cl = 0.25*rep(c(1, -1), 4), clxdr = 0.5*(4*cl)*(4*dr), clxmg = 0.5*(4*cl)*(4*mg), drxmg = 0.5*(4*mg)*(4*dr), clxdrxmg = (4*cl)*(4*dr)*(4*mg))```The estimated factorial effects for each tissue type are as follows:```{r}# wrap in function for use with emmeansfac.eff.emmc <-function(levels, ...){data_grid(clean, magnesium, drought, color) |>mutate(mg =0.25*c(rep(-1, 4), rep(1, 4)),dr =0.25*rep(c(rep(-1, 2), rep(1, 2)), 2),cl =0.25*rep(c(1, -1), 4),clxdr =0.5*(4*cl)*(4*dr),clxmg =0.5*(4*cl)*(4*mg),drxmg =0.5*(4*mg)*(4*dr),clxdrxmg = (4*cl)*(4*dr)*(4*mg)) |>arrange(magnesium, drought, color) |>select(-(1:3))}# point estimates of factorial effectsfit.fac.eff <- fit.emm |>contrast('fac.eff') |>tidy() |>select(tissue, contrast, estimate)# examinefit.fac.eff |>spread(tissue, estimate) |>pander(caption ='Estimates of factorial effects by tissue type.')```We can't answer the question of whether color interactions are significant the usual way but @lenth1989quick provides a clever method of testing the hypotheses $H_{0i}: \theta_i = 0$ vs $H_{Ai}: \theta_i \neq 0$ for each factorial effect $\theta_i, i = 1, \dots, k$. The `unrepx` package provides an implementation that generates *p*-values for these tests from the vector of interaction effects [@unrepx].```{r, include = F}# isolate effect estimates for flower tissueflower.effs <- filter(fit.fac.eff, tissue == 'flower') |> pull(estimate) names(flower.effs) <- filter(fit.fac.eff, tissue == 'flower') |> pull(contrast)# lenth methodeff.test(flower.effs, method = 'Lenth')```The full set of results for all three tissue types is below. Remember that these are separate analyses shown jointly. Lenth's method assumes that estimates for factorial effects are independent, which is only true here *within* one tissue type; since tissue samples of different types are collected from the same individuals, the independence does not hold across tissue types.```{r}# apply to all three tissue typesfit.fac.eff |>group_by(tissue) |>nest(data =c(contrast, estimate)) |>mutate(test =map(data, ~eff.test(.x$estimate, method ='Lenth'))) |>mutate(rslt =map2(data, test, ~inner_join(.x, .y, join_by(estimate == effect)))) |>select(tissue, rslt) |>unnest(everything()) |>pander(caption ='Inference for factorial effects using Lenth\'s method.')```I'm choosing to look at the unadjusted p-values, as the F tests in ANOVA are not usually adjusted for multiplicity. However, the client may wish to consider simultaneous inference adjustments when looking at results across genes.For this gene, the color/drought/magnesium interactions are significant for all tissue types according to the unadjusted p-values, suggesting that the effect of color depends on the combination of soil types. However, somewhat counterintuitively, the main effects and two-way interactions involving color are not significant. This is addressed to some extent by the plot below: the expression level differs by color phenotype *less* when averaged across soil type (solid black) than when disaggregated, and even then, expression levels for pink and white color phenotypes only differ in certain soil combinations. The client should consider carefully how to explain this result.```{r}#| fig-width: 8#| fig-height: 3# visualize marginal effect of colorcolor.means <- clean |>group_by(color, tissue) |>summarize(absolute.mean =mean(absolute.mean))ggplot(clean, aes(x = color, y = absolute.mean, color = magnesium, linetype = drought)) +facet_wrap(~tissue) +geom_point(alpha =0.5) +geom_path(aes(group =interaction(drought, magnesium)),alpha =0.5) +geom_point(data = color.means, inherit.aes = F,aes(x = color, y = absolute.mean),shape =17, size =4) +geom_path(data = color.means,inherit.aes = F,aes(x = color, y = absolute.mean, group = tissue),linewidth =1.25) +theme_bw()```### Research question 2> How does gene expression differ between flower colors when grown in the same soil type?This can be approached analogous to the analysis given above, but using specific interaction contrasts that compare expression levels between pink and white color phenotypes separately for each combination of the drought/magnesium factors in place of all factorial effects. This analysis is shown below.```{r}# specify contrasts of interestcontr.fn.emmc <-function(levels, ...){data_grid(clean, color, magnesium, drought) |>mutate('p-w.ctrl'=c(1, 0, 0, 0, -1, 0, 0, 0),'p-w.dr'=c(0, 1, 0, 0, 0, -1, 0, 0),'p-w.mg'=c(0, 0, 1, 0, 0, 0, -1, 0),'p-w.dxm'=c(0, 0, 0, 1, 0, 0, 0, -1)) |>arrange(magnesium, drought, color) |>select(starts_with('p'))}fit.contr <- fit.emm |>contrast('contr.fn') |>tidy() |>select(tissue, contrast, estimate) |>mutate(contrast =factor(contrast, levels =paste('p-w', c('ctrl', 'mg', 'dr', 'dxm'), sep ='.')))# examinefit.contr |>spread(tissue, estimate) |>pander(caption ='Estimates of P-W contrasts by tissue type and soil treatment combination.')# apply lenth's method to estimated contrastsfit.contr |>group_by(tissue) |>nest(data =c(contrast, estimate)) |>mutate(test =map(data, ~eff.test(.x$estimate, method ='Lenth'))) |>mutate(rslt =map2(data, test, ~inner_join(.x, .y, join_by(estimate == effect)))) |>select(tissue, rslt) |>unnest(everything()) |>pander(caption ='Inference using Lenth\'s method.')```Gene expression levels differ significantly between pink and white color phenotypes in flower tissue and leaf tissue in the magnesium soil and in root tissue in the drought soil. Visually:```{r}#| fig-width: 4#| fig-height: 3# plot estimated contrasts abovefit.contr |>group_by(tissue) |>nest(data =c(contrast, estimate)) |>mutate(test =map(data, ~eff.test(.x$estimate, method ='Lenth'))) |>mutate(rslt =map2(data, test, ~inner_join(.x, .y, join_by(estimate == effect)))) |>select(tissue, rslt) |>unnest(everything()) |>mutate(lwr = estimate -2*Lenth_PSE,upr = estimate +2*Lenth_PSE) |>select(tissue, contrast, estimate, lwr, upr) |>mutate(contrast =fct_recode(contrast,"control"="p-w.ctrl","magnesium"="p-w.mg","drought"="p-w.dr","both"="p-w.dxm")) |>arrange(contrast, tissue) |>ggplot(aes(x = contrast, y = estimate, color = tissue)) +geom_point(position =position_dodge(width =0.5)) +geom_errorbar(aes(ymin = lwr, ymax = upr), width =0.1, position =position_dodge(width =0.5)) +geom_path(aes(group = tissue), position =position_dodge(width =0.5)) +theme_bw() +labs(x ="soil type", y ="estimated (P - W) difference")```If this is too messy, here is the same in separate panels:```{r}#| fig-width: 8#| fig-height: 3# same as above but facetedfit.contr |>group_by(tissue) |>nest(data =c(contrast, estimate)) |>mutate(test =map(data, ~eff.test(.x$estimate, method ='Lenth'))) |>mutate(rslt =map2(data, test, ~inner_join(.x, .y, join_by(estimate == effect)))) |>select(tissue, rslt) |>unnest(everything()) |>mutate(lwr = estimate -2*Lenth_PSE,upr = estimate +2*Lenth_PSE) |>select(tissue, contrast, estimate, lwr, upr) |>mutate(contrast =fct_recode(contrast,"control"="p-w.ctrl","magnesium"="p-w.mg","drought"="p-w.dr","both"="p-w.dxm")) |>arrange(contrast, tissue) |>ggplot(aes(x = contrast, y = estimate)) +geom_point() +geom_errorbar(aes(ymin = lwr, ymax = upr), width =0.1) +geom_path(aes(group = tissue)) +facet_grid(~tissue) +theme_bw() +labs(x ="soil type", y ="estimated (P - W) difference")```These plots show the estimated difference in gene expression levels (concentration values) between pink and white flowers by soil type.# SummaryDue to pseudoreplication, ANOVA should **not** be performed. The approach above employs the method of @lenth1989quick for analysis of unreplicated factorial designs. This analysis pipeline should be easily transferrable to the data from the other five genes with minor to no adjustments to code. Returning to the original questions:1. What type of analysis is best? It will be some kind of comparison of means but what exactly makes the most sense.> There is no replication in the study, so analyses should be largely limited to visual assessments. @lenth1989quick provides a method of analyzing data from unreplicated factorial designs; this approach can be used to address the main research questions and example implementations are given above.2. How can we potentially bring multiple tissue types into the analysis so we do not have to multiply the analyses by 3?> It is recommended to carry out analyses separately by tissue type and compare qualitatively as originally planned. Under different circumstances and with proper design considerations, effects associated with tissue type could be estimated along with other factors in the ANOVA framework; client is recommended to consult with a statistician at design stage on experimental design and analysis plan if this is of interest in future.3. How can we control for the fact that the conversion factor may be slightly different between each run?> Small variation in the conversion factors between samples shouldn’t affect conclusions. It is also worth noting that the conversions need not be uniform if they are part of the measurement process. In this study, direct comparison of CT values and absolute concentrations indicates near-identical trends with respect to the experimental factors.4. Some kind of R or JMP template!> A copy of the source .qmd for this document is provided for client's use.```{r, include = F, eval = F}knitr::purl('25su-gronor.qmd', 'analysis-template.R')```# References::: {#refs}:::