Background information

In order to process large phenotyping data set, it is better to streamline this process as much as possible. In order to provide a primer for the less-experienced in R, and those who don’t know where to start, I wrote this small example of data analysis and visualization.

Remember - as the data that you are about to process is rather large - it is much better to document each step of your analysis to make sure it is as reproducible as possible. In the example data set I am using a collection of various mutants in Arabidopsis that were grown under “control” and “salt stress” conditions.

Loading & organizing the datasets

OK - the data that is coming from the PSI machine is usually coming in separate .csv files > in order to make life easier for us we are going to combine all the phenotype containing files into one dataset:

# load the datasets:
morpho_data <- read.csv("Rgb_Morpho_Plant.csv")
head(morpho_data)

In the example morpho_data: - the treatment is indicated in collumn “Tray.Info” - the genotype is indicated in collumns “Plant.Name” - the individual plant identifier is indicated in collumn “Plant.ID” - the collumns from “AREA_PX” to “SOL” represent the morphometric traits derived from the black-and-white image

Next we have data from other measuring stations as well:

# data from fluorescent camera:
fc_data <- read.csv("Fc_Plant.csv")
# data from thermal/infrared camera:
ir_data <- read.csv("Ir_Plant.csv")
# data from RGB camera with color segmentation:
# WARNING - this data used to be not exported correctly - because the names of the HUEs was also containing semicolons - and required some manual manipulations of the table before importing it into R. 
col_data <- read.csv("Rgb_Color_Plant.csv")
# data from scales measuring the weight of each pot:
sc_data <- read.csv("ScalesMeasure.csv")

Let’s combine all the datasets (except scales) together:

library(reshape2)

# We will use columns "Tray.Info", "Round.Order", "Plant.ID" to make sure we are matching the right columns between individual datasets.

# In order to avoid unneccessary columns - we are going to first trim the datasets:
col_data <- col_data[,c(1:2,4:9,13:21)]
morpho_data <- morpho_data[,c(4:5,7,13:23)]
fc_data <- fc_data[,c(4:5,7,13:95)]
ir_data <- ir_data[,c(4:5,7,13:17)]

all_data <- merge(col_data, morpho_data, by=c("Tray.ID", "Round.Order", "Plant.ID"))
all_data <- merge(all_data, fc_data, by=c("Tray.ID", "Round.Order", "Plant.ID"))
# Then I have a little problem - because my IR is measured in a round before the full cycle - and the round order numbers are different. So for me to be able to merge two dataset I need to create new column "Round.Order2"
ir_data$Round.Order2 <- ir_data$Round.Order+1
# and then let's replace the original Round.Order by this tricky round order and remove it from the dataset:
ir_data$Round.Order <- ir_data$Round.Order2
ir_data <- ir_data[,1:8]
# and now we can fuse all together:
all_data <- merge(all_data, ir_data, by=c("Tray.ID", "Round.Order", "Plant.ID"))
head(all_data)

# Remove rows containing missing data:
all_data <- na.omit(all_data)
dim(all_data)

## [1] 9772  116

Just because I am super freaky about the reproducibility of my research - I usually can save this file in the working directory as well

write.csv(all_data, "pheno_data.csv", row.names = F)

Analyzing change in Rosette Area

Examining the dynamic changes in the phenotypic data

One of the best examples of dynamic change in the phenotype is growth. Let’s have a look at how all the plants grow in our experiments - for this we will use the most simple plot function in R called just like this - “plot”.

plot(all_data$AREA_MM ~ all_data$Round.Order, type="l")

The above graph cannot be really described as the most informative graph ever. And so we will try to put more layers into this data visualization by using the ggplot2 library.

library(ggplot2)

Area_lgraph <- ggplot(data=all_data, aes(x= Round.Order, y=AREA_MM, group = Plant.ID)) 
Area_lgraph <- Area_lgraph + geom_line() 
Area_lgraph <- Area_lgraph + scale_colour_manual(values = c("grey", "blue")) 
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
  # geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Area_lgraph <- Area_lgraph + facet_grid(~ Tray.Info) 
Area_lgraph <- Area_lgraph + ylab("Area (mm2)") + xlab("Round") + theme(legend.position='none')
Area_lgraph

The above graph is a bit better - I can see how my plants are growing under “control” and “salt stress” conditions. However, it would be nice to see whether the lines that show very little growth are in any way unusual - for example the plant died before the experiment even started.

I can do it by going back to my raw data/pictures and validating if I have the reason to excluse a specific plant from my analysis.

I can add the sample ID information (containing Tray number, plant location and its genotype) to the label and make the graph interactive - so the label shows as soon as I hover over the line with my cursor. This interactive feature is available in “plotly” library, and plotting the graph using ggplotly() function.

library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

ggplotly(Area_lgraph)

As we will need another unit of time - which is different from “Round”, to model our data at later time point - we will need to create another variable called “Time”. Let’s look at the unique number of rounds & days to see our measurement period:

length(unique(all_data$Measuring.Date))

## [1] 15

length(unique(all_data$Round.Order))

## [1] 28

In total we have 28 rounds of measurements performed over 15 days. The salt stress was applied on 28th of November 2018, which is the first day of measurements.

But - I see something rather funny going on at round 57 - as the Area of all my plants is decreasing. I just checked in my calendar and it appears that during this day, there was an error of the PSI, and some of my plants were stuck in the dark during the meaeurement. So I would prefer to remove all the datapoints collected after this time point.

all_data <- subset(all_data, all_data$Round.Order < 57)

And now - let’s make the variable representing the time (in hours) as the plants are measured every 12h. But the round order is only represented by the uneven numbers - So each round - starting with round 11 - represents a time after salt stress - but as they have increments of 2 per time point - we need to divide the constitutive “round number” by 2, to get into hours

So: Time (h) = ((Round.Order - 11)/2) * 12

all_data$Time <- ((all_data$Round.Order-11)/2)*12

OK - let’s see how the graph looks like now:

Area_lgraph <- ggplot(data=all_data, aes(x= Time, y=AREA_MM, group = Plant.ID)) 
Area_lgraph <- Area_lgraph + geom_line() 
Area_lgraph <- Area_lgraph + scale_colour_manual(values = c("grey", "blue")) 
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
  # geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Area_lgraph <- Area_lgraph + facet_grid(~ Tray.Info) 
Area_lgraph <- Area_lgraph + ylab("Area (mm2)") + xlab("time (h)") + theme(legend.position='none')
ggplotly(Area_lgraph)

Another thing that I can see from the graph is that there is a slight zig-zag in the lines - this is most possibly due to the movement of the leafs during the day-night rythm. As for the Area I dont want to include the night time changes - I only want to include the day measurements.

As my day-time measurement is taken around 16:00 and the night time around 4:00 I am going to subset my data into day and night data. So round numbers 13, 17, 21, 25, 29, 33, 37, 41, 43, 49, 53, 57, 61, 65 are the measurements taken at night.

day_data <- subset(all_data, all_data$Round.Order %in% c(11,15,19,23,27,31,35,39,43,47,51,55,59,63))
night_data <- subset(all_data, all_data$Round.Order %in% c(13,17,21,25,29,33,37,41,43,49,53,57,61,65))

And let’s look at the graph now:

Area_lgraph <- ggplot(data=day_data, aes(x= Time, y=AREA_MM, group = Plant.ID)) 
Area_lgraph <- Area_lgraph + geom_line() 
Area_lgraph <- Area_lgraph + scale_colour_manual(values = c("grey", "blue")) 
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
  # geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Area_lgraph <- Area_lgraph + facet_grid(~ Tray.Info) 
Area_lgraph <- Area_lgraph + ylab("Area (mm2)") + xlab("time (h)") + theme(legend.position='none')
ggplotly(Area_lgraph)

That looks much better - we can also represent the time as “day” rather than an hour - and so you can create another column called “day” and plot the results per day rather than per h:

day_data$Day <- day_data$Time/24


Area_lgraph <- ggplot(data=day_data, aes(x= Day, y=AREA_MM, group = Plant.ID)) 
Area_lgraph <- Area_lgraph + geom_line() 
Area_lgraph <- Area_lgraph + scale_colour_manual(values = c("grey", "blue")) 
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
  # geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Area_lgraph <- Area_lgraph + facet_grid(~ Tray.Info) 
Area_lgraph <- Area_lgraph + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
ggplotly(Area_lgraph)

Change Round # into specific time intervals

OK - so now you have your “dream dataset” that you know is correct, we can now fit some exponential curves to this.

So you want to probably do it for each genotype separately - so let’s start with making a subset for Col-0 (wt) and plot the data “as is” without any fitting of the curves:

Col0_data <- subset(day_data, day_data$Plant.Name == "Col-0")

Col_Area_lgraph <- ggplot(data=Col0_data, aes(x= Day, y=AREA_MM, group = Plant.ID)) 
Col_Area_lgraph <- Col_Area_lgraph + geom_line() 
Col_Area_lgraph <- Col_Area_lgraph + scale_colour_manual(values = c("grey", "blue")) 
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
  # geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Col_Area_lgraph <- Col_Area_lgraph + facet_grid(~ Tray.Info) 
Col_Area_lgraph <- Col_Area_lgraph + ylab("Area (mm2)") + xlab("Round") + theme(legend.position='none')
ggplotly(Col_Area_lgraph)

So let’s do some curve fitting now! Let’s do a summary statistics for our genotype and separate it per condition. We will do it using the “doBy” library

library(doBy)
sum_ski <- summaryBy(AREA_MM ~ Tray.Info + Day, data = Col0_data)
sum_ski

# let's split the dataset so it will be easier to modell
Col_C <- subset(sum_ski, sum_ski$Tray.Info == "Control")
Col_S <- subset(sum_ski, sum_ski$Tray.Info == "Salt ")

Col_C

Col_S

In the next step, we will transform the Area with the log() function - to enforce it running linearly, and save the outputs of individal models as an object (this will come in handy in next step):

# remove all the rows with missing values - the lm function is sensitive to so called incomplete cases:

Area_mm2_c <- Col_C$AREA_MM.mean
time_d <- Col_C$Day
model_C <- lm(log(Area_mm2_c)~ time_d)
model_C

## 
## Call:
## lm(formula = log(Area_mm2_c) ~ time_d)
## 
## Coefficients:
## (Intercept)       time_d  
##       4.975        0.261

Area_mm2_s <- Col_S$AREA_MM.mean
time_d <- Col_S$Day
model_S <- lm(log(Area_mm2_s)~ time_d)
model_S

## 
## Call:
## lm(formula = log(Area_mm2_s) ~ time_d)
## 
## Coefficients:
## (Intercept)       time_d  
##      4.8183       0.2602

So now we have our functions which would basically equate to:

Area = e^(Intercept + DELTA * time)

Where the Intercept is the first value out of the model, and the DELTA is the second value.

Now we have to use those values to see what the Area over time SHOULD be like:

timevalues <- unique(time_d)

Area.C <- exp(predict(model_C,list(Time=timevalues)))
Area.S <- exp(predict(model_S,list(Time=timevalues)))

Area.C

##         1         2         3         4         5         6         7 
##  144.7144  187.8748  243.9075  316.6517  411.0914  533.6974  692.8700 
##         8         9        10        11        12 
##  899.5150 1167.7908 1516.0786 1968.2415 2555.2596

Let’s integrate now all of these values into the table with the actual Area values:

# create a dataframe with 12 rows - because we have 12 unique time points
Open_for_model <-data.frame(Day=numeric(12),Plant.ID=character(12), Plant.Name=character(12), Tray.Info=character(12), AREA_MM=numeric(12))
Open_for_model

head(Open_for_model)

Control <- Open_for_model
Salt <- Open_for_model

unique(time_d)

##  [1]  0  1  2  3  4  5  6  7  8  9 10 11

Control$Day <- unique(time_d)
Salt$Day <- unique(time_d)

Control$Plant.ID <- "model"
Salt$Plant.ID <- "model"

Control$Plant.Name <- "model_Col-0"
Salt$Plant.Name <- "model_Col-0"

Control$Tray.Info <- "Control"
Salt$Tray.Info <- "Salt "

Control$AREA_MM <- Area.C
Salt$AREA_MM <- Area.S


model_total <- rbind(Control, Salt)
head(model_total)

head(Col0_data)

Col0_data_relevant <- subset(Col0_data, select = c("Day", "Plant.ID", "Plant.Name", "Tray.Info", "AREA_MM"))
head(Col0_data_relevant)

head(model_total)

Col2 <- rbind(Col0_data_relevant, model_total)

Col2

Now let’s try to plot this:

ggplot(data=Col2, aes(x= Day, y=AREA_MM, group = Plant.ID, color = Plant.Name)) + 
  geom_line() + 
  scale_colour_manual(values = c("grey", "blue")) +
  ylim(0, 2500) + 
  facet_grid(~ Tray.Info) + 
  ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')

Now the above calculations are only going to give you an insight for the AVERAGE of the population of plants - but let’s say you would like to fit a function per individual plant, and then extract the DELTA and INTERCEPT from the exponential model, and do some statistics with that.

We would have to extract one individual plant and run the following command:

names <- c(text="Plant.ID", "Plant.Name", "Tray.Info", "intercept", "delta")
growth_factors <- data.frame()
for (k in names) growth_factors[[k]] <- as.character()


growth_factors

# i=1

# run for the first sample:
i=1

uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
uni

#plot(uni$Area ~ uni$time.day.)
# let's get also all the individual identifiers in the table:
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))

# let's calculate the model:  
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])

# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues

##  [1]  0  1  2  3  4  5  6  7  8  9 10 11

Area.pred <- exp(predict(model_C,list(Time=timevalues)))
Area.pred

##          1          2          3          4          5          6 
##   88.97007  116.40671  152.30429  199.27198  260.72361  341.12572 
##          7          8          9         10         11         12 
##  446.32229  583.95945  764.04124  999.65678 1307.93158 1711.27237

uni$Area_pred <- Area.pred
done <- uni

done

Then - let’s loop this for all the other individual plants of Col-0 - note that we don’t have to separate per treatment here as that is already included in Plant.ID!

for(i in 2:length(unique(Col0_data$Plant.ID))){
  uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
  growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
  growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
  growth_factors[i,3] <- as.character(unique(uni$Tray.Info))

  # let's calculate the model:  
  Area_mm2 <- uni$AREA_MM
  time_d <- uni$Day
  model_C <- lm(log(Area_mm2)~ time_d)
  # add model parts into the main table with growth factors
  growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
  growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])

  # calculate predicted Area values for this specific sample:
  timevalues <- unique(time_d)
  timevalues

  Area.pred <- exp(predict(model_C,list(Time=timevalues)))
  
  uni$Area_pred <- Area.pred
  done <- rbind(done, uni)
}

growth_factors

done

Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID)) 
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot +  facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot

library(plotly)

ggplotly(Col_plot)

quartz.save("Col-0_growth_fitted.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

This data looks pretty awesome and I do not feel an urge to remove any outliers. Let’s check other genotypes that are in the dataset. But before we do it - let’s keep the essential numbers “safe”:

# to keep the data safe - we will save this as growth_factors2 and done2
growth_factors2 <- growth_factors
done2 <- done

dim(growth_factors2)

## [1] 17  5

You can repeat it for all the genotypes that you have in your setup by using the below script.

In here an example of how to run it for genotype “iaa-14”

unique(day_data$Plant.Name)

##  [1] iaa-14     avp-3      ahp1/2/3   ein3       ein2-1     duf247    
##  [7] R12a-B     nyc2-1     nyc2-2     eir1-1     gai        aux1-7    
## [13] acs2/4/5/6 ahp2/3/4   ein3/eil1  ahp1-5     eto-1      abi1-1    
## [19] etr-1      Col-0     
## 20 Levels: abi1-1 acs2/4/5/6 ahp1-5 ahp1/2/3 ahp2/3/4 aux1-7 ... R12a-B

# next one in line is iaa-14
Col0_data <- subset(day_data, day_data$Plant.Name == "iaa-14")

Col_Area_lgraph <- ggplot(data=Col0_data, aes(x= Day, y=AREA_MM, group = Plant.ID)) 
Col_Area_lgraph <- Col_Area_lgraph + geom_line() 
Col_Area_lgraph <- Col_Area_lgraph + scale_colour_manual(values = c("grey", "blue")) 
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
  # geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Col_Area_lgraph <- Col_Area_lgraph + facet_grid(~ Tray.Info) 
Col_Area_lgraph <- Col_Area_lgraph + ylab("Area (mm2)") + xlab("Round") + theme(legend.position='none')
ggplotly(Col_Area_lgraph)

names <- c(text="Plant.ID", "Plant.Name", "Tray.Info", "intercept", "delta")
growth_factors <- data.frame()
for (k in names) growth_factors[[k]] <- as.character()


growth_factors

# i=1

# run for the first sample:
i=1

uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
uni

#plot(uni$Area ~ uni$time.day.)
# let's get also all the individual identifiers in the table:
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))

# let's calculate the model:  
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])

# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues

##  [1]  0  1  2  3  4  5  6  7  8  9 10 11

Area.pred <- exp(predict(model_C,list(Time=timevalues)))
Area.pred

##          1          2          3          4          5          6 
##   88.87625  117.65016  155.73969  206.16081  272.90589  361.25985 
##          7          8          9         10         11         12 
##  478.21864  633.04312  837.99242 1109.29457 1468.43147 1943.83985

uni$Area_pred <- Area.pred
done <- uni

done

for(i in 2:length(unique(Col0_data$Plant.ID))){
  uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
  growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
  growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
  growth_factors[i,3] <- as.character(unique(uni$Tray.Info))

  # let's calculate the model:  
  Area_mm2 <- uni$AREA_MM
  time_d <- uni$Day
  model_C <- lm(log(Area_mm2)~ time_d)
  # add model parts into the main table with growth factors
  growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
  growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])

  # calculate predicted Area values for this specific sample:
  timevalues <- unique(time_d)
  timevalues

  Area.pred <- exp(predict(model_C,list(Time=timevalues)))
  
  uni$Area_pred <- Area.pred
  done <- rbind(done, uni)
}

growth_factors

done

Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID)) 
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot +  facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot

library(plotly)

ggplotly(Col_plot)

quartz.save("iaa-14_growth_fitted.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

growth_factors2 <- rbind(growth_factors2, growth_factors)

done2 <- rbind(done2, done)

OK - how about you would like to run the script on data that does contain outliers. Let’s re-run the part of the script that get’s us to the stage where we have the interactive graph.

I am using the subset of the data for genotype nyc2-1 for which I know that there are some plants that died during the experiment and thus show odd growth curves:

Col0_data <- subset(day_data, day_data$Plant.Name == "nyc2-1")

Col_Area_lgraph <- ggplot(data=Col0_data, aes(x= Day, y=AREA_MM, group = Plant.ID)) 
Col_Area_lgraph <- Col_Area_lgraph + geom_line() 
Col_Area_lgraph <- Col_Area_lgraph + scale_colour_manual(values = c("grey", "blue")) 
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
  # geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Col_Area_lgraph <- Col_Area_lgraph + facet_grid(~ Tray.Info) 
Col_Area_lgraph <- Col_Area_lgraph + ylab("Area (mm2)") + xlab("Round") + theme(legend.position='none')
ggplotly(Col_Area_lgraph)

names <- c(text="Plant.ID", "Plant.Name", "Tray.Info", "intercept", "delta")
growth_factors <- data.frame()
for (k in names) growth_factors[[k]] <- as.character()


growth_factors

# i=1

# run for the first sample:
i=1

uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
uni

#plot(uni$Area ~ uni$time.day.)
# let's get also all the individual identifiers in the table:
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))

# let's calculate the model:  
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])

# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues

##  [1]  0  1  2  3  4  5  6  7  8  9 10 11

Area.pred <- exp(predict(model_C,list(Time=timevalues)))
Area.pred

##         1         2         3         4         5         6         7 
##   79.6454  107.6941  145.6207  196.9040  266.2476  360.0120  486.7974 
##         8         9        10        11        12 
##  658.2327  890.0425 1203.4886 1627.3210 2200.4144

uni$Area_pred <- Area.pred
done <- uni

done

for(i in 2:length(unique(Col0_data$Plant.ID))){
  uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
  growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
  growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
  growth_factors[i,3] <- as.character(unique(uni$Tray.Info))

  # let's calculate the model:  
  Area_mm2 <- uni$AREA_MM
  time_d <- uni$Day
  model_C <- lm(log(Area_mm2)~ time_d)
  # add model parts into the main table with growth factors
  growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
  growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])

  # calculate predicted Area values for this specific sample:
  timevalues <- unique(time_d)
  timevalues

  Area.pred <- exp(predict(model_C,list(Time=timevalues)))
  
  uni$Area_pred <- Area.pred
  done <- rbind(done, uni)
}

growth_factors

done

Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID)) 
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot +  facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot

library(plotly)

ggplotly(Col_plot)

# to save run the un-hashed command below:
#quartz.save("nyc2-1_growth_fitted.pdf", type= "pdf", dpi=200)

In the above graph we have a sample that I would classify as an outlier - “PS_Tray_001_Salt_D1_nyc2-1”.

We can remove it from the “done” object and re-plot the data - saving it under a different name:

pls_remove <- c("PS_Tray_001_Salt _D1_nyc2-1") 

dim(done)

## [1] 196 119

done <- subset(done, !(done$Plant.ID %in% pls_remove))
growth_factors <- subset(growth_factors, !(growth_factors$Plant.ID %in% pls_remove))
dim(done)

## [1] 192 119

Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID)) 
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot +  facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot

library(plotly)

ggplotly(Col_plot)

#quartz.save("nyc2-1_growth_fitted_no_outl.pdf", type= "pdf", dpi=200)

growth_factors2 <- rbind(growth_factors2, growth_factors)

done2 <- rbind(done2, done)

If you have more than one plant that you would like to remove from your curated dataset, please see below on how to remove multiple lines per one time:

Col0_data <- subset(day_data, day_data$Plant.Name == "gai")

Col_Area_lgraph <- ggplot(data=Col0_data, aes(x= Day, y=AREA_MM, group = Plant.ID)) 
Col_Area_lgraph <- Col_Area_lgraph + geom_line() 
Col_Area_lgraph <- Col_Area_lgraph + scale_colour_manual(values = c("grey", "blue")) 
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
  # geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Col_Area_lgraph <- Col_Area_lgraph + facet_grid(~ Tray.Info) 
Col_Area_lgraph <- Col_Area_lgraph + ylab("Area (mm2)") + xlab("Round") + theme(legend.position='none')
ggplotly(Col_Area_lgraph)

names <- c(text="Plant.ID", "Plant.Name", "Tray.Info", "intercept", "delta")
growth_factors <- data.frame()
for (k in names) growth_factors[[k]] <- as.character()


growth_factors

# i=1

# run for the first sample:
i=1

uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
uni

#plot(uni$Area ~ uni$time.day.)
# let's get also all the individual identifiers in the table:
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))

# let's calculate the model:  
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])

# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues

##  [1]  0  1  2  3  4  5  6  7  8  9 10 11

Area.pred <- exp(predict(model_C,list(Time=timevalues)))
Area.pred

##         1         2         3         4         5         6         7 
##  33.16733  45.01368  61.09119  82.91108 112.52437 152.71461 207.25957 
##         8         9        10        11        12 
## 281.28632 381.75314 518.10362 703.15430 954.29939

uni$Area_pred <- Area.pred
done <- uni

done

for(i in 2:length(unique(Col0_data$Plant.ID))){
  uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
  growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
  growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
  growth_factors[i,3] <- as.character(unique(uni$Tray.Info))

  # let's calculate the model:  
  Area_mm2 <- uni$AREA_MM
  time_d <- uni$Day
  model_C <- lm(log(Area_mm2)~ time_d)
  # add model parts into the main table with growth factors
  growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
  growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])

  # calculate predicted Area values for this specific sample:
  timevalues <- unique(time_d)
  timevalues

  Area.pred <- exp(predict(model_C,list(Time=timevalues)))
  
  uni$Area_pred <- Area.pred
  done <- rbind(done, uni)
}

## Warning in predict.lm(model_C, list(Time = timevalues)): prediction from a
## rank-deficient fit may be misleading

## Warning in predict.lm(model_C, list(Time = timevalues)): prediction from a
## rank-deficient fit may be misleading

growth_factors

done

Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID)) 
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot +  facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot

library(plotly)

ggplotly(Col_plot)

quartz.save("gai_growth_fitted.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

#growth_factors2 <- rbind(growth_factors2, growth_factors)

#done2 <- rbind(done2, done)

Again - I spotted some plants that did not grow at all:

pls_remove <- c("PS_Tray_053_Control_C1_gai", "PS_Tray_128_Control_B4_gai", "PS_Tray_110_Control_C3_gai", "PS_Tray_108_Salt _D1_gai", "PS_Tray_097_Salt _D3_gai") 

dim(done)

## [1] 128 119

done <- subset(done, !(done$Plant.ID %in% pls_remove))
growth_factors <- subset(growth_factors, !(growth_factors$Plant.ID %in% pls_remove))
dim(done)

## [1]  93 119

Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID)) 
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot +  facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot

library(plotly)

ggplotly(Col_plot)

quartz.save("gai_growth_fitted_no_outl.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

growth_factors2 <- rbind(growth_factors2, growth_factors)

done2 <- rbind(done2, done)

Comparing growth rates between genotypes / conditions

Once you have all your growth rates calculated, you can then do some analysis and examine the differences between your genotypes / conditions.

But first - let’s write the results of the analysis:

unique(growth_factors2$Plant.Name)

## [1] "Col-0"  "iaa-14" "nyc2-1" "gai"

# Save the growth factors
#write.csv(growth_factors2, "RGR_PSI_experiment.csv", row.names = F)

# if you didn't re-run the above analysis for all the different mutants that I have in my dataset, it is probably for the best if you load the growth rate data from my (already curated) file:

growth_factors2 <- read.csv("RGR_PSI_experiment_Magda.csv")

Now we can do some plotting. As the box-plots and bar-plots are not the best way of representing the entity of your data, we will use the dot-plots and plot the growth rate of all the mutants:

library(ggbeeswarm)
library(gapminder)
library(RColorBrewer)
library(ggridges)

## 
## Attaching package: 'ggridges'

## The following object is masked from 'package:ggplot2':
## 
##     scale_discrete_manual

library(cowplot)

## 
## Attaching package: 'cowplot'

## The following object is masked from 'package:ggplot2':
## 
##     ggsave

head(growth_factors2)

growth_factors2$intercept <- as.numeric(growth_factors2$intercept)
growth_factors2$delta <- as.numeric(growth_factors2$delta)

my_box_plot <- ggplot(data = growth_factors2, mapping = aes(x = Plant.Name, y = delta, colour = Plant.Name)) 
#my_box_plot <- my_box_plot + geom_boxplot()
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot

## Warning: Removed 2 rows containing non-finite values (stat_summary).

## Warning: Removed 2 rows containing missing values (position_beeswarm).

quartz.save("all_mutants_RGR_Delta.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

Now let’s to it for the other component of the growth rate - the intercept that is also in the exponent of our growth function:

my_box_plot <- ggplot(data = growth_factors2, mapping = aes(x = Plant.Name, y = intercept, colour = Plant.Name)) 
#my_box_plot <- my_box_plot + geom_boxplot()
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot

quartz.save("all_mutants_RGR_Intercept.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

OK - as you probably notice - this graph is way too dense with the different mutants.

Thus we need to divide it into different groups. In my case I will segregate them based on hormone:

auxin <- c("Col-0", "iaa-14", "aux1-7")
ethylene <- c("Col-0", "ein-3", "ein2-1", "eir1-1", "acs2/4/5/6", "ein3/eil1", "eto-1", "etr-1")
cytokinin <- c("Col-0", "ahp1/2/3", "ahp2/3/4", "ahp1-5")
aba <- c("Col-0", "abi1-1")
gibberelin <- c("Col-0", "gai")
salt <- c("Col-0", "avp-3", "duf247", "R12a-B", "nyc2-1", "nyc2-2")

# make separate datasets for individual hormones:
aux_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% auxin)
eth_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% ethylene)
cyto_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% cytokinin)
aba_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% aba)
gib_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% gibberelin)
salt_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% salt)

# This part will help to order the mutants as we want to plot them (wt always first):
aux_growth$Plant.Name <- factor(aux_growth$Plant.Name,levels = c("Col-0", "iaa-14", "aux1-7"))
eth_growth$Plant.Name <- factor(eth_growth$Plant.Name,levels = c("Col-0", "ein-3", "ein2-1", "eir1-1", "acs2/4/5/6", "ein3/eil1", "eto-1", "etr-1"))
cyto_growth$Plant.Name <- factor(cyto_growth$Plant.Name,levels = c("Col-0", "ahp1/2/3", "ahp2/3/4", "ahp1-5"))
aba_growth$Plant.Name <- factor(aba_growth$Plant.Name,levels = c("Col-0", "abi1-1"))
gib_growth$Plant.Name <- factor(gib_growth$Plant.Name,levels = c("Col-0", "gai"))
salt_growth$Plant.Name <- factor(salt_growth$Plant.Name,levels = c("Col-0", "avp-3", "duf247", "R12a-B", "nyc2-1", "nyc2-2"))

Then we can plot the graphs out of the individual data sets:

my_box_plot <- ggplot(data = aux_growth, mapping = aes(x = Plant.Name, y = delta, colour = Plant.Name)) 
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot

quartz.save("auxin_mutants_RGR_delta.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

And now - we can also automatically add p-values

library(ggpubr)

## Loading required package: magrittr

## 
## Attaching package: 'ggpubr'

## The following object is masked from 'package:cowplot':
## 
##     get_legend

my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0") 
my_box_plot

Just so we have consistent colours for different mutants across different graphs:

my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "magenta", "mediumpurple"))
my_box_plot

# Let's save it:

quartz.save("auxin_mutants_RGR_delta.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

Then - let’s combine all of the above into one for the graph for Intercept:

my_box_plot <- ggplot(data = aux_growth, mapping = aes(x = Plant.Name, y = intercept, colour = Plant.Name)) 
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0") 
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "magenta", "mediumpurple"))
my_box_plot

quartz.save("auxin_mutants_RGR_Intercept.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

Then - let’s do it for other hormone groups - ethylene:

my_box_plot <- ggplot(data = eth_growth, mapping = aes(x = Plant.Name, y = delta, colour = Plant.Name)) 
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0") 
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "dodgerblue", "cadetblue", "aquamarine", "darkturquoise", "darkslategray4", "darkslategray2", "cyan4"))
my_box_plot

quartz.save("ethylene_mutants_RGR_Delta.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

my_box_plot <- ggplot(data = eth_growth, mapping = aes(x = Plant.Name, y = intercept, colour = Plant.Name)) 
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0") 
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "dodgerblue", "cadetblue", "aquamarine", "darkturquoise", "darkslategray4", "darkslategray2", "cyan4"))
my_box_plot

quartz.save("ethylene_mutants_RGR_Intercept.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

Comparing other traits

Let’s have a look at the traits other than Area - for example Slenderness of Leaves (SOL) or other photosynthesis traits.

First what we need to do - is to select the genotypes that are specific to your group of interest - let’s say salt stress related genes

But… we also want to use ONLY the plants that we selected based on the growth rate. So we can use the Plant.ID from “salt_growth” file, which contains only the curated plants!

day <- day_data

keepers <- unique(salt_growth$Plant.ID)

salt_all <- subset(day, day$Plant.ID %in% keepers)

salt_all

# This part will help to order the mutants as we want them:
salt_all$Plant.Name <- factor(salt_all$Plant.Name,levels = c("Col-0", "avp-3", "duf247", "R12a-B", "nyc2-1", "nyc2-2"))

my_box_plot <- ggplot(data = salt_all, mapping = aes(x = Plant.Name, y = SOL, colour = Plant.Name)) 
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_grid(Tray.Info ~ Time)
  # facet_wrap(Day ~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0") 
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "goldenrod4", "goldenrod", "gold", "lightgoldenrod", "khaki"))
my_box_plot

quartz.save("salt_mutants_SOL.pdf", type= "pdf", dpi=200)

## quartz_off_screen 
##                 2

This is going to be much to much phenotypes to plot into a simple notebook - So let’s make a list of interesting phenotypes and loop them to produce some graphs:

head(day)

colnames(day)

##   [1] "Tray.ID"        "Round.Order"    "Plant.ID"       "Measuring.Date"
##   [5] "Measuring.Time" "Tray.Info"      "Position"       "Plant.Name"    
##   [9] "Hue1"           "Hue2"           "Hue3"           "Hue4"          
##  [13] "Hue5"           "Hue6"           "Hue7"           "Hue8"          
##  [17] "Hue9"           "AREA_PX"        "AREA_MM"        "PERIMETER_PX"  
##  [21] "PERIMETER_MM"   "ROUNDNESS"      "ROUNDNESS2"     "ISOTROPY"      
##  [25] "COMPACTNESS"    "ECCENTRICITY"   "RMS"            "SOL"           
##  [29] "Size"           "Fo"             "Fm"             "Fv"            
##  [33] "QY_max"         "Fm_Lss1"        "Fm_Lss2"        "Fm_Lss3"       
##  [37] "Fm_Lss4"        "Fm_Lss5"        "Fm_Lss6"        "Ft_Lss1"       
##  [41] "Ft_Lss2"        "Ft_Lss3"        "Ft_Lss4"        "Ft_Lss5"       
##  [45] "Ft_Lss6"        "Fo_Lss1"        "Fo_Lss2"        "Fo_Lss3"       
##  [49] "Fo_Lss4"        "Fo_Lss5"        "Fo_Lss6"        "Fv_Lss1"       
##  [53] "Fv_Lss2"        "Fv_Lss3"        "Fv_Lss4"        "Fv_Lss5"       
##  [57] "Fv_Lss6"        "Fq_Lss1"        "Fq_Lss2"        "Fq_Lss3"       
##  [61] "Fq_Lss4"        "Fq_Lss5"        "Fq_Lss6"        "Fv.Fm_Lss1"    
##  [65] "Fv.Fm_Lss2"     "Fv.Fm_Lss3"     "Fv.Fm_Lss4"     "Fv.Fm_Lss5"    
##  [69] "Fv.Fm_Lss6"     "QY_Lss1"        "QY_Lss2"        "QY_Lss3"       
##  [73] "QY_Lss4"        "QY_Lss5"        "QY_Lss6"        "NPQ_Lss1"      
##  [77] "NPQ_Lss2"       "NPQ_Lss3"       "NPQ_Lss4"       "NPQ_Lss5"      
##  [81] "NPQ_Lss6"       "qN_Lss1"        "qN_Lss2"        "qN_Lss3"       
##  [85] "qN_Lss4"        "qN_Lss5"        "qN_Lss6"        "qP_Lss1"       
##  [89] "qP_Lss2"        "qP_Lss3"        "qP_Lss4"        "qP_Lss5"       
##  [93] "qP_Lss6"        "qL_Lss1"        "qL_Lss2"        "qL_Lss3"       
##  [97] "qL_Lss4"        "qL_Lss5"        "qL_Lss6"        "PAR1"          
## [101] "PAR2"           "PAR3"           "PAR4"           "PAR5"          
## [105] "PAR6"           "ETR1"           "ETR2"           "ETR3"          
## [109] "ETR4"           "ETR5"           "ETR6"           "Temp.avg"      
## [113] "Temp.stddev"    "Temp.median"    "Temp.min"       "Temp.max"      
## [117] "Time"           "Day"

Let’s focus only on the highest light intensity phenotypes - if we do not see any difference between treatments / genotypes in these traits, there is a little changce we will see something in lower light intensities (Lss < 6)

yes_pheno <- c("Plant.Name", "Time", "Tray.Info", "Plant.ID", "AREA_MM", "PERIMETER_MM", "ROUNDNESS", "ISOTROPY", "COMPACTNESS", "ECCENTRICITY", "RMS", "SOL", "Fo", "Fm", "Fv", "QY_max", "Fm_Lss6", "Ft_Lss6", "Fo_Lss6", "Fv_Lss6", "Fq_Lss6", "Fv.Fm_Lss6", "QY_Lss6", "NPQ_Lss6", "qN_Lss6", "qP_Lss6", "qL_Lss6", "Temp.avg", "Temp.median")

day_yes <- select(day, yes_pheno)

keepers <- unique(salt_growth$Plant.ID)

salt_all <- subset(day_yes, day_yes$Plant.ID %in% keepers)

time_vals <- c(0,24,48,72,96,120,144,168,192,216,240,263)

time_vals <- as.numeric(time_vals)

# I am going to split the graphs per individual timepoints - otherwise the graphs will be way too chaotic
for(g in 1:9){
  
  now <- subset(salt_all, salt_all$Time == time_vals[g])
  
  # And then I am looping the graphs for individual traits:
  for(p in 5:29){
    target <- now[,c(1:4,p)]
    real_pheno <- colnames(target)[5]
    colnames(target)[5] <- "pheno"
    
    # This part will help to order the mutants as we want them:
    target$Plant.Name <- factor(target$Plant.Name,levels = c("Col-0", "avp-3", "duf247", "R12a-B", "nyc2-1", "nyc2-2"))
    
    pdf(paste("salt_mutants_",real_pheno,"_", time_vals[g],"h.pdf", sep=""))
    
    my_box_plot <- ggplot(data = target, mapping = aes(x = Plant.Name, y = pheno, colour = Plant.Name)) 
    my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
    my_box_plot <- my_box_plot + facet_grid(Tray.Info~.)
    my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
    my_box_plot <- my_box_plot + theme(legend.position = "none")
    my_box_plot <- my_box_plot + xlab("")
    my_box_plot <- my_box_plot + ylab(real_pheno)
    my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
    my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0") 
    my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "goldenrod4", "goldenrod", "gold", "lightgoldenrod", "khaki"))
    print(my_box_plot)
    dev.off()
  }
}

ok - let’s do the same mega-loop for all the other mutant groups:

e.g. Auxin

yes_pheno <- c("Plant.Name", "Time", "Tray.Info", "Plant.ID", "AREA_MM", "PERIMETER_MM", "ROUNDNESS", "ISOTROPY", "COMPACTNESS", "ECCENTRICITY", "RMS", "SOL", "Fo", "Fm", "Fv", "QY_max", "Fm_Lss6", "Ft_Lss6", "Fo_Lss6", "Fv_Lss6", "Fq_Lss6", "Fv.Fm_Lss6", "QY_Lss6", "NPQ_Lss6", "qN_Lss6", "qP_Lss6", "qL_Lss6", "Temp.avg", "Temp.median")

day_yes <- select(day, yes_pheno)

keepers <- unique(aux_growth$Plant.ID)

salt_all <- subset(day_yes, day_yes$Plant.ID %in% keepers)

time_vals <- c(0,24,48,72,96,120,144,168,192,216,240,263)

time_vals <- as.numeric(time_vals)

for(g in 1:9){
  
  now <- subset(salt_all, salt_all$Time == time_vals[g])
  
  colnames(now)
  
  for(p in 5:29){
    target <- now[,c(1:4,p)]
    real_pheno <- colnames(target)[5]
    colnames(target)[5] <- "pheno"
    # This part will help to order the mutants as we want them:
    target$Plant.Name <- factor(target$Plant.Name,levels = c("Col-0", "iaa-14", "aux1-7"))

    
    pdf(paste("auxin_mutants_",real_pheno,"_", time_vals[g],"h.pdf", sep=""))
    
    my_box_plot <- ggplot(data = target, mapping = aes(x = Plant.Name, y = pheno, colour = Plant.Name)) 
    my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
    my_box_plot <- my_box_plot + facet_grid(Tray.Info~.)
    my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
    my_box_plot <- my_box_plot + theme(legend.position = "none")
    my_box_plot <- my_box_plot + xlab("")
    my_box_plot <- my_box_plot + ylab(real_pheno)
    my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
    my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0") 
    my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "magenta", "mediumpurple"))
    print(my_box_plot)
    dev.off()
  }
}

Comparing Green Hues

Let’s have a look at the greenness traits and how to analyze them.

First what we need to do - is to select the genotypes that are specific to your group of interest - let’s say salt stress related genes

But… we also want to use ONLY the plants that we selected based on the growth rate. So we can use the Plant.ID from “salt_growth” file, which contains only the curated plants!

day <- day_data

keepers <- unique(salt_growth$Plant.ID)

salt_all <- subset(day, day$Plant.ID %in% keepers)

salt_all

# This part will help to order the mutants as we want them:
salt_all$Plant.Name <- factor(salt_all$Plant.Name,levels = c("Col-0", "avp-3", "duf247", "R12a-B", "nyc2-1", "nyc2-2"))

Then - we have to retrieve to summary statistics, because otherwise we are not able to present all the hues in one graph:

library("doBy")

col_sum <- summaryBy(Hue1 + Hue2 + Hue3 + Hue4 + Hue5 + Hue6 + Hue7 + Hue8 + Hue9 ~ Plant.Name + Tray.Info + Time, data=salt_all, FUN = function(x) c(m = mean(x)))

head(col_sum)

Before we can make a plot, we actually have to re-shape this table into a table containing all the hues in one column.

col_sum_2 <- melt(col_sum, id=c("Time","Plant.Name", "Tray.Info"))
head(col_sum_2)

d <- ggplot(col_sum_2, aes(x= Time, y= value, fill= variable)) + geom_area() 
d <- d + facet_grid(Plant.Name ~ Tray.Info) 
# Let's replace the colors by actual colors of the hues:
d <- d + scale_fill_manual(values=c("#6E6F5A", "#343822", "#455413","#455536","#577146", "#597120","#728458","#738636", "#909858")) 
d <- d + xlab("hours after salt treatment")
d <- d + ylab("Area(pixels)")
d <- d + labs(title = "Change in Greenness Hues")
d <- d + guides(fill=FALSE)
d

You can also choose to plot the graphs per individual genotype, for example:

Col_col_sum <- subset(col_sum_2, col_sum_2$Plant.Name == "Col-0")

d <- ggplot(Col_col_sum, aes(x= Time, y= value, fill= variable)) + geom_area() 
d <- d + facet_grid(Plant.Name ~ Tray.Info) 
# Let's replace the colors by actual colors of the hues:
d <- d + scale_fill_manual(values=c("#6E6F5A", "#343822", "#455413","#455536","#577146", "#597120","#728458","#738636", "#909858")) 
d <- d + xlab("hours after salt treatment")
d <- d + ylab("Area(pixels)")
d <- d + labs(title = "Change in Greenness Hues")
d <- d + guides(fill=FALSE)
d

However, the above graph does not really give you an idea of the relative abundance of greenness hues. You can also do this by calculating the hues as percentage of total pixels:

head(salt_all)

salt_all$Hue1.p <- salt_all$Hue1/salt_all$AREA_PX
salt_all$Hue2.p <- salt_all$Hue2/salt_all$AREA_PX
salt_all$Hue3.p <- salt_all$Hue3/salt_all$AREA_PX
salt_all$Hue4.p <- salt_all$Hue4/salt_all$AREA_PX
salt_all$Hue5.p <- salt_all$Hue5/salt_all$AREA_PX
salt_all$Hue6.p <- salt_all$Hue6/salt_all$AREA_PX
salt_all$Hue7.p <- salt_all$Hue7/salt_all$AREA_PX
salt_all$Hue8.p <- salt_all$Hue8/salt_all$AREA_PX
salt_all$Hue9.p <- salt_all$Hue9/salt_all$AREA_PX

col_sum_p <- summaryBy(Hue1.p + Hue2.p + Hue3.p + Hue4.p + Hue5.p + Hue6.p + Hue7.p + Hue8.p + Hue9.p ~ Plant.Name + Tray.Info + Time, data=salt_all, FUN = function(x) c(m = mean(x)))

head(col_sum_p)

col_sum_p_2 <- melt(col_sum_p, id=c("Time","Plant.Name", "Tray.Info"))
head(col_sum_p_2)

And now let’s plot this into a new graph:

d <- ggplot(col_sum_p_2, aes(x= Time, y= value, fill= variable)) + geom_area() 
d <- d + facet_grid(Plant.Name ~ Tray.Info) 
# Let's replace the colors by actual colors of the hues:
d <- d + scale_fill_manual(values=c("#6E6F5A", "#343822", "#455413","#455536","#577146", "#597120","#728458","#738636", "#909858")) 
d <- d + xlab("hours after salt treatment")
d <- d + ylab("Percentage of Area")
d <- d + labs(title = "Change in Greenness Hues")
d <- d + guides(fill=FALSE)
d

And per-plant graph command would look like:

Col_col_sum_p_2 <- subset(col_sum_p_2, col_sum_p_2$Plant.Name == "Col-0")

d <- ggplot(Col_col_sum_p_2, aes(x= Time, y= value, fill= variable)) + geom_area() 
d <- d + facet_grid(Plant.Name ~ Tray.Info) 
# Let's replace the colors by actual colors of the hues:
d <- d + scale_fill_manual(values=c("#6E6F5A", "#343822", "#455413","#455536","#577146", "#597120","#728458","#738636", "#909858")) 
d <- d + xlab("hours after salt treatment")
d <- d + ylab("Percentage of Area")
d <- d + labs(title = "Change in Greenness Hues")
d <- d + guides(fill=FALSE)
d

PSI Data Analysis primer