In order to process large phenotyping data set, it is better to streamline this process as much as possible. In order to provide a primer for the less-experienced in R, and those who don’t know where to start, I wrote this small example of data analysis and visualization.
Remember - as the data that you are about to process is rather large - it is much better to document each step of your analysis to make sure it is as reproducible as possible. In the example data set I am using a collection of various mutants in Arabidopsis that were grown under “control” and “salt stress” conditions.
OK - the data that is coming from the PSI machine is usually coming in separate .csv files > in order to make life easier for us we are going to combine all the phenotype containing files into one dataset:
# load the datasets:
morpho_data <- read.csv("Rgb_Morpho_Plant.csv")
head(morpho_data)
In the example morpho_data: - the treatment is indicated in collumn “Tray.Info” - the genotype is indicated in collumns “Plant.Name” - the individual plant identifier is indicated in collumn “Plant.ID” - the collumns from “AREA_PX” to “SOL” represent the morphometric traits derived from the black-and-white image
Next we have data from other measuring stations as well:
# data from fluorescent camera:
fc_data <- read.csv("Fc_Plant.csv")
# data from thermal/infrared camera:
ir_data <- read.csv("Ir_Plant.csv")
# data from RGB camera with color segmentation:
# WARNING - this data used to be not exported correctly - because the names of the HUEs was also containing semicolons - and required some manual manipulations of the table before importing it into R.
col_data <- read.csv("Rgb_Color_Plant.csv")
# data from scales measuring the weight of each pot:
sc_data <- read.csv("ScalesMeasure.csv")
Let’s combine all the datasets (except scales) together:
library(reshape2)
# We will use columns "Tray.Info", "Round.Order", "Plant.ID" to make sure we are matching the right columns between individual datasets.
# In order to avoid unneccessary columns - we are going to first trim the datasets:
col_data <- col_data[,c(1:2,4:9,13:21)]
morpho_data <- morpho_data[,c(4:5,7,13:23)]
fc_data <- fc_data[,c(4:5,7,13:95)]
ir_data <- ir_data[,c(4:5,7,13:17)]
all_data <- merge(col_data, morpho_data, by=c("Tray.ID", "Round.Order", "Plant.ID"))
all_data <- merge(all_data, fc_data, by=c("Tray.ID", "Round.Order", "Plant.ID"))
# Then I have a little problem - because my IR is measured in a round before the full cycle - and the round order numbers are different. So for me to be able to merge two dataset I need to create new column "Round.Order2"
ir_data$Round.Order2 <- ir_data$Round.Order+1
# and then let's replace the original Round.Order by this tricky round order and remove it from the dataset:
ir_data$Round.Order <- ir_data$Round.Order2
ir_data <- ir_data[,1:8]
# and now we can fuse all together:
all_data <- merge(all_data, ir_data, by=c("Tray.ID", "Round.Order", "Plant.ID"))
head(all_data)
# Remove rows containing missing data:
all_data <- na.omit(all_data)
dim(all_data)
## [1] 9772 116
Just because I am super freaky about the reproducibility of my research - I usually can save this file in the working directory as well
write.csv(all_data, "pheno_data.csv", row.names = F)
One of the best examples of dynamic change in the phenotype is growth. Let’s have a look at how all the plants grow in our experiments - for this we will use the most simple plot function in R called just like this - “plot”.
plot(all_data$AREA_MM ~ all_data$Round.Order, type="l")
The above graph cannot be really described as the most informative graph ever. And so we will try to put more layers into this data visualization by using the ggplot2 library.
library(ggplot2)
Area_lgraph <- ggplot(data=all_data, aes(x= Round.Order, y=AREA_MM, group = Plant.ID))
Area_lgraph <- Area_lgraph + geom_line()
Area_lgraph <- Area_lgraph + scale_colour_manual(values = c("grey", "blue"))
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
# geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Area_lgraph <- Area_lgraph + facet_grid(~ Tray.Info)
Area_lgraph <- Area_lgraph + ylab("Area (mm2)") + xlab("Round") + theme(legend.position='none')
Area_lgraph
The above graph is a bit better - I can see how my plants are growing under “control” and “salt stress” conditions. However, it would be nice to see whether the lines that show very little growth are in any way unusual - for example the plant died before the experiment even started.
I can do it by going back to my raw data/pictures and validating if I have the reason to excluse a specific plant from my analysis.
I can add the sample ID information (containing Tray number, plant location and its genotype) to the label and make the graph interactive - so the label shows as soon as I hover over the line with my cursor. This interactive feature is available in “plotly” library, and plotting the graph using ggplotly() function.
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
ggplotly(Area_lgraph)
As we will need another unit of time - which is different from “Round”, to model our data at later time point - we will need to create another variable called “Time”. Let’s look at the unique number of rounds & days to see our measurement period:
length(unique(all_data$Measuring.Date))
## [1] 15
length(unique(all_data$Round.Order))
## [1] 28
In total we have 28 rounds of measurements performed over 15 days. The salt stress was applied on 28th of November 2018, which is the first day of measurements.
But - I see something rather funny going on at round 57 - as the Area of all my plants is decreasing. I just checked in my calendar and it appears that during this day, there was an error of the PSI, and some of my plants were stuck in the dark during the meaeurement. So I would prefer to remove all the datapoints collected after this time point.
all_data <- subset(all_data, all_data$Round.Order < 57)
And now - let’s make the variable representing the time (in hours) as the plants are measured every 12h. But the round order is only represented by the uneven numbers - So each round - starting with round 11 - represents a time after salt stress - but as they have increments of 2 per time point - we need to divide the constitutive “round number” by 2, to get into hours
So: Time (h) = ((Round.Order - 11)/2) * 12
all_data$Time <- ((all_data$Round.Order-11)/2)*12
OK - let’s see how the graph looks like now:
Area_lgraph <- ggplot(data=all_data, aes(x= Time, y=AREA_MM, group = Plant.ID))
Area_lgraph <- Area_lgraph + geom_line()
Area_lgraph <- Area_lgraph + scale_colour_manual(values = c("grey", "blue"))
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
# geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Area_lgraph <- Area_lgraph + facet_grid(~ Tray.Info)
Area_lgraph <- Area_lgraph + ylab("Area (mm2)") + xlab("time (h)") + theme(legend.position='none')
ggplotly(Area_lgraph)
Another thing that I can see from the graph is that there is a slight zig-zag in the lines - this is most possibly due to the movement of the leafs during the day-night rythm. As for the Area I dont want to include the night time changes - I only want to include the day measurements.
As my day-time measurement is taken around 16:00 and the night time around 4:00 I am going to subset my data into day and night data. So round numbers 13, 17, 21, 25, 29, 33, 37, 41, 43, 49, 53, 57, 61, 65 are the measurements taken at night.
day_data <- subset(all_data, all_data$Round.Order %in% c(11,15,19,23,27,31,35,39,43,47,51,55,59,63))
night_data <- subset(all_data, all_data$Round.Order %in% c(13,17,21,25,29,33,37,41,43,49,53,57,61,65))
And let’s look at the graph now:
Area_lgraph <- ggplot(data=day_data, aes(x= Time, y=AREA_MM, group = Plant.ID))
Area_lgraph <- Area_lgraph + geom_line()
Area_lgraph <- Area_lgraph + scale_colour_manual(values = c("grey", "blue"))
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
# geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Area_lgraph <- Area_lgraph + facet_grid(~ Tray.Info)
Area_lgraph <- Area_lgraph + ylab("Area (mm2)") + xlab("time (h)") + theme(legend.position='none')
ggplotly(Area_lgraph)
That looks much better - we can also represent the time as “day” rather than an hour - and so you can create another column called “day” and plot the results per day rather than per h:
day_data$Day <- day_data$Time/24
Area_lgraph <- ggplot(data=day_data, aes(x= Day, y=AREA_MM, group = Plant.ID))
Area_lgraph <- Area_lgraph + geom_line()
Area_lgraph <- Area_lgraph + scale_colour_manual(values = c("grey", "blue"))
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
# geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Area_lgraph <- Area_lgraph + facet_grid(~ Tray.Info)
Area_lgraph <- Area_lgraph + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
ggplotly(Area_lgraph)
OK - so now you have your “dream dataset” that you know is correct, we can now fit some exponential curves to this.
So you want to probably do it for each genotype separately - so let’s start with making a subset for Col-0 (wt) and plot the data “as is” without any fitting of the curves:
Col0_data <- subset(day_data, day_data$Plant.Name == "Col-0")
Col_Area_lgraph <- ggplot(data=Col0_data, aes(x= Day, y=AREA_MM, group = Plant.ID))
Col_Area_lgraph <- Col_Area_lgraph + geom_line()
Col_Area_lgraph <- Col_Area_lgraph + scale_colour_manual(values = c("grey", "blue"))
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
# geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Col_Area_lgraph <- Col_Area_lgraph + facet_grid(~ Tray.Info)
Col_Area_lgraph <- Col_Area_lgraph + ylab("Area (mm2)") + xlab("Round") + theme(legend.position='none')
ggplotly(Col_Area_lgraph)
So let’s do some curve fitting now! Let’s do a summary statistics for our genotype and separate it per condition. We will do it using the “doBy” library
library(doBy)
sum_ski <- summaryBy(AREA_MM ~ Tray.Info + Day, data = Col0_data)
sum_ski
# let's split the dataset so it will be easier to modell
Col_C <- subset(sum_ski, sum_ski$Tray.Info == "Control")
Col_S <- subset(sum_ski, sum_ski$Tray.Info == "Salt ")
Col_C
Col_S
In the next step, we will transform the Area with the log() function - to enforce it running linearly, and save the outputs of individal models as an object (this will come in handy in next step):
# remove all the rows with missing values - the lm function is sensitive to so called incomplete cases:
Area_mm2_c <- Col_C$AREA_MM.mean
time_d <- Col_C$Day
model_C <- lm(log(Area_mm2_c)~ time_d)
model_C
##
## Call:
## lm(formula = log(Area_mm2_c) ~ time_d)
##
## Coefficients:
## (Intercept) time_d
## 4.975 0.261
Area_mm2_s <- Col_S$AREA_MM.mean
time_d <- Col_S$Day
model_S <- lm(log(Area_mm2_s)~ time_d)
model_S
##
## Call:
## lm(formula = log(Area_mm2_s) ~ time_d)
##
## Coefficients:
## (Intercept) time_d
## 4.8183 0.2602
So now we have our functions which would basically equate to:
Area = e^(Intercept + DELTA * time)
Where the Intercept is the first value out of the model, and the DELTA is the second value.
Now we have to use those values to see what the Area over time SHOULD be like:
timevalues <- unique(time_d)
Area.C <- exp(predict(model_C,list(Time=timevalues)))
Area.S <- exp(predict(model_S,list(Time=timevalues)))
Area.C
## 1 2 3 4 5 6 7
## 144.7144 187.8748 243.9075 316.6517 411.0914 533.6974 692.8700
## 8 9 10 11 12
## 899.5150 1167.7908 1516.0786 1968.2415 2555.2596
Let’s integrate now all of these values into the table with the actual Area values:
# create a dataframe with 12 rows - because we have 12 unique time points
Open_for_model <-data.frame(Day=numeric(12),Plant.ID=character(12), Plant.Name=character(12), Tray.Info=character(12), AREA_MM=numeric(12))
Open_for_model
head(Open_for_model)
Control <- Open_for_model
Salt <- Open_for_model
unique(time_d)
## [1] 0 1 2 3 4 5 6 7 8 9 10 11
Control$Day <- unique(time_d)
Salt$Day <- unique(time_d)
Control$Plant.ID <- "model"
Salt$Plant.ID <- "model"
Control$Plant.Name <- "model_Col-0"
Salt$Plant.Name <- "model_Col-0"
Control$Tray.Info <- "Control"
Salt$Tray.Info <- "Salt "
Control$AREA_MM <- Area.C
Salt$AREA_MM <- Area.S
model_total <- rbind(Control, Salt)
head(model_total)
head(Col0_data)
Col0_data_relevant <- subset(Col0_data, select = c("Day", "Plant.ID", "Plant.Name", "Tray.Info", "AREA_MM"))
head(Col0_data_relevant)
head(model_total)
Col2 <- rbind(Col0_data_relevant, model_total)
Col2
Now let’s try to plot this:
ggplot(data=Col2, aes(x= Day, y=AREA_MM, group = Plant.ID, color = Plant.Name)) +
geom_line() +
scale_colour_manual(values = c("grey", "blue")) +
ylim(0, 2500) +
facet_grid(~ Tray.Info) +
ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Now the above calculations are only going to give you an insight for the AVERAGE of the population of plants - but let’s say you would like to fit a function per individual plant, and then extract the DELTA and INTERCEPT from the exponential model, and do some statistics with that.
We would have to extract one individual plant and run the following command:
names <- c(text="Plant.ID", "Plant.Name", "Tray.Info", "intercept", "delta")
growth_factors <- data.frame()
for (k in names) growth_factors[[k]] <- as.character()
growth_factors
# i=1
# run for the first sample:
i=1
uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
uni
#plot(uni$Area ~ uni$time.day.)
# let's get also all the individual identifiers in the table:
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))
# let's calculate the model:
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])
# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues
## [1] 0 1 2 3 4 5 6 7 8 9 10 11
Area.pred <- exp(predict(model_C,list(Time=timevalues)))
Area.pred
## 1 2 3 4 5 6
## 88.97007 116.40671 152.30429 199.27198 260.72361 341.12572
## 7 8 9 10 11 12
## 446.32229 583.95945 764.04124 999.65678 1307.93158 1711.27237
uni$Area_pred <- Area.pred
done <- uni
done
Then - let’s loop this for all the other individual plants of Col-0 - note that we don’t have to separate per treatment here as that is already included in Plant.ID!
for(i in 2:length(unique(Col0_data$Plant.ID))){
uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))
# let's calculate the model:
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])
# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues
Area.pred <- exp(predict(model_C,list(Time=timevalues)))
uni$Area_pred <- Area.pred
done <- rbind(done, uni)
}
growth_factors
done
Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID))
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot + facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot
library(plotly)
ggplotly(Col_plot)
quartz.save("Col-0_growth_fitted.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
This data looks pretty awesome and I do not feel an urge to remove any outliers. Let’s check other genotypes that are in the dataset. But before we do it - let’s keep the essential numbers “safe”:
# to keep the data safe - we will save this as growth_factors2 and done2
growth_factors2 <- growth_factors
done2 <- done
dim(growth_factors2)
## [1] 17 5
You can repeat it for all the genotypes that you have in your setup by using the below script.
In here an example of how to run it for genotype “iaa-14”
unique(day_data$Plant.Name)
## [1] iaa-14 avp-3 ahp1/2/3 ein3 ein2-1 duf247
## [7] R12a-B nyc2-1 nyc2-2 eir1-1 gai aux1-7
## [13] acs2/4/5/6 ahp2/3/4 ein3/eil1 ahp1-5 eto-1 abi1-1
## [19] etr-1 Col-0
## 20 Levels: abi1-1 acs2/4/5/6 ahp1-5 ahp1/2/3 ahp2/3/4 aux1-7 ... R12a-B
# next one in line is iaa-14
Col0_data <- subset(day_data, day_data$Plant.Name == "iaa-14")
Col_Area_lgraph <- ggplot(data=Col0_data, aes(x= Day, y=AREA_MM, group = Plant.ID))
Col_Area_lgraph <- Col_Area_lgraph + geom_line()
Col_Area_lgraph <- Col_Area_lgraph + scale_colour_manual(values = c("grey", "blue"))
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
# geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Col_Area_lgraph <- Col_Area_lgraph + facet_grid(~ Tray.Info)
Col_Area_lgraph <- Col_Area_lgraph + ylab("Area (mm2)") + xlab("Round") + theme(legend.position='none')
ggplotly(Col_Area_lgraph)
names <- c(text="Plant.ID", "Plant.Name", "Tray.Info", "intercept", "delta")
growth_factors <- data.frame()
for (k in names) growth_factors[[k]] <- as.character()
growth_factors
# i=1
# run for the first sample:
i=1
uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
uni
#plot(uni$Area ~ uni$time.day.)
# let's get also all the individual identifiers in the table:
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))
# let's calculate the model:
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])
# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues
## [1] 0 1 2 3 4 5 6 7 8 9 10 11
Area.pred <- exp(predict(model_C,list(Time=timevalues)))
Area.pred
## 1 2 3 4 5 6
## 88.87625 117.65016 155.73969 206.16081 272.90589 361.25985
## 7 8 9 10 11 12
## 478.21864 633.04312 837.99242 1109.29457 1468.43147 1943.83985
uni$Area_pred <- Area.pred
done <- uni
done
for(i in 2:length(unique(Col0_data$Plant.ID))){
uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))
# let's calculate the model:
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])
# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues
Area.pred <- exp(predict(model_C,list(Time=timevalues)))
uni$Area_pred <- Area.pred
done <- rbind(done, uni)
}
growth_factors
done
Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID))
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot + facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot
library(plotly)
ggplotly(Col_plot)
quartz.save("iaa-14_growth_fitted.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
growth_factors2 <- rbind(growth_factors2, growth_factors)
done2 <- rbind(done2, done)
OK - how about you would like to run the script on data that does contain outliers. Let’s re-run the part of the script that get’s us to the stage where we have the interactive graph.
I am using the subset of the data for genotype nyc2-1 for which I know that there are some plants that died during the experiment and thus show odd growth curves:
Col0_data <- subset(day_data, day_data$Plant.Name == "nyc2-1")
Col_Area_lgraph <- ggplot(data=Col0_data, aes(x= Day, y=AREA_MM, group = Plant.ID))
Col_Area_lgraph <- Col_Area_lgraph + geom_line()
Col_Area_lgraph <- Col_Area_lgraph + scale_colour_manual(values = c("grey", "blue"))
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
# geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Col_Area_lgraph <- Col_Area_lgraph + facet_grid(~ Tray.Info)
Col_Area_lgraph <- Col_Area_lgraph + ylab("Area (mm2)") + xlab("Round") + theme(legend.position='none')
ggplotly(Col_Area_lgraph)
names <- c(text="Plant.ID", "Plant.Name", "Tray.Info", "intercept", "delta")
growth_factors <- data.frame()
for (k in names) growth_factors[[k]] <- as.character()
growth_factors
# i=1
# run for the first sample:
i=1
uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
uni
#plot(uni$Area ~ uni$time.day.)
# let's get also all the individual identifiers in the table:
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))
# let's calculate the model:
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])
# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues
## [1] 0 1 2 3 4 5 6 7 8 9 10 11
Area.pred <- exp(predict(model_C,list(Time=timevalues)))
Area.pred
## 1 2 3 4 5 6 7
## 79.6454 107.6941 145.6207 196.9040 266.2476 360.0120 486.7974
## 8 9 10 11 12
## 658.2327 890.0425 1203.4886 1627.3210 2200.4144
uni$Area_pred <- Area.pred
done <- uni
done
for(i in 2:length(unique(Col0_data$Plant.ID))){
uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))
# let's calculate the model:
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])
# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues
Area.pred <- exp(predict(model_C,list(Time=timevalues)))
uni$Area_pred <- Area.pred
done <- rbind(done, uni)
}
growth_factors
done
Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID))
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot + facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot
library(plotly)
ggplotly(Col_plot)
# to save run the un-hashed command below:
#quartz.save("nyc2-1_growth_fitted.pdf", type= "pdf", dpi=200)
In the above graph we have a sample that I would classify as an outlier - “PS_Tray_001_Salt_D1_nyc2-1”.
We can remove it from the “done” object and re-plot the data - saving it under a different name:
pls_remove <- c("PS_Tray_001_Salt _D1_nyc2-1")
dim(done)
## [1] 196 119
done <- subset(done, !(done$Plant.ID %in% pls_remove))
growth_factors <- subset(growth_factors, !(growth_factors$Plant.ID %in% pls_remove))
dim(done)
## [1] 192 119
Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID))
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot + facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot
library(plotly)
ggplotly(Col_plot)
#quartz.save("nyc2-1_growth_fitted_no_outl.pdf", type= "pdf", dpi=200)
growth_factors2 <- rbind(growth_factors2, growth_factors)
done2 <- rbind(done2, done)
If you have more than one plant that you would like to remove from your curated dataset, please see below on how to remove multiple lines per one time:
Col0_data <- subset(day_data, day_data$Plant.Name == "gai")
Col_Area_lgraph <- ggplot(data=Col0_data, aes(x= Day, y=AREA_MM, group = Plant.ID))
Col_Area_lgraph <- Col_Area_lgraph + geom_line()
Col_Area_lgraph <- Col_Area_lgraph + scale_colour_manual(values = c("grey", "blue"))
#Area_lgraph <- Area_lgraph + ylim (0, 2000)
# geom_smooth(method="lm", aes(color="Exp Model"), formula= (y ~ exp(x)), se=FALSE, linetype = 1) +
Col_Area_lgraph <- Col_Area_lgraph + facet_grid(~ Tray.Info)
Col_Area_lgraph <- Col_Area_lgraph + ylab("Area (mm2)") + xlab("Round") + theme(legend.position='none')
ggplotly(Col_Area_lgraph)
names <- c(text="Plant.ID", "Plant.Name", "Tray.Info", "intercept", "delta")
growth_factors <- data.frame()
for (k in names) growth_factors[[k]] <- as.character()
growth_factors
# i=1
# run for the first sample:
i=1
uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
uni
#plot(uni$Area ~ uni$time.day.)
# let's get also all the individual identifiers in the table:
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))
# let's calculate the model:
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])
# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues
## [1] 0 1 2 3 4 5 6 7 8 9 10 11
Area.pred <- exp(predict(model_C,list(Time=timevalues)))
Area.pred
## 1 2 3 4 5 6 7
## 33.16733 45.01368 61.09119 82.91108 112.52437 152.71461 207.25957
## 8 9 10 11 12
## 281.28632 381.75314 518.10362 703.15430 954.29939
uni$Area_pred <- Area.pred
done <- uni
done
for(i in 2:length(unique(Col0_data$Plant.ID))){
uni <- subset(Col0_data, Col0_data$Plant.ID == unique(Col0_data$Plant.ID)[i])
growth_factors[i,1] <- as.character(unique(uni$Plant.ID))
growth_factors[i,2] <- as.character(unique(uni$Plant.Name))
growth_factors[i,3] <- as.character(unique(uni$Tray.Info))
# let's calculate the model:
Area_mm2 <- uni$AREA_MM
time_d <- uni$Day
model_C <- lm(log(Area_mm2)~ time_d)
# add model parts into the main table with growth factors
growth_factors[i,4] <- as.numeric(model_C$coefficients[[1]])
growth_factors[i,5] <- as.numeric(model_C$coefficients[[2]])
# calculate predicted Area values for this specific sample:
timevalues <- unique(time_d)
timevalues
Area.pred <- exp(predict(model_C,list(Time=timevalues)))
uni$Area_pred <- Area.pred
done <- rbind(done, uni)
}
## Warning in predict.lm(model_C, list(Time = timevalues)): prediction from a
## rank-deficient fit may be misleading
## Warning in predict.lm(model_C, list(Time = timevalues)): prediction from a
## rank-deficient fit may be misleading
growth_factors
done
Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID))
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot + facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot
library(plotly)
ggplotly(Col_plot)
quartz.save("gai_growth_fitted.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
#growth_factors2 <- rbind(growth_factors2, growth_factors)
#done2 <- rbind(done2, done)
Again - I spotted some plants that did not grow at all:
pls_remove <- c("PS_Tray_053_Control_C1_gai", "PS_Tray_128_Control_B4_gai", "PS_Tray_110_Control_C3_gai", "PS_Tray_108_Salt _D1_gai", "PS_Tray_097_Salt _D3_gai")
dim(done)
## [1] 128 119
done <- subset(done, !(done$Plant.ID %in% pls_remove))
growth_factors <- subset(growth_factors, !(growth_factors$Plant.ID %in% pls_remove))
dim(done)
## [1] 93 119
Col_plot <- ggplot(data=done, aes(x= Day, y=Area_pred, group = Plant.ID))
Col_plot <- Col_plot + geom_line() + scale_colour_manual(values = c("grey", "blue"))
Col_plot <- Col_plot + facet_grid(~ Tray.Info) + ylab("Area (mm2)") + xlab("time (days)") + theme(legend.position='none')
Col_plot <- Col_plot + ylim(0, 4000)
Col_plot
library(plotly)
ggplotly(Col_plot)
quartz.save("gai_growth_fitted_no_outl.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
growth_factors2 <- rbind(growth_factors2, growth_factors)
done2 <- rbind(done2, done)
Once you have all your growth rates calculated, you can then do some analysis and examine the differences between your genotypes / conditions.
But first - let’s write the results of the analysis:
unique(growth_factors2$Plant.Name)
## [1] "Col-0" "iaa-14" "nyc2-1" "gai"
# Save the growth factors
#write.csv(growth_factors2, "RGR_PSI_experiment.csv", row.names = F)
# if you didn't re-run the above analysis for all the different mutants that I have in my dataset, it is probably for the best if you load the growth rate data from my (already curated) file:
growth_factors2 <- read.csv("RGR_PSI_experiment_Magda.csv")
Now we can do some plotting. As the box-plots and bar-plots are not the best way of representing the entity of your data, we will use the dot-plots and plot the growth rate of all the mutants:
library(ggbeeswarm)
library(gapminder)
library(RColorBrewer)
library(ggridges)
##
## Attaching package: 'ggridges'
## The following object is masked from 'package:ggplot2':
##
## scale_discrete_manual
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
##
## ggsave
head(growth_factors2)
growth_factors2$intercept <- as.numeric(growth_factors2$intercept)
growth_factors2$delta <- as.numeric(growth_factors2$delta)
my_box_plot <- ggplot(data = growth_factors2, mapping = aes(x = Plant.Name, y = delta, colour = Plant.Name))
#my_box_plot <- my_box_plot + geom_boxplot()
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot
## Warning: Removed 2 rows containing non-finite values (stat_summary).
## Warning: Removed 2 rows containing missing values (position_beeswarm).
quartz.save("all_mutants_RGR_Delta.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
Now let’s to it for the other component of the growth rate - the intercept that is also in the exponent of our growth function:
my_box_plot <- ggplot(data = growth_factors2, mapping = aes(x = Plant.Name, y = intercept, colour = Plant.Name))
#my_box_plot <- my_box_plot + geom_boxplot()
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot
quartz.save("all_mutants_RGR_Intercept.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
OK - as you probably notice - this graph is way too dense with the different mutants.
Thus we need to divide it into different groups. In my case I will segregate them based on hormone:
auxin <- c("Col-0", "iaa-14", "aux1-7")
ethylene <- c("Col-0", "ein-3", "ein2-1", "eir1-1", "acs2/4/5/6", "ein3/eil1", "eto-1", "etr-1")
cytokinin <- c("Col-0", "ahp1/2/3", "ahp2/3/4", "ahp1-5")
aba <- c("Col-0", "abi1-1")
gibberelin <- c("Col-0", "gai")
salt <- c("Col-0", "avp-3", "duf247", "R12a-B", "nyc2-1", "nyc2-2")
# make separate datasets for individual hormones:
aux_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% auxin)
eth_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% ethylene)
cyto_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% cytokinin)
aba_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% aba)
gib_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% gibberelin)
salt_growth <- subset(growth_factors2, growth_factors2$Plant.Name %in% salt)
# This part will help to order the mutants as we want to plot them (wt always first):
aux_growth$Plant.Name <- factor(aux_growth$Plant.Name,levels = c("Col-0", "iaa-14", "aux1-7"))
eth_growth$Plant.Name <- factor(eth_growth$Plant.Name,levels = c("Col-0", "ein-3", "ein2-1", "eir1-1", "acs2/4/5/6", "ein3/eil1", "eto-1", "etr-1"))
cyto_growth$Plant.Name <- factor(cyto_growth$Plant.Name,levels = c("Col-0", "ahp1/2/3", "ahp2/3/4", "ahp1-5"))
aba_growth$Plant.Name <- factor(aba_growth$Plant.Name,levels = c("Col-0", "abi1-1"))
gib_growth$Plant.Name <- factor(gib_growth$Plant.Name,levels = c("Col-0", "gai"))
salt_growth$Plant.Name <- factor(salt_growth$Plant.Name,levels = c("Col-0", "avp-3", "duf247", "R12a-B", "nyc2-1", "nyc2-2"))
Then we can plot the graphs out of the individual data sets:
my_box_plot <- ggplot(data = aux_growth, mapping = aes(x = Plant.Name, y = delta, colour = Plant.Name))
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot
quartz.save("auxin_mutants_RGR_delta.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
And now - we can also automatically add p-values
library(ggpubr)
## Loading required package: magrittr
##
## Attaching package: 'ggpubr'
## The following object is masked from 'package:cowplot':
##
## get_legend
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0")
my_box_plot
Just so we have consistent colours for different mutants across different graphs:
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "magenta", "mediumpurple"))
my_box_plot
# Let's save it:
quartz.save("auxin_mutants_RGR_delta.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
Then - let’s combine all of the above into one for the graph for Intercept:
my_box_plot <- ggplot(data = aux_growth, mapping = aes(x = Plant.Name, y = intercept, colour = Plant.Name))
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0")
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "magenta", "mediumpurple"))
my_box_plot
quartz.save("auxin_mutants_RGR_Intercept.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
Then - let’s do it for other hormone groups - ethylene:
my_box_plot <- ggplot(data = eth_growth, mapping = aes(x = Plant.Name, y = delta, colour = Plant.Name))
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0")
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "dodgerblue", "cadetblue", "aquamarine", "darkturquoise", "darkslategray4", "darkslategray2", "cyan4"))
my_box_plot
quartz.save("ethylene_mutants_RGR_Delta.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
my_box_plot <- ggplot(data = eth_growth, mapping = aes(x = Plant.Name, y = intercept, colour = Plant.Name))
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_wrap(~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0")
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "dodgerblue", "cadetblue", "aquamarine", "darkturquoise", "darkslategray4", "darkslategray2", "cyan4"))
my_box_plot
quartz.save("ethylene_mutants_RGR_Intercept.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
Let’s have a look at the traits other than Area - for example Slenderness of Leaves (SOL) or other photosynthesis traits.
First what we need to do - is to select the genotypes that are specific to your group of interest - let’s say salt stress related genes
But… we also want to use ONLY the plants that we selected based on the growth rate. So we can use the Plant.ID from “salt_growth” file, which contains only the curated plants!
day <- day_data
keepers <- unique(salt_growth$Plant.ID)
salt_all <- subset(day, day$Plant.ID %in% keepers)
salt_all
# This part will help to order the mutants as we want them:
salt_all$Plant.Name <- factor(salt_all$Plant.Name,levels = c("Col-0", "avp-3", "duf247", "R12a-B", "nyc2-1", "nyc2-2"))
my_box_plot <- ggplot(data = salt_all, mapping = aes(x = Plant.Name, y = SOL, colour = Plant.Name))
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_grid(Tray.Info ~ Time)
# facet_wrap(Day ~ Tray.Info)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0")
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "goldenrod4", "goldenrod", "gold", "lightgoldenrod", "khaki"))
my_box_plot
quartz.save("salt_mutants_SOL.pdf", type= "pdf", dpi=200)
## quartz_off_screen
## 2
This is going to be much to much phenotypes to plot into a simple notebook - So let’s make a list of interesting phenotypes and loop them to produce some graphs:
head(day)
colnames(day)
## [1] "Tray.ID" "Round.Order" "Plant.ID" "Measuring.Date"
## [5] "Measuring.Time" "Tray.Info" "Position" "Plant.Name"
## [9] "Hue1" "Hue2" "Hue3" "Hue4"
## [13] "Hue5" "Hue6" "Hue7" "Hue8"
## [17] "Hue9" "AREA_PX" "AREA_MM" "PERIMETER_PX"
## [21] "PERIMETER_MM" "ROUNDNESS" "ROUNDNESS2" "ISOTROPY"
## [25] "COMPACTNESS" "ECCENTRICITY" "RMS" "SOL"
## [29] "Size" "Fo" "Fm" "Fv"
## [33] "QY_max" "Fm_Lss1" "Fm_Lss2" "Fm_Lss3"
## [37] "Fm_Lss4" "Fm_Lss5" "Fm_Lss6" "Ft_Lss1"
## [41] "Ft_Lss2" "Ft_Lss3" "Ft_Lss4" "Ft_Lss5"
## [45] "Ft_Lss6" "Fo_Lss1" "Fo_Lss2" "Fo_Lss3"
## [49] "Fo_Lss4" "Fo_Lss5" "Fo_Lss6" "Fv_Lss1"
## [53] "Fv_Lss2" "Fv_Lss3" "Fv_Lss4" "Fv_Lss5"
## [57] "Fv_Lss6" "Fq_Lss1" "Fq_Lss2" "Fq_Lss3"
## [61] "Fq_Lss4" "Fq_Lss5" "Fq_Lss6" "Fv.Fm_Lss1"
## [65] "Fv.Fm_Lss2" "Fv.Fm_Lss3" "Fv.Fm_Lss4" "Fv.Fm_Lss5"
## [69] "Fv.Fm_Lss6" "QY_Lss1" "QY_Lss2" "QY_Lss3"
## [73] "QY_Lss4" "QY_Lss5" "QY_Lss6" "NPQ_Lss1"
## [77] "NPQ_Lss2" "NPQ_Lss3" "NPQ_Lss4" "NPQ_Lss5"
## [81] "NPQ_Lss6" "qN_Lss1" "qN_Lss2" "qN_Lss3"
## [85] "qN_Lss4" "qN_Lss5" "qN_Lss6" "qP_Lss1"
## [89] "qP_Lss2" "qP_Lss3" "qP_Lss4" "qP_Lss5"
## [93] "qP_Lss6" "qL_Lss1" "qL_Lss2" "qL_Lss3"
## [97] "qL_Lss4" "qL_Lss5" "qL_Lss6" "PAR1"
## [101] "PAR2" "PAR3" "PAR4" "PAR5"
## [105] "PAR6" "ETR1" "ETR2" "ETR3"
## [109] "ETR4" "ETR5" "ETR6" "Temp.avg"
## [113] "Temp.stddev" "Temp.median" "Temp.min" "Temp.max"
## [117] "Time" "Day"
Let’s focus only on the highest light intensity phenotypes - if we do not see any difference between treatments / genotypes in these traits, there is a little changce we will see something in lower light intensities (Lss < 6)
yes_pheno <- c("Plant.Name", "Time", "Tray.Info", "Plant.ID", "AREA_MM", "PERIMETER_MM", "ROUNDNESS", "ISOTROPY", "COMPACTNESS", "ECCENTRICITY", "RMS", "SOL", "Fo", "Fm", "Fv", "QY_max", "Fm_Lss6", "Ft_Lss6", "Fo_Lss6", "Fv_Lss6", "Fq_Lss6", "Fv.Fm_Lss6", "QY_Lss6", "NPQ_Lss6", "qN_Lss6", "qP_Lss6", "qL_Lss6", "Temp.avg", "Temp.median")
day_yes <- select(day, yes_pheno)
keepers <- unique(salt_growth$Plant.ID)
salt_all <- subset(day_yes, day_yes$Plant.ID %in% keepers)
time_vals <- c(0,24,48,72,96,120,144,168,192,216,240,263)
time_vals <- as.numeric(time_vals)
# I am going to split the graphs per individual timepoints - otherwise the graphs will be way too chaotic
for(g in 1:9){
now <- subset(salt_all, salt_all$Time == time_vals[g])
# And then I am looping the graphs for individual traits:
for(p in 5:29){
target <- now[,c(1:4,p)]
real_pheno <- colnames(target)[5]
colnames(target)[5] <- "pheno"
# This part will help to order the mutants as we want them:
target$Plant.Name <- factor(target$Plant.Name,levels = c("Col-0", "avp-3", "duf247", "R12a-B", "nyc2-1", "nyc2-2"))
pdf(paste("salt_mutants_",real_pheno,"_", time_vals[g],"h.pdf", sep=""))
my_box_plot <- ggplot(data = target, mapping = aes(x = Plant.Name, y = pheno, colour = Plant.Name))
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_grid(Tray.Info~.)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + ylab(real_pheno)
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0")
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "goldenrod4", "goldenrod", "gold", "lightgoldenrod", "khaki"))
print(my_box_plot)
dev.off()
}
}
ok - let’s do the same mega-loop for all the other mutant groups:
e.g. Auxin
yes_pheno <- c("Plant.Name", "Time", "Tray.Info", "Plant.ID", "AREA_MM", "PERIMETER_MM", "ROUNDNESS", "ISOTROPY", "COMPACTNESS", "ECCENTRICITY", "RMS", "SOL", "Fo", "Fm", "Fv", "QY_max", "Fm_Lss6", "Ft_Lss6", "Fo_Lss6", "Fv_Lss6", "Fq_Lss6", "Fv.Fm_Lss6", "QY_Lss6", "NPQ_Lss6", "qN_Lss6", "qP_Lss6", "qL_Lss6", "Temp.avg", "Temp.median")
day_yes <- select(day, yes_pheno)
keepers <- unique(aux_growth$Plant.ID)
salt_all <- subset(day_yes, day_yes$Plant.ID %in% keepers)
time_vals <- c(0,24,48,72,96,120,144,168,192,216,240,263)
time_vals <- as.numeric(time_vals)
for(g in 1:9){
now <- subset(salt_all, salt_all$Time == time_vals[g])
colnames(now)
for(p in 5:29){
target <- now[,c(1:4,p)]
real_pheno <- colnames(target)[5]
colnames(target)[5] <- "pheno"
# This part will help to order the mutants as we want them:
target$Plant.Name <- factor(target$Plant.Name,levels = c("Col-0", "iaa-14", "aux1-7"))
pdf(paste("auxin_mutants_",real_pheno,"_", time_vals[g],"h.pdf", sep=""))
my_box_plot <- ggplot(data = target, mapping = aes(x = Plant.Name, y = pheno, colour = Plant.Name))
my_box_plot <- my_box_plot + geom_beeswarm(alpha=0.6, priority = "density")
my_box_plot <- my_box_plot + facet_grid(Tray.Info~.)
my_box_plot <- my_box_plot + stat_summary(fun.y=mean, geom="point", shape=95, size=10, color="black", fill="black")
my_box_plot <- my_box_plot + theme(legend.position = "none")
my_box_plot <- my_box_plot + xlab("")
my_box_plot <- my_box_plot + ylab(real_pheno)
my_box_plot <- my_box_plot + theme(axis.text.x = element_text(angle=70, vjust=0.5))
my_box_plot <- my_box_plot + stat_compare_means(label = "p.signif", method = "t.test", ref.group = "Col-0")
my_box_plot <- my_box_plot + scale_color_manual(values=c("royalblue", "magenta", "mediumpurple"))
print(my_box_plot)
dev.off()
}
}
Let’s have a look at the greenness traits and how to analyze them.
First what we need to do - is to select the genotypes that are specific to your group of interest - let’s say salt stress related genes
But… we also want to use ONLY the plants that we selected based on the growth rate. So we can use the Plant.ID from “salt_growth” file, which contains only the curated plants!
day <- day_data
keepers <- unique(salt_growth$Plant.ID)
salt_all <- subset(day, day$Plant.ID %in% keepers)
salt_all
# This part will help to order the mutants as we want them:
salt_all$Plant.Name <- factor(salt_all$Plant.Name,levels = c("Col-0", "avp-3", "duf247", "R12a-B", "nyc2-1", "nyc2-2"))
Then - we have to retrieve to summary statistics, because otherwise we are not able to present all the hues in one graph:
library("doBy")
col_sum <- summaryBy(Hue1 + Hue2 + Hue3 + Hue4 + Hue5 + Hue6 + Hue7 + Hue8 + Hue9 ~ Plant.Name + Tray.Info + Time, data=salt_all, FUN = function(x) c(m = mean(x)))
head(col_sum)
Before we can make a plot, we actually have to re-shape this table into a table containing all the hues in one column.
col_sum_2 <- melt(col_sum, id=c("Time","Plant.Name", "Tray.Info"))
head(col_sum_2)
d <- ggplot(col_sum_2, aes(x= Time, y= value, fill= variable)) + geom_area()
d <- d + facet_grid(Plant.Name ~ Tray.Info)
# Let's replace the colors by actual colors of the hues:
d <- d + scale_fill_manual(values=c("#6E6F5A", "#343822", "#455413","#455536","#577146", "#597120","#728458","#738636", "#909858"))
d <- d + xlab("hours after salt treatment")
d <- d + ylab("Area(pixels)")
d <- d + labs(title = "Change in Greenness Hues")
d <- d + guides(fill=FALSE)
d
You can also choose to plot the graphs per individual genotype, for example:
Col_col_sum <- subset(col_sum_2, col_sum_2$Plant.Name == "Col-0")
d <- ggplot(Col_col_sum, aes(x= Time, y= value, fill= variable)) + geom_area()
d <- d + facet_grid(Plant.Name ~ Tray.Info)
# Let's replace the colors by actual colors of the hues:
d <- d + scale_fill_manual(values=c("#6E6F5A", "#343822", "#455413","#455536","#577146", "#597120","#728458","#738636", "#909858"))
d <- d + xlab("hours after salt treatment")
d <- d + ylab("Area(pixels)")
d <- d + labs(title = "Change in Greenness Hues")
d <- d + guides(fill=FALSE)
d
However, the above graph does not really give you an idea of the relative abundance of greenness hues. You can also do this by calculating the hues as percentage of total pixels:
head(salt_all)
salt_all$Hue1.p <- salt_all$Hue1/salt_all$AREA_PX
salt_all$Hue2.p <- salt_all$Hue2/salt_all$AREA_PX
salt_all$Hue3.p <- salt_all$Hue3/salt_all$AREA_PX
salt_all$Hue4.p <- salt_all$Hue4/salt_all$AREA_PX
salt_all$Hue5.p <- salt_all$Hue5/salt_all$AREA_PX
salt_all$Hue6.p <- salt_all$Hue6/salt_all$AREA_PX
salt_all$Hue7.p <- salt_all$Hue7/salt_all$AREA_PX
salt_all$Hue8.p <- salt_all$Hue8/salt_all$AREA_PX
salt_all$Hue9.p <- salt_all$Hue9/salt_all$AREA_PX
col_sum_p <- summaryBy(Hue1.p + Hue2.p + Hue3.p + Hue4.p + Hue5.p + Hue6.p + Hue7.p + Hue8.p + Hue9.p ~ Plant.Name + Tray.Info + Time, data=salt_all, FUN = function(x) c(m = mean(x)))
head(col_sum_p)
col_sum_p_2 <- melt(col_sum_p, id=c("Time","Plant.Name", "Tray.Info"))
head(col_sum_p_2)
And now let’s plot this into a new graph:
d <- ggplot(col_sum_p_2, aes(x= Time, y= value, fill= variable)) + geom_area()
d <- d + facet_grid(Plant.Name ~ Tray.Info)
# Let's replace the colors by actual colors of the hues:
d <- d + scale_fill_manual(values=c("#6E6F5A", "#343822", "#455413","#455536","#577146", "#597120","#728458","#738636", "#909858"))
d <- d + xlab("hours after salt treatment")
d <- d + ylab("Percentage of Area")
d <- d + labs(title = "Change in Greenness Hues")
d <- d + guides(fill=FALSE)
d
And per-plant graph command would look like:
Col_col_sum_p_2 <- subset(col_sum_p_2, col_sum_p_2$Plant.Name == "Col-0")
d <- ggplot(Col_col_sum_p_2, aes(x= Time, y= value, fill= variable)) + geom_area()
d <- d + facet_grid(Plant.Name ~ Tray.Info)
# Let's replace the colors by actual colors of the hues:
d <- d + scale_fill_manual(values=c("#6E6F5A", "#343822", "#455413","#455536","#577146", "#597120","#728458","#738636", "#909858"))
d <- d + xlab("hours after salt treatment")
d <- d + ylab("Percentage of Area")
d <- d + labs(title = "Change in Greenness Hues")
d <- d + guides(fill=FALSE)
d