Relative growth rate
#load the files
RGR <- read.csv("~/Desktop/Data/RGR.csv")
View(RGR)
#plot a boxplot to visualise the data #we can see from the boxplot that ssu2 group has a much lower relative growth rate than the other two groups
summary(RGR$Group)
ssu1 ssu2 wt
12 12 12
levels(RGR$Group)
[1] "ssu1" "ssu2" "wt"
p <- ggboxplot(RGR, x="Group", y= "RGR",color = "Group", palette = c("#00AFBB", "#E7B800", "#FC4E07")) + labs(x="Groups of plants", y = "relative growth rates (g g^-1 d^-1)") + geom_jitter(shape=19, position=position_jitter(0.2)) + geom_point() + theme(legend.position = "none")
p
#use anova for comparsion among three groups
model <- aov(RGR~Group, data = RGR)
#check the assumption of the fitting model
plot(model)
#conclusion: ANOVA assumptions are met: data is adequately normally distributed, variances are homogeneous, although data point 30 and 33 appear to be outliers
summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
Group 2 0.015775 0.007888 142.1 <2e-16 ***
Residuals 33 0.001832 0.000056
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#post-hoc test for ANOVA: Tukey Honest Significant Differences test
TukeyHSD(model)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = RGR ~ Group, data = RGR)
$Group
diff lwr upr p adj
ssu2-ssu1 -0.043416667 -0.050879761 -0.035953572 0.0000000
wt-ssu1 0.001916667 -0.005546428 0.009379761 0.8046724
wt-ssu2 0.045333333 0.037870239 0.052796428 0.0000000
plot(TukeyHSD(model))
Conclusion for RGR: ANOVA assumptions are met: data is adequately normally distributed, variances are homogeneous, although data point 30 and 33 appear to be outliers. There is a very signficant relative growth rate difference among these three groups (ANOVA, F(2,33)= 142.1, p < 0.001). More specifically, relative growth rate of ssu2 group is very statistically significant lower than that of ssu1 group and wt group. (post-hoc TukeyHSD, padj (ssu2-ssu1) < 0.001, padj (wt-ssu2) < 0.001). The difference between the wild type group and the ssu1 group is not significant (p = 0.805)
Sucrose concentration
sucrose <- read.csv("~/Desktop/Data/Sucrose.csv")
View(sucrose)
#plot a boxplot to visualise the data #we can see from the boxplot that ssu2 group has a much lower relative growth rate than the other two groups
summary(sucrose$Group)
ssu1 ssu2 wt
11 11 11
levels(sucrose$Group)
[1] "ssu1" "ssu2" "wt"
p <- ggboxplot(sucrose, x="Group", y= "Sucrose",color = "Group", palette = c("#00AFBB", "#E7B800", "#FC4E07")) + labs(x="Groups of plants", y = "sucrose concentrations(umol g^-1)") + geom_jitter(shape=19, position=position_jitter(0.2)) + geom_point() + theme(legend.position = "none")
p
#use anova for comparsion among three groups
model1 <- aov(Sucrose~Group, data = sucrose)
plot(model1)
#conclusion: ANOVA assumptions are gernerally met: data is adequately normally distributed according to the normal Q-Q plot, variances are in general homogeneous, though data point 2,3 and 4 appear to be outliers
summary(model1)
Df Sum Sq Mean Sq F value Pr(>F)
Group 2 41.18 20.590 19.56 3.65e-06 ***
Residuals 30 31.57 1.052
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#post-hoc test for ANOVA: Tukey Honest Significant Differences test
TukeyHSD(model1)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Sucrose ~ Group, data = sucrose)
$Group
diff lwr upr p adj
ssu2-ssu1 -0.9018182 -1.9802375 0.1766012 0.1151764
wt-ssu1 1.7863636 0.7079443 2.8647830 0.0008618
wt-ssu2 2.6881818 1.6097625 3.7666012 0.0000027
plot(TukeyHSD(model1))
Report for sucrose concentration: ANOVA assumptions are gernerally met. Data is adequately normally distributed according to the normal Q-Q plot, variances are in general homogeneous, though data point 2,3 and 4 appear to be outliers. There is a signficant sucrose concentrations difference among these three groups (ANOVA, F(2,30)= 19.56, p < 0.001). More specifically,sucrose concentration of wild type (wt) group is very statistically significant higher than that of ssu1 group and ssu2 group. (TukeyHSD, padj (wt-ssu1) < 0.001, padj (wt-ssu2) < 0.001). The sucrose concentration difference between ssu1 and ssu2 is not significant (padj = 0.115)
Analysis of A-Ci data
AC <- read.csv("~/Desktop/Data/AciDataWT.csv")
View(AC)
#plot a scatterplot
plot(Ci ~ A, data = AC)
#assume linearlity
#a function for plotting linear regression result
ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
ggplotRegression(lm(Ci ~ A, data = AC)) + xlab("CO2 concentration (ppm)") + ylab("photosynthetic activity (umol CO2 M^-2 s^-1)")
#linear regression analysis
AC_lm <- lm(Ci ~ A, data = AC)
summary(AC_lm)
Call:
lm(formula = Ci ~ A, data = AC)
Residuals:
Min 1Q Median 3Q Max
-167.44 -98.87 -29.79 71.62 315.37
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -26.19 92.71 -0.283 0.78471
A 29.79 7.19 4.143 0.00324 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 151.8 on 8 degrees of freedom
Multiple R-squared: 0.6821, Adjusted R-squared: 0.6423
F-statistic: 17.16 on 1 and 8 DF, p-value: 0.00324
#check the assumptions of linear model by looking at the R square value
Conclusion: By looking at the right end of the graph, we suspect that the best fit line may underestimate the higher Ci values.By looking at the R squared value, we can tell 64% of the variation has been accounted for by the linear model. Therefore, fitting a straight line to this data set was probably not justified.
Concenration on the lower Ci part
low_AC <- read.csv("~/Desktop/Data/AciDataWTTruncate.csv")
View(low_AC)
#plot a scatterplot
plot(Ci ~ A, data = low_AC)
#assume linearlity
#a function for plotting linear regression result
ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
ggplotRegression(lm(Ci ~ A, data = low_AC)) + xlab("CO2 concentration (ppm)") + ylab("photosynthetic activity (umol CO2 M^-2 s^-1)")
#linear regression analysis
AC_lm1 <- lm(A ~ Ci, data = low_AC)
summary(AC_lm1)
Call:
lm(formula = A ~ Ci, data = low_AC)
Residuals:
1 2 3 4 5 6 7
-0.25151 0.05768 0.11436 0.36188 -0.08775 -0.33278 0.13812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.33181 0.22588 -14.75 2.59e-05 ***
Ci 0.06857 0.00124 55.28 3.66e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2639 on 5 degrees of freedom
Multiple R-squared: 0.9984, Adjusted R-squared: 0.998
F-statistic: 3056 on 1 and 5 DF, p-value: 3.664e-08
#test the normality of the data
plot(AC_lm1)
#the data in general is noramlly distributed, though point 10 appears to be an outlier and out of cook's distnace
#make prediction
prediction <- predict(AC_lm1)
plot(A ~ Ci, data = low_AC)
lines(low_AC$Ci,prediction)
Conclusion: By looking at the Q-Q plot, the data points do not resemble an “S shape” and therefore are noramlly distributed. By looking at the R square value (R^2 = 0.998), we can tell that the linearity assumption is met. CO2 concentration is strongly positively correlated to the photosynthetic activity (linear regression, R^2 = 0.998, p < 0.001) The photosynthetic activity (A) could be predicted from CO2 concentration (Ci) via A = -3.33 + 0.0686 Ci (R^2 = 0.998) CO2 concentration is a significant predictor of photosynthetic activity (p < 0.001, F(1,5) = 3056)
A-Ci for all 3 groups of the plants
#load package
library(MASS)
#import the files
full_aci <- read.csv("~/Desktop/Data/AciDataFull.csv")
View(full_aci)
#visualise the data
p <- ggplot(full_aci,aes(full_aci$Ci,full_aci$A)) + geom_point(aes(colour=Group)) + geom_line(aes(colour=Group)) + xlab("CO2 concentration (ppm)") + ylab("photosynthetic activity(CO2 m^-2 s^-1)") + labs(col = "Genotype")
p
#from looking at the graph, which appears non-linear, a subset of <300ppm is selected
subset_aci <- full_aci[full_aci[, 1] < 300, ]
View(subset_aci)
#visualise the data again
p <- ggplot(subset_aci,aes(subset_aci$Ci,subset_aci$A)) + geom_point(aes(colour=Group)) + geom_line(aes(colour=Group)) + xlab("CO2 concentration (ppm)") + ylab("photosynthetic activity(CO2 m^-2 s^-1)") + labs(col = "Genotype")
p
#fit a linear model
aci_lm4 <- lm(A ~ Ci + Group + Ci:Group, data = subset_aci)
summary(aci_lm4)
Call:
lm(formula = A ~ Ci + Group + Ci:Group, data = subset_aci)
Residuals:
Min 1Q Median 3Q Max
-0.70505 -0.08332 0.07070 0.12077 0.36188
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.435899 0.265058 -12.963 2.04e-08 ***
Ci 0.051104 0.001426 35.828 1.43e-13 ***
Groupssu2 2.114889 0.408037 5.183 0.000228 ***
Groupwt 0.104088 0.366857 0.284 0.781454
Ci:Groupssu2 -0.037248 0.002315 -16.090 1.74e-09 ***
Ci:Groupwt 0.017466 0.001994 8.761 1.47e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2963 on 12 degrees of freedom
Multiple R-squared: 0.9977, Adjusted R-squared: 0.9968
F-statistic: 1047 on 5 and 12 DF, p-value: 2.089e-15
Anova(aci_lm4)
Anova Table (Type II tests)
Response: A
Sum Sq Df F value Pr(>F)
Ci 280.414 1 3194.21 6.211e-16 ***
Group 106.609 2 607.20 8.776e-13 ***
Ci:Group 50.111 2 285.41 7.619e-11 ***
Residuals 1.053 12
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
plot(aci_lm4)
#we can see from the linear model that because the R-squared value is 0.9977, therefore the data has a linear relationship, therefore the linearity assumption is met.
#by looking at the normal Q-Q plot, we can tell the data is roughly normally distributed, though point14,6 and 16 appear to be outliers.
#test the significance of factors
aci_lm <- lm(A ~ Ci + Group, data = subset_aci)
aci_lm3 <- lm(A ~ Ci * Group, data = subset_aci)
anova(aci_lm,aci_lm3,test="F")
Analysis of Variance Table
Model 1: A ~ Ci + Group
Model 2: A ~ Ci * Group
Res.Df RSS Df Sum of Sq F Pr(>F)
1 14 51.164
2 12 1.053 2 50.111 285.41 7.619e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#the interaction between genotype (Group) and CO2 concentration factor (Ci) is significant, (F(2,14) = 285.41, p < 0.001)
#Use linear model to test correlations among Aci, CO2 and genotype
aci_lm <- lm(A ~ Ci + Group, data = subset_aci)
aci_lm1 <- lm(A ~ Ci, data = subset_aci)
anova(aci_lm1,aci_lm,test="F")
Analysis of Variance Table
Model 1: A ~ Ci
Model 2: A ~ Ci + Group
Res.Df RSS Df Sum of Sq F Pr(>F)
1 16 157.774
2 14 51.164 2 106.61 14.586 0.0003772 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#genotype factor (Group) is significant, (F(2,14) = 14.586, p < 0.001)
aci_lm <- lm(A ~ Ci + Group, data = subset_aci)
aci_lm2 <- lm(A ~ Group, data = subset_aci)
anova(aci_lm2,aci_lm,test="F")
Analysis of Variance Table
Model 1: A ~ Group
Model 2: A ~ Ci + Group
Res.Df RSS Df Sum of Sq F Pr(>F)
1 15 331.58
2 14 51.16 1 280.41 76.729 4.692e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#CO2 concentration factor (Ci) is significant, (F(2,14) = 76.729, p < 0.001)
#backward stepwise elimination
aci_glm <- glm(A ~ Ci * Group, data = subset_aci)
summary(aci_glm)
Call:
glm(formula = A ~ Ci * Group, data = subset_aci)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.70505 -0.08332 0.07070 0.12077 0.36188
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.435899 0.265058 -12.963 2.04e-08 ***
Ci 0.051104 0.001426 35.828 1.43e-13 ***
Groupssu2 2.114889 0.408037 5.183 0.000228 ***
Groupwt 0.104088 0.366857 0.284 0.781454
Ci:Groupssu2 -0.037248 0.002315 -16.090 1.74e-09 ***
Ci:Groupwt 0.017466 0.001994 8.761 1.47e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 0.08778842)
Null deviance: 460.7228 on 17 degrees of freedom
Residual deviance: 1.0535 on 12 degrees of freedom
AIC: 13.993
Number of Fisher Scoring iterations: 2
Anova(aci_glm)
Analysis of Deviance Table (Type II tests)
Response: A
LR Chisq Df Pr(>Chisq)
Ci 3194.2 1 < 2.2e-16 ***
Group 1214.4 2 < 2.2e-16 ***
Ci:Group 570.8 2 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#no factor can be dropped
conclusion: By looking at the correlation plot for the full dataset, we can see when CO2 concentration is larger than 300 ppm, the correlation between CO2 concentration and photosynthetic activity is no longer linear. Therefore a subset of < 300 ppm is chosen, which is linear, as shown by the new plot I produced. This is confrimed by the R-squared value of 0.9977 in the linear model Photosynthetic activity increases with CO2 concentration (df =1, p < 0.001), the interaction between CO2 concentration and also depends on the genotypes, where photosyntheitc activity of wt > ssu1 > ssu2, (df =2, p < 0.001) Photosynthetic activity(A) could be predicted by CO2 concentration(Ci), genotypes(Group) and the interaction between the two variables. A = -3.43 + 0.0511 Ci + 2.11 ssu2 + 0.104 wt - 0.0372 Ci:ssu2 + 0.0175 Ci:wt