Relative growth rate

#load the files
RGR <- read.csv("~/Desktop/Data/RGR.csv")
View(RGR)
#plot a boxplot to visualise the data #we can see from the boxplot that ssu2 group has a much lower relative growth rate than the other two groups 
summary(RGR$Group)
ssu1 ssu2   wt 
  12   12   12 
levels(RGR$Group)
[1] "ssu1" "ssu2" "wt"  
p <- ggboxplot(RGR, x="Group", y= "RGR",color = "Group", palette = c("#00AFBB", "#E7B800", "#FC4E07")) + labs(x="Groups of plants", y = "relative growth rates (g g^-1 d^-1)") + geom_jitter(shape=19, position=position_jitter(0.2)) + geom_point() + theme(legend.position = "none")
p

#use anova for comparsion among three groups 
model <- aov(RGR~Group, data = RGR)
#check the assumption of the fitting model
plot(model)

#conclusion: ANOVA assumptions are met: data is adequately normally distributed, variances are homogeneous, although data point 30 and 33 appear to be outliers 
summary(model)
            Df   Sum Sq  Mean Sq F value Pr(>F)    
Group        2 0.015775 0.007888   142.1 <2e-16 ***
Residuals   33 0.001832 0.000056                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#post-hoc test for ANOVA: Tukey Honest Significant Differences test 
TukeyHSD(model)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = RGR ~ Group, data = RGR)

$Group
                  diff          lwr          upr     p adj
ssu2-ssu1 -0.043416667 -0.050879761 -0.035953572 0.0000000
wt-ssu1    0.001916667 -0.005546428  0.009379761 0.8046724
wt-ssu2    0.045333333  0.037870239  0.052796428 0.0000000
plot(TukeyHSD(model))

Conclusion for RGR: ANOVA assumptions are met: data is adequately normally distributed, variances are homogeneous, although data point 30 and 33 appear to be outliers. There is a very signficant relative growth rate difference among these three groups (ANOVA, F(2,33)= 142.1, p < 0.001). More specifically, relative growth rate of ssu2 group is very statistically significant lower than that of ssu1 group and wt group. (post-hoc TukeyHSD, padj (ssu2-ssu1) < 0.001, padj (wt-ssu2) < 0.001). The difference between the wild type group and the ssu1 group is not significant (p = 0.805)

Sucrose concentration

sucrose <- read.csv("~/Desktop/Data/Sucrose.csv")
View(sucrose)
#plot a boxplot to visualise the data #we can see from the boxplot that ssu2 group has a much lower relative growth rate than the other two groups 
summary(sucrose$Group)
ssu1 ssu2   wt 
  11   11   11 
levels(sucrose$Group)
[1] "ssu1" "ssu2" "wt"  
p <- ggboxplot(sucrose, x="Group", y= "Sucrose",color = "Group", palette = c("#00AFBB", "#E7B800", "#FC4E07")) + labs(x="Groups of plants", y = "sucrose concentrations(umol g^-1)") + geom_jitter(shape=19, position=position_jitter(0.2)) + geom_point() + theme(legend.position = "none")
p

#use anova for comparsion among three groups 
model1 <- aov(Sucrose~Group, data = sucrose)
plot(model1)

#conclusion: ANOVA assumptions are gernerally met: data is adequately normally distributed according to the normal Q-Q plot, variances are in general homogeneous, though data point 2,3 and 4 appear to be outliers 
summary(model1)
            Df Sum Sq Mean Sq F value   Pr(>F)    
Group        2  41.18  20.590   19.56 3.65e-06 ***
Residuals   30  31.57   1.052                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#post-hoc test for ANOVA: Tukey Honest Significant Differences test 
TukeyHSD(model1)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Sucrose ~ Group, data = sucrose)

$Group
                diff        lwr       upr     p adj
ssu2-ssu1 -0.9018182 -1.9802375 0.1766012 0.1151764
wt-ssu1    1.7863636  0.7079443 2.8647830 0.0008618
wt-ssu2    2.6881818  1.6097625 3.7666012 0.0000027
plot(TukeyHSD(model1))

Report for sucrose concentration: ANOVA assumptions are gernerally met. Data is adequately normally distributed according to the normal Q-Q plot, variances are in general homogeneous, though data point 2,3 and 4 appear to be outliers. There is a signficant sucrose concentrations difference among these three groups (ANOVA, F(2,30)= 19.56, p < 0.001). More specifically,sucrose concentration of wild type (wt) group is very statistically significant higher than that of ssu1 group and ssu2 group. (TukeyHSD, padj (wt-ssu1) < 0.001, padj (wt-ssu2) < 0.001). The sucrose concentration difference between ssu1 and ssu2 is not significant (padj = 0.115)

Analysis of A-Ci data

AC <- read.csv("~/Desktop/Data/AciDataWT.csv")
View(AC)
#plot a scatterplot
plot(Ci ~ A, data = AC)

#assume linearlity 
#a function for plotting linear regression result
ggplotRegression <- function (fit) {
    
    require(ggplot2)
    
    ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) + 
        geom_point() +
        stat_smooth(method = "lm", col = "red") +
        labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
                           "Intercept =",signif(fit$coef[[1]],5 ),
                           " Slope =",signif(fit$coef[[2]], 5),
                           " P =",signif(summary(fit)$coef[2,4], 5)))
}
ggplotRegression(lm(Ci ~ A, data = AC)) + xlab("CO2 concentration (ppm)") + ylab("photosynthetic activity (umol CO2 M^-2 s^-1)")

#linear regression analysis
AC_lm <- lm(Ci ~ A, data = AC)
summary(AC_lm)

Call:
lm(formula = Ci ~ A, data = AC)

Residuals:
    Min      1Q  Median      3Q     Max 
-167.44  -98.87  -29.79   71.62  315.37 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   -26.19      92.71  -0.283  0.78471   
A              29.79       7.19   4.143  0.00324 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 151.8 on 8 degrees of freedom
Multiple R-squared:  0.6821,    Adjusted R-squared:  0.6423 
F-statistic: 17.16 on 1 and 8 DF,  p-value: 0.00324
#check the assumptions of linear model by looking at the R square value

Conclusion: By looking at the right end of the graph, we suspect that the best fit line may underestimate the higher Ci values.By looking at the R squared value, we can tell 64% of the variation has been accounted for by the linear model. Therefore, fitting a straight line to this data set was probably not justified.

Concenration on the lower Ci part

low_AC <- read.csv("~/Desktop/Data/AciDataWTTruncate.csv")
View(low_AC)
#plot a scatterplot
plot(Ci ~ A, data = low_AC)

#assume linearlity 
#a function for plotting linear regression result
ggplotRegression <- function (fit) {
    
    require(ggplot2)
    
    ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) + 
        geom_point() +
        stat_smooth(method = "lm", col = "red") +
        labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
                           "Intercept =",signif(fit$coef[[1]],5 ),
                           " Slope =",signif(fit$coef[[2]], 5),
                           " P =",signif(summary(fit)$coef[2,4], 5)))
}
ggplotRegression(lm(Ci ~ A, data = low_AC)) + xlab("CO2 concentration (ppm)") + ylab("photosynthetic activity (umol CO2 M^-2 s^-1)")

#linear regression analysis
AC_lm1 <- lm(A ~ Ci, data = low_AC)
summary(AC_lm1)

Call:
lm(formula = A ~ Ci, data = low_AC)

Residuals:
       1        2        3        4        5        6        7 
-0.25151  0.05768  0.11436  0.36188 -0.08775 -0.33278  0.13812 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.33181    0.22588  -14.75 2.59e-05 ***
Ci           0.06857    0.00124   55.28 3.66e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2639 on 5 degrees of freedom
Multiple R-squared:  0.9984,    Adjusted R-squared:  0.998 
F-statistic:  3056 on 1 and 5 DF,  p-value: 3.664e-08
#test the normality of the data
plot(AC_lm1)

#the data in general is noramlly distributed, though point 10 appears to be an outlier and out of cook's distnace
#make prediction 
prediction <- predict(AC_lm1)
plot(A ~ Ci, data = low_AC)
lines(low_AC$Ci,prediction)

Conclusion: By looking at the Q-Q plot, the data points do not resemble an “S shape” and therefore are noramlly distributed. By looking at the R square value (R^2 = 0.998), we can tell that the linearity assumption is met. CO2 concentration is strongly positively correlated to the photosynthetic activity (linear regression, R^2 = 0.998, p < 0.001) The photosynthetic activity (A) could be predicted from CO2 concentration (Ci) via A = -3.33 + 0.0686 Ci (R^2 = 0.998) CO2 concentration is a significant predictor of photosynthetic activity (p < 0.001, F(1,5) = 3056)

A-Ci for all 3 groups of the plants

#load package 
library(MASS)
#import the files
full_aci <- read.csv("~/Desktop/Data/AciDataFull.csv")
View(full_aci)
#visualise the data
p <- ggplot(full_aci,aes(full_aci$Ci,full_aci$A)) + geom_point(aes(colour=Group)) + geom_line(aes(colour=Group)) + xlab("CO2 concentration (ppm)") + ylab("photosynthetic activity(CO2 m^-2 s^-1)") + labs(col = "Genotype")
p

#from looking at the graph, which appears non-linear, a subset of <300ppm is selected
subset_aci <- full_aci[full_aci[, 1] < 300, ]
View(subset_aci)
#visualise the data again
p <- ggplot(subset_aci,aes(subset_aci$Ci,subset_aci$A)) + geom_point(aes(colour=Group)) + geom_line(aes(colour=Group)) + xlab("CO2 concentration (ppm)") + ylab("photosynthetic activity(CO2 m^-2 s^-1)") + labs(col = "Genotype")
p

#fit a linear model 
aci_lm4 <- lm(A ~ Ci + Group + Ci:Group, data = subset_aci)
summary(aci_lm4)

Call:
lm(formula = A ~ Ci + Group + Ci:Group, data = subset_aci)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.70505 -0.08332  0.07070  0.12077  0.36188 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -3.435899   0.265058 -12.963 2.04e-08 ***
Ci            0.051104   0.001426  35.828 1.43e-13 ***
Groupssu2     2.114889   0.408037   5.183 0.000228 ***
Groupwt       0.104088   0.366857   0.284 0.781454    
Ci:Groupssu2 -0.037248   0.002315 -16.090 1.74e-09 ***
Ci:Groupwt    0.017466   0.001994   8.761 1.47e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2963 on 12 degrees of freedom
Multiple R-squared:  0.9977,    Adjusted R-squared:  0.9968 
F-statistic:  1047 on 5 and 12 DF,  p-value: 2.089e-15
Anova(aci_lm4)
Anova Table (Type II tests)

Response: A
           Sum Sq Df F value    Pr(>F)    
Ci        280.414  1 3194.21 6.211e-16 ***
Group     106.609  2  607.20 8.776e-13 ***
Ci:Group   50.111  2  285.41 7.619e-11 ***
Residuals   1.053 12                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
plot(aci_lm4)

#we can see from the linear model that because the R-squared value is 0.9977, therefore the data has a linear relationship, therefore the linearity assumption is met. 
#by looking at the normal Q-Q plot, we can tell the data is roughly normally distributed, though point14,6 and 16 appear to be outliers.
#test the significance of factors
aci_lm <- lm(A ~ Ci + Group, data = subset_aci)
aci_lm3 <- lm(A ~ Ci * Group, data = subset_aci)
anova(aci_lm,aci_lm3,test="F")
Analysis of Variance Table

Model 1: A ~ Ci + Group
Model 2: A ~ Ci * Group
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     14 51.164                                  
2     12  1.053  2    50.111 285.41 7.619e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#the interaction between genotype (Group) and CO2 concentration factor (Ci) is significant, (F(2,14) = 285.41, p < 0.001)
#Use linear model to test correlations among Aci, CO2 and genotype
aci_lm <- lm(A ~ Ci + Group, data = subset_aci)
aci_lm1 <- lm(A ~ Ci, data = subset_aci)
anova(aci_lm1,aci_lm,test="F")
Analysis of Variance Table

Model 1: A ~ Ci
Model 2: A ~ Ci + Group
  Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
1     16 157.774                                  
2     14  51.164  2    106.61 14.586 0.0003772 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#genotype factor (Group) is significant, (F(2,14) = 14.586, p < 0.001)
aci_lm <- lm(A ~ Ci + Group, data = subset_aci)
aci_lm2 <- lm(A ~ Group, data = subset_aci)
anova(aci_lm2,aci_lm,test="F")
Analysis of Variance Table

Model 1: A ~ Group
Model 2: A ~ Ci + Group
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     15 331.58                                  
2     14  51.16  1    280.41 76.729 4.692e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#CO2 concentration factor (Ci) is significant, (F(2,14) = 76.729, p < 0.001)
#backward stepwise elimination 
aci_glm <- glm(A ~ Ci * Group, data = subset_aci)
summary(aci_glm)

Call:
glm(formula = A ~ Ci * Group, data = subset_aci)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.70505  -0.08332   0.07070   0.12077   0.36188  

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -3.435899   0.265058 -12.963 2.04e-08 ***
Ci            0.051104   0.001426  35.828 1.43e-13 ***
Groupssu2     2.114889   0.408037   5.183 0.000228 ***
Groupwt       0.104088   0.366857   0.284 0.781454    
Ci:Groupssu2 -0.037248   0.002315 -16.090 1.74e-09 ***
Ci:Groupwt    0.017466   0.001994   8.761 1.47e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.08778842)

    Null deviance: 460.7228  on 17  degrees of freedom
Residual deviance:   1.0535  on 12  degrees of freedom
AIC: 13.993

Number of Fisher Scoring iterations: 2
Anova(aci_glm)
Analysis of Deviance Table (Type II tests)

Response: A
         LR Chisq Df Pr(>Chisq)    
Ci         3194.2  1  < 2.2e-16 ***
Group      1214.4  2  < 2.2e-16 ***
Ci:Group    570.8  2  < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#no factor can be dropped 

conclusion: By looking at the correlation plot for the full dataset, we can see when CO2 concentration is larger than 300 ppm, the correlation between CO2 concentration and photosynthetic activity is no longer linear. Therefore a subset of < 300 ppm is chosen, which is linear, as shown by the new plot I produced. This is confrimed by the R-squared value of 0.9977 in the linear model Photosynthetic activity increases with CO2 concentration (df =1, p < 0.001), the interaction between CO2 concentration and also depends on the genotypes, where photosyntheitc activity of wt > ssu1 > ssu2, (df =2, p < 0.001) Photosynthetic activity(A) could be predicted by CO2 concentration(Ci), genotypes(Group) and the interaction between the two variables. A = -3.43 + 0.0511 Ci + 2.11 ssu2 + 0.104 wt - 0.0372 Ci:ssu2 + 0.0175 Ci:wt

