This code is to final analysis after discussion and need to be improved

library(devtools)
devtools::source_gist("524eade46135f6348140", filename = "ggplot_smooth_func.R")

## Sourcing https://gist.githubusercontent.com/kdauria/524eade46135f6348140/raw/676acaca9a0a144ef320ae2ef00a31c3daa7179d/ggplot_smooth_func.R

## SHA-1 hash of file is c0b163b9fd2d7fe7bd5541e3266d8d36ff3b895d

Load the data

We will work at species levels for all single species compartment

Table 1. Data for 5 species which are presented more than 15 times in the food web models as single species compartments

datatable(na.omit(count_S))

#chart.Correlation(as.matrix(df_S_cor), histogram=TRUE, pch=19) 
chart.Correlation(as.matrix(df_final_cor, histogram=TRUE, pch=19))

Figure 1. Correlations of \(\log_{10}\text(SPPR)\) vs predictors species which are presented as single species compartment in the food web models.

So, \(\log_{10}\text(SPPR)\) has positive correlation with TL and negative correlation with PB, QB and PQ.

Table 2. Summary of TL and SPPR for 5 most popular species

p1 <- ggplot(data=df_S, aes(x=Species, y=SPPR, colour=Species))+geom_boxplot()+
  scale_colour_discrete("", guide=FALSE)+geom_jitter()

p2 <- ggplot(data=df_S, aes(x=Species, y=TL, colour=Species)) + geom_boxplot()+
  scale_colour_discrete("", guide=FALSE)+geom_jitter()

vplayout <- function(x, y)
  viewport(layout.pos.row = x, layout.pos.col = y)
pushViewport(viewport(layout=grid.layout(2,1)))

print(p1+theme(plot.margin=unit(c(1,1,1,1), "mm")), vp=vplayout(1,1))
print(p2+theme(plot.margin=unit(c(1,1,1,1), "mm")), vp=vplayout(2,1))

Figure 2: the boxplot of log10(SPPR) and TL of the most popular species in the food web models

kruskal.test(data=df_S, SPPR~Species)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  SPPR by Species
## Kruskal-Wallis chi-squared = 59.993, df = 4, p-value = 2.911e-12

There is significant differce in mean rank among these 5 groups.

datatable(PT)

However, no significant difference (at 5% of confidence level) in rank mean among American Plaice, Atlantic cod and Greenland halibut; also between Capelin and Herring.

Draw plot for regression of mean(log10(SPPR)) and log10(SPPR) vs TL when only considered species which are presented as single species compartment

p3 <- ggplot(data=df_S, aes(x=TL, y=SPPR, colour=Species))+
  geom_point()+
  scale_colour_discrete(guide=FALSE)+
  geom_smooth(method="lm", se=FALSE)+
  facet_wrap(~Scientific.name, nrow=4)+
  stat_smooth_func(geom="text",method="lm",hjust=0, vjust=-1,parse=TRUE)


p4 <- ggplot(data=dt_S, aes(x=MeanTL, y=MeanSPPR))+
  geom_point(aes(colour=Species))+
  geom_smooth(method="lm", se=FALSE)+
  stat_smooth_func(geom="text",method="lm",hjust=0, vjust=-1,parse=TRUE,)

p3

Figure 3. Relationship between \(\log_{10}\text(SPPR)\) and TL for 5 species which are presented in at least 15 food webs as single species compartment

The coefficients of determination for these species are low, implying that in the linear regression models, influence of TL on SPPR might not depend on the species.

p4

Figure 4. Relationship between \(\text(mean)\log_{10}\text(SPPR)\) and meanTL

In these case, only TL is cannot explain the the changes in \(\log_{10}\text(SPPR)\).

If I understand well, the linear regression allows one predict the mean of response variables based on predictors, but not to predict the response variable its self. That why I do not know how should we interprete the Figure 4 and Figure 6.

datatable(Final_Output)

fit <- lm(SPPR~TL, data=df_final_S)
par(mfrow=c(2,2))
plot(fit)

This code is to final analysis after discussion and need to be improved

LDA

Jan, 2016