1. DATA PREPARATION

shrinkage function for cross validation.

relweights function for calculating relative importance of predictors

panel.cor function to put the correlation coefficients and p-values in the upper panel of pairs.

# load the data for ENA indices
#setwd("~/Dropbox/Xu ly so lieu")
#load("/Users/LDA/Dropbox/Xu ly so lieu/ENA.rda")
setwd("E:/Dropbox/Xu ly so lieu")
load("ENA.rda")
load("data for analysis.rda")
df$SPPR <- log10(df$SPPR)
df <- df[-which(df$ID %in% c(54)),]
df <- df[-which(df$TYPE %in% c("P", "M", "S", "T", "B", "BA")),] # eliminate values of producers, mammal, shark, turtle, bird, bacteria, and zooplankton
nrow(df)
## [1] 3208

We only consider the harvested (group of) species in the analysis. We removed all data of species which are not harvested intentionally and by catch as well.

idx <- which(df$landing!=0 | df$discard!=0)  # get only harvested species
df <- df[idx,]
df$TYPE <- as.factor(as.character(df$TYPE))

Prepare the training data set and validation data set. Validation data set will consitute of some economically important species which are presented in the food web models as a single species compartment. There are 7 species which are presented in at least 15 models.

Next, data frame for each of 7 important species are created. These data can be then used to validate the model and make an analysis at the species level.

df_top <- df_ENA[which(df_ENA$Scientific.name %in% count$Scientific.name),]
count <- as.data.frame(count(df_top,Scientific.name))
df_top$SPPR <- log10(df_top$SPPR)
temp <- which(is.na(df_top$PB))   # get the row of data that PB is not available
df_top$PB[temp] <- df_top$PQ[temp]*df_top$QB[temp]

df_herring <- subset(df_top, Scientific.name=="Clupea harengus")
df_cod <- subset(df_top, Scientific.name=="Gadus morhua")
df_capelin <- subset(df_top, Scientific.name=="Mallotus villosus")
df_plaice <- subset(df_top, Scientific.name =="Hippoglossoides platessoides")
df_halibut <- subset(df_top, Scientific.name =="Reinhardtius hippoglossoides")
df_mackerel <- subset(df_top, Scientific.name == "Scomber scombrus")
df_haddock <- subset(df_top, Scientific.name=="Melanogrammus aeglefinus")

Now create a training data set that does not include the above species

df$Group <- as.character(df$Group)
df_top$Group <- as.character(df_top$Group)
# get rows of validation data set in df
idx <- NULL
system.time(for (i in 1:nrow(df)){
    for (j in 1:nrow(df_top)){
        if (df$ID[i]==df_top$ID[j] & df$Group[i]==df_top$Group[j]){
            x <- which(df$ID== df$ID[j] & df$Group==df$Group[j])
            idx <- c(idx, x)
        }
    }
})
##    user  system elapsed 
##   12.97    0.01   12.99
# remove validation data from df
df_cal <- df[-idx,]
df.backup <- df
df <- df_cal

2. Exploratory of data

2.1. Summary of SPPR accross ecosystem type.

Table 1: Data summary of all ecosystem type

Make boxplot for distribution of \(\log_{10}\text(SPPR)\) in all ecosystem types. In this case, all groups are grouped by ecosystem type.

Figure 1. Boxplot of log10(SPPR) in different ecosystem types

2.2. Summary of ecological parameters

Table 2. Summary of ecological parameters

Make boxplot for distribution of PB and PQ in all ecosystem types

Figure 2. Boxplot for PQ of different species

Figure 3. Boxplot of QB for different species group. Black dots indicate the mean values

3. Building a model for all species

3.1 Correlation of \(\log_{10}\text(SPPR)\) vs predictors

df_cor <- df[,which(colnames(df) %in% c("SPPR", "TL", "TE", "PB", "QB", "PQ","NPP", "FCI", "TST"))]
pairs(df_cor, lower.panel=panel.smooth,upper.panel = panel.cor)

#scatterplotMatrix(df_cor, spread=FALSE, smoother.args=list(lty=2))

Figure 4. Correlation matrix of \(\log_{10}\text(SPPR)\) versus predictors.

\(\log_{10}{SPPR}\) is possitvely correlated with Trophic level(TL) (r=0.74) and negatively correlated with growth efficiency (PQ - r=-0.23), production rate (PB - r=-0.22), but at lower level. Interestingly, \(\log_{10}{SPPR}\) is correlated with indices associated with (group of) species (i.e., TL, PQ, PB), rather than indices characterize the overall ecosystem structure and functioning (i.e., TE, FCI, TST). Therefore we can test the regression models with predictors are TL, PQ, PB

ggplot(data=df, aes(x=TL, y=SPPR, colour=Ecosystem.type))+
  geom_point()+
  geom_smooth(method=lm)+
  scale_y_continuous("log10(SPPR)")

Figure 5. Scatter plot of \(\log_{10}\text(SPPR)\) versus Trophic level for different ecosystem type

3.2. Variable selection

In this part, I used the all subset model technique to find which model is the best in terms of adjusted R square.

leaps <-regsubsets(SPPR ~ TL+PQ+PB, data=df, nbest=5)
plot(leaps, scale="adjr2") # model with TL and PQ or TL + PQ + PB give the best R2

Figure 6. Result for all subset model regression.

From Figure 6, it is clear that model with TL and PQ as predictor will give highest adjusted R square than only TL. Model with 3 predictors: TL, PQ and PB is just as good as model with two predictors of TL and PQ.

3.3. Model construction with TL, PQ as predictors

fit1 <- lm(SPPR~TL + PQ, data = df)
summary(fit1)
## 
## Call:
## lm(formula = SPPR ~ TL + PQ, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8102 -0.3902 -0.0750  0.3116  3.7781 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.67855    0.07753  -8.752   <2e-16 ***
## TL           1.07269    0.02118  50.647   <2e-16 ***
## PQ          -1.94441    0.13248 -14.677   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6013 on 1948 degrees of freedom
## Multiple R-squared:  0.5912, Adjusted R-squared:  0.5907 
## F-statistic:  1408 on 2 and 1948 DF,  p-value: < 2.2e-16

Cross-validation

shrinkage(fit1, n.predictors=2, k=5)
## Original R-square = 0.5911578 
## 5 Fold Cross-Validated R-square = 0.5887036 
## Change = 0.002454223

Model diagnostic

par(mfrow=c(2,2))
plot(fit1)

Figure 7. Plot for model diagnostic with TL, and PQ as predictors.

shapiro.test(fit1$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  fit1$residuals
## W = 0.94753, p-value < 2.2e-16

Draw the scatter plot of observed values vs Predicted values to evaluate the model quality

Figure 8. Observed values vs Predicted values From the Figure 8, it can be seen that the model quality is quite good.

3.4. Model with only TL as predictor

fit2 <- lm(SPPR~TL, data=df)
summary(fit2)
## 
## Call:
## lm(formula = SPPR ~ TL, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6605 -0.4275 -0.0422  0.3549  3.9703 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.05445    0.07710  -13.68   <2e-16 ***
## TL           1.07992    0.02231   48.41   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6335 on 1949 degrees of freedom
## Multiple R-squared:  0.5459, Adjusted R-squared:  0.5457 
## F-statistic:  2343 on 1 and 1949 DF,  p-value: < 2.2e-16

Cross-validation

shrinkage(fit2, n.predictors=1, k=5)
## Original R-square = 0.5459474 
## 5 Fold Cross-Validated R-square = 0.5451893 
## Change = 0.0007580116

Model diagnostic with TL as only predictor

par(mfrow=c(2,2))
plot(fit2)

Figure 9. Model diagnostic with only TL as an predictor Again, the normal ditribution of residual is violated.

Comparing two models with AIC criteria

AIC (fit1, fit2)
##      df      AIC
## fit1  4 3556.800
## fit2  3 3759.428
anova(fit2, fit1)
## Analysis of Variance Table
## 
## Model 1: SPPR ~ TL
## Model 2: SPPR ~ TL + PQ
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   1949 782.17                                  
## 2   1948 704.29  1    77.881 215.41 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Actually, model with 2 predictors (TL and PQ) is better in terms of adjusted R square and AIC.

Now we check the relative importance of TL and PQ in the model.

par(mfrow=c(1,1))
relweights(fit1)

##      Weights
## PQ  8.288716
## TL 91.711284

Figure 10. Relative importance of predictor variables

As showed in Figure 10, the total amount of variance accounted for by the model (R-square=0.574) has been divided among the predictor variables. TL accounts for more than 92% of R-square, whereas PQ contributes only 8%. Hence, adding PQ as a predictor does not make the model much better. To keep the model simply, we use only TL as an only continuous predictor.

Two research questions are rised (1) At a given trophic level, is SPPR dependent on the ecosystem type? (2) Is the effect of TL on SPPR dependent on Ecosystem type?

3.5. System/group specific model construction

I build a full model for 3 predictors: trophic level (TL), Ecosystem type (Ecosystem.type), and (group of) species (TYPE) \[\log_{10}{SPPR}=\beta_0 + \beta_1 \times~TL + \beta_2 \times~Ecosystem.type + \beta_3 \times~TL \times~Ecosystem.type\]

fit3 <- lm(SPPR ~ TL*Ecosystem.type, data=df)
anova(fit3)
## Analysis of Variance Table
## 
## Response: SPPR
##                     Df Sum Sq Mean Sq   F value    Pr(>F)    
## TL                   1 940.47  940.47 2402.4999 < 2.2e-16 ***
## Ecosystem.type       7  16.81    2.40    6.1339 4.123e-07 ***
## TL:Ecosystem.type    7   7.90    1.13    2.8814  0.005376 ** 
## Residuals         1935 757.46    0.39                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It is clear that the effect of TL on mean of \(\log_{10}{SPPR}\) depends on the Ecosystem.type.

dfx <- within(df, Ecosystem.type <- relevel(Ecosystem.type, ref = 7))
fit4 <- lm(SPPR~TL*Ecosystem.type, data=dfx)
#fit4 <- lm(SPPR~TL*C(Ecosystem.type, base=7), data=df) # 7 is open ocean
summary(fit4)
## 
## Call:
## lm(formula = SPPR ~ TL * Ecosystem.type, data = dfx)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4287 -0.4240 -0.0384  0.3443  3.7301 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)                        -1.588041   0.194031  -8.184  4.9e-16
## TL                                  1.246925   0.052828  23.604  < 2e-16
## Ecosystem.typeBay/Fjord             0.913856   0.341566   2.675 0.007525
## Ecosystem.typeChanel/Strait         0.603574   0.339567   1.777 0.075646
## Ecosystem.typeCoastal lagoon        0.949550   0.641753   1.480 0.139138
## Ecosystem.typeContinental shelf     0.537682   0.221447   2.428 0.015271
## Ecosystem.typeCoral reef           -0.007704   0.701640  -0.011 0.991240
## Ecosystem.typeEstuary               0.952157   0.589956   1.614 0.106702
## Ecosystem.typeUpwelling             0.976985   0.305374   3.199 0.001400
## TL:Ecosystem.typeBay/Fjord         -0.313449   0.100105  -3.131 0.001767
## TL:Ecosystem.typeChanel/Strait     -0.229041   0.097393  -2.352 0.018787
## TL:Ecosystem.typeCoastal lagoon    -0.187208   0.213444  -0.877 0.380550
## TL:Ecosystem.typeContinental shelf -0.170875   0.061267  -2.789 0.005338
## TL:Ecosystem.typeCoral reef         0.160842   0.217316   0.740 0.459313
## TL:Ecosystem.typeEstuary           -0.340761   0.179179  -1.902 0.057346
## TL:Ecosystem.typeUpwelling         -0.288004   0.086147  -3.343 0.000844
##                                       
## (Intercept)                        ***
## TL                                 ***
## Ecosystem.typeBay/Fjord            ** 
## Ecosystem.typeChanel/Strait        .  
## Ecosystem.typeCoastal lagoon          
## Ecosystem.typeContinental shelf    *  
## Ecosystem.typeCoral reef              
## Ecosystem.typeEstuary                 
## Ecosystem.typeUpwelling            ** 
## TL:Ecosystem.typeBay/Fjord         ** 
## TL:Ecosystem.typeChanel/Strait     *  
## TL:Ecosystem.typeCoastal lagoon       
## TL:Ecosystem.typeContinental shelf ** 
## TL:Ecosystem.typeCoral reef           
## TL:Ecosystem.typeEstuary           .  
## TL:Ecosystem.typeUpwelling         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6257 on 1935 degrees of freedom
## Multiple R-squared:  0.5603, Adjusted R-squared:  0.5569 
## F-statistic: 164.4 on 15 and 1935 DF,  p-value: < 2.2e-16
contrasts(df$Ecosystem.type)
##                   Chanel/Strait Coastal lagoon Continental shelf
## Bay/Fjord                     0              0                 0
## Chanel/Strait                 1              0                 0
## Coastal lagoon                0              1                 0
## Continental shelf             0              0                 1
## Coral reef                    0              0                 0
## Estuary                       0              0                 0
## Open ocean                    0              0                 0
## Upwelling                     0              0                 0
##                   Coral reef Estuary Open ocean Upwelling
## Bay/Fjord                  0       0          0         0
## Chanel/Strait              0       0          0         0
## Coastal lagoon             0       0          0         0
## Continental shelf          0       0          0         0
## Coral reef                 1       0          0         0
## Estuary                    0       1          0         0
## Open ocean                 0       0          1         0
## Upwelling                  0       0          0         1
anova(fit4)
## Analysis of Variance Table
## 
## Response: SPPR
##                     Df Sum Sq Mean Sq   F value    Pr(>F)    
## TL                   1 940.47  940.47 2402.4999 < 2.2e-16 ***
## Ecosystem.type       7  16.81    2.40    6.1339 4.123e-07 ***
## TL:Ecosystem.type    7   7.90    1.13    2.8814  0.005376 ** 
## Residuals         1935 757.46    0.39                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2))
plot(fit4)

Figure 11. Model diagnostic of model with TL and ecosystem type as predictors

The linear regression equation for different group of species will be as follow:

For open ocean (reference group): \[\log_{10}{SPPR}=-1.59+1.25 \times~ TL\]

For Bay/Fjord: \[\log_{10}{SPPR}=-0.67 +0.93 \times~ TL\]

For Chanel/Strait: \[\log_{10}{SPPR}=-0.98 +1.02 \times~TL\]

For Coastal lagoon: \[\log_{10}{SPPR}=-0.64 +1.06 \times~ TL\]

For Continental shelf: \[\log_{10}{SPPR}=-1.05 +1.08 \times~ TL\]

For Coral reef: \[\log_{10}{SPPR}=-1.6 +1.41 \times~ TL\]

For Estuary: \[\log_{10}{SPPR}=-0.64 +0.91 \times~ TL\]

For Upwelling: \[\log_{10}{SPPR}=-0.61 +0.96 \times~ TL\]

Summary the intercept and slope for different ecosystem.

name <- as.character(unlist(fit4$xlevels))
x <- as.numeric(fit4$coefficients)
x[3:9] <- x[1] + x[3:9]
x[10:16] <- x[2] + x[10:16]
Intercept <- x[c(1, 3:9)]
Slope <- x[c(2, 10:16)]
output <- data.frame(Intercept, Slope)
rownames(output) <- name
output$Ecosystem.type <- as.factor(name)
output$TL2 <- output[,1]+output[,2]*2
output$TL3 <- output[,1]+output[,2]*3
output$TL4 <- output[,1]+output[,2]*4
output$TL5 <- output[,1]+output[,2]*5
datatable(round(output[,-3], digits=2))
p <- ggplot(data=df, aes(x=TL, y=SPPR, colour=Ecosystem.type))+
  geom_point()+
  geom_abline(aes(slope=Slope, intercept=Intercept, colour=Ecosystem.type), data=output)+
  geom_smooth(method="lm", se=FALSE, colour="black")+
  theme(legend.position="bottom")+
  scale_color_discrete("")+
  scale_y_continuous(expression(log[10](SPPR)))+
  scale_x_continuous("Trophic level (TL)")+
  geom_text(data=eq, aes(x=2, y=6, label=V1), parse=TRUE, inherit.aes=FALSE, hjust=0, vjust=0) + facet_wrap(~Ecosystem.type, ncol=4)
p

Figure 12. Relationship between \(\log_{10}{SPPR}\) and TL in different ecosystem type.

However, it is good to check the contribution of ecosystem type. In fact, the contribution of Ecosystem type is very limited.

library(relaimpo)
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:DAAG':
## 
##     hills
## The following object is masked from 'package:dplyr':
## 
##     select
## Loading required package: boot
## 
## Attaching package: 'boot'
## The following object is masked from 'package:car':
## 
##     logit
## The following object is masked from 'package:survival':
## 
##     aml
## The following object is masked from 'package:lattice':
## 
##     melanoma
## The following object is masked from 'package:psych':
## 
##     logit
## Loading required package: survey
## 
## Attaching package: 'survey'
## The following object is masked from 'package:Hmisc':
## 
##     deff
## The following object is masked from 'package:graphics':
## 
##     dotchart
## Loading required package: mitools
## This is the global version of package relaimpo.
## If you are a non-US user, a version with the interesting additional metric pmvd is available
## from Ulrike Groempings web site at prof.beuth-hochschule.de/groemping.
x <-calc.relimp(fit4, type=c("lmg"), rela=TRUE)
x <- data.frame(x$lmg)
colnames(x) <- "Relative contribution"

3. Model validation

In this section I used independent datasets of some economically important species.

library(hydroGOF)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
val <- function(fit=list(fit1, fit2, fit3), dfnew){
    RMSE <- NULL
    for (i in 1:3){
        pred<- predict(fit[[i]], newdata=dfnew)
        RMSE[i] <- rmse(dfnew$SPPR, pred)
    }
    return(RMSE)
}

val_cod <- round(val(dfnew=df_cod), digits = 2)
val_herring <- round(val(dfnew=df_herring), digits = 2)
val_capelin <- round(val(dfnew=df_capelin), digits = 2)
val_plaice <- round(val(dfnew=df_plaice), digits = 2)
val_haddock <- round(val(dfnew = df_haddock), digits = 2)
val_halibut <- round(val(dfnew = df_halibut), digits = 2)
val_mackerel <- round(val(dfnew = df_mackerel), digits = 2)

df_val <- cbind(val_cod, val_herring, val_capelin, val_plaice, val_haddock, val_halibut, val_mackerel)
colnames(df_val) <- c("Atlantic cod", "Herring", "Capelin", "American plaice", "Haddock", "Greenland halibut", "Atlantic mackerel")
rownames(df_val) <- c("log10(SPPR = 1.18*TL - 1.64*PQ -0.05*TE- 0.44",
                      "log10(SPPR = 1.12*TL - 1.809*PQ - 0.872",
                      "log10(SPPR)= 1.15*TL = 1.34")

datatable (df_val)

Back-transformation to original data

back_trans <- function(fit, dfnew){
dfnew$fitted.values <- fit$coefficients[1]+ fit$coefficients[2]*dfnew$TL + fit$coefficients[3]*dfnew$PQ + fit$coefficients[4]*dfnew$TE

dfnew$fitted.values <- 10^(dfnew$fitted.values)
dfnew$SPPR <- 10^dfnew$SPPR
e <- NULL
for (i in 1:nrow(dfnew)){
    e[i] <- (dfnew$fitted.values[i]-dfnew$SPPR[i])^2
}
e <- sum(e)
RMSE <- sqrt(sum(e)/nrow(dfnew))
return(list(RMSE=RMSE, e=e))
}

Paper outline

Now we will check the correlation of SPPR with other parameters.

df_cor <- df_top[,which(colnames(df_top) %in% c("SPPR", "TL", "TE", "PB", "QB", "PQ","NPP"))]
pairs(df_cor, lower.panel=panel.smooth,upper.panel = panel.cor)

In fact, SPPR has strong correlation with TL and to lower extent with PB and QB. However, PB and QB has a strong correlation as well. Therefore, I tried to build a linear regression for SPPR with 4 posible predictor: TL, PB, QB and type of species (Sciencetific.name).

library(MASS)
fit <- lm(SPPR~ TL + PQ, data=df_top)
summary(fit)
## 
## Call:
## lm(formula = SPPR ~ TL + PQ, data = df_top)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9594 -0.2737 -0.0279  0.2625  1.2467 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.38498    0.28513  -1.350    0.179    
## TL           0.91310    0.07984  11.437  < 2e-16 ***
## PQ          -2.07356    0.38529  -5.382 2.85e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3938 on 147 degrees of freedom
## Multiple R-squared:  0.4828, Adjusted R-squared:  0.4757 
## F-statistic: 68.61 on 2 and 147 DF,  p-value: < 2.2e-16
stepAIC(fit, direction="backward")
## Start:  AIC=-276.59
## SPPR ~ TL + PQ
## 
##        Df Sum of Sq    RSS     AIC
## <none>              22.799 -276.59
## - PQ    1    4.4923 27.292 -251.61
## - TL    1   20.2872 43.086 -183.11
## 
## Call:
## lm(formula = SPPR ~ TL + PQ, data = df_top)
## 
## Coefficients:
## (Intercept)           TL           PQ  
##     -0.3850       0.9131      -2.0736
vif(fit)       # vif() function from package car to detect multicolinearity``` Nay VIF>=4 will be noticed
##     TL     PQ 
## 1.0705 1.0705
relweights(fit)

##     Weights
## PQ 12.89012
## TL 87.10988
par(mfrow=c(2,2))
plot(fit)

shapiro.test(fit$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  fit$residuals
## W = 0.9834, p-value = 0.06772
shrinkage(fit, n.predictors=2, k=5)
## Original R-square = 0.4827816 
## 5 Fold Cross-Validated R-square = 0.4504037 
## Change = 0.03237789
cv.lm(data=df_top, fit, m=2)
## Analysis of Variance Table
## 
## Response: SPPR
##            Df Sum Sq Mean Sq F value  Pr(>F)    
## TL          1  16.79   16.79     108 < 2e-16 ***
## PQ          1   4.49    4.49      29 2.8e-07 ***
## Residuals 147  22.80    0.16                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Warning in cv.lm(data = df_top, fit, m = 2): 
## 
##  As there is >1 explanatory variable, cross-validation
##  predicted values for a fold are not a linear function
##  of corresponding overall predicted values.  Lines that
##  are shown for the different folds are approximate
## 
## fold 1 
## Observations in test set: 75 
##                405    406    407   408   409    432   510     511   578
## Predicted    2.739  3.330  3.070 2.621 2.408  2.046 2.908  2.7468 2.355
## cvpred       2.775  3.436  3.148 2.635 2.427  2.023 2.969  2.7535 2.377
## SPPR         2.464  3.041  2.451 2.930 3.270  1.632 3.141  2.6743 3.060
## CV residual -0.311 -0.395 -0.697 0.295 0.843 -0.391 0.172 -0.0792 0.683
##               582    602    650   682    749   756   876   877   879
## Predicted   2.324 2.4328  2.246 2.404 2.2880 2.334 2.661 2.587 3.427
## cvpred      2.288 2.4153  2.213 2.396 2.2591 2.297 2.646 2.563 3.491
## SPPR        2.984 2.4421  2.067 2.689 2.3059 2.653 2.819 2.697 3.635
## CV residual 0.696 0.0267 -0.146 0.293 0.0468 0.356 0.173 0.134 0.144
##                 959    960   961    964   1017   1045    1047  1048   1102
## Predicted    2.8923  2.600 2.382  2.967  2.237  2.531  2.3961 3.142  2.072
## cvpred       2.9624  2.640 2.399  3.048  2.211  2.574  2.4236 3.227  2.047
## SPPR         2.9105  2.355 3.035  2.478  1.830  2.322  2.3766 3.583  1.674
## CV residual -0.0519 -0.285 0.636 -0.569 -0.381 -0.252 -0.0471 0.357 -0.373
##               1105   1182   1183   1185    1187 1197   1250   1253  1451
## Predicted    2.097  2.775  2.775  1.954  2.2924 2.46 2.3087  2.448 2.227
## cvpred       2.076  2.827  2.827  1.904  2.2606 2.45 2.2614  2.427 2.199
## SPPR         1.699  2.707  2.707  1.283  2.2474 2.74 2.3247  2.252 2.611
## CV residual -0.376 -0.121 -0.121 -0.621 -0.0131 0.29 0.0632 -0.175 0.412
##                1456  2000  2002    2003  2004  2005   2391 2445   2446
## Predicted    2.7370  3.11 2.335  2.5501  3.18  2.95 2.9852 2.19  3.059
## cvpred       2.7546  3.16 2.347  2.5568  3.27  3.01 3.0488 2.20  3.104
## SPPR         2.6852  2.96 2.990  2.4632  2.90  2.35 3.1014 2.35  2.976
## CV residual -0.0694 -0.20 0.642 -0.0936 -0.37 -0.66 0.0527 0.16 -0.128
##               2449  3590  3724   3725   3841    3856    3860     3972
## Predicted   2.8553 2.077 2.054  2.874  1.959  2.1576  2.4818  2.37266
## cvpred      2.9356 2.028 2.013  2.935  1.924  2.1455  2.5045  2.36477
## SPPR        3.0348 2.360 2.427  2.749  1.589  2.1252  2.4701  2.35918
## CV residual 0.0992 0.332 0.413 -0.186 -0.335 -0.0203 -0.0344 -0.00559
##              4004  4068   4096   4188   4191   4315   4317   4333   4362
## Predicted   2.596 2.642  2.118  2.789  2.242  2.422 2.7423 2.4327 2.9205
## cvpred      2.576 2.694  2.071  2.851  2.225  2.442 2.8167 2.4806 3.0084
## SPPR        3.467 3.047  1.587  2.697  2.122  2.276 2.8365 2.5084 3.0308
## CV residual 0.891 0.353 -0.484 -0.154 -0.103 -0.166 0.0198 0.0278 0.0223
##              4363   4365    4377   4378   4625   4629   4637   4640   4667
## Predicted    2.42  2.818  2.6723  2.303  3.591 2.3542 3.0061  3.085  2.975
## cvpred       2.49  2.887  2.7197  2.358  3.682 2.3950 3.0763  3.170  3.057
## SPPR         2.05  2.319  2.6972  2.035  3.449 2.4913 3.1265  2.959  2.533
## CV residual -0.44 -0.568 -0.0225 -0.323 -0.233 0.0963 0.0502 -0.211 -0.525
##               4668   4670   4813  5140
## Predicted    3.035  2.523  2.204 1.740
## cvpred       3.089  2.507  2.167 1.708
## SPPR         2.836  2.060  1.750 2.324
## CV residual -0.253 -0.448 -0.417 0.616
## 
## Sum of squares = 9.68    Mean square = 0.13    n = 75 
## 
## fold 2 
## Observations in test set: 75 
##              404   514   579   583    652    657  677   678   681   701
## Predicted   2.81 2.626 2.373 1.991  2.358  2.358 2.75 2.359 2.311 2.429
## cvpred      2.74 2.632 2.402 2.045  2.400  2.400 2.70 2.351 2.350 2.457
## SPPR        2.78 3.074 2.619 2.921  2.216  2.216 3.99 2.663 2.639 2.686
## CV residual 0.04 0.442 0.218 0.876 -0.183 -0.183 1.29 0.312 0.289 0.229
##               751  874    875   878    932   935    962    963   1020
## Predicted   2.334 2.85  2.725 3.564  2.345  2.36 2.6414  3.336  2.341
## cvpred      2.382 2.81  2.730 3.495  2.386  2.40 2.6342  3.241  2.378
## SPPR        2.653 2.99  2.593 4.113  1.827  1.81 2.6910  2.967  1.954
## CV residual 0.271 0.18 -0.138 0.618 -0.559 -0.59 0.0568 -0.274 -0.423
##               1044  1046   1049  1129  1130  1131    1132 1139   1184
## Predicted    2.926 2.383  3.198 3.010 2.784  2.57  2.7080 1.69  1.954
## cvpred       2.863 2.375  3.138 2.975 2.760  2.51  2.6650 1.76  2.012
## SPPR         2.668 3.093  2.793 3.714 3.165  2.16  2.6041 2.76  1.283
## CV residual -0.195 0.717 -0.344 0.738 0.405 -0.35 -0.0609 1.00 -0.728
##                1186  1196    1251   1252   1455   2001  2017   2058   2061
## Predicted    2.2924 2.458  2.3087  2.448  2.177  2.755 2.898  1.907  2.226
## cvpred       2.3336 2.479  2.3655  2.479  2.210  2.746 2.891  1.954  2.276
## SPPR         2.2474 2.737  2.3247  2.252  1.844  2.605 3.321  1.403  1.775
## CV residual -0.0862 0.257 -0.0408 -0.228 -0.366 -0.141 0.429 -0.551 -0.501
##              2361 2390  2392  2393  2394   2416    2447     2448   3701
## Predicted   2.100 2.15 3.157 2.701 2.935  1.946  3.0673  2.47614  2.175
## cvpred      2.123 2.19 3.099 2.673 2.885  1.979  3.0322  2.47738  2.223
## SPPR        2.346 2.69 3.422 2.843 3.282  1.863  2.9694  2.46783  1.645
## CV residual 0.223 0.50 0.323 0.169 0.397 -0.116 -0.0629 -0.00955 -0.577
##              3723   3726  3727   3870  3973   4001   4067  4099   4111
## Predicted   2.181  2.303 2.321 2.8035 2.711  2.118 2.3732 2.631 2.6497
## cvpred      2.215  2.327 2.331 2.7695 2.677  2.173 2.3816 2.657 2.6380
## SPPR        2.570  2.119 2.847 2.8411 3.165  1.632 2.3983 2.928 2.6510
## CV residual 0.355 -0.208 0.516 0.0716 0.489 -0.542 0.0167 0.271 0.0129
##               4189  4190  4192   4319  4370   4379   4380  4624   4630
## Predicted   2.5320 2.894 1.816  2.373  2.59  2.406  2.190 2.656  3.263
## cvpred      2.5632 2.873 1.834  2.375  2.58  2.433  2.238 2.593  3.169
## SPPR        2.6208 2.995 2.149  2.002  1.95  1.908  2.038 2.751  2.772
## CV residual 0.0576 0.121 0.316 -0.373 -0.63 -0.525 -0.199 0.158 -0.396
##              4639   4643   4669  4728  4790   4791  4797    4798   4810
## Predicted   3.304  2.452  2.372  2.05 3.151  2.429 2.654 2.30347  2.595
## cvpred      3.171  2.414  2.390  2.10 3.063  2.369 2.617 2.25781  2.580
## SPPR        3.348  2.205  1.980  1.09 3.391  2.185 3.329 2.26226  1.983
## CV residual 0.176 -0.209 -0.411 -1.01 0.328 -0.184 0.712 0.00445 -0.597
##               4811   4812
## Predicted    2.136  2.479
## cvpred       2.173  2.508
## SPPR         1.992  2.079
## CV residual -0.181 -0.429
## 
## Sum of squares = 14.1    Mean square = 0.19    n = 75 
## 
## Overall (Sum over all 75 folds) 
##    ms 
## 0.158