shrinkage function for cross validation.
relweights function for calculating relative importance of predictors
panel.cor function to put the correlation coefficients and p-values in the upper panel of pairs.
# load the data for ENA indices
#setwd("~/Dropbox/Xu ly so lieu")
#load("/Users/LDA/Dropbox/Xu ly so lieu/ENA.rda")
setwd("E:/Dropbox/Xu ly so lieu")
load("ENA.rda")
load("data for analysis.rda")df$SPPR <- log10(df$SPPR)
df <- df[-which(df$ID %in% c(54)),]
df <- df[-which(df$TYPE %in% c("P", "M", "S", "T", "B", "BA")),] # eliminate values of producers, mammal, shark, turtle, bird, bacteria, and zooplankton
nrow(df)## [1] 3208
We only consider the harvested (group of) species in the analysis. We removed all data of species which are not harvested intentionally and by catch as well.
idx <- which(df$landing!=0 | df$discard!=0) # get only harvested species
df <- df[idx,]
df$TYPE <- as.factor(as.character(df$TYPE))Prepare the training data set and validation data set. Validation data set will consitute of some economically important species which are presented in the food web models as a single species compartment. There are 7 species which are presented in at least 15 models.
Next, data frame for each of 7 important species are created. These data can be then used to validate the model and make an analysis at the species level.
df_top <- df_ENA[which(df_ENA$Scientific.name %in% count$Scientific.name),]
count <- as.data.frame(count(df_top,Scientific.name))
df_top$SPPR <- log10(df_top$SPPR)
temp <- which(is.na(df_top$PB)) # get the row of data that PB is not available
df_top$PB[temp] <- df_top$PQ[temp]*df_top$QB[temp]
df_herring <- subset(df_top, Scientific.name=="Clupea harengus")
df_cod <- subset(df_top, Scientific.name=="Gadus morhua")
df_capelin <- subset(df_top, Scientific.name=="Mallotus villosus")
df_plaice <- subset(df_top, Scientific.name =="Hippoglossoides platessoides")
df_halibut <- subset(df_top, Scientific.name =="Reinhardtius hippoglossoides")
df_mackerel <- subset(df_top, Scientific.name == "Scomber scombrus")
df_haddock <- subset(df_top, Scientific.name=="Melanogrammus aeglefinus")Now create a training data set that does not include the above species
df$Group <- as.character(df$Group)
df_top$Group <- as.character(df_top$Group)
# get rows of validation data set in df
idx <- NULL
system.time(for (i in 1:nrow(df)){
for (j in 1:nrow(df_top)){
if (df$ID[i]==df_top$ID[j] & df$Group[i]==df_top$Group[j]){
x <- which(df$ID== df$ID[j] & df$Group==df$Group[j])
idx <- c(idx, x)
}
}
})## user system elapsed
## 12.97 0.01 12.99
# remove validation data from df
df_cal <- df[-idx,]
df.backup <- df
df <- df_calTable 1: Data summary of all ecosystem type
Make boxplot for distribution of \(\log_{10}\text(SPPR)\) in all ecosystem types. In this case, all groups are grouped by ecosystem type.
Figure 1. Boxplot of log10(SPPR) in different ecosystem types
Table 2. Summary of ecological parameters
Make boxplot for distribution of PB and PQ in all ecosystem types
Figure 2. Boxplot for PQ of different species
Figure 3. Boxplot of QB for different species group. Black dots indicate the mean values
df_cor <- df[,which(colnames(df) %in% c("SPPR", "TL", "TE", "PB", "QB", "PQ","NPP", "FCI", "TST"))]
pairs(df_cor, lower.panel=panel.smooth,upper.panel = panel.cor)#scatterplotMatrix(df_cor, spread=FALSE, smoother.args=list(lty=2))Figure 4. Correlation matrix of \(\log_{10}\text(SPPR)\) versus predictors.
\(\log_{10}{SPPR}\) is possitvely correlated with Trophic level(TL) (r=0.74) and negatively correlated with growth efficiency (PQ - r=-0.23), production rate (PB - r=-0.22), but at lower level. Interestingly, \(\log_{10}{SPPR}\) is correlated with indices associated with (group of) species (i.e., TL, PQ, PB), rather than indices characterize the overall ecosystem structure and functioning (i.e., TE, FCI, TST). Therefore we can test the regression models with predictors are TL, PQ, PB
ggplot(data=df, aes(x=TL, y=SPPR, colour=Ecosystem.type))+
geom_point()+
geom_smooth(method=lm)+
scale_y_continuous("log10(SPPR)")Figure 5. Scatter plot of \(\log_{10}\text(SPPR)\) versus Trophic level for different ecosystem type
In this part, I used the all subset model technique to find which model is the best in terms of adjusted R square.
leaps <-regsubsets(SPPR ~ TL+PQ+PB, data=df, nbest=5)
plot(leaps, scale="adjr2") # model with TL and PQ or TL + PQ + PB give the best R2Figure 6. Result for all subset model regression.
From Figure 6, it is clear that model with TL and PQ as predictor will give highest adjusted R square than only TL. Model with 3 predictors: TL, PQ and PB is just as good as model with two predictors of TL and PQ.
fit1 <- lm(SPPR~TL + PQ, data = df)
summary(fit1)##
## Call:
## lm(formula = SPPR ~ TL + PQ, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8102 -0.3902 -0.0750 0.3116 3.7781
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.67855 0.07753 -8.752 <2e-16 ***
## TL 1.07269 0.02118 50.647 <2e-16 ***
## PQ -1.94441 0.13248 -14.677 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6013 on 1948 degrees of freedom
## Multiple R-squared: 0.5912, Adjusted R-squared: 0.5907
## F-statistic: 1408 on 2 and 1948 DF, p-value: < 2.2e-16
Cross-validation
shrinkage(fit1, n.predictors=2, k=5)## Original R-square = 0.5911578
## 5 Fold Cross-Validated R-square = 0.5887036
## Change = 0.002454223
Model diagnostic
par(mfrow=c(2,2))
plot(fit1)Figure 7. Plot for model diagnostic with TL, and PQ as predictors.
shapiro.test(fit1$residuals)##
## Shapiro-Wilk normality test
##
## data: fit1$residuals
## W = 0.94753, p-value < 2.2e-16
Draw the scatter plot of observed values vs Predicted values to evaluate the model quality
Figure 8. Observed values vs Predicted values From the Figure 8, it can be seen that the model quality is quite good.
fit2 <- lm(SPPR~TL, data=df)
summary(fit2)##
## Call:
## lm(formula = SPPR ~ TL, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6605 -0.4275 -0.0422 0.3549 3.9703
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.05445 0.07710 -13.68 <2e-16 ***
## TL 1.07992 0.02231 48.41 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6335 on 1949 degrees of freedom
## Multiple R-squared: 0.5459, Adjusted R-squared: 0.5457
## F-statistic: 2343 on 1 and 1949 DF, p-value: < 2.2e-16
Cross-validation
shrinkage(fit2, n.predictors=1, k=5)## Original R-square = 0.5459474
## 5 Fold Cross-Validated R-square = 0.5451893
## Change = 0.0007580116
Model diagnostic with TL as only predictor
par(mfrow=c(2,2))
plot(fit2)Figure 9. Model diagnostic with only TL as an predictor Again, the normal ditribution of residual is violated.
Comparing two models with AIC criteria
AIC (fit1, fit2)## df AIC
## fit1 4 3556.800
## fit2 3 3759.428
anova(fit2, fit1)## Analysis of Variance Table
##
## Model 1: SPPR ~ TL
## Model 2: SPPR ~ TL + PQ
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1949 782.17
## 2 1948 704.29 1 77.881 215.41 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Actually, model with 2 predictors (TL and PQ) is better in terms of adjusted R square and AIC.
Now we check the relative importance of TL and PQ in the model.
par(mfrow=c(1,1))
relweights(fit1)## Weights
## PQ 8.288716
## TL 91.711284
Figure 10. Relative importance of predictor variables
As showed in Figure 10, the total amount of variance accounted for by the model (R-square=0.574) has been divided among the predictor variables. TL accounts for more than 92% of R-square, whereas PQ contributes only 8%. Hence, adding PQ as a predictor does not make the model much better. To keep the model simply, we use only TL as an only continuous predictor.
Two research questions are rised (1) At a given trophic level, is SPPR dependent on the ecosystem type? (2) Is the effect of TL on SPPR dependent on Ecosystem type?
I build a full model for 3 predictors: trophic level (TL), Ecosystem type (Ecosystem.type), and (group of) species (TYPE) \[\log_{10}{SPPR}=\beta_0 + \beta_1 \times~TL + \beta_2 \times~Ecosystem.type + \beta_3 \times~TL \times~Ecosystem.type\]
fit3 <- lm(SPPR ~ TL*Ecosystem.type, data=df)
anova(fit3)## Analysis of Variance Table
##
## Response: SPPR
## Df Sum Sq Mean Sq F value Pr(>F)
## TL 1 940.47 940.47 2402.4999 < 2.2e-16 ***
## Ecosystem.type 7 16.81 2.40 6.1339 4.123e-07 ***
## TL:Ecosystem.type 7 7.90 1.13 2.8814 0.005376 **
## Residuals 1935 757.46 0.39
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It is clear that the effect of TL on mean of \(\log_{10}{SPPR}\) depends on the Ecosystem.type.
dfx <- within(df, Ecosystem.type <- relevel(Ecosystem.type, ref = 7))
fit4 <- lm(SPPR~TL*Ecosystem.type, data=dfx)
#fit4 <- lm(SPPR~TL*C(Ecosystem.type, base=7), data=df) # 7 is open ocean
summary(fit4)##
## Call:
## lm(formula = SPPR ~ TL * Ecosystem.type, data = dfx)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4287 -0.4240 -0.0384 0.3443 3.7301
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.588041 0.194031 -8.184 4.9e-16
## TL 1.246925 0.052828 23.604 < 2e-16
## Ecosystem.typeBay/Fjord 0.913856 0.341566 2.675 0.007525
## Ecosystem.typeChanel/Strait 0.603574 0.339567 1.777 0.075646
## Ecosystem.typeCoastal lagoon 0.949550 0.641753 1.480 0.139138
## Ecosystem.typeContinental shelf 0.537682 0.221447 2.428 0.015271
## Ecosystem.typeCoral reef -0.007704 0.701640 -0.011 0.991240
## Ecosystem.typeEstuary 0.952157 0.589956 1.614 0.106702
## Ecosystem.typeUpwelling 0.976985 0.305374 3.199 0.001400
## TL:Ecosystem.typeBay/Fjord -0.313449 0.100105 -3.131 0.001767
## TL:Ecosystem.typeChanel/Strait -0.229041 0.097393 -2.352 0.018787
## TL:Ecosystem.typeCoastal lagoon -0.187208 0.213444 -0.877 0.380550
## TL:Ecosystem.typeContinental shelf -0.170875 0.061267 -2.789 0.005338
## TL:Ecosystem.typeCoral reef 0.160842 0.217316 0.740 0.459313
## TL:Ecosystem.typeEstuary -0.340761 0.179179 -1.902 0.057346
## TL:Ecosystem.typeUpwelling -0.288004 0.086147 -3.343 0.000844
##
## (Intercept) ***
## TL ***
## Ecosystem.typeBay/Fjord **
## Ecosystem.typeChanel/Strait .
## Ecosystem.typeCoastal lagoon
## Ecosystem.typeContinental shelf *
## Ecosystem.typeCoral reef
## Ecosystem.typeEstuary
## Ecosystem.typeUpwelling **
## TL:Ecosystem.typeBay/Fjord **
## TL:Ecosystem.typeChanel/Strait *
## TL:Ecosystem.typeCoastal lagoon
## TL:Ecosystem.typeContinental shelf **
## TL:Ecosystem.typeCoral reef
## TL:Ecosystem.typeEstuary .
## TL:Ecosystem.typeUpwelling ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6257 on 1935 degrees of freedom
## Multiple R-squared: 0.5603, Adjusted R-squared: 0.5569
## F-statistic: 164.4 on 15 and 1935 DF, p-value: < 2.2e-16
contrasts(df$Ecosystem.type)## Chanel/Strait Coastal lagoon Continental shelf
## Bay/Fjord 0 0 0
## Chanel/Strait 1 0 0
## Coastal lagoon 0 1 0
## Continental shelf 0 0 1
## Coral reef 0 0 0
## Estuary 0 0 0
## Open ocean 0 0 0
## Upwelling 0 0 0
## Coral reef Estuary Open ocean Upwelling
## Bay/Fjord 0 0 0 0
## Chanel/Strait 0 0 0 0
## Coastal lagoon 0 0 0 0
## Continental shelf 0 0 0 0
## Coral reef 1 0 0 0
## Estuary 0 1 0 0
## Open ocean 0 0 1 0
## Upwelling 0 0 0 1
anova(fit4)## Analysis of Variance Table
##
## Response: SPPR
## Df Sum Sq Mean Sq F value Pr(>F)
## TL 1 940.47 940.47 2402.4999 < 2.2e-16 ***
## Ecosystem.type 7 16.81 2.40 6.1339 4.123e-07 ***
## TL:Ecosystem.type 7 7.90 1.13 2.8814 0.005376 **
## Residuals 1935 757.46 0.39
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2))
plot(fit4) Figure 11. Model diagnostic of model with TL and ecosystem type as predictors
The linear regression equation for different group of species will be as follow:
For open ocean (reference group): \[\log_{10}{SPPR}=-1.59+1.25 \times~ TL\]
For Bay/Fjord: \[\log_{10}{SPPR}=-0.67 +0.93 \times~ TL\]
For Chanel/Strait: \[\log_{10}{SPPR}=-0.98 +1.02 \times~TL\]
For Coastal lagoon: \[\log_{10}{SPPR}=-0.64 +1.06 \times~ TL\]
For Continental shelf: \[\log_{10}{SPPR}=-1.05 +1.08 \times~ TL\]
For Coral reef: \[\log_{10}{SPPR}=-1.6 +1.41 \times~ TL\]
For Estuary: \[\log_{10}{SPPR}=-0.64 +0.91 \times~ TL\]
For Upwelling: \[\log_{10}{SPPR}=-0.61 +0.96 \times~ TL\]
Summary the intercept and slope for different ecosystem.
name <- as.character(unlist(fit4$xlevels))
x <- as.numeric(fit4$coefficients)
x[3:9] <- x[1] + x[3:9]
x[10:16] <- x[2] + x[10:16]
Intercept <- x[c(1, 3:9)]
Slope <- x[c(2, 10:16)]
output <- data.frame(Intercept, Slope)
rownames(output) <- name
output$Ecosystem.type <- as.factor(name)output$TL2 <- output[,1]+output[,2]*2
output$TL3 <- output[,1]+output[,2]*3
output$TL4 <- output[,1]+output[,2]*4
output$TL5 <- output[,1]+output[,2]*5
datatable(round(output[,-3], digits=2))p <- ggplot(data=df, aes(x=TL, y=SPPR, colour=Ecosystem.type))+
geom_point()+
geom_abline(aes(slope=Slope, intercept=Intercept, colour=Ecosystem.type), data=output)+
geom_smooth(method="lm", se=FALSE, colour="black")+
theme(legend.position="bottom")+
scale_color_discrete("")+
scale_y_continuous(expression(log[10](SPPR)))+
scale_x_continuous("Trophic level (TL)")+
geom_text(data=eq, aes(x=2, y=6, label=V1), parse=TRUE, inherit.aes=FALSE, hjust=0, vjust=0) + facet_wrap(~Ecosystem.type, ncol=4)
pFigure 12. Relationship between \(\log_{10}{SPPR}\) and TL in different ecosystem type.
However, it is good to check the contribution of ecosystem type. In fact, the contribution of Ecosystem type is very limited.
library(relaimpo)## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:DAAG':
##
## hills
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: boot
##
## Attaching package: 'boot'
## The following object is masked from 'package:car':
##
## logit
## The following object is masked from 'package:survival':
##
## aml
## The following object is masked from 'package:lattice':
##
## melanoma
## The following object is masked from 'package:psych':
##
## logit
## Loading required package: survey
##
## Attaching package: 'survey'
## The following object is masked from 'package:Hmisc':
##
## deff
## The following object is masked from 'package:graphics':
##
## dotchart
## Loading required package: mitools
## This is the global version of package relaimpo.
## If you are a non-US user, a version with the interesting additional metric pmvd is available
## from Ulrike Groempings web site at prof.beuth-hochschule.de/groemping.
x <-calc.relimp(fit4, type=c("lmg"), rela=TRUE)
x <- data.frame(x$lmg)
colnames(x) <- "Relative contribution"In this section I used independent datasets of some economically important species.
library(hydroGOF)## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
val <- function(fit=list(fit1, fit2, fit3), dfnew){
RMSE <- NULL
for (i in 1:3){
pred<- predict(fit[[i]], newdata=dfnew)
RMSE[i] <- rmse(dfnew$SPPR, pred)
}
return(RMSE)
}
val_cod <- round(val(dfnew=df_cod), digits = 2)
val_herring <- round(val(dfnew=df_herring), digits = 2)
val_capelin <- round(val(dfnew=df_capelin), digits = 2)
val_plaice <- round(val(dfnew=df_plaice), digits = 2)
val_haddock <- round(val(dfnew = df_haddock), digits = 2)
val_halibut <- round(val(dfnew = df_halibut), digits = 2)
val_mackerel <- round(val(dfnew = df_mackerel), digits = 2)
df_val <- cbind(val_cod, val_herring, val_capelin, val_plaice, val_haddock, val_halibut, val_mackerel)
colnames(df_val) <- c("Atlantic cod", "Herring", "Capelin", "American plaice", "Haddock", "Greenland halibut", "Atlantic mackerel")
rownames(df_val) <- c("log10(SPPR = 1.18*TL - 1.64*PQ -0.05*TE- 0.44",
"log10(SPPR = 1.12*TL - 1.809*PQ - 0.872",
"log10(SPPR)= 1.15*TL = 1.34")
datatable (df_val)Back-transformation to original data
back_trans <- function(fit, dfnew){
dfnew$fitted.values <- fit$coefficients[1]+ fit$coefficients[2]*dfnew$TL + fit$coefficients[3]*dfnew$PQ + fit$coefficients[4]*dfnew$TE
dfnew$fitted.values <- 10^(dfnew$fitted.values)
dfnew$SPPR <- 10^dfnew$SPPR
e <- NULL
for (i in 1:nrow(dfnew)){
e[i] <- (dfnew$fitted.values[i]-dfnew$SPPR[i])^2
}
e <- sum(e)
RMSE <- sqrt(sum(e)/nrow(dfnew))
return(list(RMSE=RMSE, e=e))
}Now we will check the correlation of SPPR with other parameters.
df_cor <- df_top[,which(colnames(df_top) %in% c("SPPR", "TL", "TE", "PB", "QB", "PQ","NPP"))]
pairs(df_cor, lower.panel=panel.smooth,upper.panel = panel.cor)In fact, SPPR has strong correlation with TL and to lower extent with PB and QB. However, PB and QB has a strong correlation as well. Therefore, I tried to build a linear regression for SPPR with 4 posible predictor: TL, PB, QB and type of species (Sciencetific.name).
library(MASS)
fit <- lm(SPPR~ TL + PQ, data=df_top)
summary(fit)##
## Call:
## lm(formula = SPPR ~ TL + PQ, data = df_top)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9594 -0.2737 -0.0279 0.2625 1.2467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.38498 0.28513 -1.350 0.179
## TL 0.91310 0.07984 11.437 < 2e-16 ***
## PQ -2.07356 0.38529 -5.382 2.85e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3938 on 147 degrees of freedom
## Multiple R-squared: 0.4828, Adjusted R-squared: 0.4757
## F-statistic: 68.61 on 2 and 147 DF, p-value: < 2.2e-16
stepAIC(fit, direction="backward")## Start: AIC=-276.59
## SPPR ~ TL + PQ
##
## Df Sum of Sq RSS AIC
## <none> 22.799 -276.59
## - PQ 1 4.4923 27.292 -251.61
## - TL 1 20.2872 43.086 -183.11
##
## Call:
## lm(formula = SPPR ~ TL + PQ, data = df_top)
##
## Coefficients:
## (Intercept) TL PQ
## -0.3850 0.9131 -2.0736
vif(fit) # vif() function from package car to detect multicolinearity``` Nay VIF>=4 will be noticed## TL PQ
## 1.0705 1.0705
relweights(fit)## Weights
## PQ 12.89012
## TL 87.10988
par(mfrow=c(2,2))
plot(fit)shapiro.test(fit$residuals)##
## Shapiro-Wilk normality test
##
## data: fit$residuals
## W = 0.9834, p-value = 0.06772
shrinkage(fit, n.predictors=2, k=5)## Original R-square = 0.4827816
## 5 Fold Cross-Validated R-square = 0.4504037
## Change = 0.03237789
cv.lm(data=df_top, fit, m=2)## Analysis of Variance Table
##
## Response: SPPR
## Df Sum Sq Mean Sq F value Pr(>F)
## TL 1 16.79 16.79 108 < 2e-16 ***
## PQ 1 4.49 4.49 29 2.8e-07 ***
## Residuals 147 22.80 0.16
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Warning in cv.lm(data = df_top, fit, m = 2):
##
## As there is >1 explanatory variable, cross-validation
## predicted values for a fold are not a linear function
## of corresponding overall predicted values. Lines that
## are shown for the different folds are approximate
##
## fold 1
## Observations in test set: 75
## 405 406 407 408 409 432 510 511 578
## Predicted 2.739 3.330 3.070 2.621 2.408 2.046 2.908 2.7468 2.355
## cvpred 2.775 3.436 3.148 2.635 2.427 2.023 2.969 2.7535 2.377
## SPPR 2.464 3.041 2.451 2.930 3.270 1.632 3.141 2.6743 3.060
## CV residual -0.311 -0.395 -0.697 0.295 0.843 -0.391 0.172 -0.0792 0.683
## 582 602 650 682 749 756 876 877 879
## Predicted 2.324 2.4328 2.246 2.404 2.2880 2.334 2.661 2.587 3.427
## cvpred 2.288 2.4153 2.213 2.396 2.2591 2.297 2.646 2.563 3.491
## SPPR 2.984 2.4421 2.067 2.689 2.3059 2.653 2.819 2.697 3.635
## CV residual 0.696 0.0267 -0.146 0.293 0.0468 0.356 0.173 0.134 0.144
## 959 960 961 964 1017 1045 1047 1048 1102
## Predicted 2.8923 2.600 2.382 2.967 2.237 2.531 2.3961 3.142 2.072
## cvpred 2.9624 2.640 2.399 3.048 2.211 2.574 2.4236 3.227 2.047
## SPPR 2.9105 2.355 3.035 2.478 1.830 2.322 2.3766 3.583 1.674
## CV residual -0.0519 -0.285 0.636 -0.569 -0.381 -0.252 -0.0471 0.357 -0.373
## 1105 1182 1183 1185 1187 1197 1250 1253 1451
## Predicted 2.097 2.775 2.775 1.954 2.2924 2.46 2.3087 2.448 2.227
## cvpred 2.076 2.827 2.827 1.904 2.2606 2.45 2.2614 2.427 2.199
## SPPR 1.699 2.707 2.707 1.283 2.2474 2.74 2.3247 2.252 2.611
## CV residual -0.376 -0.121 -0.121 -0.621 -0.0131 0.29 0.0632 -0.175 0.412
## 1456 2000 2002 2003 2004 2005 2391 2445 2446
## Predicted 2.7370 3.11 2.335 2.5501 3.18 2.95 2.9852 2.19 3.059
## cvpred 2.7546 3.16 2.347 2.5568 3.27 3.01 3.0488 2.20 3.104
## SPPR 2.6852 2.96 2.990 2.4632 2.90 2.35 3.1014 2.35 2.976
## CV residual -0.0694 -0.20 0.642 -0.0936 -0.37 -0.66 0.0527 0.16 -0.128
## 2449 3590 3724 3725 3841 3856 3860 3972
## Predicted 2.8553 2.077 2.054 2.874 1.959 2.1576 2.4818 2.37266
## cvpred 2.9356 2.028 2.013 2.935 1.924 2.1455 2.5045 2.36477
## SPPR 3.0348 2.360 2.427 2.749 1.589 2.1252 2.4701 2.35918
## CV residual 0.0992 0.332 0.413 -0.186 -0.335 -0.0203 -0.0344 -0.00559
## 4004 4068 4096 4188 4191 4315 4317 4333 4362
## Predicted 2.596 2.642 2.118 2.789 2.242 2.422 2.7423 2.4327 2.9205
## cvpred 2.576 2.694 2.071 2.851 2.225 2.442 2.8167 2.4806 3.0084
## SPPR 3.467 3.047 1.587 2.697 2.122 2.276 2.8365 2.5084 3.0308
## CV residual 0.891 0.353 -0.484 -0.154 -0.103 -0.166 0.0198 0.0278 0.0223
## 4363 4365 4377 4378 4625 4629 4637 4640 4667
## Predicted 2.42 2.818 2.6723 2.303 3.591 2.3542 3.0061 3.085 2.975
## cvpred 2.49 2.887 2.7197 2.358 3.682 2.3950 3.0763 3.170 3.057
## SPPR 2.05 2.319 2.6972 2.035 3.449 2.4913 3.1265 2.959 2.533
## CV residual -0.44 -0.568 -0.0225 -0.323 -0.233 0.0963 0.0502 -0.211 -0.525
## 4668 4670 4813 5140
## Predicted 3.035 2.523 2.204 1.740
## cvpred 3.089 2.507 2.167 1.708
## SPPR 2.836 2.060 1.750 2.324
## CV residual -0.253 -0.448 -0.417 0.616
##
## Sum of squares = 9.68 Mean square = 0.13 n = 75
##
## fold 2
## Observations in test set: 75
## 404 514 579 583 652 657 677 678 681 701
## Predicted 2.81 2.626 2.373 1.991 2.358 2.358 2.75 2.359 2.311 2.429
## cvpred 2.74 2.632 2.402 2.045 2.400 2.400 2.70 2.351 2.350 2.457
## SPPR 2.78 3.074 2.619 2.921 2.216 2.216 3.99 2.663 2.639 2.686
## CV residual 0.04 0.442 0.218 0.876 -0.183 -0.183 1.29 0.312 0.289 0.229
## 751 874 875 878 932 935 962 963 1020
## Predicted 2.334 2.85 2.725 3.564 2.345 2.36 2.6414 3.336 2.341
## cvpred 2.382 2.81 2.730 3.495 2.386 2.40 2.6342 3.241 2.378
## SPPR 2.653 2.99 2.593 4.113 1.827 1.81 2.6910 2.967 1.954
## CV residual 0.271 0.18 -0.138 0.618 -0.559 -0.59 0.0568 -0.274 -0.423
## 1044 1046 1049 1129 1130 1131 1132 1139 1184
## Predicted 2.926 2.383 3.198 3.010 2.784 2.57 2.7080 1.69 1.954
## cvpred 2.863 2.375 3.138 2.975 2.760 2.51 2.6650 1.76 2.012
## SPPR 2.668 3.093 2.793 3.714 3.165 2.16 2.6041 2.76 1.283
## CV residual -0.195 0.717 -0.344 0.738 0.405 -0.35 -0.0609 1.00 -0.728
## 1186 1196 1251 1252 1455 2001 2017 2058 2061
## Predicted 2.2924 2.458 2.3087 2.448 2.177 2.755 2.898 1.907 2.226
## cvpred 2.3336 2.479 2.3655 2.479 2.210 2.746 2.891 1.954 2.276
## SPPR 2.2474 2.737 2.3247 2.252 1.844 2.605 3.321 1.403 1.775
## CV residual -0.0862 0.257 -0.0408 -0.228 -0.366 -0.141 0.429 -0.551 -0.501
## 2361 2390 2392 2393 2394 2416 2447 2448 3701
## Predicted 2.100 2.15 3.157 2.701 2.935 1.946 3.0673 2.47614 2.175
## cvpred 2.123 2.19 3.099 2.673 2.885 1.979 3.0322 2.47738 2.223
## SPPR 2.346 2.69 3.422 2.843 3.282 1.863 2.9694 2.46783 1.645
## CV residual 0.223 0.50 0.323 0.169 0.397 -0.116 -0.0629 -0.00955 -0.577
## 3723 3726 3727 3870 3973 4001 4067 4099 4111
## Predicted 2.181 2.303 2.321 2.8035 2.711 2.118 2.3732 2.631 2.6497
## cvpred 2.215 2.327 2.331 2.7695 2.677 2.173 2.3816 2.657 2.6380
## SPPR 2.570 2.119 2.847 2.8411 3.165 1.632 2.3983 2.928 2.6510
## CV residual 0.355 -0.208 0.516 0.0716 0.489 -0.542 0.0167 0.271 0.0129
## 4189 4190 4192 4319 4370 4379 4380 4624 4630
## Predicted 2.5320 2.894 1.816 2.373 2.59 2.406 2.190 2.656 3.263
## cvpred 2.5632 2.873 1.834 2.375 2.58 2.433 2.238 2.593 3.169
## SPPR 2.6208 2.995 2.149 2.002 1.95 1.908 2.038 2.751 2.772
## CV residual 0.0576 0.121 0.316 -0.373 -0.63 -0.525 -0.199 0.158 -0.396
## 4639 4643 4669 4728 4790 4791 4797 4798 4810
## Predicted 3.304 2.452 2.372 2.05 3.151 2.429 2.654 2.30347 2.595
## cvpred 3.171 2.414 2.390 2.10 3.063 2.369 2.617 2.25781 2.580
## SPPR 3.348 2.205 1.980 1.09 3.391 2.185 3.329 2.26226 1.983
## CV residual 0.176 -0.209 -0.411 -1.01 0.328 -0.184 0.712 0.00445 -0.597
## 4811 4812
## Predicted 2.136 2.479
## cvpred 2.173 2.508
## SPPR 1.992 2.079
## CV residual -0.181 -0.429
##
## Sum of squares = 14.1 Mean square = 0.19 n = 75
##
## Overall (Sum over all 75 folds)
## ms
## 0.158