Goal - explore relationships between shipboard measurements, grouped by depth (0-75m, 75-150m, 150-300m)
Using assumptions and code by P Legendre (https://cran.r-project.org/web/packages/lmodel2/vignettes/mod2user.pdf)
If the error variance on the response var (y) is more than 3x that on x, use OLS –> it is not
Are data approx bivariate normal? –> can be made so by log10 transforming
For bivariate normal data, if both variables are expressed in same physical units (untransformed) or are dimensionless (e.g. log transformed), & can assume the error variances are approx equal, use Majory Axis (MA) Regression –> error variances are likely not approximately equal across variables
If MA cannot be used bc variables are expressed in diff units, or the error variances differ, use either RMA or SMA:
RMA if it can be reasonably assumed that the error variance of y/ the sample variance of y = the error variance of x/ the sample variance of x (i.e. error variance is proportional to corresponding variable)
SMA first test sig of correlation coefficient to see if sig
–> USING RMA FOR KM DATA SET (using log10 transformed data)
data frame with objects in rows and variables in columns
Loading data frame of depth normalized integrated shipboard measurements
Log10 transforming data and combine non-transformed and transformed dataframes
df_all_stations <- data.frame(read.csv("integrated_depthnorm_for_stats_all_stations.csv", header=1, row.names=1)) # all stations
df_all_stations_log <- as.data.frame(lapply(df_all_stations[7:23], function(x) log10(x))) # log10 transform values in df
## Warning in FUN(X[[i]], ...): NaNs produced
## Warning in FUN(X[[i]], ...): NaNs produced
colnames(df_all_stations_log) <- paste(colnames(df_all_stations_log), "log.10", sep = ".") # rename col names with log.10
comb_df <- cbind(df_all_stations, df_all_stations_log)
is.na(comb_df$NPP.umolC.l.d.CG.log.10)<- sapply(comb_df$NPP.umolC.l.d.CG.log.10, is.infinite) # remove infinite values
is.na(comb_df$BC.c.cell.CG.log.10)<- sapply(comb_df$BC.c.cell.CG.log.10, is.infinite) # remove infinite values
Using Model II Regression code developed by Legendre 2018 (https://cran.r-project.org/web/packages/lmodel2/lmodel2.pdf)
lmodel2(formula, data = NULL, range.y=NULL, range.x=NULL, nperm=0) HAVE TO REMOVE INF VALUES
control shift m for %>%
# Upper Euphotic Zone 0-75 m
df_0to75 <- filter(comb_df,
depth.id == "0to75") # filter data between 0-75m
## Warning: package 'bindrcpp' was built under R version 3.4.4
lmodel2(mix_lyr_dpth_p02_m.log.10 ~ NPP.umolC.l.d.CG.log.10, data = df_0to75, range.y = "interval", range.x = "interval", nperm = 99)
##
## Model II regression
##
## Call: lmodel2(formula = mix_lyr_dpth_p02_m.log.10 ~
## NPP.umolC.l.d.CG.log.10, data = df_0to75, range.y = "interval",
## range.x = "interval", nperm = 99)
##
## n = 38 r = -0.5392758 r-square = 0.2908184
## Parametric P-values: 2-tailed = 0.0004764964 1-tailed = 0.0002382482
## Angle between the two OLS regression lines = 31.5245 degrees
##
## Permutation tests of OLS, MA, RMA slopes: 1-tailed, tail corresponding to sign
## A permutation test of r is equivalent to a permutation test of the OLS slope
## P-perm for SMA = NA because the SMA slope cannot be tested
##
## Regression results
## Method Intercept Slope Angle (degrees) P-perm (1-tailed)
## 1 OLS 1.2041847 -0.7863234 -38.17880 0.01
## 2 MA 0.5503920 -1.9459710 -62.80217 0.01
## 3 SMA 0.8254410 -1.4581100 -55.55692 NA
## 4 RMA 0.7889324 -1.5228663 -56.70884 0.01
##
## Confidence intervals
## Method 2.5%-Intercept 97.5%-Intercept 2.5%-Slope 97.5%-Slope
## 1 OLS 0.9549874 1.4533820 -1.201379 -0.3712682
## 2 MA -0.4853216 0.9657338 -3.783041 -1.2092694
## 3 SMA 0.5587828 1.0267871 -1.931088 -1.1009777
## 4 RMA 0.1186261 1.1504335 -2.711804 -0.8816632
##
## Eigenvalues: 0.1107516 0.02608502
##
## H statistic used for computing C.I. of MA: 0.0460459
# Lower Euphotic Zone 75-150 m
#df_75to150 <- filter(comb_df,
#depth.id == "75to150") # filter data between 0-75m
#lmodel2(pp.um.c.l.d.log.10 ~ bp.umol.c.l.d.log.10, data = df_75to150, range.y = "interval", range.x = "interval", nperm = 99)
# Upper Mesopelagic 150-300 m
#df_150to300 <- filter(comb_df,
#depth.id == "150to300") # filter data between 150-300
#lmodel2(BC.c.cell.log.10 ~ poc.um.c.l.log.10, data = df_150to300, range.y = "interval", range.x = "interval", nperm = 99)