This example uses average CPUE (Catch per Unit Effort) for canary rockfish (Off the coast of California).
Catch per unit effort is an indirect measure of the abundance of a target species– catch= CPUE * Effort, where Catch= total catch in that area, CPUE, catch-per-unit-effort, and Effort=boat-days within the same logical context. Ergo, CPUE= Catch (for any given area)/ Boat-Days (within the same area).
See more information on geostatistical_delta-GLMM here: https://github.com/nwfsc-assess/geostatistical_delta-GLMM
See this paper for more information:
Geostatistical delta-generalized linear mixed models improve precision for estimated abundance indices for West Coast groundfishes http://icesjms.oxfordjournals.org/content/early/2015/01/13/icesjms.fsu243.abstract
# Pulling in the Data
data(WCGBTS_Canary_example) # taking from SpatialDeltaGLMM- this calls the data from the SpatialDeltaGLMM package
# from the U.S. West Coast Groundfish Bottom Trawl Survey (WCGBTS)
# As far as I can tell, this data() call is telling R to prepare to grab attributes
# of the data.frame WCGBTS_Canary_example,
# however the data isn't actually loaded yet.
# As soon as you call on the data or query it in any way--
#even just ask if "is.data.frame(WCGBTS_Canary_example)",
# it gets loaded in as a data object.
CPUE <- WCGBTS_Canary_example$HAUL_WT_KG # Extracting element of canary rockfish data.
# The data.frame "WCGBTS_Canary_example" is now loaded into memory.
# This creates a vector of values taken from the WCGBTS_Canary_example-- although it's being named CPUE,
# it seems we are actually pulling the catch weight in kilograms, presumably per 1 unit effort.
# Visualize the canary rockfish data
par( mar=c(3,3,2,0), mgp=c(2,0.5,0), tck=-0.02) # Setting formatting for the histogram
hist(log(1+CPUE))
# Why add 1? because log() transforming numbers between 0 and 1 are negative.
# By adding 1, we eliminate negative log values.
Now that we have a data set, we can determine how likely any given data point is compared to a distribution. Any given distribution has parameters that describe its shape and size, so more specifically, we are comparing the data points to a distribution created by a specific set of parameters.
One way to do this is using r’s dnorm() function, which compares data points to a normal distribution. Calling dnorm() evalues any given data point’s probability of occurring based on a normal distribution that you specify– if left unspecified, it assumes you mean a normal distribution with mean 0, and a standard deviation of 1.
More on dnorm:
http://www.cyclismo.org/tutorial/R/probability.html
individual_likelihoods<-dnorm(CPUE, mean=20, sd=2)
# We can visualize this by plotting the liklihoods against the actual data points themselves:
plot(CPUE,individual_likelihoods, main="How likely any individual point is to have been \n taken from a normal distribution with mean 20, SD 2")
What about the entire data set’s likelihood of being drawn from a distribution generated based on those parameters? This is where the total log-likelihood comes in. For an explanation of why we use log-likelihood instead of pure likelihood, check this out:
http://math.stackexchange.com/questions/892832/why-we-consider-log-likelihood-instead-of-likelihood-in-gaussian-distribution http://www.unc.edu/courses/2007spring/enst/562/001/docs/lectures/lecture15.htm#loglikelihood
log_likelihood<-sum(dnorm(CPUE, mean=20, sd=2, log=TRUE)) # Log-likelihood for all data
cat("The log likelihood of of this data set with the given parameters is ",log_likelihood)
## The log likelihood of of this data set with the given parameters is -4607459
What’s going on here? We are comparing the CPUE data set to a normal distribution with mean 20, SD 2. We are also log-transforming those likelihoods (The log=TRUE toggle). By adding these all together, we get the total log likelihood.
Long story short, we want to maximize the log-likelihood that our data set comes from a distribution with the parameters we have specified.
How do we do this? We can take a variety of different approaches.
######### Method 1 -- Pre-made functions in R, like the linear model functions
Lm <- lm(CPUE ~ 1)
summary(Lm)
##
## Call:
## lm(formula = CPUE ~ 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9 -2.9 -2.9 -2.9 4939.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8549 0.7724 3.696 0.00022 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 67.37 on 7607 degrees of freedom
# Step 1 -- define function that calculates the negative log liklihood.
NegLogLike_Fn <- function(Par, Data){
## Parameters
# (Note here that the object "Par" given to the function is expected to have 2 elements-- the mean hat,
# and the SD hat, in that order)
Mean_hat <- Par[1]
SD_hat <- Par[2]
## Log-likelihoo
# Note that this is basically doing the same thing as we saw above-- just then multiplying it by negative 1.
LogLike_i <- dnorm(Data$y, mean=Mean_hat, sd=SD_hat, log=TRUE) # THis creates a vector of the log-liklihoods for each data point
NegLogLike <- -1 * sum(LogLike_i) # multiplies the sum of those data points' log-likelihoods
# Why negative? because the optim() function naturally minimizes,
# not maximizes, the output of the function.
return(NegLogLike)
}
# Step 2 -- minimize negative log-likelihood to estimate parameters
Data <- list( 'y'=CPUE ) # This is defining a named list where the only object in it is "y", containing the CPUE values
Start = c(1,1) # This is the starting point for the log liklihood function parameters
# Let's see what the negative log liklihood function is
starting_NegLogLike<-NegLogLike_Fn(Par=Start, Data=Data)
print(starting_NegLogLike)
## [1] 17282659
# Now, we can use the otim() function in R to try to find better parameters to describe the data.
Opt <- optim(par=Start, fn=NegLogLike_Fn, Data=Data, lower=c(0.01,0.01), upper=Inf, method="L-BFGS-B", hessian=TRUE)
# What does this do?
# optim() looks for parameters (starting at the start values) that minimize the output of the
# NegLogLike_Fn, based on the Data. There is a lower bound on the parameters of 0.01, and no upper
# bound (upper=Inf). The method ("L-BFGS-B") allows for these lower and upper boundaries. Hessian=TRUE
# generates a hessian matrix.
print(Opt$par) # Estimated parameters
## [1] 2.854405 67.364745
print(sqrt(diag(solve(Opt$hessian))))# This pulls the hessian matrix, "solves (aka. takes the inverse of it) it",
## [1] 0.7723203 0.5461132
# then takes the diagonal, and square roots it.
###### Method 3 -- Optimize using TMB
# Step 1 -- make and compile template file
compile( "linear_model_v1.cpp" ) # compile() is a function within the TMB library
## Note: Using Makevars in C:/Users/stubbsrw/Documents/.R/Makevars
## [1] 0
# this complies the code to create the .dll and .o files!
# You'll know it worked if the .dll file was created, the .o file
# is irrelevant and may as well not exist.
# (On Macs, it will create a .so file rather than a .dll "dynamically linked library")
# Step 2 -- build inputs and object
dyn.load(dynlib("linear_model_v1"))
# What's going on here? You're loading the "Dynamically Loaded Library" (.dll)
# file you created in the compile step. For more info on this function:
# https://stat.ethz.ch/R-manual/R-devel/library/base/html/dynload.html
Okay, so you’ve now loaded in the linear model v1.cpp file– but what is actually in that C++ file? What are the aims of the C++ file to begin with? We actually call this code in TMB’s main call, the MakeADFun. According to the TMB documentation, the purpose of the MakeADFun() function is to “Construct objective functions with derivatives based on the users C++ template.” Let’s break this down.
An “Objective function”
The function f is called, variously, an objective function, a loss function or cost function (minimization), a utility function or fitness function (maximization), or, in certain fields, an energy function or energy functional. A feasible solution that minimizes (or maximizes, if that is the goal) the objective function is called an optimal solution. An objective function is either a loss function or its negative (sometimes called a reward function, a profit function, a utility function, a fitness function, etc.), in which case it is to be maximized.
(From Wikipedia: https://en.wikipedia.org/wiki/Mathematical_optimization, https://en.wikipedia.org/wiki/Loss_function)
“With Derivatives”
Okay, so here’s where we get deep into the weeds. TMB uses a “generalized newton optimizer” for the inner optimization problem, which is a “gradient based optimizer”. I’m still just beginning to scratch the surface of this, but more on gradient based methods can be found in this doc on Python’s scipy libraries. http://www.scipy-lectures.org/advanced/mathematical_optimization/#gradient-based-methods
From my understanding, the “gradient” is basically the shape and rapidity with which you begin to approach the global minimum (which can pass through local minimums as well). So, intuitively, this has to do with the second derivative– the rate of change!
Gradient based methods take steps to minimize the function in the direction of the gradient– the direction of the steepest descent. Basically, the function looks for the fastest way to get where it wants to go (the minimum!) via the path of least resistance where each step will make the most gains.
Newton methods use the 2 first derivatives of the loss function: the gradient and the Hessian. The whole “partial derivatives” and vector calculus thing is pretty much unknown to me, but this web page seems to have some info on it: http://www.lehman.edu/faculty/dgaranin/Mathematical_Physics/Mathematical_physics-04-Partial_derivatives_and_vector_calculus.pdf
//
#include <TMB.hpp>
template<class Type>
Type objective_function<Type>::operator() () //apparently we just always put everything at and above this line
{
// Data
DATA_VECTOR( y_i ); // using caps lock, it looks for a macro called y_i-- matching the name of these objects.
// Parameters
PARAMETER( mean );
PARAMETER( log_sd );
// Objective function
Type sd = exp(log_sd);
Type jnll = 0; //jnll: joint negative log liklihood. Setting to zero to begin with.
int n_data = y_i.size(); // int= integer n_data, y_i.size figures out the length of that vector.
// Probability of data conditional on fixed effect values
for( int i=0; i<n_data; i++){ //loop in C++
//note: it needs to be starting at 0, and use i<n_data, which then stops 1 before the length of n_data-- this is because
// R indexes from 1 and C++ indexes from 0.
jnll -= dnorm( y_i(i), mean, sd, true ); //-= every time you go through the loop, you add in the negative of the right hand side.
// also could be written as:
// jnll = jnll-dnorm( y_i(i), mean, sd, true );
}
// Reporting
return jnll;
}
So, basically, the CPP file includes the loss function based on some inputs you give it (that you have named specifically in your CPP code), and returns the thing you want to minimize.
Now that we have defined what the CPP code is looking for, we can define those objects in R, and then call the CPP code using TMB’s MakeADFun.
Params <- list("mean"=0, "log_sd"=0)
Data <- list( "y_i"=CPUE )
Obj <- MakeADFun( data=Data, parameters=Params, DLL="linear_model_v1")
From the TMB documentation (https://cran.r-project.org/web/packages/TMB/TMB.pdf): A call to MakeADFun will return an object that, based on the users DLL code (specified through DLL), contains functions to calculate the objective function and its gradient. The object contains the following components:
. par A default parameter.
. fn The likelihood function.
. gr The gradient function.
. report A function to report all variables reported with the REPORT() macro in the user template.
. env Environment with access to all parts of the structure
What are the attributes you can access from the Obj created from the MakeAdFun? Note that right now, the function’s parameters have not been optimized– these are just the things you put into the function to begin with.
#Exploring the objects returned by the MakeADFun() call:
names(Obj)
## [1] "par" "fn" "gr" "he" "hessian" "method" "retape"
## [8] "env" "report"
Obj$par # The parameters you initially input into the Obj (Or the parameters after an optimization)
## mean log_sd
## 0 0
Obj$fn # This is the likelihood function-- as such, we can rename it, and then pass it parameters:
## function (x = last.par, ...)
## {
## if (tracepar) {
## cat("par:\n")
## print(x)
## }
## if (!validpar(x))
## return(NaN)
## res <- f(x, order = 0)
## if (!ADreport) {
## if (is.finite(res) && res < value.best) {
## last.par.best <<- x
## value.best <<- res
## }
## }
## res
## }
## <environment: 0x000000000d652a78>
TMB_likelihood_function<-Obj$fn
TMB_likelihood_function(Obj$par) # As such, we can pass it the current values for the parameters
## [1] 17300575
Ah, we return to gradients. What, exactly, is a gradient? http://betterexplained.com/articles/vector-calculus-understanding-the-gradient/
The gradient is a fancy word for derivative, or the rate of change of a function. It’s a vector (a direction to move) that
. Points in the direction of greatest increase of a function (intuition on why)
. Is zero at a local maximum or local minimum (because there is no single direction of increase)
Obj$gr # This is the gradient function-- you can also pass this the current parameters:
## function (x = last.par, ...)
## {
## ans <- f(x, order = 1)
## if (tracemgc)
## cat("outer mgc: ", max(abs(ans)), "\n")
## ans
## }
## <environment: 0x000000000d652a78>
Obj$gr(Obj$par)
## outer mgc: 34579560
## [,1] [,2]
## [1,] -21720.32 -34579560
# I'm not sure quite how to interpret this just yet-- I'll have to come back and explore this later.
The MakeAdFun is meant to be used with one of R’s optimizers, like nlminb– which apparently is another optimization routine, but the documentation says, “See Also: Optim(), which is perferred, and nlm.”
# Step 3 -- test and optimize
Opt <- nlminb(start=Obj$par, objective=Obj$fn, gradient=Obj$gr)
## outer mgc: 34579560
## outer mgc: 4673254
## outer mgc: 3417633
## outer mgc: 1449702
## outer mgc: 762249.9
## outer mgc: 366610.7
## outer mgc: 182475.2
## outer mgc: 88663.93
## outer mgc: 42640.91
## outer mgc: 19699.17
## outer mgc: 8506.38
## outer mgc: 3110.862
## outer mgc: 969.6104
## outer mgc: 152.9608
## outer mgc: 11.43912
## outer mgc: 0.134013
## outer mgc: 0.0003840815
# What did this create? Let's make a data.frame of the parameters and their estimates, as well as the final gradient of those objects.
Opt$diagnostics = data.frame("name"=names(Obj$par),
"Est"=Opt$par,
"final_gradient"=as.vector(Obj$gr(Opt$par))
)
## outer mgc: 0.0003840815
Opt$par # estimated parameters
## mean log_sd
## 2.854934 4.210122
SD <- sdreport(Obj) # standard errors of estimated parameters
## outer mgc: 0.0003840815
## outer mgc: 0.001680116
## outer mgc: 0.001672898
## outer mgc: 15.20041
## outer mgc: 15.23161
SD
## sdreport(.) result
## Estimate Std. Error
## mean 2.854934 0.772320069
## log_sd 4.210122 0.008106803
## Maximum gradient component: 0.0003840815
# Define a new design matrix
X <- cbind( "CA"=ifelse(WCGBTS_Canary_example$BEST_LAT_DD<42,1,0), # Assign 1 to california if below a certain latitude
"OR"=ifelse(WCGBTS_Canary_example$BEST_LAT_DD>=42&WCGBTS_Canary_example$BEST_LAT_DD<46,1,0), # Assign 1 to oregon if between WA and CA
"WA"=ifelse(WCGBTS_Canary_example$BEST_LAT_DD>=46,1,0), # Assign 1 to washington if above certain latitude
"Pass"=WCGBTS_Canary_example$PASS-1.5 ) # Not sure what "Pass" is
# What's in this, exactly?
head(X)
## CA OR WA Pass
## [1,] 0 0 1 0.5
## [2,] 0 0 1 0.5
## [3,] 0 0 1 0.5
## [4,] 0 0 1 0.5
## [5,] 0 0 1 0.5
## [6,] 0 0 1 0.5
# Step 1 -- make and compile template file
compile( "linear_model_v2.cpp" )
## Note: Using Makevars in C:/Users/stubbsrw/Documents/.R/Makevars
## [1] 0
What’s in it this time?
// Space time
#include <TMB.hpp>
template<class Type>
Type objective_function<Type>::operator() ()
{
// Data
DATA_VECTOR( y_i );
DATA_MATRIX( X_ij );
// Parameters
PARAMETER_VECTOR( b_j );
PARAMETER( log_sd );
// Objective funcction
Type sd = exp(log_sd);
Type jnll = 0;
int n_data = y_i.size();
// Linear predictor
vector<Type> linpred_i( n_data );
linpred_i = X_ij*b_j;
// Probability of data conditional on fixed effect values
for( int i=0; i<n_data; i++){
jnll -= dnorm( y_i(i), linpred_i(i), sd, true );
}
// Reporting
return jnll;
}
Pretty similar in structure to the one seen above– the main difference is the input- there are now data vector and data matrix inputs.
# Step 2 -- build inputs and object
dyn.load(dynlib("linear_model_v2")) # Loading in the CPP DLL file
Params <- list("b_j"=rep(0,ncol(X)), "log_sd"=0) # Creating two named elements in the list of parameters-- the beta coefficients, and the log_sd
Params
## $b_j
## [1] 0 0 0 0
##
## $log_sd
## [1] 0
Data = list( "y_i"=CPUE, "X_ij"=X ) # Adding the response varaible (CPUE), and the predictive data matrix (X) to one data object to be passed to the MakeADFun.
Obj <- MakeADFun( data=Data, parameters=Params, DLL="linear_model_v2")
# Step 3 -- test and optimize
# Before being optimized:
Obj$fn(Obj$par)
## [1] 17300575
Obj$gr(Obj$par)
## outer mgc: 34579560
## [,1] [,2] [,3] [,4] [,5]
## [1,] -2687.93 -4593.58 -14438.81 144.4485 -34579560
# Optimizing!
Opt <- nlminb(start=Obj$par, objective=Obj$fn, gradient=Obj$gr)
## outer mgc: 34579560
## outer mgc: 4673255
## outer mgc: 3417036
## outer mgc: 1455782
## outer mgc: 770140.3
## outer mgc: 372947.8
## outer mgc: 186133.6
## outer mgc: 90840.77
## outer mgc: 43833.32
## outer mgc: 20379.43
## outer mgc: 8849.096
## outer mgc: 3335.73
## outer mgc: 987.7975
## outer mgc: 146.4392
## outer mgc: 19.63495
## outer mgc: 47.13524
## outer mgc: 24.46349
## outer mgc: 4.848835
## outer mgc: 0.02530689
## outer mgc: 0.2588783
## outer mgc: 0.0937196
## outer mgc: 0.01149383
# Creating a diagnostics data.frame
Opt$diagnostics <- data.frame( "name"=names(Obj$par), "Est"=Opt$par, "final_gradient"=as.vector(Obj$gr(Opt$par)))
## outer mgc: 0.01149383
Opt$diagnostics
## name Est final_gradient
## 1 b_j 0.64790502 -2.082422e-05
## 2 b_j 2.11136582 -4.845455e-05
## 3 b_j 11.23758161 -6.904846e-05
## 4 b_j 0.06515283 -1.052066e-04
## 5 log_sd 4.20850083 -1.149383e-02
Opt$par # estimated parameters
## b_j b_j b_j b_j log_sd
## 0.64790502 2.11136582 11.23758161 0.06515283 4.20850083
SD <- sdreport( Obj ) # standard errors
## outer mgc: 0.01149383
## outer mgc: 0.0114947
## outer mgc: 0.01149479
## outer mgc: 0.01149421
## outer mgc: 0.01149441
## outer mgc: 0.01149398
## outer mgc: 0.01149425
## outer mgc: 0.01149404
## outer mgc: 0.01149446
## outer mgc: 15.18932
## outer mgc: 15.24274