Background on Maximum Likelihood Estimation:

http://data.princeton.edu/wws509/notes/a1.pdf

Background on CPUE

This example uses average CPUE (Catch per Unit Effort) for canary rockfish (Off the coast of California).

Catch per unit effort is an indirect measure of the abundance of a target species– catch= CPUE * Effort, where Catch= total catch in that area, CPUE, catch-per-unit-effort, and Effort=boat-days within the same logical context. Ergo, CPUE= Catch (for any given area)/ Boat-Days (within the same area).

See more information on geostatistical_delta-GLMM here: https://github.com/nwfsc-assess/geostatistical_delta-GLMM

See this paper for more information:

Geostatistical delta-generalized linear mixed models improve precision for estimated abundance indices for West Coast groundfishes http://icesjms.oxfordjournals.org/content/early/2015/01/13/icesjms.fsu243.abstract

# Pulling in the Data
data(WCGBTS_Canary_example) # taking from SpatialDeltaGLMM- this calls the data from the SpatialDeltaGLMM package
  # from the U.S. West Coast Groundfish Bottom Trawl Survey (WCGBTS)
  # As far as I can tell, this data() call is telling R to prepare to grab attributes
  # of the data.frame WCGBTS_Canary_example,
  # however the data isn't actually loaded yet.
  # As soon as you call on the data or query it in any way--
  #even just ask if "is.data.frame(WCGBTS_Canary_example)", 
  # it gets loaded in as a data object. 

CPUE <- WCGBTS_Canary_example$HAUL_WT_KG # Extracting element of canary rockfish data. 
  # The data.frame "WCGBTS_Canary_example" is now loaded into memory. 
  # This creates a  vector of values taken from the WCGBTS_Canary_example-- although it's being named CPUE, 
  # it seems we are actually pulling the catch weight in kilograms, presumably per 1 unit effort.  

# Visualize the canary rockfish data
  par( mar=c(3,3,2,0), mgp=c(2,0.5,0), tck=-0.02) # Setting formatting for the histogram
  hist(log(1+CPUE))

# Why add 1? because log() transforming numbers between 0 and 1 are negative.
# By adding 1, we eliminate negative log values.

Now that we have a data set, we can determine how likely any given data point is compared to a distribution. Any given distribution has parameters that describe its shape and size, so more specifically, we are comparing the data points to a distribution created by a specific set of parameters.

One way to do this is using r’s dnorm() function, which compares data points to a normal distribution. Calling dnorm() evalues any given data point’s probability of occurring based on a normal distribution that you specify– if left unspecified, it assumes you mean a normal distribution with mean 0, and a standard deviation of 1.

Using R’s Built-in Functions, like linear models.

######### Method 1 -- Pre-made functions in R, like the linear model functions
Lm <- lm(CPUE ~ 1)
summary(Lm)

## 
## Call:
## lm(formula = CPUE ~ 1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -2.9   -2.9   -2.9   -2.9 4939.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.8549     0.7724   3.696  0.00022 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 67.37 on 7607 degrees of freedom

Optimizing using R, and optim()

# Step 1 -- define function that calculates the negative log liklihood.
NegLogLike_Fn <- function(Par, Data){
  ## Parameters 
    # (Note here that the object "Par" given to the function is expected to have 2 elements-- the mean hat,
    # and the SD hat, in that order)
    Mean_hat <- Par[1]
    SD_hat <- Par[2]
  
  ## Log-likelihoo
    # Note that this is basically doing the same thing as we saw above-- just then multiplying it by negative 1.  
  LogLike_i <- dnorm(Data$y, mean=Mean_hat, sd=SD_hat, log=TRUE) # THis creates a vector of the log-liklihoods for each data point
  NegLogLike <- -1 * sum(LogLike_i) # multiplies the sum of those data points' log-likelihoods 
  # Why negative? because the optim() function naturally minimizes, 
  # not maximizes, the output of the function. 
  return(NegLogLike)
}

# Step 2 -- minimize negative log-likelihood to estimate parameters
Data <- list( 'y'=CPUE ) # This is defining a named list where the only object in it is "y", containing the CPUE values
Start = c(1,1) # This is the starting point for the log liklihood function parameters

# Let's see what the negative log liklihood function is 
starting_NegLogLike<-NegLogLike_Fn(Par=Start, Data=Data)
print(starting_NegLogLike)

## [1] 17282659

# Now, we can use the otim() function in R to try to find better parameters to describe the data. 
Opt <- optim(par=Start, fn=NegLogLike_Fn, Data=Data, lower=c(0.01,0.01), upper=Inf, method="L-BFGS-B", hessian=TRUE)
  # What does this do?
  # optim() looks for parameters (starting at the start values) that minimize the output of the
  # NegLogLike_Fn, based on the Data. There is a lower bound on the parameters of 0.01, and no upper
  # bound (upper=Inf). The method ("L-BFGS-B") allows for these lower and upper boundaries. Hessian=TRUE
  # generates a hessian matrix. 

print(Opt$par) # Estimated parameters

## [1]  2.854405 67.364745

print(sqrt(diag(solve(Opt$hessian))))# This pulls the hessian matrix, "solves (aka. takes the inverse of it) it",

## [1] 0.7723203 0.5461132

# then takes the diagonal, and square roots it.

Using TMB

Loading in the TMB File

###### Method 3 -- Optimize using TMB
# Step 1 -- make and compile template file
compile( "linear_model_v1.cpp" ) # compile() is a function within the TMB library

## Note: Using Makevars in C:/Users/stubbsrw/Documents/.R/Makevars

## [1] 0

  # this complies the code to create the .dll and .o files! 
  # You'll know it worked if the .dll file was created, the .o file
  # is irrelevant and may as well not exist.
  # (On Macs, it will create a .so file rather than a .dll "dynamically linked library")

# Step 2 -- build inputs and object
dyn.load(dynlib("linear_model_v1"))
  # What's going on here? You're loading the "Dynamically Loaded Library" (.dll)
  # file you created in the compile step. For more info on this function:
  # https://stat.ethz.ch/R-manual/R-devel/library/base/html/dynload.html

What’s in the TMB file, anyway?

Okay, so you’ve now loaded in the linear model v1.cpp file– but what is actually in that C++ file? What are the aims of the C++ file to begin with? We actually call this code in TMB’s main call, the MakeADFun. According to the TMB documentation, the purpose of the MakeADFun() function is to “Construct objective functions with derivatives based on the users C++ template.” Let’s break this down.

An “Objective function”

The function f is called, variously, an objective function, a loss function or cost function (minimization), a utility function or fitness function (maximization), or, in certain fields, an energy function or energy functional. A feasible solution that minimizes (or maximizes, if that is the goal) the objective function is called an optimal solution. An objective function is either a loss function or its negative (sometimes called a reward function, a profit function, a utility function, a fitness function, etc.), in which case it is to be maximized.

(From Wikipedia: https://en.wikipedia.org/wiki/Mathematical_optimization, https://en.wikipedia.org/wiki/Loss_function)

“With Derivatives”

Okay, so here’s where we get deep into the weeds. TMB uses a “generalized newton optimizer” for the inner optimization problem, which is a “gradient based optimizer”. I’m still just beginning to scratch the surface of this, but more on gradient based methods can be found in this doc on Python’s scipy libraries. http://www.scipy-lectures.org/advanced/mathematical_optimization/#gradient-based-methods

From my understanding, the “gradient” is basically the shape and rapidity with which you begin to approach the global minimum (which can pass through local minimums as well). So, intuitively, this has to do with the second derivative– the rate of change!

Gradient based methods take steps to minimize the function in the direction of the gradient– the direction of the steepest descent. Basically, the function looks for the fastest way to get where it wants to go (the minimum!) via the path of least resistance where each step will make the most gains.

Newton methods use the 2 first derivatives of the loss function: the gradient and the Hessian. The whole “partial derivatives” and vector calculus thing is pretty much unknown to me, but this web page seems to have some info on it: http://www.lehman.edu/faculty/dgaranin/Mathematical_Physics/Mathematical_physics-04-Partial_derivatives_and_vector_calculus.pdf

What’s in the CPP file?

// 
#include <TMB.hpp>
template<class Type>
Type objective_function<Type>::operator() () //apparently we just always put everything at and above this line 
{
  // Data
  DATA_VECTOR( y_i ); // using caps lock, it looks for a macro called y_i-- matching the name of these objects.

  // Parameters 
  PARAMETER( mean );
  PARAMETER( log_sd );

  // Objective function
  Type sd = exp(log_sd);
  Type jnll = 0; //jnll: joint negative log liklihood. Setting to zero to begin with.
  int n_data = y_i.size(); // int= integer n_data, y_i.size figures out the length of that vector.

  // Probability of data conditional on fixed effect values
  for( int i=0; i<n_data; i++){ //loop in C++
    //note: it needs to be starting at 0, and use i<n_data, which then stops 1 before the length of n_data-- this is because
    // R indexes from 1 and C++ indexes from 0.
    jnll -= dnorm( y_i(i), mean, sd, true ); //-= every time you go through the loop, you add in the negative of the right hand side.
  // also could be written as:
  // jnll = jnll-dnorm( y_i(i), mean, sd, true );
  }

  // Reporting
  return jnll;
}

So, basically, the CPP file includes the loss function based on some inputs you give it (that you have named specifically in your CPP code), and returns the thing you want to minimize.

Now that we have defined what the CPP code is looking for, we can define those objects in R, and then call the CPP code using TMB’s MakeADFun.

Params <- list("mean"=0, "log_sd"=0)
Data <- list( "y_i"=CPUE )
Obj <- MakeADFun( data=Data, parameters=Params, DLL="linear_model_v1")

What does the MakeADFun return?

From the TMB documentation (https://cran.r-project.org/web/packages/TMB/TMB.pdf): A call to MakeADFun will return an object that, based on the users DLL code (specified through DLL), contains functions to calculate the objective function and its gradient. The object contains the following components:

. par A default parameter.

. fn The likelihood function.

. gr The gradient function.

. report A function to report all variables reported with the REPORT() macro in the user template.

. env Environment with access to all parts of the structure

What are the attributes you can access from the Obj created from the MakeAdFun? Note that right now, the function’s parameters have not been optimized– these are just the things you put into the function to begin with.

#Exploring the objects returned by the MakeADFun() call:
names(Obj)

## [1] "par"     "fn"      "gr"      "he"      "hessian" "method"  "retape" 
## [8] "env"     "report"

Obj$par # The parameters you initially input into the Obj (Or the parameters after an optimization)

##   mean log_sd 
##      0      0

Obj$fn # This is the likelihood function-- as such, we can rename it, and then pass it parameters:

## function (x = last.par, ...) 
## {
##     if (tracepar) {
##         cat("par:\n")
##         print(x)
##     }
##     if (!validpar(x)) 
##         return(NaN)
##     res <- f(x, order = 0)
##     if (!ADreport) {
##         if (is.finite(res) && res < value.best) {
##             last.par.best <<- x
##             value.best <<- res
##         }
##     }
##     res
## }
## <environment: 0x000000000d652a78>

  TMB_likelihood_function<-Obj$fn
  TMB_likelihood_function(Obj$par) # As such, we can pass it the current values for the parameters

## [1] 17300575

Ah, we return to gradients. What, exactly, is a gradient? http://betterexplained.com/articles/vector-calculus-understanding-the-gradient/

The gradient is a fancy word for derivative, or the rate of change of a function. It’s a vector (a direction to move) that

. Points in the direction of greatest increase of a function (intuition on why)

. Is zero at a local maximum or local minimum (because there is no single direction of increase)

Obj$gr # This is the gradient function-- you can also pass this the current parameters:

## function (x = last.par, ...) 
## {
##     ans <- f(x, order = 1)
##     if (tracemgc) 
##         cat("outer mgc: ", max(abs(ans)), "\n")
##     ans
## }
## <environment: 0x000000000d652a78>

  Obj$gr(Obj$par)

## outer mgc:  34579560

##           [,1]      [,2]
## [1,] -21720.32 -34579560

# I'm not sure quite how to interpret this just yet-- I'll have to come back and explore this later.

Optimizing and Exploring Your Parameters

The MakeAdFun is meant to be used with one of R’s optimizers, like nlminb– which apparently is another optimization routine, but the documentation says, “See Also: Optim(), which is perferred, and nlm.”

# Step 3 -- test and optimize
Opt <- nlminb(start=Obj$par, objective=Obj$fn, gradient=Obj$gr)

## outer mgc:  34579560 
## outer mgc:  4673254 
## outer mgc:  3417633 
## outer mgc:  1449702 
## outer mgc:  762249.9 
## outer mgc:  366610.7 
## outer mgc:  182475.2 
## outer mgc:  88663.93 
## outer mgc:  42640.91 
## outer mgc:  19699.17 
## outer mgc:  8506.38 
## outer mgc:  3110.862 
## outer mgc:  969.6104 
## outer mgc:  152.9608 
## outer mgc:  11.43912 
## outer mgc:  0.134013 
## outer mgc:  0.0003840815

# What did this create? Let's make a data.frame of the parameters and their estimates, as well as the final gradient of those objects. 
Opt$diagnostics = data.frame("name"=names(Obj$par), 
                              "Est"=Opt$par, 
                              "final_gradient"=as.vector(Obj$gr(Opt$par))
                              )

## outer mgc:  0.0003840815

Opt$par # estimated parameters

##     mean   log_sd 
## 2.854934 4.210122

SD <- sdreport(Obj) # standard errors of estimated parameters

## outer mgc:  0.0003840815 
## outer mgc:  0.001680116 
## outer mgc:  0.001672898 
## outer mgc:  15.20041 
## outer mgc:  15.23161

SD

## sdreport(.) result
##        Estimate  Std. Error
## mean   2.854934 0.772320069
## log_sd 4.210122 0.008106803
## Maximum gradient component: 0.0003840815

Full Example #1- average CPUE for canary rockfish

Defining multivariate data

# Define a new design matrix
X <- cbind( "CA"=ifelse(WCGBTS_Canary_example$BEST_LAT_DD<42,1,0), # Assign 1 to california if below a certain latitude
            "OR"=ifelse(WCGBTS_Canary_example$BEST_LAT_DD>=42&WCGBTS_Canary_example$BEST_LAT_DD<46,1,0), # Assign 1 to  oregon if between WA and CA
            "WA"=ifelse(WCGBTS_Canary_example$BEST_LAT_DD>=46,1,0), # Assign 1 to washington if above certain latitude
            "Pass"=WCGBTS_Canary_example$PASS-1.5 ) # Not sure what "Pass" is

# What's in this, exactly?
head(X)

##      CA OR WA Pass
## [1,]  0  0  1  0.5
## [2,]  0  0  1  0.5
## [3,]  0  0  1  0.5
## [4,]  0  0  1  0.5
## [5,]  0  0  1  0.5
## [6,]  0  0  1  0.5

Compile CPP Code

# Step 1 -- make and compile template file
compile( "linear_model_v2.cpp" )

## Note: Using Makevars in C:/Users/stubbsrw/Documents/.R/Makevars

## [1] 0

What’s in it this time?

// Space time
#include <TMB.hpp>
template<class Type>
Type objective_function<Type>::operator() ()
{
  // Data
  DATA_VECTOR( y_i );
  DATA_MATRIX( X_ij );

  // Parameters
  PARAMETER_VECTOR( b_j );
  PARAMETER( log_sd );

  // Objective funcction
  Type sd = exp(log_sd);
  Type jnll = 0;
  int n_data = y_i.size();

  // Linear predictor
  vector<Type> linpred_i( n_data );
  linpred_i = X_ij*b_j;

  // Probability of data conditional on fixed effect values
  for( int i=0; i<n_data; i++){
    jnll -= dnorm( y_i(i), linpred_i(i), sd, true );
  }
  
  // Reporting
  return jnll;
}

Pretty similar in structure to the one seen above– the main difference is the input- there are now data vector and data matrix inputs.

Build inputs and object

# Step 2 -- build inputs and object
dyn.load(dynlib("linear_model_v2")) # Loading in the CPP DLL file

Params <- list("b_j"=rep(0,ncol(X)), "log_sd"=0) # Creating two named elements in the list of parameters-- the beta coefficients, and the log_sd
Params

## $b_j
## [1] 0 0 0 0
## 
## $log_sd
## [1] 0

Data = list( "y_i"=CPUE, "X_ij"=X ) # Adding the response varaible (CPUE), and the predictive data matrix (X) to one data object to be passed to the MakeADFun. 

Obj <- MakeADFun( data=Data, parameters=Params, DLL="linear_model_v2")

Optimize!

# Step 3 -- test and optimize

# Before being optimized:
Obj$fn(Obj$par)

## [1] 17300575

Obj$gr(Obj$par)

## outer mgc:  34579560

##          [,1]     [,2]      [,3]     [,4]      [,5]
## [1,] -2687.93 -4593.58 -14438.81 144.4485 -34579560

# Optimizing!
Opt <- nlminb(start=Obj$par, objective=Obj$fn, gradient=Obj$gr)

## outer mgc:  34579560 
## outer mgc:  4673255 
## outer mgc:  3417036 
## outer mgc:  1455782 
## outer mgc:  770140.3 
## outer mgc:  372947.8 
## outer mgc:  186133.6 
## outer mgc:  90840.77 
## outer mgc:  43833.32 
## outer mgc:  20379.43 
## outer mgc:  8849.096 
## outer mgc:  3335.73 
## outer mgc:  987.7975 
## outer mgc:  146.4392 
## outer mgc:  19.63495 
## outer mgc:  47.13524 
## outer mgc:  24.46349 
## outer mgc:  4.848835 
## outer mgc:  0.02530689 
## outer mgc:  0.2588783 
## outer mgc:  0.0937196 
## outer mgc:  0.01149383

# Creating a diagnostics data.frame 
Opt$diagnostics <- data.frame( "name"=names(Obj$par), "Est"=Opt$par, "final_gradient"=as.vector(Obj$gr(Opt$par)))

## outer mgc:  0.01149383

Opt$diagnostics

##     name         Est final_gradient
## 1    b_j  0.64790502  -2.082422e-05
## 2    b_j  2.11136582  -4.845455e-05
## 3    b_j 11.23758161  -6.904846e-05
## 4    b_j  0.06515283  -1.052066e-04
## 5 log_sd  4.20850083  -1.149383e-02

Opt$par # estimated parameters

##         b_j         b_j         b_j         b_j      log_sd 
##  0.64790502  2.11136582 11.23758161  0.06515283  4.20850083

SD <- sdreport( Obj ) # standard errors

## outer mgc:  0.01149383 
## outer mgc:  0.0114947 
## outer mgc:  0.01149479 
## outer mgc:  0.01149421 
## outer mgc:  0.01149441 
## outer mgc:  0.01149398 
## outer mgc:  0.01149425 
## outer mgc:  0.01149404 
## outer mgc:  0.01149446 
## outer mgc:  15.18932 
## outer mgc:  15.24274

Fish Food for Thought: Lecture 1

Rebecca Stubbs

Wednesday, March 30, 2016