Bayesian Calibration Abundance Estimates

Dong Liang, Genevieve Nesslage, Michael Wilberg, Thomas Miller

July 27, 2017

Introduction

The R codes are based on the Liang et al. (2017) and [R-INLA packages] (http://www.r-inla.org/) to estimate total variance incorporating model-based and design-based variability. We model the variance components due to non-design factors such as varying catchability (Wilberg et al. 2010) using a zero-inflated spatial mixed model. Statistical inference was in an approximate Bayesian framework (Rue et al. 2009), and estimation conducted in a model calibration method using pseudo-empirical likelihood (Chen et al. 2004).

We create the R codes to assist fishery managers and researchers in analyzing spatially intensive fishery-independent surveys based on stratified sampling design, with sample size within each stratum proportional to the area. We also provides an example data simulated from the Winter Dredge Survey of blue crabs in the Chesapeake Bay, U.S. during 2001, 2006 and 2012.

Example data and R codes

The attached bc_2017.zip contains the following example data and helper codes:

  1. wds_sim_090716.csv: simulated number of crab per tow per 1000 m2 (ysim) and other design variables
  2. cbp_grid_090716.csv: grid covering the study area of winter dredge survey

3.     cbp_boundary_0907.rda: a R object defining the boundary of the complex shoreline of the Chesapeake Bay

4.     ma_el_5c.R: Helper functions to conduct empirical-likelihood based calibration for simple and stratified random design

5.     rzi_3.R: Helper function to simulate from truncated Poisson distribution

6.     zipbk_inla_15b.R:Helper function to fit zero-inflated model using R-INLA package

Bayesian Calibration Abundance Estimates

First extract the files into your working directory, and load the helper codes codes. The FNN package can be installed from CRAN (accessed August 2017), the INLA package can be downloaded from here (accessed August 2017). The following code was compiled under Windows 7 and the latest versions of R <3.4.1>and INLA <2017 06 20 03 42 30 UTC>. But we have experienced numerical issues with running INLA under Windows 10. In that case, please use a Unix build instead.

require(INLA)
## Loading required package: INLA
## Loading required package: sp
## Loading required package: Matrix
## Loading required package: splines
## This is INLA 0.0-1463562937, dated 2016-05-18 (11:01:03+0200).
INLA:::inla.dynload.workaround()
## See www.r-inla.org/contact-us for how to get help.
source("zipbk_inla_15b.R")
## Loading required package: FNN
source("rzi_3.R")
 
source("ma_el_5c.R")

Next load the example data and define some covariates to build the calibration.

wds <- read.csv(file="wds_sim_090716.csv",head=T)
 
wds$STRATUM <- factor(wds$STRATUM) ## define stratum variable
 
wds$t_POINT_Y <- scale(wds$POINT_Y) ## scale trend for numerical stability
 
wds$t_DEPTH_B <- scale(wds$DEPTH_B) ## scale bottom detph for numerical stability
 
l <- split(wds,wds$YEAR_)

The variable ysim is the simulated Catch Per Unit Effort. The simulated data contain three years, and we are using only the first year’s data in this example.

Also load the study area and the geographic extent.

# load grid definition from Chesapeake Bay Program
 
grid <- read.csv(file="cbp_grid_090716.csv",head=T)
 
grid$STRATUM <- factor(grid$STRATUM) ## define stratum variable
 
grid$t_POINT_Y <- scale(grid$POINT_Y) ## scale trend for numerical stability
 
grid$t_DEPTH_B <- scale(grid$DEPTH_B) ## scale bottom detph for numerical stability
 
grid$Dummy <- 1 ## no segment, bay-wide estimation
 
 
 
# load shoreline boundary
 
load(file="cbp_boundary_090716.rda")

Now we are ready to start to build the auxiliary variable based on a calibration model. We start with a zero-adjusted negative binomial model. Note this may take a while.

r0 <- zipbk.inla(
 
  query=grid,## study area data frame
 
  obs=l[[1]],## observed CPUE data frame
 
  fixed=~t_POINT_Y+t_DEPTH_B+STRATUM, ## fixed effect formula
 
  random=~1, ## random effect s=spde, e= over-dispersion
 
  family="zeroinflatednbinomial0", ## family for positive catch component
 
  xcoord='POINT_X',ycoord='POINT_Y', ## geographic coordinates
 
  wcoord="AREA", ## weight for each cell in the study area
 
  bcoord='Dummy', ## factor defining the sub-areas
 
  zcoord="ysim",  ## response variable
 
  #################################################
 
  ## Arugments for mesh
 
  #################################################
 
  max.edge=c(5000,1000000), ## maximum allowed triangle edge length in the inner domain and outer extension
 
  cutoff=1000, ## minimum allowed distance between points
 
  boundary=segm.bnd, ## physical boundary
 
  mesh=NULL, ## previously built mesh object (if available)
 
  ################################
 
  predict=T, ## whether generalize model to all study area
 
  verbose=F, ## whether showing some debug message
 
  control=list(nsamples=399) ## number of approximate MCMC samples
 
)

The model can be used to predict response over the entire sampling frame, and this prediction serves as the auxiliary variable. We first use the posterior median as the point estimate and calibrate the sample based on this point estimate.

fit.cpue <- apply(r0$predSample,1,median)
 
r1 <- ma.el(
 
  x=l[[1]],
 
  grid=grid,
 
  y=fit.cpue,
 
  formula=ysim~POINT_X+POINT_Y,
 
  gcoord=NULL,
 
  FUN=median)
 
print(r1)
##             xbar               se        el.fit.el   el.pt.se.se.el 
 
##        4.8365931        0.2571405        5.9304326        0.2455191 
 
##          el.mcmc el.mcmc.model.se    el.mcmc.pi.se       el.mcmc.se 
 
##               NA               NA               NA               NA

We can also calibrate the sample according to the entire posterior predictive distribution, which will give us variance estimates due to variability of the frames and the sampling.

## calibrate to full posterior sample
 
r2 <- ma.el(
 
  x=l[[1]],
 
  grid=grid,
 
  y=r0$predSample,
 
  formula=ysim~POINT_X+POINT_Y,
 
  gcoord=NULL,
 
  FUN=median)
 
print(r2)
##             xbar               se        el.fit.el   el.pt.se.se.el 
 
##        4.8365931        0.2571405        5.9304326        0.2455191 
 
##          el.mcmc el.mcmc.model.se    el.mcmc.pi.se       el.mcmc.se 
 
##        5.0371770        0.1170559        0.2551890        0.2807553

Now we try a zero-adjusted, spatial model to build auxiliary data.

# Start calibration for first year
 
r <- zipbk.inla(
 
  query=grid,## study area data frame
 
  obs=l[[1]],## observed CPUE data frame
 
  fixed=~t_POINT_Y+t_DEPTH_B+STRATUM, ## fixed effect formula
 
  random=~s+e, ## random effect s=spde, e= over-dispersion
 
  family="zeroinflatedpoisson0", ## family for positive catch component
 
  xcoord='POINT_X',ycoord='POINT_Y', ## geographic coordinates
 
  wcoord="AREA", ## weight for each cell in the study area
 
  bcoord='Dummy', ## factor defining the sub-areas
 
  zcoord="ysim",  ## response variable
 
  #################################################
 
  ## Arugments for mesh
 
  #################################################
 
  max.edge=c(5000,1000000), ## maximum allowed triangle edge length in the inner domain and outer extension
 
  cutoff=1000, ## minimum allowed distance between points
 
  boundary=segm.bnd, ## physical boundary
 
  mesh=NULL, ## previously built mesh object (if available)
 
  #################################################
 
  ## Addition arugments for spde prior
 
  #################################################
 
  prior.variance.nominal=1,  ##nominal prior variance for the "SPDE" field
 
  prior.range.nominal=NULL,  ##nominal prior range for the "SPDE" field
 
  theta.prior.mean=c(0,0), ## mean for mvnorm prior distribution of "SPDE" sill and range
 
  theta.prior.prec=c(0.1,0.1), ##precision for independent mvnorm prior distribution of "SPDE" sill and range
 
  unstruct.hyper= 'list(theta=list(prior="loggamma",param=c(0.01,0.01)))', ## prior for scale parameter for over-dispersion random effect
 
  ################################
 
  predict=T, ## whether generalize model to all study area
 
  verbose=F, ## whether showing some debug message
 
  control=list(nsamples=399) ## number of approximate MCMC samples
 
)

We can then calibrate the sample using the more complex spatial model.

r4 <- ma.el(
 
  x=l[[1]],
 
  grid=grid,
 
  y=r$predSample,
 
  formula=ysim~POINT_X+POINT_Y,
 
  gcoord=NULL,
 
  FUN=median)
 
print(r4)
##             xbar               se        el.fit.el   el.pt.se.se.el 
 
##        4.8365931        0.2571405        6.7076741        0.1579340 
 
##          el.mcmc el.mcmc.model.se    el.mcmc.pi.se       el.mcmc.se 
 
##        5.7522776        0.6380425        0.2263144        0.6769908

Any posterior samples can be used in place of r$predSample for calibration.

References

Chen, J., Thompson, M. E., & Wu, C. (2004). Estimation of fish abundance indices based on scientific research trawl surveys. Biometrics, 60(1), 116-123.

Liang, D., Nasslage, G., Wilberg, M., Miller, T.A. (2017). Bayesian Calibration of Blue Crab (Callinectes sapidus) Abundance Indices Based on Probability Surveys. Journal of Agricultural, Biological, and Environmental Statistics. DOI: 10.1007/s13253-017-0295-4

Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319-392.

Wilberg, M. J., J. T. Thorson, B. C. Linton, and J. Berkson. 2010. Incorporating time-varying catchability into population dynamic stock assessment models. Reviews in Fisheries Science 18:7-24.