The R codes are based on the Liang et al. (2017) and [R-INLA packages] (http://www.r-inla.org/) to estimate total variance incorporating model-based and design-based variability. We model the variance components due to non-design factors such as varying catchability (Wilberg et al. 2010) using a zero-inflated spatial mixed model. Statistical inference was in an approximate Bayesian framework (Rue et al. 2009), and estimation conducted in a model calibration method using pseudo-empirical likelihood (Chen et al. 2004).
We create the R codes to assist fishery managers and researchers in analyzing spatially intensive fishery-independent surveys based on stratified sampling design, with sample size within each stratum proportional to the area. We also provides an example data simulated from the Winter Dredge Survey of blue crabs in the Chesapeake Bay, U.S. during 2001, 2006 and 2012.
The attached bc_2017.zip contains the following example data and helper codes:
3. cbp_boundary_0907.rda: a R object defining the boundary of the complex shoreline of the Chesapeake Bay
4. ma_el_5c.R: Helper functions to conduct empirical-likelihood based calibration for simple and stratified random design
5. rzi_3.R: Helper function to simulate from truncated Poisson distribution
6. zipbk_inla_15b.R:Helper function to fit zero-inflated model using R-INLA package
First extract the files into your working directory, and load the helper codes codes. The FNN package can be installed from CRAN (accessed August 2017), the INLA package can be downloaded from here (accessed August 2017). The following code was compiled under Windows 7 and the latest versions of R <3.4.1>and INLA <2017 06 20 03 42 30 UTC>. But we have experienced numerical issues with running INLA under Windows 10. In that case, please use a Unix build instead.
require(INLA)
## Loading required package: INLA
## Loading required package: sp
## Loading required package: Matrix
## Loading required package: splines
## This is INLA 0.0-1463562937, dated 2016-05-18 (11:01:03+0200).
INLA:::inla.dynload.workaround()
## See www.r-inla.org/contact-us for how to get help.
source(
"zipbk_inla_15b.R")
## Loading required package: FNN
source(
"rzi_3.R")
source(
"ma_el_5c.R")
Next load the example data and define some covariates to build the calibration.
wds <- read.csv(file="wds_sim_090716.csv",head=T)
wds$STRATUM <- factor(wds$STRATUM) ## define stratum variable
wds$t_POINT_Y <- scale(wds$POINT_Y) ## scale trend for numerical stability
wds$t_DEPTH_B <- scale(wds$DEPTH_B) ## scale bottom detph for numerical stability
l <- split(wds,wds$YEAR_)
The variable ysim is the simulated Catch Per Unit Effort. The simulated data contain three years, and we are using only the first year’s data in this example.
Also load the study area and the geographic extent.
# load grid definition from Chesapeake Bay Program
grid <- read.csv(file="cbp_grid_090716.csv",head=T)
grid$STRATUM <- factor(grid$STRATUM) ## define stratum variable
grid$t_POINT_Y <- scale(grid$POINT_Y) ## scale trend for numerical stability
grid$t_DEPTH_B <- scale(grid$DEPTH_B) ## scale bottom detph for numerical stability
grid$Dummy <- 1 ## no segment, bay-wide estimation
# load shoreline boundary
load(file="cbp_boundary_090716.rda")
Now we are ready to start to build the auxiliary variable based on a calibration model. We start with a zero-adjusted negative binomial model. Note this may take a while.
r0 <- zipbk.inla(
query=grid,## study area data frame
obs=l[[1]],## observed CPUE data frame
fixed=~t_POINT_Y+t_DEPTH_B+STRATUM, ## fixed effect formula
random=~1, ## random effect s=spde, e= over-dispersion
family="zeroinflatednbinomial0", ## family for positive catch component
xcoord='POINT_X',ycoord='POINT_Y', ## geographic coordinates
wcoord="AREA", ## weight for each cell in the study area
bcoord='Dummy', ## factor defining the sub-areas
zcoord="ysim", ## response variable
#################################################
## Arugments for mesh
#################################################
max.edge=c(5000,1000000), ## maximum allowed triangle edge length in the inner domain and outer extension
cutoff=1000, ## minimum allowed distance between points
boundary=segm.bnd, ## physical boundary
mesh=NULL, ## previously built mesh object (if available)
################################
predict=T, ## whether generalize model to all study area
verbose=F, ## whether showing some debug message
control=list(nsamples=399) ## number of approximate MCMC samples
)
The model can be used to predict response over the entire sampling frame, and this prediction serves as the auxiliary variable. We first use the posterior median as the point estimate and calibrate the sample based on this point estimate.
fit.cpue <- apply(r0$predSample,1,median)
r1 <- ma.el(
x=l[[1]],
grid=grid,
y=fit.cpue,
formula=ysim~POINT_X+POINT_Y,
gcoord=NULL,
FUN=median)
print(r1)
## xbar se el.fit.el el.pt.se.se.el
## 4.8365931 0.2571405 5.9304326 0.2455191
## el.mcmc el.mcmc.model.se el.mcmc.pi.se el.mcmc.se
## NA NA NA NA
We can also calibrate the sample according to the entire posterior predictive distribution, which will give us variance estimates due to variability of the frames and the sampling.
## calibrate to full posterior sample
r2 <- ma.el(
x=l[[1]],
grid=grid,
y=r0$predSample,
formula=ysim~POINT_X+POINT_Y,
gcoord=NULL,
FUN=median)
print(r2)
## xbar se el.fit.el el.pt.se.se.el
## 4.8365931 0.2571405 5.9304326 0.2455191
## el.mcmc el.mcmc.model.se el.mcmc.pi.se el.mcmc.se
## 5.0371770 0.1170559 0.2551890 0.2807553
Now we try a zero-adjusted, spatial model to build auxiliary data.
# Start calibration for first year
r <- zipbk.inla(
query=grid,## study area data frame
obs=l[[1]],## observed CPUE data frame
fixed=~t_POINT_Y+t_DEPTH_B+STRATUM, ## fixed effect formula
random=~s+e, ## random effect s=spde, e= over-dispersion
family="zeroinflatedpoisson0", ## family for positive catch component
xcoord='POINT_X',ycoord='POINT_Y', ## geographic coordinates
wcoord="AREA", ## weight for each cell in the study area
bcoord='Dummy', ## factor defining the sub-areas
zcoord="ysim", ## response variable
#################################################
## Arugments for mesh
#################################################
max.edge=c(5000,1000000), ## maximum allowed triangle edge length in the inner domain and outer extension
cutoff=1000, ## minimum allowed distance between points
boundary=segm.bnd, ## physical boundary
mesh=NULL, ## previously built mesh object (if available)
#################################################
## Addition arugments for spde prior
#################################################
prior.variance.nominal=1, ##nominal prior variance for the "SPDE" field
prior.range.nominal=NULL, ##nominal prior range for the "SPDE" field
theta.prior.mean=c(0,0), ## mean for mvnorm prior distribution of "SPDE" sill and range
theta.prior.prec=c(0.1,0.1), ##precision for independent mvnorm prior distribution of "SPDE" sill and range
unstruct.hyper= 'list(theta=list(prior="loggamma",param=c(0.01,0.01)))', ## prior for scale parameter for over-dispersion random effect
################################
predict=T, ## whether generalize model to all study area
verbose=F, ## whether showing some debug message
control=list(nsamples=399) ## number of approximate MCMC samples
)
We can then calibrate the sample using the more complex spatial model.
r4 <- ma.el(
x=l[[1]],
grid=grid,
y=r$predSample,
formula=ysim~POINT_X+POINT_Y,
gcoord=NULL,
FUN=median)
print(r4)
## xbar se el.fit.el el.pt.se.se.el
## 4.8365931 0.2571405 6.7076741 0.1579340
## el.mcmc el.mcmc.model.se el.mcmc.pi.se el.mcmc.se
## 5.7522776 0.6380425 0.2263144 0.6769908
Any posterior samples can be used in place of r$predSample for calibration.
Chen, J., Thompson, M. E., & Wu, C. (2004). Estimation of fish abundance indices based on scientific research trawl surveys. Biometrics, 60(1), 116-123.
Liang, D., Nasslage, G., Wilberg, M., Miller, T.A. (2017). Bayesian Calibration of Blue Crab (Callinectes sapidus) Abundance Indices Based on Probability Surveys. Journal of Agricultural, Biological, and Environmental Statistics. DOI: 10.1007/s13253-017-0295-4
Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319-392.
Wilberg, M. J., J. T. Thorson, B. C. Linton, and J. Berkson. 2010. Incorporating time-varying catchability into population dynamic stock assessment models. Reviews in Fisheries Science 18:7-24.