Tutorial on fitting the presence-absence "Mixture model" in R
David Dail, October 2011
This tutorial demonstrates site occupancy estimation using the Mixture model as described
by Dail and Madsen, Biometrics, 2011. It largely follows the form of the tutorials provided
J.A. Royle and by D. Dail for fitting the Generalized N-mixture models for repeated point
counts, closed and open populations, respectively.
The tutorial for closed populations is given in the supplemental appendix of Kerry, Royle,
Schmid, Modeling avian abundance from replicated counts using binomial mixture models,
Ecological Applications, 2005. Likewise, the tutorial for open populations is given in the
Web Appendix of Dail and Madsen, Models for estimating abundance from repeated counts of an
open metapopulation, Biometrics, 2010.
This tutorial assumes the reader has some familiarity with R.
The R function "PA_nll" takes a while to run, so the time required for finding MLEs will be
reduced by working on a 64 bit computer (though this is not required).
A command intended to be entered as R code into the R console is depicted by the ">", with
commentary between the commands describing model selection with the Mix model.
First, load the R file Supplement_Files.R into an active R session, using the command
> source("/Supplement_Files.R")
where is the complete path name of the directory containing the file
Supplement_Files.R. There is only one file contained in this archive, verified using the
command:
> ls()
The file contained there is an R function, "PA_nll", which returns the negative log
likelihood of an observed data matrix.
This function requires the logit and reverse-logit transform, which are built by the
following commands:
> expit = function(x) { return((exp(x))/(1+exp(x))) }
> logit = function(x) { return(log(x/(1-x))) }
Next, load the matrix of observed data. The data set we will use here is described in
Section 5 of Dail and Madsen, 2011. It consists of 6 years of presence-absence observations
of American Robins from 50 sites along a transect of the BBS. We first will load the point
counts collected in this study:
> R=50
> T=6
> n.vec = c(3,2,1,2,2,3,1,0,2,0,0,1,0,1,1,2,1,5,1,2,2,1,0,1,1,0,1,1,0,2,0,1,3,3,
1,0,1,0,1,1,0,2,0,1,0,1,2,0,0,1,5,1,3,3,4,3,2,2,1,2,2,2,1,0,1,0,2,4,3,2,0,2,
0,1,1,0,3,0,2,2,0,2,2,1,0,0,1,1,0,0,0,0,0,1,0,1,1,0,0,2,3,2,1,2,2,2,1,2,1,0,
2,0,0,0,0,1,0,0,3,1,2,1,1,0,1,0,0,2,0,0,0,1,1,0,1,0,2,1,1,0,1,1,1,1,1,0,1,0,
0,1,2,3,2,1,2,2,0,0,1,0,4,0,0,1,0,1,3,3,1,0,2,0,0,1,0,2,1,1,0,1,1,2,0,1,1,1,
0,0,0,1,0,0,1,2,2,1,0,0,0,2,2,2,1,2,1,2,0,0,1,0,4,0,0,1,0,1,3,3,1,0,2,0,0,1,
0,2,1,1,0,1,1,2,0,1,1,1,0,0,0,1,0,0,1,2,2,1,0,0,0,2,2,1,2,1,1,3,1,2,0,1,0,0,
1,0,0,1,2,0,1,2,2,2,0,1,0,2,0,2,4,1,1,0,2,0,1,1,1,0,0,0,1,0,1,0,0,0,1,0,1,0)
> n.mat = matrix(n.vec,nrow=R)
We only need presence-absence observations to fit the Mix model, so observations should be
truncated at 1 using the following commands:
> X.obs=matrix(0,nrow=dim(n.mat)[1],ncol=dim(n.mat)[2])
> for(i in 1:dim(n.mat)[1]){
> for(j in 1:dim(n.mat)[2]){
> X.obs[i,j]=min(1,n.mat[i,j])
> }
> }
Note that if point counts are available, it is better to use the Generalized model of Dail
and Madsen (2011). The Mix model is to be used in settings where point counts are not
available. In this case, we use the Generalized model with point counts as a standard to
compare the Mix model against.
The optimization routine optim() will be used to find the MLEs of each model. The search
method can be specified:
> Method = "BFGS"
More information related to optim() can be found by issuing the command
> help(optim)
A list of the required inputs for "nmix.mig" is given below, along with a brief description:
"vars" is a vector giving the logit-transformed parameter values. In general, this order is
(p, Psi_1, gamma, epsilon). If p.time = TRUE then there will be T parameters for p; likewise
if gamma.time = TRUE and/or epsilon.time=TRUE then there will be T-1 parameters for gamma
and/or epsilon. If closed=TRUE, only p and psi_1 are needed, as epsilon and gamma are
assumed to be equal to 0.
"X" is the matrix of presence-absence observations, with sites corresponding to rows, and
time (e.g., season) along the columns.
"p.lin" indicates whether detection probability is constant (p.lin=FALSE), the default
value; or if detection probability varies with season according to the logit-linear model,
log(p.t/(1-p.t)) = vars[1] + t*vars[2], with t=1,2,...,T.
"gamma.lin" is analogous to p.lin, but with t=1,...,T-1 if gamma.lin = TRUE.
"epsilon.lin" is analogous to gamma.lin.
"closed" gives an indication of whether values for gamma and epsilon are included (closed =
FALSE), the default value; or if gamma and epsilon are assumed equal to 0 (closed = TRUE).
Now, we can start fitting the Mix model. The first model we will consider is the closed
population model, which assumes that both epsilon and gamma both equal 0. We will use the
vector (0,0) as starting values for (logit p, logit Psi_1). This is done with the following
commands:
> hold.closed = optim(c(0,0), PA_nll, method="BFGS", hessian=TRUE, X=X.obs, closed=TRUE)
It is a good idea to verify that the optim() procedure converged:
> hold.closed$conv
This will return "0" if the optim() is satisfied that it found the maximum of the likelihood
(actually, optim() finds the minimum of the negative log likelihood).
The MLEs are obtained by the command:
> hold.closed$par
Because of a transformation that makes optim() run more smoothly, this gives the MLE for
logit (p) and logit(Psi_1). The back-transformed MLEs for p and Psi_1 can be obtained by
the following commands:
> p.est = expit(hold.closed$par[1])
> Psi_1.est = expit(hold.closed$par[2])
> c(p.est, Psi_1.est)
The asymptotic standard errors of the parameter estimates can be calculated with the Hessian
matrix, and can be found with the following command. Note that the standard errors reported
in the paper are obtained via bootstrap resampling, as opposed to the delta method employed
here. The reason for this is that Bootstrap resampling can accommodate model averaging.
Since the delta method is much faster, and we won't implement model averaging in this
tutorial, the delta method is used.
> se = sqrt(diag(solve(hold.closed$hess)))
> se #the (asymptotic) standard errors
Then, an asymptotic 95% confidence interval (e.g., for p) can be found using:
> ci.p = expit( hold.closed$par[1]+c(-1.96,1.96)*se[1])
> ci.p
As mentioned earlier, optim() is minimizing the negative log likelihood (nll). The minimum
value of the nll can be retrieved by the command:
> nll = hold.closed$val
With this, the AIC score for the closed model can be calculated:
> aic = 2*nll + 2*length(hold.closed$par)
Now, we can use the MLEs from the closed population model as starting values for the open
population model. Typically, though, it's a good idea to use several starting values in
non-linear optimization, to help ensure the reported MLE is in fact the global maximum.
Therefore, we'll run a second optimization routine with starting value (0,0,0,0).
> vars.1 = c(hold.closed$par, logit(.001), logit(.001))
> vars.2 = c(0,0,0,0)
> PA.est.1 = optim(vars.1, PA_nll, method="BFGS", hessian=TRUE, X=X.obs)
> PA.est.2 = optim(vars.2, PA_nll, method="BFGS", hessian=TRUE, X=X.obs)
First, verify that both models converged:
> PA.est.1$conv
> PA.est.2$conv
Next, check to see whether the maximum likelihood values and MLEs are the same for both sets
of starting values. Both of these values should be near 0 (if not, then perhaps try
additional starting values to verify the MLE is obtained):
> PA.est.1$val - PA.est.2$val ## should be -4.095182e-07
> sum( abs(PA.est.1$par - PA.est.2$par)) ## should be 0.001978129
We will take as the MLE the parameter values that gave the smaller minimized nll value,
although there is practically no difference between the two:
> if(PA.est.1$val<=PA.est.2$val) model = PA.est.1
> if(PA.est.1$val> PA.est.2$val) model = PA.est.2
Below is code for the Likelihood Ratio closure test, given by Self and Liang (1987).
> t.stat= 2*(hold.closed$val - model$val)
> obs.inf = model$hess
> obs.inf.nn = obs.inf[c(1,2),c(1,2)]
> obs.inf.np = obs.inf[c(1,2),c(3,4)]
> obs.inf.pn = obs.inf[c(3,4),c(1,2)]
> obs.inf.pp = obs.inf[c(3,4),c(3,4)]
> I.tilda = obs.inf.pp - obs.inf.pn%*%solve(obs.inf.nn)%*%obs.inf.np
> prop = acos(max(-1,min(1,I.tilda[1,2]/(sqrt(abs(I.tilda[1,1]*I.tilda[2,2]))))))/(2*pi)
> prop.0 = 0.5 - prop
> prop.1 = .5
> prop.2 = prop
> p.value.closed = prop.0*(0) + prop.1*(1-pchisq(t.stat,1)) + prop.2*(1-pchisq(t.stat,2))
> p.value.closed ## should be 0.0007652851
This p-value is low (<0.001), giving strong evidence against population closure.
We again obtain parameter estimates by back-transformation. The estimates of p, Psi_1, gamma
and epsilon are obtained with:
> ests.1=expit(model$par)
> ests.1 # should be [1] 0.7905831 0.8940731 0.3604662 0.1233949
These can be used to get estimates of Psi_t.
> ests.1.p = rep(ests.1[1],T)
> ests.1.gamma=c(NA,rep(ests.1[3],T-1))
> ests.1.eps=c(NA,rep(ests.1[4],T-1))
> PA.Psi.1 = numeric(T)
> PA.Psi.1[1] = ests.1[2]
> for(t in 2:T){
> PA.Psi.1[t]=PA.Psi.1[t-1]*(1-ests.1.eps[t])+(1-PA.Psi.1[t-1])*ests.1.gamma[t]
> }
> PA.Psi.1 # should be [1] 0.8940731 0.8219321 0.7846973 0.7654790 0.7555597 0.7504400
Since the actual site occupancy rates are unknown, the target values are the estimates given
by the best-fitting Generalized model, which allows the point counts as observed data.
> Psi.targets = c(0.9003937, 0.8777094, 0.8587837, 0.8437891, 0.8323330, 0.8238011)
The squared Euclidean distance between PA.Psi.1 and Psi.targets gives a measure of how close
the two estimates are:
> ssd_1 = sum( (PA.Psi.1-Psi.targets)^2)
And, for further model selection, we will hold onto the AIC value:
> aic_1 = 2*model$val + 2*length(model$par)
> num_pars_1 = length(model$par)
> ans_1 = c(num_pars_1,aic_1, ssd_1)
> ans_1 # shuld be [1] 4.0000000 389.5034371 0.0260483
Next, we will fit the fuller models, which allow p, gamma, and/or epsilon to vary seasonally
according to a logit-linear model. We can use the model with constant parameter estimates
as starting values, with a slope of 0.
In this tutorial we will only investigate one model with time-varying parameters, in
particular model gamma(t)epsilon(.)p(.).
First, we will get the starting values, using a slope of 0 as the linear term for gamma:
> vars.gamma = c(model$par[1:3],0,model$par[4])
Then, the MLEs are obtained in the same way as before, but including gamma.lin=TRUE:
> mix.gamma = optim(vars.gamma, PA_nll, method="BFGS", hessian=TRUE, X=X.obs, gamma.lin=TRUE)
Next we get the parameter estimates:
> mix.gamma_gamma.t = expit(mix.gamma$par[3] + mix.gamma$par[4]*(1:5) )
> mix.gamma_epsilon.t = expit(mix.gamma$par[5])
> mix.gamma_p.t = expit(mix.gamma$par[1])
> c(mix.gamma_gamma.t, mix.gamma_epsilon.t, mix.gamma_p.t)
## should be [1] 0.5233577 0.4673090 0.4120734 0.3589671 0.3091065 0.1446174 0.8146712
And again, we will get estimates of Psi_t:
> Psi_gamma = numeric(T)
> Psi_gamma[1]=expit(mix.gamma$par[2])
> for(t in 2:T){
> Psi_gamma[t] = Psi_gamma[t-1]*(1-mix.gamma_epsilon.t) +
> (1-Psi_gamma[t-1])*mix.gamma_gamma.t[t-1]
> }
> Psi_gamma # should be 0.8585912 0.8084314 0.7810399 0.7583156 0.7354067 0.7108416
The squared Euclidean distance from these estimates to those from the Generalized model is
given by:
> ssd_gamma = sum( (Psi_gamma-Psi.targets)^2)
Then, we hold onto the number of parameters, AIC value, and this distance value:
> aic_gamma = 2*mix.gamma$val + 2*length(mix.gamma$par)
> num_pars_gamma = length(mix.gamma$par)
> ans_gamma = c(num_pars_gamma, aic_gamma, ssd_gamma)
> ans_gamma # should be: [1] 5.00000000 391.28455340 0.04205129
Note that the best-fitting model for these data has either gamma and p varying
(logit-linearly) with season [model gamma(t)epsilon(.)p(t)] or gamma, epsilon, and p varying
(logit-linearly) with season [model gamma(t)epsilon(t)p(t)]. Model averaging yields model
gamma(A)epsilon(A)p(A), as described in the paper.
Here, we will proceed with the tutorial usiong our model with only gamma varying
(logit-linearly) with season [model gamma(t)epsilon(.)p(.)].
We can construct the tau and rho estimates using our parameter MLEs:
> tau.t=numeric(T)
> tau.t[1]=NA
> for(t in 2:T){
> tau.t[t] = Psi_gamma[t-1]*(1-mix.gamma_epsilon.t)/Psi_gamma[t]
> }
> tau.t # should be [1] NA 0.9084556 0.8853813 0.8810157 0.8820289 0.8849427
> rho.t=numeric(T)
> rho.t[T]=NA
> for(t in 1:(T-1)){
> rho.t[t] = (Psi_gamma[t+1]*mix.gamma_gamma.t[t+1] + Psi_gamma[t]*mix.gamma_epsilon.t)/
(Psi_gamma[t]+Psi_gamma[t+1])
> }
> rho.t # should be [1] 0.3282895 0.3031827 0.2763713 0.2501485 0.2254650 NA
Simarly, we can do the same for site structure estimates:
> mix.gamma_gamma.vec = c(NA, mix.gamma_gamma.t)
> pp = c(Psi_gamma*mix.gamma_gamma.vec*(1-mix.gamma_epsilon.t))
> np = c(Psi_gamma*(1-mix.gamma_gamma.vec)*(1-mix.gamma_epsilon.t))
> pn = c(Psi_gamma*mix.gamma_gamma.vec*(mix.gamma_epsilon.t))
> nn = c(Psi_gamma*(1-mix.gamma_gamma.vec)*(mix.gamma_epsilon.t))
The following lines construct a nice matrix holding these results:
> mix.ans= round(rbind(Psi_gamma, tau.t, rho.t, pp, np, pn, nn), 3)
> colnames(mix.ans)=c("2003","2004","2005","2006","2007","2008")
> rownames(mix.ans)=c("Psi_t", "tau_t", "rho_t", "+Psi+",
"-Psi+", "+Psi-", "-Psi-")
> mix.ans