Research

Some of our past and present research covers the following topics:

General software for hierarchical statistical models and algorithms

Hierarchical statistical models have become a very widely used approach for incorporating multiple sources of variation into statistical models and customizing models to the particular design and scientific issues of each research project. Some very successful software packages have made available specific flavors of algorithms such as Markov chain Monte Carlo for estimating such models, but there are many more statistical approaches and algorithms (including different MCMC flavors) that are not readily accessible in software for general models. We created NIMBLE to combine flexible model specification with a system embedded within R for programming general algorithms for such models. For more information, please see the NIMBLE web page and our paper in the Journal of Computational and Graphical Statistics.

Computational statistical methods for ecological problems

In many cases, we know the kinds of statistical estimation and analysis methods we’d like to use for a particular problem, but accomplishing the computation for them is a real challenge. Some problems we’ve worked on include:

Efficiently embedding hidden Markov models into more general statistical models

Hidden Markov models (HMMs) provide a generalization used for some common kinds of population data such as capture-recapture and occupancy data. However, embedding HMMs into models with other kinds of data or explanatory components has often required cumbersome MCMC methods for estimation in practice. We showed how to harness the flexibility of NIMBLE to gain much higher efficiency in some cases.

Statistical methods for stage-structured time-series data

Many organisms such as arthropods can easily be categorized by distinct life stages.  Population time-series then consist of estimates of abundance of each life stage at multiple times. The most common stage-structured models are Lefkovitch matrix models, but these are not very realistic.  For example, they can’t describe the development of an isolated cohort very well. Other approaches model the time-lags associated with organismal development but lack flexibility or realism in other ways. We sought to improve upon mathematical and statistical modeling methods for such stage-structured time-series data.  Some of the results included methods for development of unmarked cohorts (also this), of marked cohorts, and of unmarked cohorts with some reproduction.

Estimating models of tree population dynamics from heterogeneous monitoring data

Models of forest dynamics are needed to address many environmental challenges, including wildlife conservation, sustainable forest management, and the role of forests as carbon sources or sinks in relation to climate change.  A great deal of long-term tree data takes the form of intermittent population surveys and size measurements of marked individuals, all of which may be taken in different years for different plots, not to mention other complications.  We sought to tie together such heterogeneous monitoring data using hierarchical statistical models in order to estimate tree population dynamics while accounting for the multiple sources of variation in the data.  Some of the results include estimation of tree survival models and tree growth models.

Application of sequential Monte Carlo methods (aka particle filters) for estimating population models

A general goal of statistical methodology for population time-series data is to be able to formulate realistic models — including realistic sources of variation in the ecological processes and the data sampling, i.e. state-space models — and fit them to data in order to make statistical comparisons among alternative models.  For all but the simplest state-space models, we know how to write down the calculations we would like to do (the likelihood) for model fitting, but doing so requires computational methods.  Sequential Monte Carlo methods are appealing because they require only that one can simulate from the model of interest, but there are numerous challenges to bringing them into everyday use. Some work on this problem for population models is here.

Some older general work on statistical methods

I’ve had a long-standing interest in problems like maximum likelihood estimation (and a review paper) and approximation of normalizing constants for general hierarchical models.

Empirical and theoretical population dynamics

Why and how do populations fluctuate as they do? This has the been the question that got me into ecology and statistics in the first place. Some projects on this topic in recent years have included:

What is the role of individual heterogeneity in stage-structured population dyanmics?

Ecologists and evolutionary biologists are profoundly interested in how individuals differ and how such differences impact population dynamics and evolution. We’ve explored the consequences specifically of individual heterogeneity in development and how to model such cases.

Macro-ecology of population dynamics with the Global Population Dynamics Database

The GPDD is a collection of over 5000 time-series of population abundance data from all manner of studies.  It has allowed ecologists to look for general patterns in amount of overall variation, shape and strength of density-dependence and population regulation, cycles, and more.  To be sure, the GPDD has its limitations.  Data are of varying quality and length, come from different sampling protocols, and are not at all random in geographic or taxonomic coverage.  Typical GPDD studies use only 5-20% of the time-series.  Still, it is the best we’ve got and has been an important resource.  We investigated how far local climate variables can go in explaining population fluctuations for a typical population time-series, and the answer seems to be “not very far” (see Knape and de Valpine, 2011). We also investigated the impact of measurement error on estimates of density-dependence from the GPDD.

Other topics

I end up involved in a variety of other interesting topics. These include methods for compositional data analysis such as from animal resource use, meta-analysis of agricultural systems, methods for wildlife data with telemetry, camera-trap, and occupancy data (and a study design paper), and general comments on statistical practices in ecology (this and this).