From mathematical models to statistical models

As is true in every branch of science, most ecological theories have been formalized as mathematical models. These models are often used to make predictions that are tested in the lab or sometimes in the field, and this is one way that the theoretical foundation of ecology develops.

As important as these mathematical models are, it seems to me that they are not routinely fitted to data and used for statistical inference, which is important in applied settings. In fact, I would guess that most theoretical models in ecology have never seen the light of day, in the sense that they have’t been fitted to data from field studies. This is a wild speculation and it may prove wrong, but the point of this post is to suggest that one of the most exciting trends in ecology these days is due to the emergence of statisticians that are providing the necessary tools to fit models from theoretical ecology to field data. As a result, applied ecologists like myself can make rigorous predictions about things like population viability under different environmental scenarios, without resorting to various forms of ad-hockery.1

I have already written about one example of the conversion of mathematical models to statistical models — the case of spatially explicit metapopulation models. Metapopulation models are great when populations occur in discrete patches, and the development of occupancy models has made it possible to fit these models to real data sets fraught with various forms of observation error. But what mathematical models are available when individuals are distributed continuously in space? One option came up in a great seminar that I attended yesterday by Tom Miller. He noted that, when time can be regarded as discrete, integrodifference equations are a powerful theoretical device, as has been demonstrated by mathematical ecologists such as Mark Kot. Integrodifference equations usually have the form:

\displaystyle n_{t+1}(x) = \int_{\cal{S}} f(n_t(y))k(\|x-y\|) \mathrm{d}y

where n_t(x) denotes the number of individuals at location x in some two-dimensional spatial region \cal{S} during time t. The number of individuals in the next time step is governed by (1) a local population growth model, f(n_t(y)), parameterized in terms of birth and death rates (or perhaps just a growth rate), and (2) a dispersal kernel k(\|x-y\|) that determines the probability of moving from y to x as a function of (usually Euclidean) distance \|x-y\|.

If you choose specific forms of f and k, you can begin analyzing the spatial and temporal dynamics of a theoretical population under different parameter settings. But can you fit integrodifference models directly to data? It turns out that you can, and one of the earliest papers I know of that did this was Lele et al. (1998). Although I can’t say I fully understand the estimating equation approach they used, the estimation problem essentially boils down to replacing n_t(x) with an expectation, say \mu_t(x), and then modeling the realized values of abundance (i.e., the data) as normally distributed. That sounds easy, but it turns out to be pretty difficult, especially when dealing with random effects arising from environmental and demographic stochasticity.

Notwithstanding the estimation problem, I regard the Lele et al. paper as a major step forward in efforts to make theory from spatial ecology more relevant in applied settings. However, as mentioned by the authors, their model requires abundance data from all locations in area of interest. In practice, only a subset of the area will be surveyed and even in the surveyed areas, many individuals will not be detected. Dealing with these issues in a spatially explicit, statistical framework such that the parameters of f and k can be estimated is an exciting area of ongoing research that has been fueled by recent extensions of spatially implicit N-mixture models (see here and here).

A related class of models that can be used to describe and predict spatial and temporal population dynamics is spatiotemporal point process models. Interestingly, this seems to be a case where statistical ecologists have made more progress than mathematical ecologists, but that may be changing. More importantly, when these models are coupled with an observation model, they are extremely powerful tools for studying both population and individual-level processes, as discussed in our book. Computation issues still remain, but hopefully new methods will continue to be developed to resolve these problems and make the methods, and the theory behind them, more accessible to practitioners.

1. By ad-hockery, I’m referring to what Caughley (1994) called “games played with guesses” when he was speaking of way that some ecologists were approaching population viability analysis.

This entry was posted in Hierarchical modeling, Spatial ecology, Uncategorized and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s