Importance of data selection and filtering in species distribution models - A case study on the Cantabrian brown bear


Species distribution models (SDMs) are powerful tools in ecology and conservation. Choosing the right environmental drivers and filtering species’ occurrences taking their biases into account are key factors to consider before modeling. In this case study, we address five common problems arising during the selection of input data for presence-only SDMs on an example of a generalist species, the endangered Cantabrian brown bear. First, we focus on the selection of environmental variables that may drive its distribution, testing if climatic variables should be considered at a 1-km analysis grain. Second, we investigate how filtering the species’ data in view of (1) their collection procedures, (2) different time frames, (3) dispersal areas, and (4) subpopulations affects the performance and outputs of the models at three different spatial analysis grains (500 m, 1 km, and 5 km). Our results show that models with different input data yielded only minor differences in performance and behaved properly in terms of model validation, although coarsening the analysis grain deteriorated model performance. Still, the contribution of individual variables and the habitat suitability predictions differed among models. We show that a combination of limited data availability and poor selection of environmental variables can lead to inaccurate predictions. Specifically for the brown bear, we conclude that climatic variables should not be considered for exploring habitat suitability and that the best input data for modeling habitat suitability in the study area originate from (1) observations and traces from the (2) most recent period (2006–2019) in which the population is expanding, (3) not considering cells of dispersing bear occurrences and (4) modeling subpopulations independently (as they show distinct habitat preferences). In conclusion, SDMs can serve as a useful tool for generalist species including all available data; still, expert evaluation from the perspective of data suitability for the purpose of modeling and possible biases is recommended. This is especially important when the results are intended for management and conservation purposes at the local level, and for species that respond to the environment at coarse analysis grains.

Ecosphere 13(12), e4284
Florencia Grattarola
Florencia Grattarola
Postdoc Researcher

Uruguayan biologist doing research in macroecology and biodiversity informatics.