AYATA Sakina-Dorothee
- LOCEAN, Sorbonne Université, Paris, France
- Biodiversity, Biogeography, Ecosystem functioning, Macroecology, Marine ecology, Species distributions, Statistical ecology, Theoretical ecology
- recommender
Recommendations: 0
Review: 1
Review: 1
Predicting species distributions in the open ocean with convolutional neural networks
The potential of Convolutional Neural Networks for modeling species distributions
Recommended by François Munoz based on reviews by Jean-Olivier Irisson, Sakina-Dorothee Ayata and 1 anonymous reviewerMorand et al. (2024) designed convolutional neural networks to predict the occurrences of 38 marine animals worldwide. The environmental predictors were sea surface temperature, chlorophyll concentration, salinity and fifteen others. The time of some of the predictors was chosen to be as close as possible to the time of the observed occurrence.
This approach has previously only been applied to the analysis of the distribution of terrestrial plant species (Botella et al. 2018, Deneu et al. 2021), so the application here to very different marine ecosystems and organisms is a novelty worth highlighting and discussing.
A very interesting feature of PCI Ecology is that reviews are provided with the final manuscript and the present recommendation text.
In the case of the Morand et al. article, the reviewers provided very detailed and insightful comments that deserve to be published and read alongside the article.
The reviewers' comments question the ecological significance and implications of choosing fine temporal and spatial scales in CNN distribution modelling in order to obtain species distribution modelling (SDM).
The main question debated during the review process was whether the CNN modeling approach used here can be defined as a kind of niche modeling.
The fact is that most of the organisms studied here are mobile, and the authors have taken into account precise environmental information at dates close to those of species appearance (for example, "Temperature and chlorophyll values were also included 15 and 5 days before the occurrences"). In doing so, they took into account the fine spatial and temporal scales of species occurrences and environmental conditions, which can be influenced by both environmental preferences and the movement behaviors of individuals. The question then arises: does this approach really represent the ecological niches of the marine organisms selected? Given that most selected organisms may have specific seasonal movement dynamics, the CNN model also learns the individual movement behaviors of organisms over seasons and years. The ecological niche is a broader concept that takes into account all the environmental conditions that enable species to persist over the course of their lives and over generations. This differs from the case of sessile land plants, which must respond to the environmental context only at the points of appearance.
This is not a shortcoming of the methodology proposed here but rather an interesting conceptual issue to be considered and discussed. Modelling the occurrence of individuals at a given time and position can characterize not only the species' niche but also the dynamics of organisms' temporal movements. As a result, the model predicts the position of individuals at a given time, while the niche should also represent the role of environmental conditions faced by individuals at other times in their lives.
A relevant perspective would then be to analyze whether and how the neural network can help disentangle the ranges of environmental conditions defining the niche from those influencing the movement dynamics of individuals.
Another interesting point is that the CNN model is used here as a multi-species classifier, meaning that it provides the ranked probability that a given observation corresponds to one of the 38 species considered in the study, depending on the environmental conditions at the location and time of the observation. In other words, the model provides the relative chance of choosing each of the 38 species at a given time and place. Imagine that you are only studying two species that have exactly the same niche, a standard SDM approach should provide a high probability of occurrence close to 1 in localities where environmental conditions are very and equally suited to both species, while the CNN classifier would provide a value close to 0.5 for both species, meaning that we have an equal chance of choosing one or the other. Consequently, the fact that the probability given by the classifier is higher for a species at a given point than at another point does not (necessarily) mean that the first point presents better environmental conditions for that species but rather that we are more likely to choose it over one of the other species at this point than at another. In fact, the classification task also reflects whether the other 37 species are more or less likely to be found at each point. The classifier, therefore, does not provide the relative probability of occurrence of a species in space but rather a relative chance of finding it instead of one of the other 37 species at each point of space and time.
It is important that an ecologist designing a multi-species classifier for species distribution modelling is well aware of this point and does not interpret the variation of probabilities for a species in space as an indication of more or less suitable habitat for that specific species. On the other hand, predicting the relative probabilities of finding species to a given point at a given time gives an indication of the dynamics of their local co-occurrence. In this respect, the CNN approach is closer to a joint species distribution model (jSDM). As Ovaskainen et al. (2017) mention, "By simultaneously drawing on the information from multiple species, these (jSDM) models allow one to seek community-level patterns in how species respond to their environment". Let's return to the two species example we used above. The fact that the probabilities are 0.5 for both species actually suggests that both species can coexist at the same abundance at this location. In this respect, the CNN multi-species classifier offers promising prospects for the prediction of assemblages and habitats thanks to the relative importance of the most characteristic/dominant species from a species pool. The species pool comprises all classified species and must be sufficiently representative of the ecological diversity of species niches in the area.
Finally, CNN-based species distribution modelling is a powerful and promising tool for studying the distributions of multi-species assemblages as a function of local environmental features but also of the spatial heterogeneity of each feature around the observation point in space and time (Deneu et al. 2021). It allows acknowledging the complex effects of environmental predictors and the roles of their spatial and temporal heterogeneity through the convolution operations performed in the neural network. As more and more computationally intensive tools become available, and as more and more environmental data becomes available at finer and finer temporal and spatial scales, the CNN approach is likely to be increasingly used to study biodiversity patterns across spatial and temporal scales.
References
Botella, C., Joly, A., Bonnet, P., Monestiez, P., and Munoz, F. (2018). Species distribution modeling based on the automated identification of citizen observations. Applications in Plant Sciences, 6(2), e1029. https://doi.org/10.1002/aps3.1029
Deneu, B., Servajean, M., Bonnet, P., Botella, C., Munoz, F., and Joly, A. (2021). Convolutional neural networks improve species distribution modelling by capturing the spatial structure of the environment. PLoS Computational Biology, 17(4), e1008856. https://doi.org/10.1371/journal.pcbi.1008856
Morand, G., Joly, A., Rouyer, T., Lorieul, T., and Barde, J. (2024) Predicting species distributions in the open ocean with convolutional neural networks. bioRxiv, ver.3 peer-reviewed and recommended by PCI Ecology https://doi.org/10.1101/2023.08.11.551418
Ovaskainen, O., Tikhonov, G., Norberg, A., Guillaume Blanchet, F., Duan, L., Dunson, D., ... and Abrego, N. (2017). How to make more out of community data? A conceptual framework and its implementation as models and software. Ecology letters, 20(5), 561-576. https://doi.org/10.1111/ele.12757