Citizen science contributes to SDM validation
How citizen science could improve Species Distribution Models and their independent assessment
Recommendation: posted 30 September 2020, validated 30 September 2020
Lloret, F. (2020) Citizen science contributes to SDM validation. Peer Community in Ecology, 100059. 10.24072/pci.ecology.100059
Citizen science is becoming an important piece for the acquisition of scientific knowledge in the fields of natural sciences, and particularly in the inventory and monitoring of biodiversity (McKinley et al. 2017). The information generated with the collaboration of citizens has an evident importance in conservation, by providing information on the state of populations and habitats, helping in mitigation and restoration actions, and very importantly contributing to involve society in conservation (Brown and Williams 2019).
An obvious advantage of these initiatives is the ability to mobilize human resources on a large territorial scale and in the medium term, which would otherwise be difficult to finance. The resulting increasing information then can be processed with advanced computational techniques (Hochachka et al 2012; Kelling et al. 2015), thus improving our interpretation of the distribution of species. Specifically, the ability to obtain information on a large territorial scale can be integrated into studies based on Species Distribution Models SDMs. One of the common problems with SDMs is that they often work from species occurrences that have been opportunistically recorded, either by professionals or amateurs. A great challenge for data obtained from non-professional citizens, however, remains to ensure its standardization and quality (Kosmala et al. 2016). This requires a clear and effective design, solid volunteer training, and a high level of coordination that turns out to be complex (Brown and Williams 2019). Finally, it is essential to perform a quality validation following scientifically recognized standards, since they are often conditioned by errors and biases in obtaining information (Bird et al. 2014). There are two basic approaches to obtain the necessary data for this validation: getting it from an external source (external validation), or allocating a part of the database itself (internal validation or cross-validation) to this function.
Matutini et al. (2020) in his work 'How citizen science could improve Species Distribution Models and their independent assessment' shows a novel application of the data generated by a citizen science initiative ('Un Dragon dans mon Jardin') by providing an external source for the validation of SDMs, as a tool to construct habitat suitability maps for nine species of amphibians in western France. Importantly, 'Un Dragon dans mon Jardin' contains standardized presence-absence data, the approximation recognized as the most robust (Guisan, et al. 2017). The SDMs to be validated, in turn, were based on opportunistic information obtained by citizens and professionals. The result shows the usefulness of this external data source by minimizing the overestimation of model accuracy that is obtained with cross-validation with the internal evaluation dataset. It also shows the importance of properly filtering the information obtained by citizens by determining the threshold of sampling effort.
The destiny of citizen science is to be integrated into the complex world of science. Supported by the increasing level of the formation of society, it is becoming a fundamental piece in the scientific system dedicated to the study of biodiversity and its conservation. After funding for scientists specialized in the recognition of biodiversity has been cut back, we are seeing a transformation of the activity of these scientists towards the design, coordination, training and verification of programs for the acquisition of field information obtained by citizens. A main goal is that a substantial part of this information will eventually get integrated into the scientific system, and rigorous verification process a fundamental element for such purpose, as shown by Matutini et al. (2020) work.
 Bird TJ et al. (2014) Statistical solutions for error and bias in global citizen science datasets. Biological Conservation 173: 144-154. doi: 10.1016/j.biocon.2013.07.037
 Brown ED and Williams BK (2019) The potential for citizen science to produce reliable and useful information in ecology. Conservation Biology 33: 561-569. doi: 10.1111/cobi.13223
 Guisan A, Thuiller W and Zimmermann N E (2017) Habitat Suitability and Distribution Models: With Applications in R. The University of Chicago Press. doi: 10.1017/9781139028271
 Hochachka WM, Fink D, Hutchinson RA, Sheldon D, Wong WK and Kelling S (2012) Data-intensive science applied to broad-scale citizen science. Trens Ecol Evol 27: 130-137. doi: 10.1016/j.tree.2011.11.006
 Kelling S, Fink D, La Sorte FA, Johnston A, Bruns NE and Hochachka WM (2015) Taking a ‘Big Data’ approach to data quality in a citizen science project. Ambio 44(Supple. 4):S601-S611. doi: 10.1007/s13280-015-0710-4
 Kosmala M, Wiggins A, Swanson A and Simmons B (2016) Assessing data quality in citizen science. Front Ecol Environ 14: 551–560. doi: 10.1002/fee.1436
 Matutini F, Baudry J, Pain G, Sineau M and Pithon J (2020) How citizen science could improve Species Distribution Models and their independent assessment. bioRxiv, 2020.06.02.129536, ver. 4 peer-reviewed and recommended by PCI Ecology. doi: 10.1101/2020.06.02.129536
 McKinley DC et al. (2017) Citizen science can improve conservation science, natural resource management, and environmental protection. Biological Conservation 208:15-28. doi: 10.1016/j.biocon.2016.05.015
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.
Evaluation round #2
DOI or URL of the preprint: https://www.biorxiv.org/content/10.1101/2020.06.02.129536v3
Version of the preprint: https://www.biorxiv.org/content/10.1101/2020.06.02.129536v3
Author's Reply, 30 Sep 2020
Evaluation round #1
DOI or URL of the preprint: https://www.biorxiv.org/content/10.1101/2020.06.02.129536v1
Author's Reply, 12 Sep 2020
Decision by Francisco Lloret, posted 25 Jul 2020
The paper address an interesting topic, which is the feasibility and realibility of data provided by citizen science platforms to furnish information about species distribution models. The topic is extremely novel at a time in which the link between citizens and sciences is becoming strengthed, and natural sciences aim extensive scientific information - for instance for conservation purposes- , while keeping standards of quality. The paper is well structured and written, attaining its objectives. However it still needs some relevant improvements. As pointed by referees, the manuscript needs to reinforce some strategical issues, such as a critical assessment of the use of citizen science in terms of weekenesses, and clarify somewhat its goal, since conservation application of the contributions of the study case is not fully addressed. The revisors are overall positive with the paper, but correctly identify that there are several methodological clarificactions that should be addressed: bias treatment (accessibility, attractiveness, sampling effort), particularly when dealing with pseudo-absences, many details on data sources, (access web, program name, institutions, ....), collection and sampling design, or criteria to set thresholds to establish absence data, among others.
Additional requirements of the managing board:
As indicated in the 'How does it work?’ section and in the code of conduct, please make sure that:
-Data are available to readers, either in the text or through an open data repository such as Zenodo (free), Dryad or some other institutional repository. Data must be reusable, thus metadata or accompanying text must carefully describe the data.
-Details on quantitative analyses (e.g., data treatment and statistical scripts in R, bioinformatic pipeline scripts, etc.) and details concerning simulations (scripts, codes) are available to readers in the text, as appendices, or through an open data repository, such as Zenodo, Dryad or some other institutional repository. The scripts or codes must be carefully described so that they can be reused.
-Details on experimental procedures are available to readers in the text or as appendices.
-Authors have no financial conflict of interest relating to the article. The article must contain a "Conflict of interest disclosure" paragraph before the reference section containing this sentence: "The authors of this preprint declare that they have no financial conflict of interest with the content of this article." If appropriate, this disclosure may be completed by a sentence indicating that some of the authors are PCI recommenders: “XXX is one of the PCI XXX recommenders.”