Recommendation

A handy “How to” review code for ecologists and evolutionary biologists

Corina Logan based on reviews by Serena Caplins and 1 anonymous reviewer

A recommendation of:

Implementing Code Review in the Scientific Workflow: Insights from Ecology and Evolutionary Biology

Edward Ivimey-Cook, Joel Pick, Kevin Bairos-Novak, Antica Culina, Elliot Gould, Matthew Grainger, Benjamin Marshall, David Moreau, Matthieu Paquet, Raphaël Royauté, Alfredo Sanchez-Tojar, Inês Silva, Saras Windecker (2023), EcoEvoRxiv, ver.5, peer-reviewed and recommended by PCI Ecology https://doi.org/10.32942/X2CG64

Read preprint in preprint server Now published in a journal

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Implementing Code Review in the Scientific Workflow: Insights from Ecology and Evolutionary Biology

Code review increases reliability and improves reproducibility of research. As such, code review is an inevitable step in software development and is common in fields such as computer science. However, despite its importance, code review is noticeably lacking in ecology and evolutionary biology. This is problematic as it facilitates the propagation of coding errors and a reduction in reproducibility and reliability of published results. To address this, we provide a detailed commentary on how to effectively review code, how to set up your project to enable this form of review and detail its possible implementation at several stages throughout the research process. This guide serves as a primer for code review, and adoption of the principles and advice here will go a long way in promoting more open, reliable, and transparent ecology and evolutionary biology.

reliability, reproducibility, software development, coding errors, research process, open science, transparency.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

تنفيذ مراجعة الكود في سير العمل العلمي: رؤى من علم البيئة والبيولوجيا التطورية

تعمل مراجعة الكود على زيادة الموثوقية وتحسين إمكانية تكرار الأبحاث. على هذا النحو، تعد مراجعة التعليمات البرمجية خطوة حتمية في تطوير البرمجيات وهي شائعة في مجالات مثل علوم الكمبيوتر. ومع ذلك، على الرغم من أهميتها، إلا أن مراجعة الكود تفتقر بشكل ملحوظ إلى علم البيئة وعلم الأحياء التطوري. وهذا يمثل مشكلة لأنه يسهل انتشار أخطاء الترميز وتقليل إمكانية التكرار وموثوقية النتائج المنشورة. ولمعالجة ذلك، نقدم تعليقًا تفصيليًا حول كيفية مراجعة الكود بشكل فعال، وكيفية إعداد مشروعك لتمكين هذا النوع من المراجعة وتفاصيل تنفيذه المحتمل في عدة مراحل خلال عملية البحث. يعد هذا الدليل بمثابة دليل تمهيدي لمراجعة الكود، وسيؤدي اعتماد المبادئ والنصائح هنا إلى قطع شوط طويل في تعزيز بيئة أكثر انفتاحًا وموثوقية وشفافية وبيولوجيا تطورية.

الموثوقية، إمكانية التكرار، تطوير البرمجيات، أخطاء الترميز، عملية البحث، العلم المفتوح، الشفافية.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Implementación de la revisión de código en el flujo de trabajo científico: conocimientos de la ecología y la biología evolutiva

La revisión del código aumenta la confiabilidad y mejora la reproducibilidad de la investigación. Como tal, la revisión de código es un paso inevitable en el desarrollo de software y es común en campos como la informática. Sin embargo, a pesar de su importancia, la revisión del código es notablemente deficiente en ecología y biología evolutiva. Esto es problemático ya que facilita la propagación de errores de codificación y una reducción en la reproducibilidad y confiabilidad de los resultados publicados. Para abordar esto, proporcionamos un comentario detallado sobre cómo revisar el código de manera efectiva, cómo configurar su proyecto para permitir esta forma de revisión y detallar su posible implementación en varias etapas a lo largo del proceso de investigación. Esta guía sirve como base para la revisión del código, y la adopción de los principios y consejos aquí contribuirá en gran medida a promover una ecología y una biología evolutiva más abiertas, confiables y transparentes.

confiabilidad, reproducibilidad, desarrollo de software, errores de codificación, proceso de investigación, ciencia abierta, transparencia.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Implémentation de la révision du code dans le flux de travail scientifique : aperçus de l'écologie et de la biologie évolutive

La révision du code augmente la fiabilité et améliore la reproductibilité de la recherche. En tant que telle, la révision du code est une étape inévitable dans le développement de logiciels et est courante dans des domaines tels que l’informatique. Cependant, malgré son importance, la révision des codes fait manifestement défaut en écologie et en biologie évolutive. Ceci est problématique car cela facilite la propagation des erreurs de codage et une réduction de la reproductibilité et de la fiabilité des résultats publiés. Pour résoudre ce problème, nous fournissons un commentaire détaillé sur la façon de réviser efficacement le code, comment configurer votre projet pour permettre cette forme de révision et détaillons sa mise en œuvre possible à plusieurs étapes tout au long du processus de recherche. Ce guide sert de base à la révision du code, et l'adoption des principes et des conseils présentés ici contribuera grandement à promouvoir une écologie et une biologie évolutive plus ouvertes, fiables et transparentes.

fiabilité, reproductibilité, développement logiciel, erreurs de codage, processus de recherche, science ouverte, transparence.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

वैज्ञानिक वर्कफ़्लो में कोड समीक्षा लागू करना: पारिस्थितिकी और विकासवादी जीवविज्ञान से अंतर्दृष्टि

कोड समीक्षा से विश्वसनीयता बढ़ती है और अनुसंधान की पुनरुत्पादन क्षमता में सुधार होता है। इस प्रकार, सॉफ्टवेयर विकास में कोड समीक्षा एक अपरिहार्य कदम है और कंप्यूटर विज्ञान जैसे क्षेत्रों में यह आम है। हालाँकि, इसके महत्व के बावजूद, पारिस्थितिकी और विकासवादी जीव विज्ञान में कोड समीक्षा का स्पष्ट अभाव है। यह समस्याग्रस्त है क्योंकि यह कोडिंग त्रुटियों के प्रसार और प्रकाशित परिणामों की प्रतिलिपि प्रस्तुत करने योग्यता और विश्वसनीयता में कमी की सुविधा प्रदान करता है। इसे संबोधित करने के लिए, हम प्रभावी ढंग से कोड की समीक्षा कैसे करें, समीक्षा के इस रूप को सक्षम करने के लिए अपनी परियोजना कैसे स्थापित करें और अनुसंधान प्रक्रिया के दौरान कई चरणों में इसके संभावित कार्यान्वयन का विवरण कैसे दें, इस पर एक विस्तृत टिप्पणी प्रदान करते हैं। यह मार्गदर्शिका कोड समीक्षा के लिए एक प्राइमर के रूप में कार्य करती है, और यहां सिद्धांतों और सलाह को अपनाने से अधिक खुले, विश्वसनीय और पारदर्शी पारिस्थितिकी और विकासवादी जीव विज्ञान को बढ़ावा देने में काफी मदद मिलेगी।

विश्वसनीयता, प्रतिलिपि प्रस्तुत करने योग्यता, सॉफ्टवेयर विकास, कोडिंग त्रुटियां, अनुसंधान प्रक्रिया, खुला विज्ञान, पारदर्शिता।

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

科学的ワークフローでのコードレビューの実装: 生態学と進化生物学からの洞察

コードレビューにより信頼性が向上し、研究の再現性が向上します。そのため、コードレビューはソフトウェア開発において避けられないステップであり、コンピューターサイエンスなどの分野では一般的です。しかし、その重要性にもかかわらず、コードレビューには生態学と進化生物学が著しく欠けています。これは、コーディングエラーの伝播を促進し、公開された結果の再現性と信頼性を低下させるため、問題があります。これに対処するために、コードを効果的にレビューする方法、この形式のレビューを可能にするプロジェクトの設定方法、および研究プロセス全体のいくつかの段階での実装の可能性について詳しく説明します。このガイドはコードレビューの入門書として機能し、ここでの原則とアドバイスの採用は、よりオープンで信頼性が高く、透明性のある生態学と進化生物学を促進するのに大いに役立ちます。

信頼性、再現性、ソフトウェア開発、コーディングエラー、研究プロセス、オープンサイエンス、透明性。

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Implementando Revisão de Código no Fluxo de Trabalho Científico: Insights da Ecologia e Biologia Evolutiva

A revisão do código aumenta a confiabilidade e melhora a reprodutibilidade da pesquisa. Como tal, a revisão de código é uma etapa inevitável no desenvolvimento de software e é comum em áreas como a ciência da computação. No entanto, apesar de sua importância, a revisão de código é visivelmente carente em ecologia e biologia evolutiva. Isto é problemático porque facilita a propagação de erros de codificação e uma redução na reprodutibilidade e fiabilidade dos resultados publicados. Para resolver isso, fornecemos comentários detalhados sobre como revisar o código de maneira eficaz, como configurar seu projeto para permitir essa forma de revisão e detalhar sua possível implementação em vários estágios ao longo do processo de pesquisa. Este guia serve como uma base para a revisão do código, e a adoção dos princípios e conselhos aqui apresentados contribuirá muito para promover uma ecologia e uma biologia evolutiva mais abertas, confiáveis e transparentes.

confiabilidade, reprodutibilidade, desenvolvimento de software, erros de codificação, processo de pesquisa, ciência aberta, transparência.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Внедрение проверки кода в научный рабочий процесс: идеи экологии и эволюционной биологии

Проверка кода повышает надежность и воспроизводимость исследований. Таким образом, проверка кода является неизбежным шагом в разработке программного обеспечения и распространена в таких областях, как информатика. Однако, несмотря на свою важность, code review заметно не хватает в экологии и эволюционной биологии. Это проблематично, поскольку способствует распространению ошибок кодирования и снижению воспроизводимости и надежности опубликованных результатов. Чтобы решить эту проблему, мы предоставляем подробный комментарий о том, как эффективно проверять код, как настроить проект для включения этой формы проверки и подробно описываем ее возможную реализацию на нескольких этапах процесса исследования. Это руководство служит основой для проверки кода, и принятие содержащихся в нем принципов и советов будет иметь большое значение для продвижения более открытой, надежной и прозрачной экологии и эволюционной биологии.

надежность, воспроизводимость, разработка программного обеспечения, ошибки кодирования, процесс исследования, открытая наука, прозрачность.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

在科学工作流程中实施代码审查：来自生态学和进化生物学的见解

代码审查提高了研究的可靠性和可重复性。因此，代码审查是软件开发中不可避免的步骤，并且在计算机科学等领域很常见。然而，尽管代码审查很重要，但它在生态学和进化生物学领域却明显缺乏。这是有问题的，因为它促进了编码错误的传播并降低了已发表结果的可重复性和可靠性。为了解决这个问题，我们提供了关于如何有效审查代码、如何设置项目以实现这种形式的审查的详细评论，并详细说明了其在整个研究过程的几个阶段可能的实施。本指南可作为代码审查的入门读本，采用此处的原则和建议将大大有助于促进更加开放、可靠和透明的生态学和进化生物学。

可靠性、可重复性、软件开发、编码错误、研究过程、开放科学、透明度。

Submission: posted 19 May 2023, validated 22 May 2023
Recommendation: posted 10 August 2023, validated 11 August 2023

Cite this recommendation as:
Logan, C. (2023) A handy “How to” review code for ecologists and evolutionary biologists. Peer Community in Ecology, 100541. https://doi.org/10.24072/pci.ecology.100541

Recommendation

Ivimey Cook et al. (2023) provide a concise and useful “How to” review code for researchers in the fields of ecology and evolutionary biology, where the systematic review of code is not yet standard practice during the peer review of articles. Consequently, this article is full of tips for authors on how to make their code easier to review. This handy article applies not only to ecology and evolutionary biology, but to many fields that are learning how to make code more reproducible and shareable. Taking this step toward transparency is key to improving research rigor (Brito et al. 2020) and is a necessary step in helping make research trustable by the public (Rosman et al. 2022).

References

Brito, J. J., Li, J., Moore, J. H., Greene, C. S., Nogoy, N. A., Garmire, L. X., & Mangul, S. (2020). Recommendations to enhance rigor and reproducibility in biomedical research. GigaScience, 9(6), giaa056. https://doi.org/10.1093/gigascience/giaa056

Ivimey-Cook, E. R., Pick, J. L., Bairos-Novak, K., Culina, A., Gould, E., Grainger, M., Marshall, B., Moreau, D., Paquet, M., Royauté, R., Sanchez-Tojar, A., Silva, I., Windecker, S. (2023). Implementing Code Review in the Scientific Workflow: Insights from Ecology and Evolutionary Biology. EcoEvoRxiv, ver 5 peer-reviewed and recommended by Peer Community In Ecology. https://doi.org/10.32942/X2CG64

Rosman, T., Bosnjak, M., Silber, H., Koßmann, J., & Heycke, T. (2022). Open science and public trust in science: Results from two studies. Public Understanding of Science, 31(8), 1046-1062. https://doi.org/10.1177/09636625221100686

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
This work was partially funded by the Center of Advanced Systems Understanding (CASUS), which is financed by Germany's Federal Ministry of Education and Research (BMBF) and by the Saxon Ministry for Science, Culture and Tourism (SMWK) with tax funds on the basis of the budget approved by the Saxon State Parliament. C.H.F. and J.M.C. were supported by NSF IIBR 1915347.

Reviews

Evaluation round #1

DOI or URL of the preprint: https://doi.org/10.32942/X2CG64

Version of the preprint: 4

Author's Reply, 08 Aug 2023

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.ecology.100541.ar1

Decision by Corina Logan, posted 24 Jul 2023, validated 24 Jul 2023

Thank you for your wonderful “How to” article! It is a useful and concise read that should be helpful for many researchers. Two reviewers who have expertise in code sharing and/or promoting open research practices have provided very positive feedback and some helpful ideas that you might find useful to incorporate.

If you decide to incorporate the addition of co-authorship for code reviewers, as suggested by Reviewer 1, please also reference a guideline for authorship to ensure that researchers are aware of what the code reviewers would need to do to fully earn authorship. For example, according to the ICMJE guidelines (http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html#two), authors need to have contributed to the development of the article AND the writing of the article. Therefore, a code reviewer could earn authorship if they review the code (contributing to the development of the article) AND they help with the editing of the article.

If you decide to incorporate the discussion around offering reviewers co-authorship, as suggested by Reviewer 1, please also provide ideas for how peer review processes can address the issue of it being very difficult to find enough reviewers in the first place. If the few people who accept reviews were to become co-authors because of their code reviewing work as part of the review process, then new reviewers would need to be recruited to be the reviewers of the article (because authors cannot review their own articles).

I have only a few minor comments:

- Line 109: perhaps change “and mistaking the column order” to “and producing a mistaken column order”

- Line 113: by “number” in “These errors are thought to scale with the number and complexity of code”, do you mean the number of lines? Or the number of code chunks? Or something else?

- Line 116: wow, I had no idea about identical() - what a useful tool!

- Figure 2: it’s nice that you suggest contacting the authors directly. This can save so much time in the peer review process and promotes collegial interactions

- Line 183: for some reason the URL https://github.com/pditommaso/awesome-pipeline is not working - the pdf seems to be cutting it off, which results in a 404 error

- Line 209: is Dryad free? I thought it cost money for authors to use it (which might be hidden by contracts Dryad has with publishers or universities)

- Figure 3: “Can my code be understood?” perhaps change to “Is my code understandable?”. I’m not sure what a style guide is - maybe it is in the resources you suggested for cleaning up code? Regardless, make it a bit more obvious what this piece is

- Line 276: this link is broken https://github.com/SORTEE/peer-277 code-review/issues/8

- Line 295: “not to get bogged down modifying or homogenising style” I would add “by” as in “bogged down by modifying”

- Line 338: “These benefits are substantial and could ultimately contribute to the adoption of code review during the publication process.” Adoption by whom? Journals?

A couple of things I’ve learned from my own open workflows that you might find useful for the article (of course, don’t feel pressured to include these just because I mentioned them):

1) THE easiest way I find to make my code runable by anyone anywhere is when I upload the data sheet to GitHub and reference it in the R code so it will easily run from anyone’s computer (see an example of the code here: https://github.com/corinalogan/grackles/blob/6c8930fcd66105b580809ef761d63b9cff0cbd83/Files/Preregistrations/g_flexmanip.Rmd#L233)

2) Line 209: consider adding the following data repository to your list: Knowledge Network for Biocomplexity (https://knb.ecoinformatics.org/). It is free, University-owned, and for ecology data, as well as being easily searchable because their metadata requirements are extensive (thus removing the need for researchers to remember all of the metadata they should be adding).

I look forward to reading the revision.

All my best,

Corina

https://doi.org/10.24072/pci.ecology.100541.d1

Reviewed by anonymous reviewer 1, 11 Jul 2023

This commentary highlighted the lack of reproducibility as a long-lasting, systematical issue in ecology and evolutionary biology, where results heavily depend on statistical modeling and numerical simulations. The authors suggested a comprehensive guideline to include code review in the pre-submission, peer review, and post-publication process to ensure the validity and robustness of scientific conclusions. Together with other pioneers who advocate for reproducible research in psychology and computer science, the authors proposed a valuable and practical framework (i.e., 4Rs, flowchart for peer reviewers).

As an advocate for reproducible research myself, I agree entirely with the necessity and urgency to change the status quo in the publication process, which emphasizes and incentivizes too much on the scientific novelties, but comparably too little on the reproducibility. Despite being out of the scope of this paper, in an ideal world, scientific discoveries should be published only after stringent fact-checking and careful examination of the method. Before sweeping through the entire field of biology, the discipline of ecology and evolution and other computationally extensive disciplines could be a pioneering test ground for incorporating code review as a standard publication practice.

In light of the well-versed rationales and the recommendations in this paper and my alignment with the preference for reproducible research, I do not have any major "concerns" overall. Before going to minor technical comments, I would like to share some general thoughts that may interest the authors to discuss or further develop in the revised manuscript.

1. In the section Are results reproducible?

Although each R in the 4R guidelines is indispensable, the intrinsic demands are increasing from the first to the last R. A fair and feasible implementation of these principles could vary by discipline/subdisciplines. Reproducible results have, of course, the uttermost importance and should be ensured whenever possible. Yet, in some cases, fully reproducing results is practical.

For example, evolutionary biology has relied on and/or gradually will rely more on insights obtained from high-throughput sequencing (comparative genomics, transcriptomics, and other omics). Due to the nature of its high-volume, sometimes high-dimensionality, computations involved are expansive not only in terms of computational time but also involves in the accessibility and availability of resources, including High-Performance Cluster (HPC), # of CPUs, memories, and storage. If reviewers were required to reproduce results in the peer review process, installation and configuration of a substantial fraction of bioinformatic software are not trivial, even if resources are provided/subsidized for the reviewer.

One possible solution to this (which may be outside the scope of this paper) is to watermark (e.g., using MD5 sum) all intermediate results during the computation and only examine the reproducibility of the "scaled-down" version of results.

2. In the section Is the code Reliable

Besides the reasons mentioned by the authors, many errors in code could often appear in user-defined analysis or helper functions. Toy examples, if not unit tests in the engineering fields, should at least be included to test if those functions are doing what they are expected to do.

3. On incentives for code-review club and reviewers

As most researchers are scientists (some may be statisticians) by training, requiring them to know and implement the best practices in coding could be too demanding, especially for the first author, who happens to be a bench scientist working on a highly independent project. Besides recommendations proposed by this paper (e.g., code review club, discussing up front potential authorship for code reviewers), I would go beyond and argue that code reviewers are entitled to authorship, especially if they contributed significantly in making code reviewable (Fig. 3)

On the other hand, given the current unfair workload that reviewers commonly face, reviewers should be offered the authorship opportunity as one of the incentives, especially if fact-checking is time-consuming. Although (again) may be out of the scope of this paper, this could be facilitated by journals that adopt a double-blind reviewing process. Authors' consent on granting anonymous reviewers should be asked before the manuscript is sent out to external reviewers, and editors must withhold the information from reviewers until the final acceptance decision on the manuscript to ensure the objectivity of reviewing process. To mitigate the possibility that authors are exploiting the reviewer's fact-checking work, authors should also agree that the reviewer will be entitled to authorships if they (1) find major discrepancies in the code that change the direction of the corresponding conclusions, (2) provides the evidence of their workflow (3) authors subsequently can reproduce the issues that reviewers postulated.

As the authors pointed out, designated "Data Editors" have become standard practices in some journals. Designated "Data Reviewers" may be on the horizon in the future. In such cases, authorship to data reviewers as incentives could be discussed, and it might be favorable not to grant authorship to data reviewers if they do not assess the manuscript's scientific novelty.

Minor comments:

4. Consider switching the orders of Fig.2 and Fig. 3 and updating the in-text reference
5. Line 183-184 https://github.com/pditommaso/awesome-184pipeline not found
6. Line 276-277 https://github.com/SORTEE/peer-277code-review/issues/8 not found
7. Line 352: add in-text reference "(Fig. 1)" after if code is indeed adhering to the R’s listed above.
8. Line 358: missing full citation of Stodden 2011, Light et al., 2014
9. Fig 1: "Code must be error-free": the language is vague, and I suggest revising it to "Is the code doing what it is supposed to do?"

https://doi.org/10.24072/pci.ecology.100541.rev11

Reviewed by Serena Caplins, 07 Jul 2023

The manuscript describes a process by which code-review in the field of ecology and evolution could take place. The authors suggest following the 4 R's (Reported, Run, Reliable, Reproducible).

Throughout the paper the authors provide some suggestions for reproducibility and effective managment of package versions. A tool not yet mentioned that my be useful is the R package "packrat" which stores packages specific to the version as they are installed, can reload them across different machines and can be shared across users. I recommend adding this type of package to the manuscript as this (version control/reproducibility) is what it was built for. It would be nice to see a packrat snapshot submitted along with code to fully reproduce an analyses.

Another tool which can greatly aid in reproduciblity is containers and/or the containerization of a research project. Here is a paper describing their use in data science https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008316
And more information on the reproducibility of containers here: https://carpentries-incubator.github.io/docker-introduction/reproduciblity/index.html

Beyond that I found the paper to be well written and well structured and feel it will be a helpful guide for the community. Congratulations to the authors and thank you for putting this together!

I would love to see journals take code review serieously and hire someone to code review alongside the work performed by the editors and reviewers. Perhaps some of the society journals should take the lead here?

https://doi.org/10.24072/pci.ecology.100541.rev12

User comments

No user comments yet

or Register
Submit a preprint