A stronger statistical test of stream restoration experiments
On the efficacy of restoration in stream networks: comments, critiques, and prospective recommendations
The metacommunity framework acknowledges that local sites are connected to other sites through dispersal, and that these connectivity patterns can influence local dynamics . This framework is slowly moving from a framework that guides fundamental research to being actively applied in for instance a conservation context (e.g. ). Swan and Brown [3,4] analyzed the results of a suite of experimental manipulations in headwater and mainstem streams on invertebrate community structure in the context of the metacommunity concept. This was an important contribution to conservation ecology.
However, David Murray-Stoker  was not satisfied with their statistical analyses, and recreated, and more importantly, improved their original analyses in the peer-reviewed article. The new analyses are based on a combination of a more consistent site selection, checking the model assumptions, using different estimation procedures, and focusing more on effect size calculations versus statistical significance. This peer-reviewed article is thus the perfect example of the advantages of open research: the original authors making available both the data and their R script files, initially first updating the analyses and results themselves, followed by more in-depth analyses of the original data and question.
This peer reviewed went through a very in-depth process itself, with several rounds of questions and feedback that addressed both the statistical analyses, the interpretation of the results, and the conclusions. It also, however, addressed something that is often harder to provide feedback on, for instance the tone of the argument. I hope that scientists interested in these issues will not only read the final manuscript, but also the different steps of the peer review processes. These are very informative, I think, and provide a more complete picture of mainly the raison for certain decisions.
Not only does this provide the reader interested in stream conservation with the opportunity to make up their own mind on the appropriateness of these decisions, but it could potentially lead to more analyses of this important data set. For instance, maybe a formal meta-analysis that starts with the effect sizes of all the original studies might bring some new insights into this question?
 Leibold, M. A., Holyoak, M., Mouquet, N. et al. (2004). The metacommunity concept: a framework for multi‐scale community ecology. Ecology letters, 7(7), 601-613. doi: 10.1111/j.1461-0248.2004.00608.x
 Heino, J. (2013). The importance of metacommunity ecology for environmental assessment research in the freshwater realm. Biological Reviews, 88(1), 166-178. doi: 10.1111/j.1469-185X.2012.00244.x
 Swan, C. M., and Brown, B. L. (2017). Metacommunity theory meets restoration: isolation may mediate how ecological communities respond to stream restoration. Ecological Applications, 27(7), 2209-2219. doi: 10.1002/eap.1602
 Swan, C. M., and Brown, B. L. (2018). Erratum for: Metacommunity theory meets restoration: isolation may mediate how ecological communities respond to stream restoration. Ecological Applications 28:1370–1371. doi: 10.1002/eap.1738
 Murray-Stoker, D. (2020). On the efficacy of restoration in stream networks: comments, critiques, and prospective recommendations. bioRxiv, 611939, ver. 7 peer-reviewed and recommended by PCI Ecology. doi: 10.1101/611939
Karl Cottenie (2020) A stronger statistical test of stream restoration experiments. Peer Community in Ecology, 100052. 10.24072/pci.ecology.100052
Revision round #42020-04-02
Decision round #4
It is obvious from the response from the author that my marble analogy did not clarify the statistical issue associated with the interpretation of the results. That is why I was glad to see the table in the response, because I think that will help me to explain the issue more clearly. If I understand the table correctly, the Revised Sites portion shows the streams that were selected for the new analyses. If that is correct, then
1) it clearly shows that the 2 treatments all replicates have in common are Bank Stabilization and In-Channel Manipulation. This thus means that the inference can only be applied to other streams that have received similar treatments. It does not matter if Riparian Forestation is present in a lot of them, because that treatment was not part of the selection regime. All replicate in an experiment should be as similar as possible to each other, except for the factors of interest (in this case whether they are Headwaters or Mainstems, and whether they received the treatments Bank Stabilization/In-Channel Manipulation or not). They are not similar in the Riparian Reforestation category.
2) the table also shows that Riparian Reforestation is confounded within the Headwaters and Mainstems category, since Reforestation is presence in all Mainstem streams and if absent, only absent in the Headwaters category. So any difference (or lack thereof) between Headwaters and Mainstems could potentially be caused the lack of Reforestation in the Headwaters streams.
I recommend that this table should be included in the manuscript, and the author should also discuss these limitations associated with their approach in re-analyzing the data. I don't think that this recommendation diminishes the value or novelty or impact of the manuscript. I just think that this is necessary to be clear and explicit about the correct inferences associated with the analyses.
Revision round #32020-03-08
Decision round #3
The author addressed most of the comments satisfactorily. There is only one sticking point left, regarding the effect of site selection on the inference possible.
At the end of the introduction, the author writes: "I used this hypothesis to guide and inform site selection in my re-analysis. I required sites in the re-analysis to have received both the bank stabilization and in-channel manipulations treatments (hereafter “revised” sites), although sites receiving riparian reforestation were also included if they received both the bank stabilization and in-channel manipulations treatments."
This means that the population from which the samples are drawn from, and thus what influences the inference, are all sites that received both bank stabilization and in-channel manipulations, period. Since reforestation is not part of the site selection criteria, the author cannot make any inference about reforestation.
This is similar to a bowl of a 1000 marbles that have 3 colors (red, blue, and black), and that are either small or big. If you randomly select 10 red or blue marbles (but not black ones) from the bowl, and 8 of the selected red or blue marbles are small, your inference will be about the population of red and blue marbles, not about the black marbles, or about the small marbles. Maybe the black ones are big marbles, but because you did not select these, you will not be able to test that.
I don't know what the range of restoration practices are, but out of all the possible ones, the author only selected the bank stabilization and in-channel manipulations restoration practices. That means, similarly to the marbles, that the population this analysis allows inference for is the population of bank stabilization and in-channel manipulation experiments. To call that subset "restoration" is not statistically correct, I think. Or that it reflects reforestation practices.
Thus, the first sentence of "Effectiveness of local restoration" that reads: "I hypothesized that restoration would have stronger effects in headwaters relative to mainstems." is incorrect. I do think that the author should replace every instance of "restoration" with "bank stabilization and in-channel manipulations restoration practices". Further in the discussion, the author can then potentially make the case that these are the most common restoration practices, or the most effective ones, or some other argument to convince the reader that the analysis applies to restoration in general, but the author will have to make it very explicit that this is speculation, that is not necessarily supported by the data analysis.
When the author writes, correctly: "the hypothesis was intended to guide criteria for site selection and reduce variation in restoration treatments among sites and not to necessarily or strictly compare the effects in-channel manipulations and riparian reforestation treatments on biodiversity in restored streams." The flip side of the reduced variation in restoration treatments among sites is that the population sampled is different, and smaller. I think that this has to be explicitly acknowledged in the manuscript, and that the correct words have to be used.
Finally, I think that the section "Statistical inconsistencies" has indeed been toned down enough. The author writes in the response to the comment to "tone it down little" that " There is an assumption that what researchers write in their manuscripts is an honest representation of the study, but that assumption was broken and that implicit trust between reader and researcher was lost." I have an almost completely opposite view on the original authors' actions: they were extremely open about what they did and how they did it. They did provide all the code and all the data. That they provided this information when asked does not diminish that they did provide all of this, without which this manuscript would not have been possible. Did they maybe make some mistakes, yes, could they have analysed the data better, yes (see the resulting manuscript). But I am pretty sure that is the case for a lot of published papers, mine included, without there being any . I benefited from reading some of the blog posts from Stephen Hearst on the function of the Methods section in scientific papers, especially https://scientistseessquirrel.wordpress.com/2015/02/27/reproducibility-your-methods-section-and-400-years-of-angst/, and maybe that might help the author too.
Revision round #22020-02-05
Decision round #2
I mirror the comments from the two reviewers who applaud the amount of work done by the author in revising and reworking this manuscript. Reviewer 2, however, provided some additional suggestions. The first paragraph of their review provides some context, but the "Few additions" section provides 5 suggestions that should be addressed. In my mind, their suggestion for using linear regression is not necessary, since I think the author wanted to stay as close as possible to the original analysis approaches.
In addition to these changes, I also have some suggestions.
First a statistical question/suggestion: the author writes: "with stream identity fitted as a random blocking factor in each ANOVA;" Does that mean that the restored and adjacent sample sites within a stream were paired, as they should be? I assume it is, but I would then strongly advice to add a plot of the raw data of the richness and diversity, similar to figures 1 and 2, showing the richnesses of the restored and adjacent sites from the same stream connected by a line. If all those lines going up for the headwaters and relatively 0 for the mainstem streams, that could suggest the lack of power to detect a significant interaction effect? Because figures 1 and 2 show the results of the statistical test, this biological meaning can get lost.
I did not pick up on another issue in the first review round, but there is some ambiguity in the text: The manuscript has a sentence: " I required sites in the re-analysis to have received both the bank stabilization and in-channel manipulations treatments (hereafter “revised” sites), although sites receiving riparian reforestation were also included if they received both the bank stabilization and in-channel manipulations treatments. " The section after "although" seems not necessary, because your criteria are actually "received both bank stabilization and in-channel manipulation treatments". Your selection criteria, I think, do not allow you to say anything about reforestation? This ambiguity is reflected in the first sentence of the Discussion: "I hypothesized that stream-channel manipulations would have a more consistent effect relative to the effect of riparian reforestation between headwaters and mainstems, with stronger effects of restoration in headwaters relative to mainstems. " Again, I think the reference to reforestation should be removed from that statement.
Once I became aware of that, the next sentence: "As there were no significant effects of restoration on any of the community metrics between headwaters and mainstems,..." became also not correct. Your inference is not about restoration, but it is about "stream-channel manipulations". I would thus strongly recommend to change "restoration" with "stream-channel manipulations", and do this consistently throughout the manuscript. This would explicitly acknowledge throughout this manuscript that the results of this re-analysis are very specific to one type of restoration, and thus explicitly identify the scope of the inference possible. I think that in light of this argument, the second paragraph in "Restoration Ecology & Experimental Design" should be changed considerably. The author equally uses "restoration" maybe not indiscriminately, but at least too generally. Also, I would remove the next sentence "and not to necessarily or strictly compare the effects in-channel manipulations and riparian reforestation treatments on biodiversity in restored streams." Your selection criteria indeed do not allow you to say anything about riparian reforestation, because that was not a selection criterion, the reader does not know anything about this condition and how it affects stream communities.
In "Restoration Ecology & Experimental Design" you mentioned that this re-analysis did not include "the time since restoration" as a selection criterion. Does that mean that you do not think it is important, or if you included this there would be not enough degrees of freedom to perform a statistical analysis? This is an important point, I think, because the main differences between this author's approach and the original 2 papers are the inclusion criteria and some, important, statistical differences. Since this re-analysis did not provide criteria for time since restoration (maybe because this information is not available in any of the original data?), it does illustrate that all these selection criteria are 1) important, but also 2) subjective to a certain degree, and 3) result in trade-offs with sample size and inference. For instance, I argue above that this re-analysis precludes the use of the word restoration in this manuscript, because it is actually stream-channel modifications. Equally important, if the original data have varying degrees of recovery time, this could very well be more important than the type of stream restoration. Given the importance of selection criteria in this manuscript, this should be included in this manuscript.
I also think that the tone in the "Statistical Inconsistencies" can still be toned down. While the first paragraph points out the differences between the text and the R code, the second paragraph is different. I would suggest that the author removes the first ("Finally, and of greatest concern, is the wholesale disagreement between the reported analytical procedure and what was actually conducted when analyzing temporal variability.") and final sentence ("Without consulting the supporting information or if no R code was provided, it would have been assumed the results presented in the erratum (Swan and Brown 2018) were derived from the analytical procedure described in the original study (Swan and Brown 2017), just with the corrected dataset; this assumption would have been incorrect.") of that paragraph. The authors did provide supporting information and R code, actually this whole manuscript benefited from the original authors' willingness for open and reproducible science.
These are in my opinion only textual changes that should not be a problem for the author to address, and I am looking forward to the next version of this manuscript that provides an example of "something that should be done more often" (reviewer 1).
Reviewed by Eric Harvey, 2020-01-15 16:46
Reviewed by Mariana Perez Rocha, 2020-01-23 23:26
Revision round #12019-11-10
Decision round #1
I have now read the two reviews of the preprint “On the efficacy of restoration in stream networks: comment and critique” by Murray-Stoker. Based on their comments, I have again read the preprint, and here is my recommendation. I think, though, that my recommendation might not be very satisfying for all parties involved. I hope, though, that I provide enough justification to convince the reader that a new approach would be beneficial.
Both reviewers point out the dual objectives of this preprint: on the one hand it provides a reanalysis of existing data from a previous study based on a different opinion on study design and analysis decisions, and on the other hand a critique of the motivation or intent of these differences. I agree with both reviewers that re-analysis of existing data in papers show the strength of the current practice of open and reproducible research. However, I also agree with one of the reviewers that the manuscript lacks some structure to easily convey the differences of the three papers in question. And their recommendation of a table with three columns, one for the original paper, one for the erratum, and one for this preprint would be useful. Each row would be a comparison item, which could be what studies were included, what design decisions were made, what statistical decisions were made, what results and conclusions were made for each of these 3 approaches. That will make it easier for the reader to compare and contrast, and then also decide where they stand on each of these decisions.
I also agree with one of the reviewers, though, that attributing negative intent to the original authors seems harsh and even unnecessary. They shared their data to make all this re-analysis even possible, so I think it is very unlikely that they intentionally made decisions that would bias their results and thus their conclusions. I think that what we are seeing here is that open science exposes more of the messiness of scientific practice with subjective decisions that have always been present, but now are out in the open and can be questioned and potentially corrected. This productive debate can lead to better or different analyses and conclusions, and it is up to the reader to decide which one is the most convincing one.
This is important for me, because after reading all the different opinions, I think that there is an inherent flaw with the current analysis method. If I understand it correctly, for most of the restored streams, there is a comparison between the restored stream and an adjacent stream. (I tried to find the original data mentioned in the preprint, but for some reason the link did not work.) If this is in essence a comparison between an experimental and control condition, and the study wants to combine results from different experiments, I think that the appropriate statistical approach should be a meta-analysis. This has several advantages. For each experiment, you will be able to directly compute an effect size (which was also advocated by one of the reviewers) for each study (in essence treatment - control, potentially scaled by sample size and/or standard deviation). This direct comparison will ensure that the reach dependency is included in the analysis but not as a random effect, and it will also reflect in the figures the true power of the statistical design. Right now when you look at the figures, this paired nature of the design is not at all represented. Secondly the meta-analytical approach would avoid one of the points raised by the author: what studies to include and which ones not to include. If you work with effect sizes in a meta-analysis, one of the assumptions being made by the different authors (whether bank stabilization and in-channel manipulations treatments are strong enough) will actually not be an assumption but a test. In the meta-analysis, you could test whether the stronger impacts result in larger effect sizes, or whether the different types of treatments (reforestation etc) have different effect sizes. Meta-analysis techniques were developed to exactly address these types of questions. Finally, by computing the effect size, you will avoid having to include the random stream identity, and since this is a difference, not only is the interpretation more directly, but also more likely to result in a statistical model with easier model assumptions.
I realize that this recommendation for an additional or different analysis is one of the reasons that started this debate and series of manuscripts. However, I hope that my arguments outlined above will convince somebody that the meta-analytical approach is superior to the analyses from the previous articles and preprint, and that this might provide a more detailed and useful analysis and associated conclusions.