Cazalis et al. assess the effectiveness of protected areas for preserving birds, using both assemblage-level and individual species-level analysis. The main contribution of this paper seems to me to be the explicit consideration of land cover type when comparing protected and unprotected areas, such that protected areas are compared to unprotected areas of the same land cover type. At the individual species level, the study evaluates the different effects of protected areas on species with different habitat requirements and human tolerance, highlighting which species are likely to be most benefited by protected areas. Overall I find the study interesting and useful, especially because it evaluates whether protected areas are more successful for some habitat types and some types of species – this seems like it could be of direct interest to applied managers. I also find this study interesting in the context of other recent overviews of biodiversity change that, like this study, have found little change in assemblage-level metrics but greater change in species-level metrics . I think this paper would benefit from more explicit discussion of other studies that have found little assemblage-level change, and what implications that has for using assemblage-level metrics (particularly species richness) to measure biodiversity change. I have one concern (discussed below) about the authors’ decision to exclude non-native species from analysis, and I have a concern about “winsorizing” the data in the statistical methods (also discussed below). Finally, I have listed some minor comments about word choice and sentence structure; these minor sentence-level issues are a slight barrier to smooth reading, but generally did not interfere with my ability to understand the article.
The introduction and/or discussion could benefit from addressing other studies that have found small changes or no changes in species richness over time and in response to disturbance . This study compares species richness of protected and unprotected sites, with the assumption that species richness would decline more in unprotected than in protected sites because of local extinctions and population declines in non-protected areas. This would lead to an observed pattern of lower species richness in unprotected areas. However, Supp & Ernest  reviewed published studies and found that species richness did not respond strongly to disturbance. They suggested that “community-level measures are poor indicators of change...” . Supp & Ernest  did find stronger species-level responses to disturbance, which is in line with the results of this study. I think it would be useful to put this study and its results into the context of broader biodiversity trends, and especially to mention other studies that, like this study, have found that differences in individual species metrics (e.g. individual species abundance changes, species turnover) are larger than differences in species richness .
If I understand correctly, this study excluded non-native species from analyses (line 210). It may be reasonable to limit the study to native species if the goal is to evaluate the effectiveness of PAs in preventing local or regional extinctions and population declines. However, excluding non-native species does not make sense if the objective is to study the effectiveness of PAs in conserving ecological communities, because communities are changed by the addition of non-native species. The abstract of this study states that the study investigated “protected areas effectiveness in conserving bird assemblages” (line 22) and this is also mentioned in the discussion (line 418). In fact, I think this study investigates protected areas’ effectiveness in conserving native bird species, even if the assemblages in which those native species live have changed due to the addition of non-native species. Excluding non-native species could be masking important changes in the ecological community.
I think it would be helpful for the authors to clarify whether this study is aiming to study native species conservation or entire community assemblages. If the aim is to study community assemblages (e.g. line 22), then I think all species detected should be included in the assemblage analyses. If non-native species are excluded from analyses, then I would find it clearer if the terms “species richness” and “abundance” were changed changed to “native species richness” and “abundance of native species” throughout the text. This will clarify exactly what aspects of the community are being studied. In particular, claims like that made in line 488 that “...we measured PAs effectiveness as the difference in abundance or richness between protected and unprotected sites” are not quite correct if non-native species have been excluded. The actual difference in species richness or abundance between sites is unknown in this study, and may differ from the difference between native species richness or native species abundance.
I am concerned about the winsorizing of abundance values for species (line 217). If I understand the methods correctly, individual species abundance is analyzed using GAMs with a Poisson error distribution. Did the authors check whether the data fit distributional assumptions before or after winsorizing, e.g. by looking at QQ plots or histograms of residuals ? In general, it is preferable to model the original data values using an appropriate error distribution rather than transforming or changing extreme values. If the authors still decide to winsorize the data, then they should justify this by reporting whether the winsorized data fit a Poisson error distribution better than the un-transformed data did. I do not know how much the choice of whether to winsorize the data affects the results, but in general I think it would be preferable to model these data using their true values and an appropriate error distribution, rather than changing those values to the value of the chosen quantiles.
 Dornelas, M. et al. (2014) Assemblage time series reveal biodiversity change but not systematic loss. Science 344, 296-299.
 McGill, B. J. et al. (2015) Fifteen forms of biodiversity trend in the Anthropocene. Trends in Ecology & Evolution 30, 104-113.
 Supp, S. R. & Ernest, S. M. (2014) Species-level and community-level responses to disturbance: a cross-community analysis. Ecology 95, 1717-1723.
 Wood, Simon N. 2017. Generalized Additive Models: An introduction with R (2nd edition). Chapman and Hall/CRC.
For spelling or word change suggestions, I have put the proposed new word in bold.
Line 17-19: “which whereby...” is odd wording. Remove the word “which”?
Line 27-28: Wrong sentence structure. Perhaps delete “are the one avoiding human activities” so that the sentence reads “At the species level, we found that species that avoid human activities tend to be favoured by protected areas.”
84-89: I think the comparison to population trend-based methods can be shortened. A criticism of trend-based methods is not a major development of this study. The example (starting at “Hence...” and ending at “...20 times more important” can be removed.
96-137: These 2 paragraphs are the most informative and important part of the introduction. The choice of counter-factuals and how that influences the interpretation of measures of PA effectiveness seems to be a major theme of this study. These 2 paragraphs are a well-written, concise overview of the issues.
132: Re-cite the meta-analyses here? They were originally cited in line 98, but I ended up scrolling up through the text looking for those citations.
184: “effectiveness in selecting the most interesting sites...”
185: “effectiveness in creating ...”
202: “presence of PAs ...”
204: “starting stop of each route” is confusing. Maybe better to say “first stop of each route”.
210: I find the phrase “non-native” species more familiar than “non-indigenous species” but this is probably just a matter of convention. The authors could change it or leave it as it currently is.
216: Consider at least mentioning detectability and the difference between abundance and detections. Unless they have already corrected for detection, what the authors did here was sum detections, not abundances. Using this metric for analysis assumes that detectability was constant across routes and protected and unprotected areas. This assumption should be mentioned.
220: Include which software was used (e.g. QGIS, ArcGIS, R).
228: Include a reference for the quoted IUCN definition of a protected area.
244: Regarding “effectiveness can vary with protection level”, has the introduction already cited studies showing this? Perhaps provide a reference here.
250: “(May to June)”
262: “... not analysed because they were too scarce.”
Fig 3: The negative relationship between human-affinity and PAEfor is hard to see on the scatter plot. Drawing the regression line from the LM model reported in Table 1 might help make the relationship more visible.
390-505: Overall, the Discussion is interesting and well written.
422: The claim that forest PAs have “more forest-typical bird assemblages” would be stronger if all species (including non-native species) were included in analysis.
452: “changed” should be “changes”
507-508: This sentence seems weaker than the rest of the Discussion. It does not add anything new.