Combining epidemiological models with statistical inference can detect parasite interactions
Detecting within-host interactions using genotype combination prevalence data
There are several important topics in the study of infectious diseases that have not been well explored due to technical difficulties. One such topic is pursued by Alizon et al. in “Modelling coinfections to detect within-host interactions from genotype combination prevalences” . Both theory and several important examples have demonstrated that interactions among co-infecting strains can have outsized impacts on disease outcomes, transmission dynamics, and epidemiology. Unfortunately, empirical data on pathogen interactions and their outcomes is often correlational making results difficult to decipher.
The analytical framework developed by Alizon et al.  infers the presence and strength of pathogen interactions through their impact on transmission dynamics using a novel application of Approximate Bayesian Computation (ABC)-regression to epidemiological data. Traditional analytic approaches identify pathogen interactions when the observed distribution of pathogens among hosts differ from ‘neutral’ expectations. However, deviations from this expectation are not only a result of inter-strain interactions but can be caused by many ecological interactions, such as heterogeneity in host contact networks. To overcome this difficulty, Alizon et al  develop an analytical framework that incorporates explicit epidemiological models to allow inference of interactions among strains of Human Papillomaviruses (HPV) even with other ecological interactions that impact the distribution of strains among hosts. Alizon et al also demonstrate that using more of the available data, including the specific combination of strains present in hosts and knowledge of the connectivity of the hosts (i.e., super-spreaders), leads to more accurate inferences of the strength and direction of within-host interactions among coinfecting strains. This method successfully identified data generated from models with high and moderate inter-strain interaction intensity when the host population was homogeneous and was only slightly less successful when the host population was heterogeneous (super-spreaders present). By comparison, some previously published analytical methods could identify only some inter-strain interactions in datasets generated from models with homogeneous host populations, but host heterogeneity obscured these interactions.
This manuscript makes seamless connections between basic viral biology and its epidemiological consequences by tying them together with realistic models, illustrating the fundamental utility of biological modeling. This analytical framework provides crucial tools for experimentalists, facilitating collaborations with theoreticians to better understand the epidemiological consequences of co-infections. In addition, the method is simple enough to be applied by a broad base of experimentalists to the many pathogens where co-infections are common. Thus, this paper has the potential to impact several research fields and public health practice. Those attempting to apply this method should note the potential limitations noted by the authors. For example, it is not designed to detect the mechanisms of inter-strain interactions (there is no within host component of the models) but to identify the existence of interactions through patterns indicative of these interactions while ruling out other sources that could cause the pattern. This approach is likely to be most accurate when strain identification within hosts is precise and unbiased - which is unlikely in many systems where samples are taken only from symptomatic cases and strain detection is not sufficiently sensitive – and when host contact networks can be reasonably estimated. Importantly, a priori knowledge of the set of possible epidemiological models is needed for accurate parameter estimates, which may be true for several prominent pathogens, but not be so for many other pathogens and symbionts. We look forward to future extensions of this framework where this restriction is relaxed. Alizon et al.  have provided a framework that will facilitate theoretical and empirical work on the impact of coinfections on infectious disease and should shape future public health data collection standards.
 Alizon, S., Murall, C.L., Saulnier, E., & Sofonea, M.T. (2018). Detecting within-host interactions using genotype combination prevalence data. bioRxiv, 256586, ver. 3 peer-reviewed and recommended by PCI Ecology. doi: 10.1101/256586
Dustin Brisson (2018) Combining epidemiological models with statistical inference can detect parasite interactions. Peer Community in Ecology, 100006. 10.24072/pci.ecology.100006
Evaluation round #127 Jul 2018
DOI or URL of the preprint: 10.1101/256586
Version of the preprint: 1
Decision by Dustin Brisson
Editor type comments
Two experts and I have reviewed the preprint entitled “Modelling coinfections to detect within-host interactions from genotype combination prevalences” and all have come to similar conclusions. First, all of us appreciate the topic and think that the work itself has value. There were several noted areas where additional analyses would improve the impact, although most of these were minor. The primary area that the authors should focus on to improve the manuscript is on the presentation. All of the reviewers believed that the much of the work was under-presented (not enough detail), ambiguously described, and in several areas confusing. It is important to note that none of the reviewers identified all or even most of the areas where the presentation needs alteration, instead noting some examples of the types of presentation that caused confusion. Globally, we found this an important piece of work and hope to see an improved version in the near future. Below you will find my personal review for what it is worth. Dustin
There are several important academic areas in the study of infectious diseases that have not been well explored, often because of technical difficulty. The topic of the current work, “Modelling coinfections to detect within-host interactions from genotype combination prevalences,” is certainly one of the. Both theory and several important examples have demonstrated that co-infection dynamics can have outsized impacts on epidemiology. Unfortunately, for the outcome of pathogen interactions are difficult to decipher from correlational data. The current work aims to identify the impacts of pathogen interactions on transmission dynamics using epidemiological data. This is important and would be a major advance. The primary issue with the current manuscript is in the presentation, which I found to be ambiguous in some places and confusing in others. A potential issue is that you have assumed that I am smarter than I am, but I should point out that those like me will be the primary audience of papers that introduce analytical approaches for empirical data. I would suggest more hand holding in the presentation. Below I try to provide some examples to help guide the resubmission. At the end, I also point out some areas where additional analyses could be useful.
First, the method section must come before the results and discussion as one must read the methods to make sense of the results For example, parameter set #3 is referred to in the results without a description of what that is and why it may be important. Additionally, there are definitions that are in the methods that are needed to understand the results (ie “target runs”, “competition intensity”).
Second, in many descriptions there is an absence of explicit purpose which seems to affect the structure of the manuscript. That is, it is not always clear what is trying to be accomplished with each section and several sections that is necessary. This is true for the overall paper as well. By the end it is somewhat that the purpose is to introduce and validate a method to assess transmission dynamics from epidemiological data given potential coinfections (this is my inference anyway), but it is difficult to infer this from the abstract or the introduction. Another potential point or subpoint is that there is information that cannot be used by other methods so you have less precise results. Anyway, being more explicit throughout will help structure the manuscript and help the reader.
One of the major issues seems to be a lack of precision in the descriptions, again throughout. Take the first paragraph of the Discussion as an example (“This is due to the fact that when sharing a host, parasites can interact in various ways . The goal of this study was to determine to what extent the prevalence of parasite combinations can inform us on such interactions.”). It is hard to get a solid footing on what the interactions are, what is meant by parasite combinations, and what information you are looking for. Similarly, better descriptions of the figures in the text, especially with regard to what the reader is supposed to learn from them, is essential. For example, Fig 3 is intense but it is never explicitly stated what the take home message of the figure is. I am also not clear on what the numbers in the circles of Fig 2A are or what is meant by “different prevalences” mean. The methods are overall pretty clear, but more hand-holding would be helpful. For example, the summary statistics used seem important – spending greater time describing them and what we learn from the different types would be useful. In addition, as the ABC is the centerpiece, a fuller description is probably warranted.
Some comments on the science itself.
The ABC is used to infer parameters of the model, but what if you assume the incorrect underlying model, is it substantially worse than the other heuristic methods where there are fewer assumptions. I would like to see what occurs if you assumptions about the epidemiology are incorrect.
Can you explain how “we assumed that interactions between 303 HPV types take place through the recovery rates.” affects either model or biological inference? I think you may mean how coinfection affects clearance rates, but I still cannot quite figure out how this impacts the interpretation.
It is not clear what happens to the clearance rates if there are 3 strains present. That is, is recovery (1+k)^2 or is it still 1+k.
I don’t understand why a in (2) denotes assortment between host types rather than within host types What is the upper case delta in the updated master equation (w/ two host classes)?
Some other things to consider
Why the paper is centered on HPV is not clear to me. It applies to many potential pathogens and you did not actually use any data from HPV.
The sentence describing 6D is confusing.
A definition of what a significant association between parasites is should be explicitly stated in the results section for each test. Currently it is a bit confusing “Depending on whether k is greater or lower than 1, we expect host classes containing genotypes from the second group to be under- or over-represented respectively” - why is there not a positive correlation between interaction intensity and the probability that the test is significant for the combination network?