Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/33086| Title: | Evaluation of variant calling algorithms for wastewater-based epidemiology using mixed populations of SARS-CoV-2 variants in synthetic and wastewater samples |
| Authors: | Bassano, I Ramachandran, VK Khalifa, MS Lilley, CJ Brown, MR van Aerle, R Denise, H Rowe, W George, A Cairns, E Wierzbicki, C Pickwell, ND Carlile, M Holmes, N Payne, A Loose, M Burke, TA Paterson, S Wade, MJ Grimsley, JMS |
| Keywords: | SARS-CoV-2;sequencing;variant callers;VOC;wastewater |
| Issue Date: | 19-Apr-2023 |
| Publisher: | Microbiology Society |
| Citation: | 'Evaluation of variant calling algorithms for wastewater-based epidemiology using mixed populations of SARS-CoV-2 variants in synthetic and wastewater samples', Microbial Genomics, 9 (4), pp. 1–16. doi: 10.1099/mgen.0.000933. |
| Abstract: | Wastewater-based epidemiology has been used extensively throughout the COVID-19 (coronavirus disease 19) pandemic to detect and monitor the spread and prevalence of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) and its variants. It has proven an excellent, complementary tool to clinical sequencing, supporting the insights gained and helping to make informed public-health decisions. Consequently, many groups globally have developed bioinformatics pipelines to analyse sequencing data from wastewater. Accurate calling of mutations is critical in this process and in the assignment of circulating variants; yet, to date, the performance of variant-calling algorithms in wastewater samples has not been investigated. To address this, we compared the performance of six variant callers (VarScan, iVar, GATK, FreeBayes, LoFreq and BCFtools), used widely in bioinformatics pipelines, on 19 synthetic samples with known ratios of three different SARS-CoV-2 variants of concern (VOCs) (Alpha, Beta and Delta), as well as 13 wastewater samples collected in London between the 15th and 18th December 2021. We used the fundamental parameters of recall (sensitivity) and precision (specificity) to confirm the presence of mutational profiles defining specific variants across the six variant callers. Our results show that BCFtools, FreeBayes and VarScan found the expected variants with higher precision and recall than GATK or iVar, although the latter identified more expected defining mutations than other callers. LoFreq gave the least reliable results due to the high number of false-positive mutations detected, resulting in lower precision. Similar results were obtained for both the synthetic and wastewater samples. |
| Description: | Impact Statement:
Since the declaration of the pandemic in March 2020 by the World Health Organization (WHO), many laboratories have made substantial contributions to understanding the spread of COVID-19 (coronavirus disease 19), the biology and transmission of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), as well as novel studies concerning environmental detection of the virus to support public-health systems around the world. In England, the Environmental Monitoring for Health Protection (EMHP) programme, part of the Department of Health and Social Care, has been monitoring SARS-CoV-2 since 2020, covering up to 75 % of the population. In this article, we have used a portion of the wealth of data collected over the last 2 years to interrogate whether wastewater samples are correctly analysed when it comes to identifying variants of the same circulating virus. While many laboratories have published excellent work in wastewater-based epidemiology (WBE), it is our understanding that a basic investigation regarding the ability of tools to efficiently identify these variants was lacking. Having worked with some of the variant callers within our group, we proposed this article, which we believe covers a good understanding of what tools can be used to analyse wastewater samples. In this work, we describe key parameters used to assess the efficacy of variant callers, namely sensitivity and specificity (also known as recall and precision), as well as comparing the frequency at which known mutations are called. There is a high significance behind these results, as we can clearly show that, using our datasets, some callers do indeed outperform others. To our knowledge, at present, no such comparison work has been done using environmental samples, and we believe this work will lay the basic foundations for future, more detailed and specific studies in WBE. Data Summary: All data used to generate figures have been deposited under ENA Accession numbers ERA14897883 and ERA14897645. Codes used to analyse the data have been uploaded on Github under https://github.com/mookhalifa/wastewater_variant_caller_comparison.git. Mutation patterns used to describe VOCs and VUIs were taken from mutation patters were taken from https://github.com/phe-genomics/variant_definitions. |
| URI: | https://bura.brunel.ac.uk/handle/2438/33086 |
| DOI: | https://doi.org/10.1099/mgen.0.000933 |
| Other Identifiers: | ORCiD: Irene Bassano https://orcid.org/0000-0002-6948-2568 ORCiD: Mohammad S. Khalifa https://orcid.org/0000-0002-1747-7337 ORCiD: Matthew J. Wade https://orcid.org/0000-0001-9824-7121 |
| Appears in Collections: | Department of Life Sciences Research Papers |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| FullText.pdf | Copyright © 2023 The Authors This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/). This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution. | 3.12 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License