SARS-CoV-2 wastewater samples are extensively collected and studied because it allows quantitative assessment of viral load in surrounding populations. Additionally, SARS-CoV-2 strain deconvolution provides more insights into pandemic dynamics and the emergence of new strains. Typically, the solution to the strain deconvolution problem begins with read alignment of wastewater short-read sequencing data to the SARS-CoV-2 reference genome. After variants are identified and their abundances estimated, a reference database is utilized to assign variants to strains, select a subset of strains, and infer the relative abundance of these strains based on a mathematical model. Assembly-based methods have their own strengths but currently exist in the shadow of alignment-based methods.
wastewaterSPAdes is available for Linux.
To use wastewaterSPAdes, you’ll need to:
- Pass the
--sewage
flag to thecoronaspades.py
script. - Provide the SARS-CoV-2 reference genome as trusted contigs.
Here’s an example:
/home/dmm2017/algorithmic-biology/assembler/coronaspades.py --sewage -1 R1.fastq -2 R2.fastq --trusted-contigs NC_045512.2.fa -o sewage_assembly_cov