SARS-CoV-2 wastewater samples are extensively collected and studied because it allows quantitative assessment of viral load in surrounding populations. Additionally, SARS-CoV-2 strain deconvolution provides more insights into pandemic dynamics and the emergence of new strains. Typically, the solution to the strain deconvolution problem begins with read alignment of wastewater short-read sequencing data to the SARS-CoV-2 reference genome. After variants are identified and their abundances estimated, a reference database is utilized to assign variants to strains, select a subset of strains, and infer the relative abundance of these strains based on a mathematical model. Assembly-based methods have their own strengths but currently exist in the shadow of alignment-based methods.

wastewaterSPAdes is available for Linux.

To use wastewaterSPAdes, you’ll need to:

  • Pass the --sewage flag to the coronaspades.py script.
  • Provide the SARS-CoV-2 reference genome as trusted contigs.

Here’s an example:

/home/dmm2017/algorithmic-biology/assembler/coronaspades.py --sewage -1 R1.fastq -2 R2.fastq --trusted-contigs NC_045512.2.fa -o sewage_assembly_cov