SARS-CoV-2 wastewater samples are extensively collected and studied because it allows quantitatively assess a viral load in surrounding populations. Additionally, SARS-CoV-2 strain
deconvolution gives more insights into pandemic dynamics and the uprising of new strains. Usually, the solution to the strain deconvolution problem starts with read alignment of wastewater short read sequencing data to the SARS-CoV-2 reference genome. After variants are called and their abundances are estimated, a reference database is used to assign variants to strains, select a subset of strains, and infer the relative abundance of these strains based on some mathematical model. Assembly-based methods have their own strengths but currently reside in the shadow of alignment-based methods.
wastewaterSPAdes is available here:
In order to use wastewaterSPAdes, you will need to pass –sewage flag to coronaspades.py script, and provide SARS-CoV-2 reference genome as trusted contigs. The example is here:
”’
/home/dmm2017/algorithmic-biology/assembler/coronaspades.py –sewage -1 R1.fastq -2 R2.fastq –trusted-contigs NC_045512.2.fa -o sewage_assembly_cov
”’