Originally, we have developed a de novo genome assembler tool called SPAdes for the purpose of overcoming the complications associated with single-cell microbial data generated using MDA. Later, SPAdes was recognized by the scientific community as one of the best assemblers for bacterial data sets. This fact inspired us to extend the capabilities of SPAdes to include additional sequencing platforms besides Illumina (e.g. IonTorrent, PacBio and Oxford Nanopore) and to develop a set of novel software tools for various purposes: assembly of highly polymorphic genomes, plasmid assembly, metagenome assembly etc.
Starting from penicillin, Natural Products have an exceptional track record in pharmacology: many antibiotics, antiviral and antitumor agents, immunosuppressors, and toxins are Natural Products. The recent discovery of teixobactin (the beginning of 2015) brought Natural Products back in the center of attention after a long period of a recession in antibiotic discovery efforts. The launch of the Global Natural Products Social (GNPS) Molecular Networking project also in 2015 combined together more than a billion mass spectra of natural products generated in over a hundred laboratories around the globe. While these spectra definitely contain new Natural Products including extremely useful from a medical point of view, revealing of them remains a challenging computational problem. Natural Products often contain non-standard amino acids and complex modifications greatly complicating their discovery.
Center for Algorithmic Biotechnology in collaboration with Center for Computational Mass Spectrometry at UCSD and Mohimani Lab at Carnegie Mellon University are working on software for cyclic and more complex peptide sequencing and dereplication that was successfully used in many collaborative projects.
High-throughput metagenomics sequencing has become one of the most effective ways to study microbial communities sampled from the environment, as well as from living organism. Our group is developing
metaSPAdes software for de novo assembly of metagenomics samples as well as novel pipeline for analysis of series of metagenomics samples.
RNA-Seq is vastly used for well-studied organisms such as mouse and human, thus allowing to use reference-based methods for the analysis. However, multiple research projects study organisms with previously unsequenced genomes therefore creating a need for de novo transcriptome assembler. Due to varying expressions levels of different genes and isoforms, RNA-Seq data sets are characterized by highly-uneven coverage depth. Since SPAdes assembler is already capable of dealing with non-uniform coverage (typical for single-cell genomic data), we have decided to create rnaSPAdes — a SPAdes-based assembler for RNA-Seq data.
In addition, we complement it with rnaQUAST — a quality assessment tool for transcriptome assemblers, which works for both — model organisms with reference genome and gene database, and organisms whose genome is unknown.