SMMole is a computational pipeline for assigning predicted secondary metabolite (SM) molecules with tentative biological properties, such as producing organism and bioactivity. Researchers can utilize SMMole output for (i) eliminating SMs associated with producers that are inconsistent with the known sample environment, and (ii) prioritizing molecules with the strongest bioactivity for experimental validation. Our tool overcame the limitations of the competing approaches: the PubChem chemical search interface (lack of information for SMs absent in PubChem) and PASS Online (no taxonomy predictions). Moreover, SMMole is suitable for the high-throughput analysis since it allows the batch processing of hundreds of SMs.
Our pipeline takes as input molecules in standard chemical formats (SMILES, MDL MOL and SDF), converts them to SMILES, and queries PubChem for exactly these and all structurally similar compounds. The users control the similarity level by setting SMMole’s threshold on the Tanimoto coefficient. Since we focus on natural products only, the pipeline computes the NP-likeness score (Ertl et al., 2007) for all PubChem hits and filters outs likely synthetic compounds. SMMole obtains taxonomy data and biological tests results for the remaining molecules, and summarizes them for each input SM. The tool output contains the most common producer taxonomy rank and averaged data on bioactivity.
The tool is developed within the RSF 20-74-00032 project.


SMMole is a work-in-progress project. We will release the first public version soon. Stay tuned!


If you have any questions/suggestions about SMMole, please write to